LiteLLM + MCP Server: Self-Hosted WHMCS AI (2026)

LiteLLM is an open-source proxy that lets you route requests to 100+ AI models through a single endpoint. Since version 1.80.18, it has native MCP Gateway support, meaning MCP tools work with every LLM backend connected to the proxy.

For hosting providers with strict data control requirements, LiteLLM + MCP Server for WHMCS is the self-hosted AI stack. Your billing data never leaves your infrastructure. No external API calls unless you choose to make them.

This guide covers what LiteLLM is, why it matters for WHMCS, how the MCP Gateway works, and when to choose LiteLLM over cloud alternatives like OpenRouter.

What Is LiteLLM?

LiteLLM is an open-source proxy server that sits between your AI applications and LLM providers. It translates requests into each provider's format, handles retries, manages API keys, and provides a unified OpenAI-compatible API.

What it offers hosting providers:

100+ model support (Anthropic, OpenAI, Ollama, vLLM, Azure, Bedrock, and more)
Native MCP Gateway since v1.80.18
Self-hosted via Docker or Kubernetes
MCP permissions per API key and per team
Request logging and cost tracking built in
Open source (MIT license, free)

The critical difference between LiteLLM and cloud services like OpenRouter: LiteLLM runs on your servers. You control the entire data pipeline.

Why Self-Hosted AI Matters for WHMCS

WHMCS contains sensitive business data: client payment methods, invoice amounts, revenue figures, support ticket contents, service configurations. When you use a cloud AI service, this data passes through external servers.

Cloud providers like Anthropic and OpenAI state they do not train on business data. But "not training on data" and "data never leaving your infrastructure" are different guarantees.

Self-hosted AI with LiteLLM means:

WHMCS data stays on your server at every step
No billing data transits through external AI providers (if using local models)
Full audit trail under your control
Compliance with data residency requirements (GDPR, HIPAA-adjacent, SOC 2)
No vendor lock-in to any single AI provider

For hosting providers subject to data processing agreements or operating in regulated markets, self-hosted AI is not a preference. It is a requirement.

The Architecture: LiteLLM + MCP Server + WHMCS

[AI Client] → [LiteLLM Proxy] → [Local Model (Ollama/vLLM)]
                    ↓
              [MCP Gateway]
                    ↓
              [MCP Server] → [WHMCS] → [Database]

LiteLLM acts as both the model router and the MCP gateway. When an AI client sends a request that requires WHMCS tools, LiteLLM:

Routes the request to the configured LLM (local or cloud)
The LLM decides which MCP tools to call
LiteLLM executes the tool calls through its MCP Gateway
MCP Server processes the request against WHMCS
Results flow back through the same pipeline

The MCP Gateway means the same WHMCS tools work no matter which LLM processes the request. Switch from Ollama to Claude to GPT-4 without reconfiguring MCP.

LiteLLM MCP Gateway: How It Works

The MCP Gateway in LiteLLM (added in v1.80.18) provides:

Unified tool access across models: Configure MCP Server once in LiteLLM. Every connected model gets access to the same 46 WHMCS tools.

Per-key permissions: Create API keys in LiteLLM with specific MCP tool access. Your support team key can read tickets but not access financial data. Your finance team key can read invoices but not modify services.

Per-team scoping: Assign MCP permissions at the team level. Different departments see different subsets of WHMCS data.

Auto-execute tool calls: Configure require_approval: "never" for autonomous execution. Or require approval for write operations while auto-approving reads.

Example LiteLLM Configuration

model_list:
  - model_name: "local-llama"
    litellm_params:
      model: "ollama/llama3.3"
      api_base: "http://localhost:11434"
  - model_name: "claude-sonnet"
    litellm_params:
      model: "claude-sonnet-4-20250514"
      api_key: "sk-ant-..."
 
mcp_servers:
  - name: "whmcs"
    url: "https://your-whmcs.com/modules/addons/mx_mcp/mcp.php"
    headers:
      Authorization: "Bearer YOUR_BEARER_TOKEN"

This configuration gives both the local Llama model and Claude Sonnet access to WHMCS tools through MCP Server. The AI client talks to LiteLLM, LiteLLM handles everything else.

Deployment Options

Docker (Recommended for Most Hosting Providers)

docker run -d \
  --name litellm \
  -p 4000:4000 \
  -v ./litellm_config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml

LiteLLM runs on port 4000. Point your AI clients to http://localhost:4000 as the API base URL. If running on the same server as WHMCS, the entire AI pipeline stays on one machine.

Kubernetes (Enterprise Scale)

LiteLLM provides Helm charts for Kubernetes deployment. Use this for multi-node setups where you need horizontal scaling, load balancing, or high availability.

Alongside WHMCS

Many hosting providers run WHMCS on a dedicated server. Add LiteLLM and a local model (Ollama) to the same server or a server on the same network. The WHMCS MCP Server endpoint is accessible locally without public internet exposure.

Local Model Options for WHMCS

If you want zero external API calls, run local models via Ollama or vLLM behind LiteLLM:

Model	Parameters	RAM Required	WHMCS Performance
Llama 3.3 70B	70B	48GB+	Good for data retrieval and summaries
Llama 3.1 8B	8B	8GB	Adequate for simple lookups
Mistral 7B	7B	8GB	Fast, good for ticket summaries
DeepSeek R1	7B-671B	Varies	Strong reasoning for analysis
Qwen 2.5 72B	72B	48GB+	Good multilingual support

For pure WHMCS data retrieval ("show me client X", "list overdue invoices"), 8B parameter models on 8GB RAM handle the task. For analysis and reasoning ("predict churn", "calculate profitability"), 70B+ models produce better results but need 48GB+ RAM or GPU offloading.

LiteLLM vs OpenRouter vs Direct API

Factor	LiteLLM	OpenRouter	Direct API
Self-hosted	Yes	No	No
Data stays on-prem	Yes (with local models)	No	No
MCP Gateway	Native (v1.80.18+)	Via MCP servers	Per-client config
Models available	100+ (any you configure)	500+	One provider
Cost	Free (OSS) + hardware	5.5% fee + model costs	Model costs only
Per-key MCP permissions	Yes	No	No
Per-team MCP scoping	Yes	No	No
Setup complexity	High (Docker, config)	Low (API key)	Low (API key)
Maintenance	You manage it	Managed by OpenRouter	Managed by provider

Choose LiteLLM when:

Data residency is a requirement
You need per-team or per-key MCP permissions
You want to run local models (Ollama, vLLM)
You manage multiple teams accessing WHMCS data
You are comfortable with Docker and self-hosted infrastructure

Choose OpenRouter when:

You want managed infrastructure
Setup speed matters more than data control
You need 500+ models including latest releases
See our OpenRouter guide

Choose direct API when:

You use only one model provider (e.g., Claude only)
Simplicity is the priority
You have fewer than 500 queries per month

Cost Analysis: Self-Hosted vs Cloud

Self-hosted AI has different economics than cloud:

Cloud (OpenRouter + MCP Server):

MCP Server: $22/month
Model costs: $7-180/month (depends on usage and models)
Infrastructure: $0 (uses cloud providers)
Total: $29-202/month

Self-hosted (LiteLLM + Ollama + MCP Server):

MCP Server: $22/month
LiteLLM: $0 (open source)
Local model (Ollama): $0 (open source)
Hardware: Existing server (or $50-150/month for a dedicated GPU VPS)
Model API costs: $0 (if using only local models)
Total: $22/month (on existing hardware) or $72-172/month (dedicated GPU VPS)

If you already have server capacity with 48GB+ RAM, self-hosted is the cheapest option. The per-query cost is effectively zero. If you need to provision new hardware, the break-even point is around 5,000-10,000 queries per month compared to cloud alternatives.

Security Considerations

LiteLLM adds another layer in the stack, which means another surface to secure:

Access control: LiteLLM has its own API key system. Set strong keys. Do not expose the LiteLLM API to the public internet unless needed.

Network isolation: Run LiteLLM on the same private network as your WHMCS. The MCP Server endpoint does not need to be publicly accessible if LiteLLM connects locally.

Logging: LiteLLM logs all requests including tool calls. Combined with MCP Server audit logs, you get full traceability: who asked what, which model processed it, which WHMCS tools were called, and what data was returned.

Updates: LiteLLM is actively developed. Keep it updated. The MCP Gateway is relatively new (v1.80.18+) and receives frequent improvements.

Frequently Asked Questions

Can I mix local and cloud models through LiteLLM? Yes. Configure Ollama for simple queries and Claude Sonnet for complex analysis in the same LiteLLM instance. Route based on model name in the request. Both get access to the same WHMCS MCP tools.

Does LiteLLM work with MCP Server out of the box? Yes, since version 1.80.18. Add MCP Server as an MCP endpoint in LiteLLM's configuration. The MCP Gateway handles tool discovery and execution.

How much hardware do I need? LiteLLM itself is lightweight (runs in Docker with minimal resources). The hardware requirement depends on the local model. For Llama 3.1 8B: 8GB RAM. For Llama 3.3 70B: 48GB RAM or a GPU with 40GB+ VRAM.

Is this GDPR compliant? Self-hosting with local models means WHMCS data never leaves your infrastructure. This satisfies data residency requirements. However, GDPR compliance involves more than data residency. Consult your compliance team for a full assessment.

Can multiple team members use this simultaneously? Yes. LiteLLM supports concurrent requests. Create separate API keys per team member with different MCP permissions. The support team sees tickets and clients. The finance team sees invoices and revenue.

Summary

LiteLLM + MCP Server for WHMCS is the self-hosted AI stack for hosting providers who need full data control. The native MCP Gateway means your WHMCS tools work with any model. Per-key and per-team permissions let you restrict access by role. Zero per-query cost when using local models.

The trade-off is setup complexity and hardware requirements. If you are comfortable with Docker and have server capacity, the economics and privacy benefits are worth the effort.

Next steps:

Install MCP Server for WHMCS
Compare AI gateways (OpenRouter, LiteLLM, Portkey)
Use local AI models with MCP Server
Reduce AI costs 60% with model routing