Skip to main content
guides9 min read

LiteLLM + MCP Server: Self-Hosted WHMCS AI (2026)

LiteLLM + MCP Server for WHMCS: self-host your AI, route 100+ models, and keep billing data on your servers. Native MCP Gateway support.

M

MX Modules Team

LiteLLM + MCP Server: Self-Hosted WHMCS AI (2026)
#whmcs#ai#mcp#automation#privacy

LiteLLM is an open-source proxy that lets you route requests to 100+ AI models through a single endpoint. Since version 1.80.18, it has native MCP Gateway support, meaning MCP tools work with every LLM backend connected to the proxy.

For hosting providers with strict data control requirements, LiteLLM + MCP Server for WHMCS is the self-hosted AI stack. Your billing data never leaves your infrastructure. No external API calls unless you choose to make them.

This guide covers what LiteLLM is, why it matters for WHMCS, how the MCP Gateway works, and when to choose LiteLLM over cloud alternatives like OpenRouter.

What Is LiteLLM?

LiteLLM is an open-source proxy server that sits between your AI applications and LLM providers. It translates requests into each provider's format, handles retries, manages API keys, and provides a unified OpenAI-compatible API.

What it offers hosting providers:

  • 100+ model support (Anthropic, OpenAI, Ollama, vLLM, Azure, Bedrock, and more)
  • Native MCP Gateway since v1.80.18
  • Self-hosted via Docker or Kubernetes
  • MCP permissions per API key and per team
  • Request logging and cost tracking built in
  • Open source (MIT license, free)

The critical difference between LiteLLM and cloud services like OpenRouter: LiteLLM runs on your servers. You control the entire data pipeline.

Why Self-Hosted AI Matters for WHMCS

WHMCS contains sensitive business data: client payment methods, invoice amounts, revenue figures, support ticket contents, service configurations. When you use a cloud AI service, this data passes through external servers.

Cloud providers like Anthropic and OpenAI state they do not train on business data. But "not training on data" and "data never leaving your infrastructure" are different guarantees.

Self-hosted AI with LiteLLM means:

  • WHMCS data stays on your server at every step
  • No billing data transits through external AI providers (if using local models)
  • Full audit trail under your control
  • Compliance with data residency requirements (GDPR, HIPAA-adjacent, SOC 2)
  • No vendor lock-in to any single AI provider

For hosting providers subject to data processing agreements or operating in regulated markets, self-hosted AI is not a preference. It is a requirement.

The Architecture: LiteLLM + MCP Server + WHMCS

[AI Client] → [LiteLLM Proxy] → [Local Model (Ollama/vLLM)]
                    ↓
              [MCP Gateway]
                    ↓
              [MCP Server] → [WHMCS] → [Database]

LiteLLM acts as both the model router and the MCP gateway. When an AI client sends a request that requires WHMCS tools, LiteLLM:

  1. Routes the request to the configured LLM (local or cloud)
  2. The LLM decides which MCP tools to call
  3. LiteLLM executes the tool calls through its MCP Gateway
  4. MCP Server processes the request against WHMCS
  5. Results flow back through the same pipeline

The MCP Gateway means the same WHMCS tools work no matter which LLM processes the request. Switch from Ollama to Claude to GPT-4 without reconfiguring MCP.

LiteLLM MCP Gateway: How It Works

The MCP Gateway in LiteLLM (added in v1.80.18) provides:

Unified tool access across models: Configure MCP Server once in LiteLLM. Every connected model gets access to the same 45 WHMCS tools.

Per-key permissions: Create API keys in LiteLLM with specific MCP tool access. Your support team key can read tickets but not access financial data. Your finance team key can read invoices but not modify services.

Per-team scoping: Assign MCP permissions at the team level. Different departments see different subsets of WHMCS data.

Auto-execute tool calls: Configure require_approval: "never" for autonomous execution. Or require approval for write operations while auto-approving reads.

Example LiteLLM Configuration

model_list:
  - model_name: "local-llama"
    litellm_params:
      model: "ollama/llama3.3"
      api_base: "http://localhost:11434"
  - model_name: "claude-sonnet"
    litellm_params:
      model: "claude-sonnet-4-20250514"
      api_key: "sk-ant-..."
 
mcp_servers:
  - name: "whmcs"
    url: "https://your-whmcs.com/modules/addons/mx_mcp/mcp/sse.php"
    api_key: "your-mcp-api-key"

This configuration gives both the local Llama model and Claude Sonnet access to WHMCS tools through MCP Server. The AI client talks to LiteLLM, LiteLLM handles everything else.

Deployment Options

docker run -d \
  --name litellm \
  -p 4000:4000 \
  -v ./litellm_config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml

LiteLLM runs on port 4000. Point your AI clients to http://localhost:4000 as the API base URL. If running on the same server as WHMCS, the entire AI pipeline stays on one machine.

Kubernetes (Enterprise Scale)

LiteLLM provides Helm charts for Kubernetes deployment. Use this for multi-node setups where you need horizontal scaling, load balancing, or high availability.

Alongside WHMCS

Many hosting providers run WHMCS on a dedicated server. Add LiteLLM and a local model (Ollama) to the same server or a server on the same network. The WHMCS MCP Server endpoint is accessible locally without public internet exposure.

Local Model Options for WHMCS

If you want zero external API calls, run local models via Ollama or vLLM behind LiteLLM:

ModelParametersRAM RequiredWHMCS Performance
Llama 3.3 70B70B48GB+Good for data retrieval and summaries
Llama 3.1 8B8B8GBAdequate for simple lookups
Mistral 7B7B8GBFast, good for ticket summaries
DeepSeek R17B-671BVariesStrong reasoning for analysis
Qwen 2.5 72B72B48GB+Good multilingual support

For pure WHMCS data retrieval ("show me client X", "list overdue invoices"), 8B parameter models on 8GB RAM handle the task. For analysis and reasoning ("predict churn", "calculate profitability"), 70B+ models produce better results but need 48GB+ RAM or GPU offloading.

LiteLLM vs OpenRouter vs Direct API

FactorLiteLLMOpenRouterDirect API
Self-hostedYesNoNo
Data stays on-premYes (with local models)NoNo
MCP GatewayNative (v1.80.18+)Via MCP serversPer-client config
Models available100+ (any you configure)500+One provider
CostFree (OSS) + hardware5.5% fee + model costsModel costs only
Per-key MCP permissionsYesNoNo
Per-team MCP scopingYesNoNo
Setup complexityHigh (Docker, config)Low (API key)Low (API key)
MaintenanceYou manage itManaged by OpenRouterManaged by provider

Choose LiteLLM when:

  • Data residency is a requirement
  • You need per-team or per-key MCP permissions
  • You want to run local models (Ollama, vLLM)
  • You manage multiple teams accessing WHMCS data
  • You are comfortable with Docker and self-hosted infrastructure

Choose OpenRouter when:

  • You want managed infrastructure
  • Setup speed matters more than data control
  • You need 500+ models including latest releases
  • See our OpenRouter guide

Choose direct API when:

  • You use only one model provider (e.g., Claude only)
  • Simplicity is the priority
  • You have fewer than 500 queries per month

Cost Analysis: Self-Hosted vs Cloud

Self-hosted AI has different economics than cloud:

Cloud (OpenRouter + MCP Server):

  • MCP Server: $22/month
  • Model costs: $7-180/month (depends on usage and models)
  • Infrastructure: $0 (uses cloud providers)
  • Total: $29-202/month

Self-hosted (LiteLLM + Ollama + MCP Server):

  • MCP Server: $22/month
  • LiteLLM: $0 (open source)
  • Local model (Ollama): $0 (open source)
  • Hardware: Existing server (or $50-150/month for a dedicated GPU VPS)
  • Model API costs: $0 (if using only local models)
  • Total: $22/month (on existing hardware) or $72-172/month (dedicated GPU VPS)

If you already have server capacity with 48GB+ RAM, self-hosted is the cheapest option. The per-query cost is effectively zero. If you need to provision new hardware, the break-even point is around 5,000-10,000 queries per month compared to cloud alternatives.

Security Considerations

LiteLLM adds another layer in the stack, which means another surface to secure:

Access control: LiteLLM has its own API key system. Set strong keys. Do not expose the LiteLLM API to the public internet unless needed.

Network isolation: Run LiteLLM on the same private network as your WHMCS. The MCP Server endpoint does not need to be publicly accessible if LiteLLM connects locally.

Logging: LiteLLM logs all requests including tool calls. Combined with MCP Server audit logs, you get full traceability: who asked what, which model processed it, which WHMCS tools were called, and what data was returned.

Updates: LiteLLM is actively developed. Keep it updated. The MCP Gateway is relatively new (v1.80.18+) and receives frequent improvements.

Frequently Asked Questions

Can I mix local and cloud models through LiteLLM? Yes. Configure Ollama for simple queries and Claude Sonnet for complex analysis in the same LiteLLM instance. Route based on model name in the request. Both get access to the same WHMCS MCP tools.

Does LiteLLM work with MCP Server out of the box? Yes, since version 1.80.18. Add MCP Server as an MCP endpoint in LiteLLM's configuration. The MCP Gateway handles tool discovery and execution.

How much hardware do I need? LiteLLM itself is lightweight (runs in Docker with minimal resources). The hardware requirement depends on the local model. For Llama 3.1 8B: 8GB RAM. For Llama 3.3 70B: 48GB RAM or a GPU with 40GB+ VRAM.

Is this GDPR compliant? Self-hosting with local models means WHMCS data never leaves your infrastructure. This satisfies data residency requirements. However, GDPR compliance involves more than data residency. Consult your compliance team for a full assessment.

Can multiple team members use this simultaneously? Yes. LiteLLM supports concurrent requests. Create separate API keys per team member with different MCP permissions. The support team sees tickets and clients. The finance team sees invoices and revenue.

Summary

LiteLLM + MCP Server for WHMCS is the self-hosted AI stack for hosting providers who need full data control. The native MCP Gateway means your WHMCS tools work with any model. Per-key and per-team permissions let you restrict access by role. Zero per-query cost when using local models.

The trade-off is setup complexity and hardware requirements. If you are comfortable with Docker and have server capacity, the economics and privacy benefits are worth the effort.

Next steps:

MCP Server

MCP Server

AI Integration for WHMCS

Connect AI to your WHMCS. Query clients, invoices, and tickets using natural language. Try free for 15 days.

Did you find this helpful?

Join other WHMCS professionals and get our latest guides and AI tips directly in your inbox.

M

MX Modules Team

We run a hosting business on WHMCS. These modules are the tools we built to solve our own problems, and now we share them with other providers.