How free-claude-code Runs Claude Code on Any Model

Q: Is free-claude-code against Anthropic's terms?

It does not use Anthropic's API, auth, or billing. It replaces the Anthropic backend with another provider via the supported ANTHROPIC_BASE_URL variable and a dummy local token — the same documented pattern gateways like LiteLLM use.

Q: How does it make Claude Code free?

By routing requests to providers with genuine free tiers (NVIDIA NIM, Gemini AI Studio, OpenRouter free models, Mistral Experiment) or to local models via Ollama, LM Studio, or llama.cpp. You bring your own access.

Q: Can I run Claude Code fully locally?

Yes. The catalog ships local providers with default localhost base URLs (Ollama :11434, LM Studio :1234, llama.cpp :8080). Point the proxy at one and no request leaves your machine.

Q: Does it work with the VS Code extension?

Yes. Add ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, and the gateway model-discovery flag to the extension's environment-variables setting. It also supports the CLI and JetBrains ACP.

Q: How many providers does it support?

Seventeen hosted and local backends, including NVIDIA NIM, OpenRouter, Gemini, DeepSeek, Mistral, Cerebras, Groq, Fireworks, Z.ai, LM Studio, llama.cpp, and Ollama.

Q: Will quality match real Claude models?

No — you run whichever model you route to, not Anthropic's models. A strong hosted model can come close on coding tasks; a small local model trades quality for privacy and zero cost.

Q: Are my prompts private on free tiers?

Not necessarily. Free tiers vary, and the project's docs flag that Gemini's free tier may use prompts to improve Google's products in some regions. For privacy, route to a local model.

Q: How do I send Opus and Haiku to different models?

Set MODEL_OPUS, MODEL_SONNET, and MODEL_HAIKU independently in the Admin UI; leave a tier blank to inherit the fallback MODEL. The router resolves each tier to its own provider.

Reference: Alishahryar1/free-claude-code — MIT · Python

free-claude-code is a local proxy that speaks Anthropic's Messages API, so the Claude Code client thinks it is talking to Anthropic while the proxy quietly routes every request to one of 17 other model backends — free provider tiers, your own paid keys, or a model running on your own machine. The "free" is legitimate, not a billing hack.

What does free-claude-code actually do?

free-claude-code is a FastAPI server that exposes Anthropic-compatible endpoints (/v1/messages, /v1/models) on localhost:8082. You point Claude Code at it with two environment variables, and it translates and forwards each request to a provider you choose. The client never knows the difference.¹

This works because Claude Code reads ANTHROPIC_BASE_URL to decide where to send traffic. The same technique powers gateways like LiteLLM, whose docs tell you to export ANTHROPIC_BASE_URL="http://0.0.0.0:4000" to route Claude Code "through centralized authentication, usage tracking, and cost controls."² free-claude-code is a member of that established class, not an exploit.

The repo has drawn 32,378 stars and 4,916 forks since it was created on 2026-01-28, and ships under the MIT license.³ It is plain Python — 1.9 MB of it — built on FastAPI, httpx, and pydantic.³

Is "free" actually free, and is it allowed?

Yes, and the mechanism matters. free-claude-code does not touch Anthropic's auth or billing — it replaces the Anthropic backend entirely. No Anthropic key is involved; the ANTHROPIC_AUTH_TOKEN you set is the literal dummy string freecc, a token for your own local proxy.¹ Claude Code, the client, is free to download; free-claude-code just points it elsewhere.

"Free" then rests on two honest sources. First, real free tiers: NVIDIA NIM, Google's Gemini AI Studio, OpenRouter's free-models collection, and Mistral's Experiment plan. Second, fully local inference — Ollama, LM Studio, and llama.cpp, whose default base URLs in the catalog are all localhost.⁴ No leaked keys, no credit stacking. As Anthropic's own settings docs note, ANTHROPIC_AUTH_TOKEN is a first-class, supported variable.⁵

How does the request actually flow?

The critical path is short: Claude Code sends an Anthropic Messages request, the router resolves which model to use, optional local shortcuts intercept busywork, a normalizer reshapes the response, and a provider adapter talks to the real upstream — hosted or local.¹

flowchart LR
    client["Claude Code<br/>CLI / IDE / Bots"] -->|Anthropic Messages| api["/v1/messages<br/>(FastAPI)"]
    api --> router{"Model Router<br/>Opus/Sonnet/Haiku"}
    router --> opt["Local optimizations<br/>quota / prefix probes"]
    opt --> norm["Protocol normalizer<br/>streaming, tools, thinking"]
    norm --> adapters["Provider adapters"]
    adapters --> hosted["Hosted: NIM, Gemini,<br/>OpenRouter, DeepSeek…"]
    adapters --> local["Local: Ollama,<br/>LM Studio, llama.cpp"]

Every hop is a small, testable unit, which is why the project ships with pytest contract tests and a strict type checker in CI.¹

The model router: one name in, a provider out

The router's whole job is to turn a Claude model name into a concrete provider and model. It supports two paths: a direct provider/model slug like zai/glm-5.1, or per-tier mapping where Opus, Sonnet, and Haiku each route to a different backend.⁶

@dataclass(frozen=True, slots=True)
class ResolvedModel:
    original_model: str
    provider_id: str
    provider_model: str
    provider_model_ref: str
    thinking_enabled: bool

The result is a frozen, slotted dataclass — immutable and memory-lean. That per-tier routing is the clever part: you can send expensive Opus traffic to a strong hosted model, push Haiku to a local 8B model, and leave a cheap fallback for everything else.⁶

Two transports, registered as data

free-claude-code talks to 17 backends without 17 bespoke code paths. Each provider is described by a frozen ProviderDescriptor whose transport_type is a two-value literal: openai_chat or anthropic_messages.⁴ OpenAI-style providers get their streaming chat completions translated into Anthropic server-sent events; native providers pass through.

TransportType = Literal["openai_chat", "anthropic_messages"]

Adding a provider is data plus a factory function, not a new branch in the request handler. The catalog module deliberately imports zero provider implementations, a boundary enforced by contract tests.⁴ This is the reusable lesson: model the difference between integrations as a field, and the integrations stop multiplying your control flow.

The guard that turns a 500 into a crash

My favorite detail is a single assertion that runs at module import. The registry maintains three parallel sets — provider descriptors, factory functions, and supported provider IDs — and the module flatly refuses to load if those three sets ever disagree with each other, turning a wiring mistake into an immediate startup failure.⁷

if set(PROVIDER_DESCRIPTORS) != set(SUPPORTED_PROVIDER_IDS) or set(
    PROVIDER_FACTORIES
) != set(SUPPORTED_PROVIDER_IDS):
    raise AssertionError(
        "PROVIDER_DESCRIPTORS, PROVIDER_FACTORIES, and SUPPORTED_PROVIDER_IDS "
        "are out of sync"
    )

Forget to wire a factory for a new provider and the server will not start, instead of failing mysteriously on the first request that needs it. It is a cheap invariant with a high payoff — the kind of thing worth stealing for any plugin registry.

What I would steal from it

Three ideas travel well beyond this repo. First, emulate the client's protocol instead of forking the client — a stable interface plus a swappable backend is the entire product. Second, answer the client's busywork locally to save latency and quota. Third, the import-time invariant guard.

The local-busywork trick is the quiet winner: free-claude-code intercepts trivial Claude Code probes like quota checks and command-prefix detection and replies without ever calling an upstream model.⁸ None of these ideas are exotic, and all of them are easy to copy into your own tools — the same way a bring-your-own-key setup keeps your AI choices in your hands rather than a vendor's.

Is it worth using?

If you want to run Claude Code's polished client experience on a model you already pay for, or on a model running entirely on your own laptop, free-claude-code is a clean, well-typed, actively maintained way to do exactly that without forking anything. The seam is the product.

Treat the free tiers with eyes open: they carry rate limits, and some — Google's Gemini free tier among them — may use your prompts to improve their products outside certain regions.⁹ For private or sensitive work, route to a local model; the architecture makes that a one-line config change.

Frequently Asked Questions

Is free-claude-code against Anthropic's terms?

It does not use Anthropic's API, auth, or billing at all. It replaces the Anthropic backend with another provider by setting ANTHROPIC_BASE_URL — a supported variable — and uses a dummy local token. This is the same documented pattern gateways like LiteLLM rely on.²⁵

How does it make Claude Code "free"?

By routing Claude Code's requests to providers with genuine free tiers (NVIDIA NIM, Gemini AI Studio, OpenRouter free models, Mistral Experiment) or to models running locally on your own machine via Ollama, LM Studio, or llama.cpp. You bring your own access; nothing is bypassed.⁴

Can I run Claude Code fully locally?

Yes. The catalog ships local providers with default localhost base URLs: Ollama at :11434, LM Studio at :1234, llama.cpp at :8080. Point the proxy at one, set the model slug, and no request leaves your machine.⁴

Does it work with the VS Code extension?

Yes. You add ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, and the gateway model-discovery flag to the extension's environment-variables setting. It also supports the CLI and JetBrains ACP.¹

How many providers does it support?

Seventeen, spanning hosted gateways and local servers: NVIDIA NIM, OpenRouter, Gemini, DeepSeek, Mistral, Codestral, OpenCode Zen and Go, Wafer, Kimi, Cerebras, Groq, Fireworks, Z.ai, LM Studio, llama.cpp, and Ollama.¹

Will quality match real Claude models?

No — you are running whichever model you route to, not Anthropic's models. A strong hosted model can come close on many coding tasks; a small local model trades quality for privacy and zero cost. Per-tier routing lets you mix the two.⁶

Are my prompts private on free tiers?

Not necessarily. Free tiers vary, and the project's own docs flag that Gemini's free tier may use prompts to improve Google's products in some regions. For privacy, route to a local model.⁹

How do I send Opus and Haiku to different models?

Set MODEL_OPUS, MODEL_SONNET, and MODEL_HAIKU independently in the local Admin UI; leave a tier blank to inherit the fallback MODEL. The router resolves each tier to its own provider.⁶

The best engineering stories are not new models — they are clever seams between the tools you already use.

If you like keeping your AI choices in your own hands, you might like mnmnote.com, a local-first notes app where you bring your own key. See also: Run Private AI on Your Own Notes (No Cloud).

free-claude-code README, GitHub. https://github.com/Alishahryar1/free-claude-code (accessed 2026-06-05). ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
"Using Claude Code with LiteLLM Proxy," LiteLLM docs. https://docs.litellm.ai/docs/tutorials/claude_responses_api (accessed 2026-06-05). ↩ ↩²
GitHub REST API, repos/Alishahryar1/free-claude-code and /languages — 32,378 stars, 4,916 forks, MIT, Python 1,926,681 bytes (accessed 2026-06-05). ↩ ↩²
config/provider_catalog.py, free-claude-code. https://github.com/Alishahryar1/free-claude-code/blob/main/config/provider_catalog.py (accessed 2026-06-05). ↩ ↩² ↩³ ↩⁴ ↩⁵
Claude Code settings — environment variables, Anthropic. https://code.claude.com/docs/en/settings (accessed 2026-06-05). ↩ ↩²
api/model_router.py, free-claude-code. https://github.com/Alishahryar1/free-claude-code/blob/main/api/model_router.py (accessed 2026-06-05). ↩ ↩² ↩³ ↩⁴
providers/registry.py, free-claude-code. https://github.com/Alishahryar1/free-claude-code/blob/main/providers/registry.py (accessed 2026-06-05). ↩
api/optimization_handlers.py, free-claude-code. https://github.com/Alishahryar1/free-claude-code/blob/main/api/optimization_handlers.py (accessed 2026-06-05). ↩
free-claude-code README, Google AI Studio (Gemini) provider section — free-tier data-use note. https://github.com/Alishahryar1/free-claude-code (accessed 2026-06-05). ↩ ↩²