Ask a developer what AI coding assistant they use and you’ll get one of a handful of answers: GitHub Copilot, Cursor, maybe Claude or ChatGPT for longer tasks. The commercial tools have done a good job of capturing mindshare, and to be honest, they’re genuinely impressive. But there’s a growing cohort of developers who aren’t comfortable with their code leaving their machine — and for them, the open-source and self-hosted ecosystem has matured considerably in 2026.
If you work on a codebase with sensitive business logic, a proprietary algorithm, code that processes personal data under GDPR, or you’re simply in a company with a restrictive data handling policy — the “just use Copilot” answer might not be available to you. Here’s what the alternatives actually look like now.
Why the commercial tools aren’t always an option
The data concern is real and specific. When you accept a Copilot suggestion, the telemetry, context, and suggestions involve your code transiting Microsoft’s infrastructure. For most developers on most projects, that’s an acceptable trade-off. But “most” isn’t “all”.
Financial services firms, law firms, defence contractors, and healthcare companies often have policies that prevent source code from leaving the organisation’s infrastructure — or at minimum require that any third-party processing is covered by a data processing agreement that the vendor’s standard terms don’t necessarily provide.
There’s also a cost argument. GitHub Copilot Individual is around £10/month. Multiply that across a team of 20 developers and you’re spending £2,400/year before you’ve bought a single server. If you have the infrastructure, running a self-hosted model can work out significantly cheaper at scale.
The toolchain: Continue + Ollama
The most practical setup for local AI coding assistance right now is Continue (the VS Code and JetBrains extension) paired with Ollama (a tool for running LLMs locally on your machine).
Continue is the interface layer — it integrates into your editor, handles the autocomplete UX, manages context windows, and routes requests to whatever model you point it at. Ollama handles the actual model serving: you pull a model, it runs locally, and Continue talks to it over a local API.
The model choice matters a lot here. For coding specifically, Qwen2.5-Coder and DeepSeek-Coder-V2 are the current standouts in the open-source space. Both are genuinely competitive with older Copilot for straightforward autocomplete and function completion. They’re not as strong as GPT-4o or Claude Sonnet 4.6 for complex multi-file refactors, but for the day-to-day “write this function” use case they’re more than adequate.
To run this on your own machine, you’ll want a reasonably recent GPU — a 12GB VRAM card handles the 7B parameter models well. On an M-series Mac (M2 Pro or later with 32GB+ unified memory), it runs smoothly. On a standard developer machine without a dedicated GPU, it’s slower but workable for smaller models.
Cody by Sourcegraph
Cody is worth mentioning as a different approach: it’s AI coding assistance designed around codebase context. Where Copilot works primarily from your current file and recent context, Cody indexes your entire codebase (or a large chunk of it) and uses that for context when answering questions or generating code.
The self-hosted version of Cody connects to your own Sourcegraph instance — which you can run on your own infrastructure — and uses whichever LLM backend you configure. That gives you a genuinely powerful “ask questions about your codebase” experience without your code leaving your infrastructure.
The setup overhead is higher than Ollama — you’re deploying a Sourcegraph server, not just running a model locally — but for a team working on a large, complex codebase, the context-awareness is worth it.
Tabby: the self-hosted Copilot alternative
Tabby is an open-source, self-hostable coding assistant designed specifically to be a drop-in replacement for Copilot. You run a Tabby server on your own infrastructure (it has decent Docker and Kubernetes support), and developers connect to it through IDE extensions for VS Code, JetBrains, and Neovim.
It supports a range of base models, including Codestral and the StarCoder2 family, and has a reasonably polished admin UI for managing models and monitoring usage. If you need to deploy AI coding assistance across a team on your own servers, Tabby is probably the most mature option right now.
The honest trade-offs
The self-hosted approach has real costs, and it’s worth being direct about them.
You’re on the hook for infrastructure. Running models locally is fine for individual developers; running them for a team of 30 means setting up and maintaining a server, managing model updates, and handling uptime. That’s not free.
Quality still lags the frontier models. DeepSeek-Coder and Qwen2.5-Coder are impressive for their size, but they’re not Claude Sonnet 4.6 or GPT-4o. If your developers do a lot of complex architectural work, exploratory coding, or debugging across large codebases, they’ll feel the gap.
Context windows are smaller. Local models typically handle 8K–32K token context windows. Commercial tools increasingly offer much larger contexts, which matters when you’re working across multiple files.
That said, for the use cases where self-hosting makes sense — a team that processes sensitive code, a company with data residency requirements, or developers who simply prefer to keep their work private — the toolchain in 2026 is good enough to be a genuine choice rather than a compromise.
The fact that you’re even weighing it seriously says something about how far the open-source ecosystem has come.