Local LLMs with Ollama on Your Homelab Server
Self-host your own language models – no cloud, no data leaving your network
This article is part of the AI Agents Guide – the curated learning path for AI agents.
Cloud LLMs are convenient, but not always the right choice: sensitive data, ongoing cost, dependence on one vendor. With Ollama you run language models locally on your own server – surprisingly easily. This article shows the entry, including an honest hardware reality check.
Why local?
- Privacy: Nothing leaves your server – ideal for private or confidential content.
- Cost: No token billing; just electricity.
- Offline & control: Runs without internet, no API limits, no model swapping overnight.
The price: local models are (still) weaker than the big cloud models – but perfectly sufficient for many tasks.
Installation
On a Linux server one line is enough:
curl -fsSL https://ollama.com/install.sh | sh
This installs Ollama as a service running in the background. (If you prefer it cleanly contained, use the official Docker image instead.)
Load your first model and chat
One command downloads the model and starts the chat directly:
ollama run llama3.2
The first time the model is downloaded, after that you land in an interactive prompt. With ollama list you see installed models, with ollama pull mistral you fetch more. Small models (1–4B parameters) are fast, larger ones (8–14B) need more RAM and patience.
The hardware reality
Honestly, the hardware decides how much fun this is. A rough rule of thumb for RAM needs:
- 3B model: ~4–6 GB – runs smoothly on almost any mini PC.
- 7–8B model: ~8–10 GB – usable, the sweet spot for a homelab.
- 14B+: 16 GB and up – noticeably slow without a GPU.
A GPU speeds things up enormously but isn't a must – small models run decently on CPU. What matters is enough RAM.
In my setup Ollama runs on the same mini PC as the rest of the homelab – with 24 GB of RAM, 7–8B models are no problem:
Ad · Affiliate link – if you buy through it, I may earn a commission. It doesn’t change the price for you.
Ollama as an API and with a UI
It gets interesting when other programs use Ollama. The service automatically exposes an API on port 11434:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain MQTT in one sentence.",
"stream": false
}'
Ollama also speaks an OpenAI-compatible API – so it plugs into many tools. A comfortable chat UI you get, for example, with LibreChat, which I described in detail here.
What I left out
- GPU setup (CUDA/ROCm) – worth it, but a topic of its own.
- Quantization – why a model comes in several sizes and which to pick.
- Custom Modelfiles – baking in a system prompt and parameters.
Conclusion & outlook
With one line of installation and one command your own LLM runs – local, private, free to operate. Through the API it becomes the basis for your own AI applications. And if you want to understand how such a model becomes an agent that uses tools on its own: that's exactly what the next article is about.
Ad · Affiliate link – if you buy through it, I may earn a commission. It doesn’t change the price for you.