Local LLMs with Ollama on Your Homelab Server

Ollama: your own language model, running locally on the homelab server.

Part of a guide

This article is part of the AI Agents Guide – the curated learning path for AI agents.

Cloud LLMs are convenient, but not always the right choice: sensitive data, ongoing cost, dependence on one vendor. With Ollama you run language models locally on your own server – surprisingly easily. This article shows the entry, including an honest hardware reality check.

Why local?

Privacy: Nothing leaves your server – ideal for private or confidential content.
Cost: No token billing; just electricity.
Offline & control: Runs without internet, no API limits, no model swapping overnight.

The price: local models are (still) weaker than the big cloud models – but perfectly sufficient for many tasks.

Installation

On a Linux server one line is enough:

bash

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama as a service running in the background. (If you prefer it cleanly contained, use the official Docker image instead.)

Load your first model and chat

One command downloads the model and starts the chat directly:

bash

ollama run llama3.2

The first time the model is downloaded, after that you land in an interactive prompt. With ollama list you see installed models, with ollama pull mistral you fetch more. Small models (1–4B parameters) are fast, larger ones (8–14B) need more RAM and patience.

The hardware reality

Honestly, the hardware decides how much fun this is. A rough rule of thumb for RAM needs:

3B model: ~4–6 GB – runs smoothly on almost any mini PC.
7–8B model: ~8–10 GB – usable, the sweet spot for a homelab.
14B+: 16 GB and up – noticeably slow without a GPU.

A GPU speeds things up enormously but isn't a must – small models run decently on CPU. What matters is enough RAM.

In my setup Ollama runs on the same mini PC as the rest of the homelab – with 24 GB of RAM, 7–8B models are no problem:

Ad · Affiliate link – if you buy through it, I may earn a commission. It doesn’t change the price for you.

Beelink SER5 Max (Ryzen 7 6800U, 24 GB RAM, 500 GB SSD) Amazon

Sparsamer Mini-PC für Homelab & Self-Hosting – genug RAM für Proxmox, Docker-Stacks oder eine Home-Assistant-VM.

View on Amazon

Ollama as an API and with a UI

It gets interesting when other programs use Ollama. The service automatically exposes an API on port 11434:

bash

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain MQTT in one sentence.",
  "stream": false
}'

Ollama also speaks an OpenAI-compatible API – so it plugs into many tools. A comfortable chat UI you get, for example, with LibreChat, which I described in detail here.

What I left out

GPU setup (CUDA/ROCm) – worth it, but a topic of its own.
Quantization – why a model comes in several sizes and which to pick.
Custom Modelfiles – baking in a system prompt and parameters.

Conclusion & outlook

With one line of installation and one command your own LLM runs – local, private, free to operate. Through the API it becomes the basis for your own AI applications. And if you want to understand how such a model becomes an agent that uses tools on its own: that's exactly what the next article is about.

// More recommendations

Ad · Affiliate link – if you buy through it, I may earn a commission. It doesn’t change the price for you.

netcup – 5 € Gutschein für Neukunden Hosting

5 € Rabatt für netcup-Neukunden (gilt nicht für Domains). Beim Bestellen einlösen.

View offer

Code: 36nc17813356860

ollama llm ai self-hosting homelab linux