Local Models (GGUF)
Run inference entirely on your own hardware — no Ollama, no API key. Contenox can download GGUF model files directly from HuggingFace and serve them via a built-in llama.cpp backend.
Curated models
Run contenox model registry-list to see all available models with sizes. The table below lists the curated set; approximate VRAM figures assume Q4_K_M quantization.
| Name | Description | ~VRAM |
|---|---|---|
tiny | FastThink 0.5B (testing only) | ~1 GB |
llama3.2-1b | Llama 3.2 1B | ~1 GB |
qwen2.5-1.5b | Qwen 2.5 1.5B | ~1 GB |
granite-3.2-2b | IBM Granite 3.2 2B | ~1 GB |
qwen3-4b | Qwen 3 4B | ~3 GB |
gemma4-e2b | Gemma 4 E2B | ~3 GB |
phi-4-mini | Microsoft Phi-4 Mini | ~3 GB |
gemma4-e4b | Gemma 4 E4B | ~5 GB |
granite-3.2-8b | IBM Granite 3.2 8B | ~5 GB |
qwen2.5-7b | Qwen 2.5 7B | ~5 GB |
qwen3-14b | Qwen 3 14B | ~9 GB |
qwen3-30b | Qwen 3 30B (MoE, fast) | ~19 GB |
kimi-linear | Kimi Linear 48B (MoE) | ~30 GB |
llama4-scout | Llama 4 Scout 17Bx16E | ~68 GB |
Note
Multi-GPU models (llama4-scout) require several GPUs or unified memory. MoE models (qwen3-30b, kimi-linear) use far less active VRAM than their parameter count suggests.
1. Download a model
Pick a model from the table and pull it. The file is stored at ~/.contenox/models/<name>/model.gguf.
contenox model pull qwen3-4b
Progress is printed in-line. The download is resumable — if interrupted, re-run the same command.
2. Register the local backend
This step is only needed once — the backend persists in ~/.contenox/local.db.
contenox backend add local --type local --url ~/.contenox/models/
Contenox scans the models directory at startup and exposes every */model.gguf it finds as a model name.
3. Set the default model and run
contenox config set default-model qwen3-4b
contenox "hello, what can you do?"
Important
Do not set default-provider for local models — leave it unset or the routing will conflict. The local backend is auto-selected when the model name matches a downloaded file.
Bring your own model
Any GGUF file hosted on HuggingFace (or any public URL) can be pulled by name:
contenox model pull my-model --url https://huggingface.co/org/repo/resolve/main/model.gguf
Use /resolve/main/ (not /blob/main/) in the URL so HuggingFace serves the raw file.
After the download completes, the model is automatically registered in the local registry and available from the local backend.
Registry management
The model registry is the authoritative name → URL index. Manage it from the CLI or Beam UI.
CLI
contenox model registry-list # list all curated + user-added entries
contenox model add my-model --url https://huggingface.co/org/repo/resolve/main/model.gguf
contenox model show my-model # print registry details as JSON
contenox model rm my-model # remove a user-added entry
Curated entries (tiny, qwen3-4b, etc.) cannot be removed — they are embedded in the binary.
Beam UI
Open the Model Registry page at /model-registry in Beam. It shows all curated and user-added entries. User-added entries have a Delete button; curated entries display a Curated badge.
To add a custom entry via the UI, fill in the Name and Source URL fields in the add form and click Add.
Next steps
- CLI reference — full
contenox modelsubcommand reference - Quickstart — wire the backend into your first agent
- Core Concepts — chains, tasks, hooks