Local Models (GGUF)

Run inference entirely on your own hardware — no Ollama, no API key. Contenox can download GGUF model files directly from HuggingFace and serve them via a built-in llama.cpp backend.

Curated models

Run contenox model registry-list to see all available models with sizes. The table below lists the curated set; approximate VRAM figures assume Q4_K_M quantization.

NameDescription~VRAM
tinyFastThink 0.5B (testing only)~1 GB
llama3.2-1bLlama 3.2 1B~1 GB
qwen2.5-1.5bQwen 2.5 1.5B~1 GB
granite-3.2-2bIBM Granite 3.2 2B~1 GB
qwen3-4bQwen 3 4B~3 GB
gemma4-e2bGemma 4 E2B~3 GB
phi-4-miniMicrosoft Phi-4 Mini~3 GB
gemma4-e4bGemma 4 E4B~5 GB
granite-3.2-8bIBM Granite 3.2 8B~5 GB
qwen2.5-7bQwen 2.5 7B~5 GB
qwen3-14bQwen 3 14B~9 GB
qwen3-30bQwen 3 30B (MoE, fast)~19 GB
kimi-linearKimi Linear 48B (MoE)~30 GB
llama4-scoutLlama 4 Scout 17Bx16E~68 GB

Note

Multi-GPU models (llama4-scout) require several GPUs or unified memory. MoE models (qwen3-30b, kimi-linear) use far less active VRAM than their parameter count suggests.


1. Download a model

Pick a model from the table and pull it. The file is stored at ~/.contenox/models/<name>/model.gguf.

contenox model pull qwen3-4b

Progress is printed in-line. The download is resumable — if interrupted, re-run the same command.


2. Register the local backend

This step is only needed once — the backend persists in ~/.contenox/local.db.

contenox backend add local --type local --url ~/.contenox/models/

Contenox scans the models directory at startup and exposes every */model.gguf it finds as a model name.


3. Set the default model and run

contenox config set default-model qwen3-4b
contenox "hello, what can you do?"

Important

Do not set default-provider for local models — leave it unset or the routing will conflict. The local backend is auto-selected when the model name matches a downloaded file.


Bring your own model

Any GGUF file hosted on HuggingFace (or any public URL) can be pulled by name:

contenox model pull my-model --url https://huggingface.co/org/repo/resolve/main/model.gguf

Use /resolve/main/ (not /blob/main/) in the URL so HuggingFace serves the raw file.

After the download completes, the model is automatically registered in the local registry and available from the local backend.


Registry management

The model registry is the authoritative name → URL index. Manage it from the CLI or Beam UI.

CLI

contenox model registry-list          # list all curated + user-added entries
contenox model add my-model --url https://huggingface.co/org/repo/resolve/main/model.gguf
contenox model show my-model          # print registry details as JSON
contenox model rm my-model            # remove a user-added entry

Curated entries (tiny, qwen3-4b, etc.) cannot be removed — they are embedded in the binary.

Beam UI

Open the Model Registry page at /model-registry in Beam. It shows all curated and user-added entries. User-added entries have a Delete button; curated entries display a Curated badge.

To add a custom entry via the UI, fill in the Name and Source URL fields in the add form and click Add.


Next steps