Skip to content

ollama

chat
embedding
open-source

Test your Ollama API key.

Confirm your local Ollama host is reachable and list the models you've pulled.

Stateless proxy — keys never logged, stored, or persisted. What happens to your key →

Detected
Ollama

What this key does

Ollama is a local model runner — no API key. Auth is just network reachability to the host (usually localhost:11434). Use this page to confirm the daemon is up and your model is loaded.

How to get a Ollama API key

  1. Install Ollama from ollama.com/download.
  2. Pull a model: ollama pull llama3.2.
  3. By default the daemon listens on http://localhost:11434.
  4. Paste your host URL here. For remote Ollama set OLLAMA_HOST=0.0.0.0:11434 and use the LAN IP.

Common errors and fixes

  • ECONNREFUSED: The daemon isn't running. Start ollama serve or open the Ollama desktop app.
  • 404 model not found: Pull the model first: ollama pull <model>.
  • Timeout: Cold model load can take 30+ seconds for large models. Re-run after the first request loads it into memory.

Security best practices

  • Don't expose Ollama on a public IP without a reverse proxy + auth.
  • If you must allow LAN access, restrict the listen address to a specific interface.
  • Pulled models live on disk in ~/.ollama — treat that directory like any other code dependency.

Pricing at a glance

Free — you pay for the hardware.

FAQ

Why no API key?
Ollama is a local runtime. Network reachability is the auth boundary.
Can I use OpenAI's SDK with Ollama?
Yes — Ollama exposes an OpenAI-compatible /v1/chat/completions on the same port.
How do I expose Ollama to other machines?
Set OLLAMA_HOST=0.0.0.0:11434 and put it behind your VPN or a reverse proxy with auth.
Which model is fastest on Mac?
On Apple Silicon, llama3.2 (3B), phi-3-mini, and qwen2.5-coder-3b are great for interactive use.
Can I run Ollama in production?
Sure, but add auth, rate limiting, and a queue. The default daemon is single-tenant.
How do I see GPU usage?
ollama ps shows what's loaded. nvidia-smi or asitop shows GPU utilisation.