Ollama is a local runtime. Network reachability is the auth boundary.

Can I use OpenAI's SDK with Ollama?

Yes — Ollama exposes an OpenAI-compatible /v1/chat/completions on the same port.

How do I expose Ollama to other machines?

Set OLLAMA_HOST=0.0.0.0:11434 and put it behind your VPN or a reverse proxy with auth.

Which model is fastest on Mac?

On Apple Silicon, llama3.2 (3B), phi-3-mini, and qwen2.5-coder-3b are great for interactive use.

Can I run Ollama in production?

Sure, but add auth, rate limiting, and a queue. The default daemon is single-tenant.

How do I see GPU usage?

ollama ps shows what's loaded. nvidia-smi or asitop shows GPU utilisation.

ollama

chat

embedding

open-source

Test your Ollama API key.

Confirm your local Ollama host is reachable and list the models you've pulled.

Get a key API docs Pricing

Stateless proxy — keys never logged, stored, or persisted. What happens to your key →

API key·paste anywhere

Detected

Ollama

What this key does

Ollama is a local model runner — no API key. Auth is just network reachability to the host (usually localhost:11434). Use this page to confirm the daemon is up and your model is loaded.

How to get a Ollama API key

Install Ollama from ollama.com/download.
Pull a model: ollama pull llama3.2.
By default the daemon listens on http://localhost:11434.
Paste your host URL here. For remote Ollama set OLLAMA_HOST=0.0.0.0:11434 and use the LAN IP.

Common errors and fixes

ECONNREFUSED: The daemon isn't running. Start ollama serve or open the Ollama desktop app.
404 model not found: Pull the model first: ollama pull <model>.
Timeout: Cold model load can take 30+ seconds for large models. Re-run after the first request loads it into memory.

Security best practices

Don't expose Ollama on a public IP without a reverse proxy + auth.
If you must allow LAN access, restrict the listen address to a specific interface.
Pulled models live on disk in ~/.ollama — treat that directory like any other code dependency.

Pricing at a glance

Free — you pay for the hardware.

FAQ

Why no API key?: Ollama is a local runtime. Network reachability is the auth boundary.
Can I use OpenAI's SDK with Ollama?: Yes — Ollama exposes an OpenAI-compatible /v1/chat/completions on the same port.
How do I expose Ollama to other machines?: Set OLLAMA_HOST=0.0.0.0:11434 and put it behind your VPN or a reverse proxy with auth.
Which model is fastest on Mac?: On Apple Silicon, llama3.2 (3B), phi-3-mini, and qwen2.5-coder-3b are great for interactive use.
Can I run Ollama in production?: Sure, but add auth, rate limiting, and a queue. The default daemon is single-tenant.
How do I see GPU usage?: ollama ps shows what's loaded. nvidia-smi or asitop shows GPU utilisation.

What this key does

How to get a Ollama API key

Common errors and fixes

Security best practices

Pricing at a glance

FAQ

Test other providers

Related reading