What's the difference between Inference API and Inference Endpoints?

API is shared, free, throttled. Endpoints are dedicated GPUs you spin up and pay for.

Why does my token 503?

Cold start. The first call to a model on the Inference API triggers a load. Retry after a few seconds.

Read vs write vs fine-grained?

Read: download models. Write: push. Fine-grained: scoped permissions. For inference, read or fine-grained is enough.

How do I list models I can call?

There are 500k+ — enumerating is impractical. Use the HF Hub search UI to find models, then call by id.

Roughly a few hundred requests per hour per user. Pro accounts get higher caps.

Can I run multimodal models?

Yes — image, audio, video models are all callable via the Inference API.

huggingface

chat

embedding

image

audio

open-source

Test your Hugging Face API key.

Validate a Hugging Face token, confirm whoami, and run a quick benchmark against the inference endpoint.

Get a key API docs Pricing

Stateless proxy — keys never logged, stored, or persisted. What happens to your key →

API key·paste anywhere

Detected

Hugging Face

What this key does

Hugging Face tokens authenticate against the Inference API (free, rate-limited, hundreds of thousands of models) and Inference Endpoints (your own dedicated GPUs). Tokens can be read, write, or fine-grained.

How to get a Hugging Face API key

Sign in at huggingface.co/settings/tokens.
Click New token.
Pick a fine-grained token with Inference API permissions.
Copy the hf_... token and paste it here.

Common errors and fixes

401 Unauthorized: Key is invalid, revoked, or pasted with extra whitespace. Generate a new key from the provider console and try again.
403 Forbidden: Key is valid but lacks permission for this resource. Check project / org / workspace scope, or that billing is set up for this key.
429 Too Many Requests: You hit the per-minute or per-day rate limit. Wait a moment and retry, or upgrade your tier.
404 Not Found: The endpoint or model id changed. Check the provider docs for the current path and model identifier.
5xx: The provider is having issues. Check their status page before assuming the bug is yours.

Security best practices

Store keys in an env var or secret manager — never commit them to a repo, even a private one.
Restrict scope: prefer per-project or per-deployment keys over a single root key shared across services.
Rotate on a schedule (90 days is a sane default) and immediately on suspected leak.
Audit usage in the provider console after rotation to confirm the old key has zero traffic.
Set per-key spend limits where the provider supports them, so a leaked key has a bounded blast radius.

Pricing at a glance

Inference API is free with rate limits; Inference Endpoints are billed by GPU/hour. Pro accounts get higher rate limits.

FAQ

What's the difference between Inference API and Inference Endpoints?: API is shared, free, throttled. Endpoints are dedicated GPUs you spin up and pay for.
Why does my token 503?: Cold start. The first call to a model on the Inference API triggers a load. Retry after a few seconds.
Read vs write vs fine-grained?: Read: download models. Write: push. Fine-grained: scoped permissions. For inference, read or fine-grained is enough.
How do I list models I can call?: There are 500k+ — enumerating is impractical. Use the HF Hub search UI to find models, then call by id.
Free rate limits?: Roughly a few hundred requests per hour per user. Pro accounts get higher caps.
Can I run multimodal models?: Yes — image, audio, video models are all callable via the Inference API.

What this key does

How to get a Hugging Face API key

Common errors and fixes

Security best practices

Pricing at a glance

FAQ

Test other providers

Related reading