huggingface
chat
embedding
image
audio
open-source
Test your Hugging Face API key.
Validate a Hugging Face token, confirm whoami, and run a quick benchmark against the inference endpoint.
Stateless proxy — keys never logged, stored, or persisted. What happens to your key →
Detected
Hugging Face
What this key does
Hugging Face tokens authenticate against the Inference API (free, rate-limited, hundreds of thousands of models) and Inference Endpoints (your own dedicated GPUs). Tokens can be read, write, or fine-grained.
How to get a Hugging Face API key
- Sign in at huggingface.co/settings/tokens.
- Click New token.
- Pick a fine-grained token with Inference API permissions.
- Copy the hf_... token and paste it here.
Common errors and fixes
- 401 Unauthorized: Key is invalid, revoked, or pasted with extra whitespace. Generate a new key from the provider console and try again.
- 403 Forbidden: Key is valid but lacks permission for this resource. Check project / org / workspace scope, or that billing is set up for this key.
- 429 Too Many Requests: You hit the per-minute or per-day rate limit. Wait a moment and retry, or upgrade your tier.
- 404 Not Found: The endpoint or model id changed. Check the provider docs for the current path and model identifier.
- 5xx: The provider is having issues. Check their status page before assuming the bug is yours.
Security best practices
- Store keys in an env var or secret manager — never commit them to a repo, even a private one.
- Restrict scope: prefer per-project or per-deployment keys over a single root key shared across services.
- Rotate on a schedule (90 days is a sane default) and immediately on suspected leak.
- Audit usage in the provider console after rotation to confirm the old key has zero traffic.
- Set per-key spend limits where the provider supports them, so a leaked key has a bounded blast radius.
Pricing at a glance
Inference API is free with rate limits; Inference Endpoints are billed by GPU/hour. Pro accounts get higher rate limits.
FAQ
- What's the difference between Inference API and Inference Endpoints?
- API is shared, free, throttled. Endpoints are dedicated GPUs you spin up and pay for.
- Why does my token 503?
- Cold start. The first call to a model on the Inference API triggers a load. Retry after a few seconds.
- Read vs write vs fine-grained?
- Read: download models. Write: push. Fine-grained: scoped permissions. For inference, read or fine-grained is enough.
- How do I list models I can call?
- There are 500k+ — enumerating is impractical. Use the HF Hub search UI to find models, then call by id.
- Free rate limits?
- Roughly a few hundred requests per hour per user. Pro accounts get higher caps.
- Can I run multimodal models?
- Yes — image, audio, video models are all callable via the Inference API.
Test other providers
Related reading
- API key security best practices for LLMsHow to store, scope, rotate, and revoke LLM API keys without leaking them through git, logs, or shared environments.
- Free LLM API keys for testing in 2026Which providers offer free credits, how long they last, and how to stretch them for prototyping without a credit card.