Skip to content

huggingface

chat
embedding
image
audio
open-source

Test your Hugging Face API key.

Validate a Hugging Face token, confirm whoami, and run a quick benchmark against the inference endpoint.

Stateless proxy — keys never logged, stored, or persisted. What happens to your key →

Detected
Hugging Face

What this key does

Hugging Face tokens authenticate against the Inference API (free, rate-limited, hundreds of thousands of models) and Inference Endpoints (your own dedicated GPUs). Tokens can be read, write, or fine-grained.

How to get a Hugging Face API key

  1. Sign in at huggingface.co/settings/tokens.
  2. Click New token.
  3. Pick a fine-grained token with Inference API permissions.
  4. Copy the hf_... token and paste it here.

Common errors and fixes

  • 401 Unauthorized: Key is invalid, revoked, or pasted with extra whitespace. Generate a new key from the provider console and try again.
  • 403 Forbidden: Key is valid but lacks permission for this resource. Check project / org / workspace scope, or that billing is set up for this key.
  • 429 Too Many Requests: You hit the per-minute or per-day rate limit. Wait a moment and retry, or upgrade your tier.
  • 404 Not Found: The endpoint or model id changed. Check the provider docs for the current path and model identifier.
  • 5xx: The provider is having issues. Check their status page before assuming the bug is yours.

Security best practices

  • Store keys in an env var or secret manager — never commit them to a repo, even a private one.
  • Restrict scope: prefer per-project or per-deployment keys over a single root key shared across services.
  • Rotate on a schedule (90 days is a sane default) and immediately on suspected leak.
  • Audit usage in the provider console after rotation to confirm the old key has zero traffic.
  • Set per-key spend limits where the provider supports them, so a leaked key has a bounded blast radius.

Pricing at a glance

Inference API is free with rate limits; Inference Endpoints are billed by GPU/hour. Pro accounts get higher rate limits.

FAQ

What's the difference between Inference API and Inference Endpoints?
API is shared, free, throttled. Endpoints are dedicated GPUs you spin up and pay for.
Why does my token 503?
Cold start. The first call to a model on the Inference API triggers a load. Retry after a few seconds.
Read vs write vs fine-grained?
Read: download models. Write: push. Fine-grained: scoped permissions. For inference, read or fine-grained is enough.
How do I list models I can call?
There are 500k+ — enumerating is impractical. Use the HF Hub search UI to find models, then call by id.
Free rate limits?
Roughly a few hundred requests per hour per user. Pro accounts get higher caps.
Can I run multimodal models?
Yes — image, audio, video models are all callable via the Inference API.