vLLM HuggingFace Authentication¶

Problem¶

vLLM can't download Qwen/Qwen2.5-Coder-7B-Instruct without HuggingFace authentication:

401 Client Error: Unauthorized
Invalid credentials in Authorization header

Recommended Solution (Persistent)¶

Step 1: Get HuggingFace Token¶

Go to: https://huggingface.co/settings/tokens
Create new token (read access is enough)
Copy the token (starts with hf_...)

Step 2: Accept Model License¶

Go to: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
Click "Agree and access repository" if prompted
Accept any terms/conditions

On Thor:

huggingface-cli login

When prompted: - Paste your token (input will be hidden) - Say Yes to "Add token as git credential?"

This will: - Save token to ~/.cache/huggingface/token - Store token in git credential manager (persists across sessions) - Mount token into vLLM container automatically

Verify authentication:

huggingface-cli whoami
# Should show your username

Step 4: Run vLLM Setup¶

Now just run the setup script - it will detect your saved token:

./scripts/setup_vllm_thor.sh

The script will automatically: - Check for saved token in ~/.cache/huggingface/token - Mount the HuggingFace cache directory into the container - Pass HF_TOKEN environment variable if set

Alternative: Temporary Token (Not Recommended)¶

If you need to use a temporary token for one session:

export HF_TOKEN="hf_your_token_here"
./scripts/setup_vllm_thor.sh

Note: This token won't persist after closing your terminal. Use huggingface-cli login for permanent setup.

Alternative Models (No Auth Required)¶

If you want to skip authentication, try these public models:

Microsoft Phi-3.5-mini (3.8B)¶

MODEL="microsoft/Phi-3.5-mini-instruct"

No authentication required
Smaller, faster
Good for coding

TinyLlama (1.1B)¶

MODEL="TinyLlama/TinyLlama-1.1B-Chat-v1.0"

No authentication required
Very fast, minimal memory
Good for testing

Meta Llama 3.1 (8B) - May require auth¶

MODEL="meta-llama/Llama-3.1-8B-Instruct"

Requires Meta license acceptance
General purpose, good quality

Quick Test¶

To verify your authentication works:

# Check who you're logged in as
huggingface-cli whoami

# Test downloading the model
huggingface-cli download Qwen/Qwen2.5-Coder-7B-Instruct --repo-type model
# Should start downloading without 401 errors

Troubleshooting¶

"Token is already saved" message¶

If you see this when running huggingface-cli login:

A token is already saved on your machine. Run `hf auth whoami` to get more information

You're already authenticated! Just run the vLLM setup script.

Container can't access token¶

If vLLM container shows 401 errors even after login:

Check token file exists: bash ls -la ~/.cache/huggingface/token
Verify token in environment: bash echo $HF_TOKEN
Try setting explicitly: bash export HF_TOKEN=$(cat ~/.cache/huggingface/token) ./scripts/setup_vllm_thor.sh

Git credential helper not working¶

If token doesn't persist after login:

# Configure git to store credentials
git config --global credential.helper store

# Then login again
huggingface-cli login

Recommended Quick Fix¶

Use Phi-3.5-mini - No auth needed, good quality, smaller/faster:

Edit scripts/setup_vllm_thor.sh
Change line: MODEL="microsoft/Phi-3.5-mini-instruct"
Run: ./scripts/setup_vllm_thor.sh

This will work immediately without any HuggingFace setup!

Summary¶

Best Practice: 1. Run huggingface-cli login (one time setup) 2. Accept "Add token as git credential" (persists forever) 3. Run ./scripts/setup_vllm_thor.sh (uses saved token automatically)

Token is stored in: - ~/.cache/huggingface/token - Main token file - Git credential store - Backup via git credential helper - Container gets access via volume mount: -v "$HOME/.cache/huggingface:/root/.cache/huggingface"