DeepSeek V4 on HuggingFace: How to Access and Download the Open Weights
DeepSeek V4 is fully open-source, with all model weights publicly available on HuggingFace under the permissive MIT License. Whether you want to run the model locally, fine-tune it for your use case, or simply inspect its architecture, HuggingFace is the primary distribution channel for DeepSeek V4.
This guide walks you through exactly where to find the models, what you get in each repository, how large the downloads are, and how to start using them.
DeepSeek V4 HuggingFace Repository Links
DeepSeek published four model repositories in the official deepseek-ai HuggingFace collection:
| Repository | Type | Params (Total / Active) | Precision | Size |
|---|---|---|---|---|
| deepseek-ai/DeepSeek-V4-Flash-Base | Base (pre-trained) | 284B / 13B | FP8 Mixed | ~160 GB |
| deepseek-ai/DeepSeek-V4-Flash | Instruct (RLHF-tuned) | 284B / 13B | FP4 + FP8 Mixed | ~160 GB |
| deepseek-ai/DeepSeek-V4-Pro-Base | Base (pre-trained) | 1.6T / 49B | FP8 Mixed | ~865 GB |
| deepseek-ai/DeepSeek-V4-Pro | Instruct (RLHF-tuned) | 1.6T / 49B | FP4 + FP8 Mixed | ~865 GB |
All four repositories are part of the deepseek-ai/deepseek-v4 collection.
What's Inside Each Repository
Each V4 model repository contains:
- Model weights in SafeTensors format (split across multiple shards)
- DeepSeek_V4.pdf — the full technical report
- encoding/ folder — Python scripts for building OpenAI-compatible prompts and parsing model output
- inference/ folder — detailed instructions for running the model locally
- LICENSE — MIT License file
- README with model card, benchmark tables, and citations
The technical report (DeepSeek_V4.pdf) is hosted in the Pro repository and covers the full architecture details including the Hybrid Attention mechanism, mHC, and training methodology.
License: MIT, Not Apache
A common misconception is that DeepSeek uses Apache 2.0 licensing (as it did for some earlier models). DeepSeek V4 is released under the MIT License, which is even more permissive:
- ✅ Commercial use permitted
- ✅ Modification permitted
- ✅ Distribution permitted
- ✅ Private use permitted
- ✅ No patent clauses or additional restrictions
This means you can build proprietary products on top of V4, fine-tune and redistribute derivatives, and use it in any commercial context without restriction (beyond preserving the MIT copyright notice).
How to Download DeepSeek V4 Weights
Option 1: HuggingFace CLI (Recommended)
pip install huggingface_hub
# Download V4-Flash (instruct, ~160 GB)
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash --local-dir ./DeepSeek-V4-Flash
# Download V4-Pro (instruct, ~865 GB)
huggingface-cli download deepseek-ai/DeepSeek-V4-Pro --local-dir ./DeepSeek-V4-Pro
Option 2: Python with huggingface_hub
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="deepseek-ai/DeepSeek-V4-Flash",
local_dir="./DeepSeek-V4-Flash"
)
Option 3: ModelScope (Recommended for Users in China)
DeepSeek V4 is also available on ModelScope at matching repository paths (deepseek-ai/DeepSeek-V4-Flash, etc.), which may offer faster download speeds from within mainland China.
Storage and Bandwidth Requirements
| Model | Disk Space | RAM Required | Recommended GPU Setup |
|---|---|---|---|
| V4-Flash | ~160 GB | ~160 GB VRAM | 2× H100 80GB or 8× A100 40GB |
| V4-Pro | ~865 GB | ~865 GB VRAM | 16× H100 80GB (or equivalent) |
| V4-Flash (quantized) | ~80 GB | ~80 GB VRAM | 2× RTX 4090 / 1× RTX 5090 |
| V4-Pro (quantized) | ~200 GB | ~200 GB VRAM | 4–8× H100 |
Note: DeepSeek uses FP4 + FP8 mixed precision, so the raw weights are already heavily compressed. Community-provided quantized versions (GGUF/GPTQ) are appearing on HuggingFace that can reduce these requirements further.
Running the Model: Key Setup Notes
DeepSeek V4 doesn't use a standard HuggingFace Jinja chat template. Instead, you must use the custom encoding scripts provided in the repository's encoding/ folder.
A minimal example:
from encoding_dsv4 import encode_messages, parse_message_from_completion_text
messages = [
{"role": "user", "content": "Explain the Hybrid Attention Architecture in DeepSeek V4"}
]
prompt = encode_messages(messages, thinking_mode="thinking")
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Flash")
tokens = tokenizer.encode(prompt)
For full inference setup, refer to the inference/README.md inside each repository.
Community Activity on HuggingFace
Within days of the April 24, 2026 launch, the DeepSeek V4-Pro repository saw over 123,000 downloads and 22 community Spaces built on top of it. The community quickly produced:
- GGUF quantizations for llama.cpp (enabling CPU+GPU hybrid inference)
- LM Studio-compatible versions
- Ollama builds
- Jan-compatible packages
These community-maintained quantizations are making V4-Flash runnable on a single RTX 4090 — a remarkable feat for a 284B-parameter model.
DeepSeek V4 and AI Platforms
If you prefer API access rather than managing local weights, the V4 models are also available through multiple inference providers. Platforms like Framia.pro integrate frontier AI models — including the latest DeepSeek releases — to give creators and developers seamless API access without managing infrastructure.
Conclusion
DeepSeek V4 on HuggingFace is one of the most accessible frontier model releases in AI history. Four repositories, MIT licensing, a comprehensive technical report, and custom inference tooling are all freely available. Whether you're running it on a GPU cluster, experimenting with community quantizations, or accessing it via API, HuggingFace is your starting point.