DeepSeek V4 on HuggingFace: How to Access and Download the Open Weights

DeepSeek V4-Pro and V4-Flash weights are freely available on HuggingFace under MIT License. Here's how to find, download, and start using them locally.

by Framia

DeepSeek V4 on HuggingFace: How to Access and Download the Open Weights

DeepSeek V4 is fully open-source, with all model weights publicly available on HuggingFace under the permissive MIT License. Whether you want to run the model locally, fine-tune it for your use case, or simply inspect its architecture, HuggingFace is the primary distribution channel for DeepSeek V4.

This guide walks you through exactly where to find the models, what you get in each repository, how large the downloads are, and how to start using them.


DeepSeek published four model repositories in the official deepseek-ai HuggingFace collection:

Repository Type Params (Total / Active) Precision Size
deepseek-ai/DeepSeek-V4-Flash-Base Base (pre-trained) 284B / 13B FP8 Mixed ~160 GB
deepseek-ai/DeepSeek-V4-Flash Instruct (RLHF-tuned) 284B / 13B FP4 + FP8 Mixed ~160 GB
deepseek-ai/DeepSeek-V4-Pro-Base Base (pre-trained) 1.6T / 49B FP8 Mixed ~865 GB
deepseek-ai/DeepSeek-V4-Pro Instruct (RLHF-tuned) 1.6T / 49B FP4 + FP8 Mixed ~865 GB

All four repositories are part of the deepseek-ai/deepseek-v4 collection.


What's Inside Each Repository

Each V4 model repository contains:

  • Model weights in SafeTensors format (split across multiple shards)
  • DeepSeek_V4.pdf — the full technical report
  • encoding/ folder — Python scripts for building OpenAI-compatible prompts and parsing model output
  • inference/ folder — detailed instructions for running the model locally
  • LICENSE — MIT License file
  • README with model card, benchmark tables, and citations

The technical report (DeepSeek_V4.pdf) is hosted in the Pro repository and covers the full architecture details including the Hybrid Attention mechanism, mHC, and training methodology.


License: MIT, Not Apache

A common misconception is that DeepSeek uses Apache 2.0 licensing (as it did for some earlier models). DeepSeek V4 is released under the MIT License, which is even more permissive:

  • ✅ Commercial use permitted
  • ✅ Modification permitted
  • ✅ Distribution permitted
  • ✅ Private use permitted
  • ✅ No patent clauses or additional restrictions

This means you can build proprietary products on top of V4, fine-tune and redistribute derivatives, and use it in any commercial context without restriction (beyond preserving the MIT copyright notice).


How to Download DeepSeek V4 Weights

pip install huggingface_hub

# Download V4-Flash (instruct, ~160 GB)
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash --local-dir ./DeepSeek-V4-Flash

# Download V4-Pro (instruct, ~865 GB)
huggingface-cli download deepseek-ai/DeepSeek-V4-Pro --local-dir ./DeepSeek-V4-Pro

Option 2: Python with huggingface_hub

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="deepseek-ai/DeepSeek-V4-Flash",
    local_dir="./DeepSeek-V4-Flash"
)

DeepSeek V4 is also available on ModelScope at matching repository paths (deepseek-ai/DeepSeek-V4-Flash, etc.), which may offer faster download speeds from within mainland China.


Storage and Bandwidth Requirements

Model Disk Space RAM Required Recommended GPU Setup
V4-Flash ~160 GB ~160 GB VRAM 2× H100 80GB or 8× A100 40GB
V4-Pro ~865 GB ~865 GB VRAM 16× H100 80GB (or equivalent)
V4-Flash (quantized) ~80 GB ~80 GB VRAM 2× RTX 4090 / 1× RTX 5090
V4-Pro (quantized) ~200 GB ~200 GB VRAM 4–8× H100

Note: DeepSeek uses FP4 + FP8 mixed precision, so the raw weights are already heavily compressed. Community-provided quantized versions (GGUF/GPTQ) are appearing on HuggingFace that can reduce these requirements further.


Running the Model: Key Setup Notes

DeepSeek V4 doesn't use a standard HuggingFace Jinja chat template. Instead, you must use the custom encoding scripts provided in the repository's encoding/ folder.

A minimal example:

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

messages = [
    {"role": "user", "content": "Explain the Hybrid Attention Architecture in DeepSeek V4"}
]

prompt = encode_messages(messages, thinking_mode="thinking")

import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Flash")
tokens = tokenizer.encode(prompt)

For full inference setup, refer to the inference/README.md inside each repository.


Community Activity on HuggingFace

Within days of the April 24, 2026 launch, the DeepSeek V4-Pro repository saw over 123,000 downloads and 22 community Spaces built on top of it. The community quickly produced:

  • GGUF quantizations for llama.cpp (enabling CPU+GPU hybrid inference)
  • LM Studio-compatible versions
  • Ollama builds
  • Jan-compatible packages

These community-maintained quantizations are making V4-Flash runnable on a single RTX 4090 — a remarkable feat for a 284B-parameter model.


DeepSeek V4 and AI Platforms

If you prefer API access rather than managing local weights, the V4 models are also available through multiple inference providers. Platforms like Framia.pro integrate frontier AI models — including the latest DeepSeek releases — to give creators and developers seamless API access without managing infrastructure.


Conclusion

DeepSeek V4 on HuggingFace is one of the most accessible frontier model releases in AI history. Four repositories, MIT licensing, a comprehensive technical report, and custom inference tooling are all freely available. Whether you're running it on a GPU cluster, experimenting with community quantizations, or accessing it via API, HuggingFace is your starting point.