DeepSeek V4 Model Card: Full Technical Reference (2026)

Complete DeepSeek V4 model card: full specs, API reference, pricing, benchmark table, local deployment guide, and technical notes for V4-Pro and V4-Flash.

DeepSeek V4 Model Card: Full Technical Reference for Developers

The DeepSeek V4 model card consolidates everything a developer needs to understand and deploy the V4 series. This reference covers the full technical specifications, access methods, known limitations, and usage guidelines for both V4-Pro and V4-Flash.

Model Identity

Field	DeepSeek-V4-Pro	DeepSeek-V4-Flash
Model ID	`deepseek-v4-pro`	`deepseek-v4-flash`
Developer	DeepSeek-AI (Hangzhou DeepSeek Artificial Intelligence Co., Ltd.)
Release Date	April 24, 2026 (Preview)
License	MIT License
Model Type	Decoder-only Transformer, MoE
Architecture	Hybrid Attention (CSA + HCA) + mHC
Total Parameters	1.6T	284B
Active Parameters	49B	13B
Context Length	1,000,000 tokens	1,000,000 tokens
Precision	FP4 + FP8 Mixed	FP4 + FP8 Mixed
Download Size	~865 GB	~160 GB

HuggingFace Repository Map

Repository	Type	URL
DeepSeek-V4-Pro	Instruct (RLHF-tuned)	huggingface.co/deepseek-ai/DeepSeek-V4-Pro
DeepSeek-V4-Pro-Base	Pre-trained base	huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base
DeepSeek-V4-Flash	Instruct (RLHF-tuned)	huggingface.co/deepseek-ai/DeepSeek-V4-Flash
DeepSeek-V4-Flash-Base	Pre-trained base	huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base

API Reference

Endpoints

Base URL: https://api.deepseek.com/v1
Chat Completions: POST /chat/completions
Compatible formats: OpenAI ChatCompletions API, Anthropic Messages API

Model Names (API)

deepseek-v4-pro — Full capability flagship
deepseek-v4-flash — Fast and cost-efficient

⚠️ Deprecated (retiring July 24, 2026): deepseek-chat, deepseek-reasoner

Pricing

Model	Input	Output
deepseek-v4-flash	$0.14 / 1M tokens	$0.28 / 1M tokens
deepseek-v4-pro	$1.74 / 1M tokens	$3.48 / 1M tokens

Architecture Details

Hybrid Attention System

Layer Type	Mechanism	Purpose
Recent token layers	Standard attention	Full fidelity for nearby context
Mid-range token layers	Compressed Sparse Attention (CSA)	Efficient access to moderate-distance context
Long-range token layers	Heavily Compressed Attention (HCA)	Compact representation of distant history

Efficiency vs V3.2 at 1M context:

FLOPs: 27% of V3.2 (73% reduction)
KV Cache: 10% of V3.2 (90% reduction)

Training Innovations

Innovation	Description
Optimizer	Muon (replaces AdamW)
Residual connections	mHC (Manifold-Constrained Hyper-Connections)
Pre-training data	32T+ diverse tokens
Post-training Stage 1	Expert specialization via SFT + RL (GRPO)
Post-training Stage 2	Unified consolidation via on-policy distillation

Inference Modes

Mode	API Parameter	Thinking Budget	Context Requirement
Non-think	`"thinking": {"type": "disabled"}`	None	Standard
Think High	`"thinking": {"type": "enabled", "budget_tokens": N}`	User-defined	Standard
Think Max	Special system prompt + `"thinking": {"type": "max"}`	Extended	384K+ tokens recommended

Recommended Sampling Parameters

{
  "temperature": 1.0,
  "top_p": 1.0
}

Benchmark Reference

V4-Pro-Max vs Frontier Models

Benchmark	V4-Pro Max	Opus 4.6 Max	GPT-5.4 xHigh	Gemini-3.1-Pro High
MMLU-Pro	87.5%	89.1%	87.5%	91.0%
GPQA Diamond	90.1%	91.3%	93.0%	94.3%
HLE	37.7%	40.0%	39.8%	44.4%
LiveCodeBench	93.5%	88.8%	N/A	91.7%
Codeforces	3206	N/A	3168	3052
SWE-bench Verified	80.6%	80.8%	N/A	80.6%
SWE-bench Pro	55.4%	57.3%	57.7%	54.2%
Terminal Bench 2.0	67.9%	65.4%	75.1%	68.5%
MRCR 1M	83.5%	92.9%	N/A	76.3%
CorpusQA 1M	62.0%	71.7%	N/A	53.8%

Local Deployment Reference

Configuration	Storage	VRAM	Min GPU Setup
V4-Flash (Full)	160 GB	~160 GB	2× H100 80GB
V4-Flash (Q4 quant)	~80 GB	~80 GB	RTX 5090
V4-Pro (Full)	865 GB	~865 GB	16× H100 80GB
V4-Pro (Q4 quant)	~200–400 GB	~200–400 GB	4–8× H100 80GB

Chat Template

DeepSeek V4 does not use a standard HuggingFace Jinja chat template. Use the custom encoding scripts in each repository's encoding/ folder.

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

prompt = encode_messages(messages, thinking_mode="no_think")  
# Options: "no_think", "thinking", "max_thinking"

Known Limitations

Text-only at launch: No native image, audio, or video understanding in the April 2026 preview release
Preview status: Edge cases may exist; DeepSeek recommends relying on official accounts for updates
Think Max context requirement: 384K+ token context window required for best Think Max performance
Large download: V4-Pro at 865 GB requires significant bandwidth and storage for local deployment
Chat template: Non-standard encoding requires using repository-provided scripts rather than standard HuggingFace pipeline tools

Contact and Support

Official Twitter: @deepseek_ai
GitHub: github.com/deepseek-ai
HuggingFace: huggingface.co/deepseek-ai
API Documentation: api-docs.deepseek.com
Email: service@deepseek.com
Web Chat: chat.deepseek.com

For developers building on platforms like Framia.pro that integrate DeepSeek V4's capabilities, this model card serves as the authoritative technical reference for all integration decisions.

Citation

@misc{deepseekai2026deepseekv4,
  title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
  author={DeepSeek-AI},
  year={2026},
}