DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention
Published on: 06 October 2025
High-Level Architectural Shift
graph TD
subgraph A["DeepSeek-V3.1-Terminus (Dense Attention)"]
direction TB
style A fill:#f9f,stroke:#333,stroke-width:2px
Input1[Input Query] --> Attention1{Core Attention};
All_KV1[All Key-Value Pairs from Context] --> Attention1;
Attention1 --> Output1[Output];
Complexity1["Complexity: O(L²)"] -.-> Attention1;
end
subgraph B["DeepSeek-V3.2-Exp (Sparse Attention)"]
direction TB
style B fill:#ccf,stroke:#333,stroke-width:2px
Input2[Input Query] --> DSA{"DeepSeek Sparse Attention (DSA)"};
All_KV2[All Key-Value Pairs from Context] --> DSA;
DSA -- "Filters to Top-K Pairs" --> Attention2{Core Attention};
Input2 --> Attention2;
Attention2 --> Output2[Output];
Complexity2["Complexity: O(Lk)"] -.-> Attention2;
end
Inside DeepSeek Sparse Attention (DSA)
graph TD
subgraph "DSA Internal Workflow"
A[Input: Query Token & Full Context] --> B[1.Lightning Indexer];
B -- "Computes relevancy scores for all tokens" --> C[2.Top-k Selector];
C -- "Selects only the most relevant k tokens" --> D[Output: Sparse Key-Value Pairs];
end
D --> E{Main Attention Mechanism};
F[Original Query Token] --> E;
E --> G[Final Output];
Core Innovation and Benefits
mindmap
root((DeepSeek Sparse Attention))
::icon(fa fa-lightbulb)
Core Innovation: Selective Token Processing
Lightning Indexer
::icon(fa fa-bolt)
Rapidly scores token relevance
Top-k Selector
::icon(fa fa-filter)
Picks only the highest-scored tokens
Problem Solved
High Computational Cost of Dense Attention
Complexity is O(L²)
Scales poorly with long contexts
Key Benefits
::icon(fa fa-rocket)
Improved Efficiency
New Complexity is O(Lk)
Faster inference for long sequences
Reduced API & compute costs
Comparable Performance
Maintains model quality despite sparsity
Sources: