Feature and Model Support Matrix#
This page provides an overview of UCM (Unified Cache Manager) compatibility across different models and inference frameworks. Use this matrix as a compatibility reference for model selection, deployment, and feature validation.
Legend 🧭#
Symbol |
Description |
|---|---|
✅ |
Fully supported |
❌ |
Not supported |
🟡 |
Not tested or verified |
Model Support and Feature Compatibility 🧩#
Prefix Cache Support#
This section presents prefix cache support for each model across the supported inference frameworks. This information serves as a reference for evaluating framework compatibility in deployments that require prefix cache.
Model |
vLLM |
vLLM-Ascend |
SGLang |
|---|---|---|---|
DeepSeek V3.2 |
✅ |
✅ |
✅ |
DeepSeek R1 |
✅ |
✅ |
✅ |
DeepSeek V3/3.1 |
✅ |
✅ |
✅ |
Qwen3.5 |
❌ |
❌ |
❌ |
Qwen3 |
✅ |
✅ |
✅ |
Qwen3-Moe |
✅ |
✅ |
✅ |
Qwen3-Next |
❌ |
❌ |
❌ |
Qwen2.5 |
✅ |
✅ |
✅ |
GLM-5 |
✅ |
✅ |
❌ |
GLM-4.x |
✅ |
✅ |
✅ |
MiniMax-M2.5 |
✅ |
✅ |
✅ |
Kimi-K2.5 |
❌ |
❌ |
❌ |
Note: The table lists a selected set of representative models. See Prefix Cache for more details.
Inference Enhancement Features#
This section presents support information for inference enhancement features, including Sparse Attention, ReRoPE, and CacheBlend, across the listed models and framework versions.
Model |
GsaOnDevice |
ReRoPE |
CacheBlend |
|---|---|---|---|
DeepSeek V3.2 |
✅ |
✅ |
✅ |
DeepSeek R1 |
✅ |
✅ |
✅ |
DeepSeek V3/3.1 |
✅ |
✅ |
✅ |
Qwen3 |
✅ |
✅ |
✅ |
Qwen2.5 |
✅ |
✅ |
✅ |
Note: See Sparse Attention and ReRoPE for more details.
Supported Compute Platforms and Devices#
This section presents the currently supported compute platforms and devices.
Compute Platform |
Vendor |
Device |
|---|---|---|
CANN |
Ascend |
910C, 910B |
CUDA |
NVIDIA |
H100, H20, L40, L20 |
MUSA |
Mthreads |
S5000 |
MACA |
MetaX |
C500 |
Note: The table shows only selected platforms.
Notes and Limitations 📌#
This matrix is provided as a compatibility reference for the configurations listed on this page.
Actual behavior may vary depending on hardware, runtime settings, backend changes, and model variants.
This support matrix is continuously updated. For the latest information, please refer to the GitHub issues and pull requests.