Feature and Model Support Matrix

Contents

Feature and Model Support Matrix#

This page provides an overview of UCM (Unified Cache Manager) compatibility across different models and inference frameworks. Use this matrix as a compatibility reference for model selection, deployment, and feature validation.

Legend 🧭#

Symbol	Description
✅	Fully supported
❌	Not supported
🟡	Not tested or verified

Model Support and Feature Compatibility 🧩#

Prefix Cache Support#

This section presents prefix cache support for each model across the supported inference frameworks. This information serves as a reference for evaluating framework compatibility in deployments that require prefix cache.

Model	vLLM (main)	vLLM-Ascend (main)	SGLang (main)
DeepSeek V3.2	✅	✅	✅
DeepSeek R1	✅	✅	✅
DeepSeek V3/3.1	✅	✅	✅
Qwen3.5	❌	❌	❌
Qwen3	✅	✅	✅
Qwen3-Moe	✅	✅	✅
Qwen3-Next	❌	❌	❌
Qwen2.5	✅	✅	✅
GLM-5	✅	✅	❌
GLM-4.x	✅	✅	✅
MiniMax-M2.5	✅	✅	✅
Kimi-K2.5	❌	❌	❌

Note: The table lists a selected set of representative models. See Prefix Cache for more details.

Inference Enhancement Features#

This section presents support information for inference enhancement features, including Sparse Attention, ReRoPE, and CacheBlend, across the listed models and framework versions.

Model	GsaOnDevice vLLM / vLLM-Ascend 0.11.0	ReRoPE vLLM 0.11.0	CacheBlend vLLM 0.9.2
DeepSeek V3.2	✅	✅	✅
DeepSeek R1	✅	✅	✅
DeepSeek V3/3.1	✅	✅	✅
Qwen3	✅	✅	✅
Qwen2.5	✅	✅	✅

Note: See Sparse Attention and ReRoPE for more details.

Supported Compute Platforms and Devices#

This section presents the currently supported compute platforms and devices.

Compute Platform	Vendor	Device
CANN	Ascend	910C, 910B
CUDA	NVIDIA	H100, H20, L40, L20
MUSA	Mthreads	S5000
MACA	MetaX	C500

Note: The table shows only selected platforms.

Notes and Limitations 📌#

This matrix is provided as a compatibility reference for the configurations listed on this page.
Actual behavior may vary depending on hardware, runtime settings, backend changes, and model variants.
This support matrix is continuously updated. For the latest information, please refer to the GitHub issues and pull requests.