Feature and Model Support Matrix#

This page provides an overview of UCM (Unified Cache Manager) compatibility across different models and inference frameworks. Use this matrix as a compatibility reference for model selection, deployment, and feature validation.

Legend 🧭#

Symbol

Description

Fully supported

Not supported

🟡

Not tested or verified

Model Support and Feature Compatibility 🧩#

Prefix Cache Support#

This section presents prefix cache support for each model across the supported inference frameworks. This information serves as a reference for evaluating framework compatibility in deployments that require prefix cache.

Model

vLLM
(main)

vLLM-Ascend
(main)

SGLang
(main)

DeepSeek V3.2

DeepSeek R1

DeepSeek V3/3.1

Qwen3.5

Qwen3

Qwen3-Moe

Qwen3-Next

Qwen2.5

GLM-5

GLM-4.x

MiniMax-M2.5

Kimi-K2.5

Note: The table lists a selected set of representative models. See Prefix Cache for more details.

Inference Enhancement Features#

This section presents support information for inference enhancement features, including Sparse Attention, ReRoPE, and CacheBlend, across the listed models and framework versions.

Model

GsaOnDevice
vLLM / vLLM-Ascend 0.11.0

ReRoPE
vLLM 0.11.0

CacheBlend
vLLM 0.9.2

DeepSeek V3.2

DeepSeek R1

DeepSeek V3/3.1

Qwen3

Qwen2.5

Note: See Sparse Attention and ReRoPE for more details.

Supported Compute Platforms and Devices#

This section presents the currently supported compute platforms and devices.

Compute Platform

Vendor

Device

CANN

Ascend

910C, 910B

CUDA

NVIDIA

H100, H20, L40, L20

MUSA

Mthreads

S5000

MACA

MetaX

C500

Note: The table shows only selected platforms.

Notes and Limitations 📌#

  • This matrix is provided as a compatibility reference for the configurations listed on this page.

  • Actual behavior may vary depending on hardware, runtime settings, backend changes, and model variants.

  • This support matrix is continuously updated. For the latest information, please refer to the GitHub issues and pull requests.