Extending UCM Store#
π Overview#
In the Unified Cache Manager (UCM) architecture, the Store component handles:
Space Management: Allocation and scaling of KV Cache storage
Persistence: Durable storage and recovery of KV Cache data
Tiered Transfer: Efficient data movement between storage hierarchies
Data Processing: Quantization, compression, and encoding transformations
Built-in Store Ecosystem#
UCM provides production-ready Store implementations with the following dependency architecture:

Extension Options#
Beyond built-in Stores, UCM supports three extension patterns:
Method |
Implementation |
Performance |
Use Case |
Rating |
Reference Implementation |
|---|---|---|---|---|---|
Pure Python |
Python only |
βββββ |
Prototyping, algorithm validation |
ββ |
|
Python/C++ Hybrid |
Python + C++ |
βββββ |
Complex logic with performance-critical paths |
βββ |
|
Pure C++ |
C++ only |
βββββ |
Production, high-performance scenarios |
βββββ |
|
π§ Pure Python Extension#
For rapid prototyping and algorithm validation where development speed outweighs runtime performance.
Implementation Steps#
Inherit the base class
from ucm.store.ucmstore_v1 import Task, UcmKVStoreBaseV1 class UcmCustomPythonStore(UcmKVStoreBaseV1): def __init__(self, config: dict): super().__init__(config)
Implement required methods
@abstractmethod def lookup(self, block_ids: List[bytes]) -> List[bool]: """Check presence of blocks in external storage.""" pass @abstractmethod def lookup_on_prefix(self, block_ids: List[bytes]) -> int: """Check presence of blocks in external storage.""" pass @abstractmethod def prefetch(self, block_ids: List[bytes]) -> None: """Asynchronously prefetch blocks into high-speed cache.""" pass @abstractmethod def load(self, block_ids: List[bytes], shard_index: List[int], dst_tensor: List[List[torch.Tensor]]) -> Task: """Initiate transfer of KV cache from storage to device.""" pass @abstractmethod def dump(self, block_ids: List[bytes], shard_index: List[int], src_tensor: List[List[torch.Tensor]]) -> Task: """Initiate transfer of KV cache from device to storage.""" pass @abstractmethod def load_data(self, block_ids: List[bytes], shard_index: List[int], dst_addr: List[List[int]] | np.ndarray) -> Task: """Low-level fetch: copy KV data to device pointers.""" pass @abstractmethod def dump_data(self, block_ids: List[bytes], shard_index: List[int], src_addr: List[List[int]] | np.ndarray) -> Task: """Low-level dump: copy KV data from device pointers.""" pass @abstractmethod def wait(self, task: Task) -> None: """Block until the given transfer task completes.""" pass @abstractmethod def check(self, task: Task) -> bool: """Non-blocking poll for task completion.""" pass
Note: Full interface specifications are in
ucm/store/ucmstore_v1.pyRegister your Store
# ucm/store/factory_v1.py UcmConnectorFactoryV1.register_connector( "UcmCustomPythonStore", "ucm.store.custom.connector", "UcmCustomPythonStore" )
β οΈ Performance Warning: Python implementations are GIL-bound and unsuitable for high-throughput scenarios. Use only for development and testing.
β‘ Hybrid Python/C++ Extension#
Best for balancing productivity with performanceβimplement hot paths in C++, orchestrate with Python.
Architecture#
βββββββββββββββββββββββββββ
β Python Wrapper Layer β β Business logic, config, API surface
β (ucm/store/cpy/*) β
ββββββββββββββ¬βββββββββββββ
β pybind11
βΌ
βββββββββββββββββββββββββββ
β C++ Core Layer β β Performance-critical operations
β (ucm/store/cc/*) β Memory management, compute kernels
βββββββββββββββββββββββββββ
Implementation Steps#
Implement C++ core (
hybrid_store.cc)#include "ucm/store/ucmstore_v1.h" namespace UC::HybridStore { class HybridStore : public StoreV1 { public: ~HybridStore() override; Status Setup(const Detail::Dictionary& config) override; std::string Readme() const override; // Core operations Expected<std::vector<uint8_t>> Lookup(const Detail::BlockId* blocks, size_t num) override; Expected<ssize_t> LookupOnPrefix(const Detail::BlockId* blocks, size_t num) override; void Prefetch(const Detail::BlockId* blocks, size_t num) override; Expected<Detail::TaskHandle> Load(Detail::TaskDesc task) override; Expected<Detail::TaskHandle> Dump(Detail::TaskDesc task) override; Expected<bool> Check(Detail::TaskHandle taskId) override; Status Wait(Detail::TaskHandle taskId) override; }; } // namespace UC::HybridStore
Note: Full interface specifications are in
ucm/store/ucmstore_v1.hCreate Python bindings (
hybrid_store.cpy.cc)#include <pybind11/pybind11.h> PYBIND11_MODULE(ucmhybridstore, m) { py::class_<UC::HybridStore::HybridStore>(m, "HybridStore") .def(py::init<const Config&>()) .def("Lookup", &UC::HybridStore::HybridStore::Lookup, py::arg("blocks_ids").noconvert()) .def("Load", &UC::HybridStore::HybridStore::Load) ...; // other interface }
Python wrapper layer
from ucmhybridstore import HybridStore class UcmHybridStoreWrapper: def __init__(self, config: dict): self._store = HybridStore(config) def lookup_with_retry(self, block_ids): """Add Python-level retry logic""" result = self._store.Lookup(block_ids) if not result: result = self._handle_miss(block_ids) return result
π Pure C++ Extension (Recommended)#
Production-ready implementation with maximum performance and resource control.
Why C++?#
Zero-overhead abstraction: Direct memory access, no Python runtime overhead
Full resource control: Explicit memory management and threading
Seamless integration: Stackable and chainable with built-in Stores
Implementation Steps#
Define header (
custom_store.h)#pragma once #include "ucm/store/ucmstore_v1.h" namespace UC::CustomStore { class CustomStore : public StoreV1 { public: ~CustomStore() override; Status Setup(const Detail::Dictionary& config) override; std::string Readme() const override; // Required interfaces Expected<std::vector<uint8_t>> Lookup(const Detail::BlockId* blocks, size_t num) override; Expected<ssize_t> LookupOnPrefix(const Detail::BlockId* blocks, size_t num) override; void Prefetch(const Detail::BlockId* blocks, size_t num) override; Expected<Detail::TaskHandle> Load(Detail::TaskDesc task) override; Expected<Detail::TaskHandle> Dump(Detail::TaskDesc task) override; Expected<bool> Check(Detail::TaskHandle taskId) override; Status Wait(Detail::TaskHandle taskId) override; }; } // namespace UC::CustomStore
Note: Full interface specifications are in
ucm/store/ucmstore_v1.hExpose factory function (
custom_store.cc)extern "C" UC::StoreV1* MakeCustomStore() { return new UC::CustomStore::CustomStore(); }
CMake configuration (
CMakeLists.txt)file(GLOB_RECURSE UCM_CUSTOM_STORE_CC_SOURCE_FILES "./cc/*.cc") add_library(customstore SHARED ${UCM_CUSTOM_STORE_CC_SOURCE_FILES}) target_include_directories(customstore PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/cc) target_link_libraries(customstore PUBLIC storeintf) file(RELATIVE_PATH INSTALL_REL_PATH ${UCM_ROOT_DIR} ${CMAKE_CURRENT_SOURCE_DIR}) install(TARGETS customstore LIBRARY DESTINATION ${INSTALL_REL_PATH} COMPONENT ucm)
Dynamic registration (
custom_store.py)from ucm.store.pipeline.connector import UcmPipelineStoreBuilder def _custom_pipeline_builder(config: Dict[str, object], pipeline: ucmpipelinestore.PipelineStore): pipeline.Stack("Custom", str("custom/libcustomstore.so"), config) UcmPipelineStoreBuilder.register("Custom", _custom_pipeline_builder)
YAML configuration
ucm_connectors: - ucm_connector_name: "UcmPipelineStore" ucm_connector_config: store_pipeline: "Custom" # ... custom config
Best Practices#
β Memory Management: Use UCM smart pointers and memory poolsβavoid raw
new/deleteβ Exception Safety: Return
UC::Statusobjects instead of throwing exceptionsβ Thread Safety: Implementations must be thread-safe; UCM calls from multiple threads concurrently
β Performance: Annotate hot paths with
UCM_PROFILER_SCOPE
π― Quick Decision Guide#
Decision Matrix
Requirement |
Pure Python |
Hybrid |
Pure C++ |
|---|---|---|---|
Development speed |
βββββ |
ββββ |
βββ |
Runtime performance |
βββ |
ββββ |
βββββ |
Threading support |
β |
β οΈ |
β |
Production ready |
β |
β οΈ |
β |
β Pre-implementation Checklist#
[ ] Reviewed
ucm/store/ucmstore_v1.h[ ] Reviewed
ucm/store/ucmstore_v1.py[ ] Defined supported data types and compression algorithms
[ ] Estimated target QPS and latency SLOs
[ ] Prepared unit tests (reference:
ucm/store/test/)[ ] Selected extension method based on performance requirements
[ ] Created stub implementation and validated registration
π€ Getting Help#
Issues: Report bugs
Examples: See
ucm/store/test/e2e
Next Steps: Once your Store is implemented, see Prefix Cache Guide for pipeline configuration.