Skip to main content

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

Project description

lmcache logo

Docs PyPI PyPI - Python Version Unit Tests Code Quality Integration Tests


OpenSSF Best Practices OpenSSF Scorecard Ask DeepWiki GitHub commit activity PyPI - Downloads YouTube Channel Views


| Blog | Documentation | Join Slack | Interest Form | Roadmap

Summary

LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts all over the datacenter (including GPU, CPU, Disk and even S3) with a wide range of acceleration technqiue (zero cpu copy, NIXL, GDS and more). LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

performance

LMCache is used, integrated, or referenced across a growing ecosystem of LLM serving platforms, infrastructure providers, and open-source projects:

For more details, please check our Ray Summit talk and technical report.

Features

  • 🔥 Integration with vLLM v1 with the following features:
    • High performance CPU KVCache offloading
    • Disaggregated prefill
    • P2P KVCache sharing
  • Integration with SGLang for KV cache offloading
  • Storage support as follows:
  • Installation support through pip and latest vLLM

Installation

To use LMCache, simply install lmcache from your package manager, e.g. pip:

pip install lmcache

Works on Linux NVIDIA GPU platform.

More detailed installation instructions are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any "undefined symbol" or torch mismatch versions can be resolved in the documentation.

Getting started

The best way to get started is to checkout the Quickstart Examples in the docs.

Documentation

Check out the LMCache documentation which is available online.

We also post regularly in LMCache blogs.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with LMCache.

Interested in Connecting?

Fill out the interest form, sign up for our newsletter, join LMCache slack, or drop an email, and our team will reach out to you!

Community meeting

The community meeting Zoom Link for LMCache is hosted bi-weekly. All are welcome to join!

Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – Add to Google Calendar

We keep notes from each meeting on this document for summaries of standups, discussion, and action items.

Recordings of meetings are available on the YouTube LMCache channel.

Contributing

We welcome and value all contributions and collaborations. Please check out Contributing Guide on how to contribute.

We continually update [Onboarding] Welcoming contributors with good first issues!

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,
  title={Cachegen: Kv cache compression and streaming for fast large language model serving},
  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
  pages={38--56},
  year={2024}
}

@article{cheng2024large,
  title={Do Large Language Models Need a Content Delivery Network?},
  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
  journal={arXiv preprint arXiv:2409.13761},
  year={2024}
}

@inproceedings{10.1145/3689031.3696098,
  author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
  title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},
  year = {2025},
  url = {https://doi.org/10.1145/3689031.3696098},
  doi = {10.1145/3689031.3696098},
  booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},
  pages = {94–109},
}

@article{cheng2025lmcache,
  title={LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference},
  author={Cheng, Yihua and Liu, Yuhan and Yao, Jiayi and An, Yuwei and Chen, Xiaokun and Feng, Shaoting and Huang, Yuyang and Shen, Samuel and Du, Kuntai and Jiang, Junchen},
  journal={arXiv preprint arXiv:2510.09665},
  year={2025}
}

Socials

Linkedin | Twitter | Youtube

License

The LMCache codebase is licensed under Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcache-0.4.4.tar.gz (3.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lmcache-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file lmcache-0.4.4.tar.gz.

File metadata

  • Download URL: lmcache-0.4.4.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lmcache-0.4.4.tar.gz
Algorithm Hash digest
SHA256 edfd2fb6147a26e73092c8ffd34e852c1e1d96f3236457d8879c42944cffbd39
MD5 98df192352b03e014a071a1282bb49e1
BLAKE2b-256 a2b0a535e8fdd07083b7944f8b3ea6222e05da5ce5e18f37c7aaf938fb157fdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.4.4.tar.gz:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 15720c918342d93b36549ee4b97eddecbe50bdcd0df66f7ad9ad7c44a22cce08
MD5 4e939c8a7133f35e75b1adb9e37d6082
BLAKE2b-256 44ebe951ce0ee170c73f0c92eb56c1fbe653ea8e72481f9c988cee01ade646d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.4.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e802f69bdc2a838deb140b4c9ee9dbcf4e775d6c793f3eb3ed563dcecac5b645
MD5 7d915cb31a4d698b3b4ccd89697628fe
BLAKE2b-256 83cecff5b389fdb0bcdc7f2d7905308febd5f9669c686b9670a9a0510af6ca1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.4.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8fa54f0ab2058477cf3349551baa600b0a04ac35193d8851e6103dae9e56f1f9
MD5 ab5a25263d8f963f1a573d3a1a2f35d2
BLAKE2b-256 f52b392083124a0d60ff39c3ba516e7e64438d86e60b06aae7140cb002e9cc9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.4.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1224c244be8bd3b2646234a5c3052106b77b3f5397301996a0f8ff6d2ea26c8a
MD5 c2e0053ab57f4f140c06dfcf03e70ecd
BLAKE2b-256 6dd36ffd070830f802e4e25f272f4bdcf050f0f3d25c630bfd26d7ca9a2eff70

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.4.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page