Skip to main content

Delta compression for LLM fine-tunes - lossless or LoRA-equivalent SVD compression

Project description

∴ Sparse

Delta Compression for Fine-tuned Models and Datasets

Compress your 14GB fine-tune to 1.4GB (lossless) or 50MB (LoRA-equivalent). Reconstruct in 4 seconds.

Verified: GPT-2 compression → reconstruction → identical inference output

License: Apache 2.0 Python 3.9+ PyTorch Rust

Quick StartHow It WorksCLIPython API


What Sparse Does

Sparse compresses fine-tuned models and datasets as deltas from their base versions.

Compression Mode Size (7B) Quality Use Case
Lossless ~1.4 GB 100% Production, quality-critical
Lossy (SVD) ~50 MB ~95-99% Sharing, size-critical
Dataset Delta 60-80% savings 100% Derivative datasets

Key benefit: Works on models you've already trained - no LoRA required during training.

Works with: Full fine-tunes, RLHF, model merges, translated/augmented datasets


Quick Start

pip install sparse-llm

Compress a Fine-tune

# Lossless compression (~1.4GB for 7B model)
sparse compress meta-llama/Llama-2-7b-hf ./my-finetune -o ./my-delta

# OR: Lossy compression (~50MB, LoRA-equivalent quality)
sparse compress-lossy meta-llama/Llama-2-7b-hf ./my-finetune -o ./my-delta --rank 16

Reconstruct from Delta

# From lossless delta
sparse reconstruct meta-llama/Llama-2-7b-hf ./my-delta -o ./reconstructed-model

# From lossy delta
sparse reconstruct-lossy meta-llama/Llama-2-7b-hf ./my-delta -o ./reconstructed-model

Dataset Delta

# Compress derivative dataset
sparse dataset-compress squad squad_v2 -o ./squad_v2_delta

# Reconstruct
sparse dataset-reconstruct ./squad_v2_delta

How It Works

Fine-tuned Model (14GB)  -  Base Model (14GB)  =  Delta (1.4GB or 50MB)
                                                        ↓
                                              Reconstruct: Base + Delta
  • Lossless: Sparse + INT8 encoding → ~10% of original size, 100% quality
  • Lossy (SVD): Low-rank approximation → ~0.4% of original, ~95-99% quality

CLI Reference

# Lossless compression (100% quality)
sparse compress <base> <finetune> -o <output>
sparse reconstruct <base> <delta> [-o <output>]

# Lossy compression (~50MB, LoRA-equivalent quality)
sparse compress-lossy <base> <finetune> -o <output> [--rank 16]
sparse reconstruct-lossy <base> <delta> [-o <output>]

# Dataset commands
sparse dataset-compress <base> <derivative> -o <output>
sparse dataset-reconstruct <delta_dir>
sparse dataset-estimate <base> <derivative>

# Info
sparse info <path>

Python API

from core import compress_delta, reconstruct_from_delta
from core import compress_delta_svd_full, reconstruct_from_svd_delta

# Lossless compression
manifest = compress_delta(
    base_model_id="meta-llama/Llama-2-7b-hf",
    finetune_model_id="./my-finetune",
    output_path="./my-delta"
)
print(f"Compression: {manifest.compression_ratio:.1f}x")  # ~10x

# Extract LoRA (lossy, LoRA-equivalent)
manifest = compress_delta_svd_full(
    base_model_id="meta-llama/Llama-2-7b-hf",
    finetune_model_id="./my-finetune",
    output_path="./my-svd-delta",
    rank=16  # Like LoRA rank
)
print(f"Compression: {manifest.compression_ratio:.1f}x")  # ~280x

# Reconstruct (lossless)
model = reconstruct_from_delta("meta-llama/Llama-2-7b-hf", "./my-delta")

# Reconstruct from extracted LoRA
model = reconstruct_from_svd_delta("meta-llama/Llama-2-7b-hf", "./my-lora-delta")

Dataset API

from core import compress_dataset_delta, reconstruct_from_dataset_delta

# Compress
manifest = compress_dataset_delta("squad", "squad_v2", "./squad_v2_delta")
print(f"Savings: {manifest['size_stats']['savings_pct']:.1f}%")

# Reconstruct
dataset = reconstruct_from_dataset_delta("./squad_v2_delta")

Performance

All optimizations are automatic - no configuration needed:

  • Rust SIMD acceleration: 5-10x faster compression
  • Base model caching: ~20s saved per compression
  • Smart heuristics: 10-20% better compression ratios
  • GPU reconstruction: 2-3x faster on CUDA
  • Lazy loading: 50-70% memory reduction for 30B+ models

Typical speedup: ~60s → ~8-12s (5-8x faster)

📚 Advanced optimizations: See API_REFERENCE.md for MmapDeltaStorage, DifferentialCompressor, and other utilities.


Sparse vs LoRA

LoRA/PEFT Sparse
When applied During training After training
Works on existing models
Lossless option

Key insight: sparse compress-lossy gives you LoRA-sized files (~50MB) from models that weren't trained with LoRA.


Requirements

  • Python 3.9+
  • PyTorch 2.0+
  • transformers
  • Rust (included in wheel, no setup needed)

License

Apache 2.0 - See LICENSE for details.

Free for personal and commercial use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sparse_llm-0.0.5-cp39-abi3-win_amd64.whl (636.0 kB view details)

Uploaded CPython 3.9+Windows x86-64

sparse_llm-0.0.5-cp39-abi3-manylinux_2_34_x86_64.whl (888.6 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

sparse_llm-0.0.5-cp39-abi3-macosx_11_0_arm64.whl (680.6 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file sparse_llm-0.0.5-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: sparse_llm-0.0.5-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 636.0 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparse_llm-0.0.5-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c7e1022abb38e84ebc1a524ae73da3b6dd58358b42d111cf2b655ac67c620d80
MD5 036252a42fa6b2df9542a64ed944de09
BLAKE2b-256 6494d8759fb0521010439efe9f63b5526b1c7105ad662d336565ac9ab411ee3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparse_llm-0.0.5-cp39-abi3-win_amd64.whl:

Publisher: build-artifacts.yml on gagansuie/sparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparse_llm-0.0.5-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sparse_llm-0.0.5-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a6a8214c55ba64241f7e0951d01aacb423cd2f3fdf75ce72fa37e59dd1b76ac7
MD5 8c7031c68bb9cd1dfde449cb2101d1c7
BLAKE2b-256 a1db4fbff9c90cbcf36e26441bd67d125f1ced734200c40de7b1077f7d96848f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparse_llm-0.0.5-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: build-artifacts.yml on gagansuie/sparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparse_llm-0.0.5-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparse_llm-0.0.5-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 29d43f28b2838192812499b9df5604b62b10cb8518c970a39185289661f30ddd
MD5 580512a7c76f156e9811642f57b8c3c2
BLAKE2b-256 8daa86e9411985cf297b912e3b0afbb659a759d18432353052beaa2520de0e60

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparse_llm-0.0.5-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: build-artifacts.yml on gagansuie/sparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page