Skip to main content

A byte-level compression library implementing RLE and LZ77 with explicit encoding formats, manual pipelines, and automatic algorithm selection.

Project description

simple-compression

simple-compression is a small Python library that implements classic byte-oriented compression algorithms with an explicit and composable API.

The library is designed to operate directly on bytearray data and provides both manual and automatic compression pipelines. All encoded outputs are self-describing and can be decoded without external metadata.

Current version: v0.2.0


Scope and goals

This project focuses on:

  • Correct, deterministic implementations of classic compression algorithms
  • Explicit encoding formats that are easy to inspect and reason about
  • A simple API for chaining multiple compression stages
  • Safe and strict decoding

This library does not attempt to compete with production compressors in performance. It is intended for correctness, clarity, and control. BECAUSE OF PYTHON OVERHEAD THIS IS VERY SLOW AND NOT PRACTICAL FOR PRODUCTION USE AS OF RIGHT NOW THIS LIBRARY IS BEST DESIGNED FOR LEARNING ABOUT COMPRESSION EVENTUALLY IT WILL BE REWRITTEN IN C AND OPTIMIZED FOR PERFORMANCE


Implemented algorithms

  • Run-Length Encoding (RLE)
  • LZ77
  • Huffman

Each algorithm has a fully defined binary format and a strict decoder.


Installation

pip install simple-compression

Basic Usage

Both of these features are expected to improve in effectiveness with more testing and tuning as well as the future implementation of a Huffman Algorithm The first usage uses the auto=True argument does a quick pass on the data to gather metrics to automatically select the algorithms and their sequence. The second usage passes the algorithm name as arguments to manually select algorithms and determine their sequence.

from simple_compression.compression import SimpleCompression

compress = SimpleCompression()

data = bytearray(b"AAAAAABBBBBCCDSADDDDDSSSCVZCSSSSWEEEFWEWAFZCVAGQWTQL")

encoded = compress.encode(data, auto=True)
decoded = compress.decode(encoded)

encoded = compress.encode(data, sequence=["RLE", "LZ77"])
decoded = compress.decode(encoded)

The decoder reads header tokens embedded at the start of the bitstream to determine which algorithms were applied and in which order. This allows for a really robust decoder which when combined with the spec documentation for each algorithm can be helpful in implementing decoders in other languages.


Algorithm Formatting

Detailed binary formats for each algorithm are documented below. RLE Format LZ77 Format HUFFMAN Format


Road Map

Currently the library is functional. It has a composable pipeline of basic compression algorithms. The probe for the auto=True argument has been redesigned to be more tunable and collect more metrics. In testing it does better at avoiding using algorithms when the data doesn't fit their use case. The next part of development will be focused on rewritting the library in C so it can be usable outside of an educational setting.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_compression-0.2.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_compression-0.2.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file simple_compression-0.2.0.tar.gz.

File metadata

  • Download URL: simple_compression-0.2.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for simple_compression-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c99f255a52975f176e2ed9fe9d1336653d50fc306e2c5620a33043c8b8b5bc86
MD5 9d3b6882b1f275b118c321ed87fa49c8
BLAKE2b-256 00c34f623bf8f6561dfc0e9eec11c799e1cda6d471c0c1a58454ade337f7617d

See more details on using hashes here.

File details

Details for the file simple_compression-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_compression-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ddfe3c245af1c059636176a9333446c6f9c29e0f4c238b2e6868b0cc0c10660
MD5 6595522dc19dff2c0b994beea27a7001
BLAKE2b-256 381e144c9656c6bb06498c77e851a4f5dd55ceb42076d368a79d265ec36387ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page