A byte-level compression library implementing RLE and LZ77 with explicit encoding formats, manual pipelines, and automatic algorithm selection.
Project description
simple-compression
simple-compression is a small Python library that implements classic byte-oriented compression algorithms with an explicit and composable API.
The library is designed to operate directly on bytearray data and provides both manual and automatic compression pipelines. All encoded outputs are self-describing and can be decoded without external metadata.
Current version: v0.2.0
Scope and goals
This project focuses on:
- Correct, deterministic implementations of classic compression algorithms
- Explicit encoding formats that are easy to inspect and reason about
- A simple API for chaining multiple compression stages
- Safe and strict decoding
This library does not attempt to compete with production compressors in performance. It is intended for correctness, clarity, and control. BECAUSE OF PYTHON OVERHEAD THIS IS VERY SLOW AND NOT PRACTICAL FOR PRODUCTION USE AS OF RIGHT NOW THIS LIBRARY IS BEST DESIGNED FOR LEARNING ABOUT COMPRESSION EVENTUALLY IT WILL BE REWRITTEN IN C AND OPTIMIZED FOR PERFORMANCE
Implemented algorithms
- Run-Length Encoding (RLE)
- LZ77
- Huffman
Each algorithm has a fully defined binary format and a strict decoder.
Installation
pip install simple-compression
Basic Usage
Both of these features are expected to improve in effectiveness with more testing and tuning as well as the future implementation of a Huffman Algorithm
The first usage uses the auto=True argument does a quick pass on the data to gather metrics to automatically select the algorithms and their sequence.
The second usage passes the algorithm name as arguments to manually select algorithms and determine their sequence.
from simple_compression.compression import SimpleCompression
compress = SimpleCompression()
data = bytearray(b"AAAAAABBBBBCCDSADDDDDSSSCVZCSSSSWEEEFWEWAFZCVAGQWTQL")
encoded = compress.encode(data, auto=True)
decoded = compress.decode(encoded)
encoded = compress.encode(data, sequence=["RLE", "LZ77"])
decoded = compress.decode(encoded)
The decoder reads header tokens embedded at the start of the bitstream to determine which algorithms were applied and in which order. This allows for a really robust decoder which when combined with the spec documentation for each algorithm can be helpful in implementing decoders in other languages.
Algorithm Formatting
Detailed binary formats for each algorithm are documented below. RLE Format LZ77 Format HUFFMAN Format
Road Map
Currently the library is functional. It has a composable pipeline of basic compression algorithms.
The probe for the auto=True argument has been redesigned to be more tunable and collect more metrics.
In testing it does better at avoiding using algorithms when the data doesn't fit their use case.
The next part of development will be focused on rewritting the library in C so it can be usable outside of an educational setting.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simple_compression-0.2.0.tar.gz.
File metadata
- Download URL: simple_compression-0.2.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c99f255a52975f176e2ed9fe9d1336653d50fc306e2c5620a33043c8b8b5bc86
|
|
| MD5 |
9d3b6882b1f275b118c321ed87fa49c8
|
|
| BLAKE2b-256 |
00c34f623bf8f6561dfc0e9eec11c799e1cda6d471c0c1a58454ade337f7617d
|
File details
Details for the file simple_compression-0.2.0-py3-none-any.whl.
File metadata
- Download URL: simple_compression-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ddfe3c245af1c059636176a9333446c6f9c29e0f4c238b2e6868b0cc0c10660
|
|
| MD5 |
6595522dc19dff2c0b994beea27a7001
|
|
| BLAKE2b-256 |
381e144c9656c6bb06498c77e851a4f5dd55ceb42076d368a79d265ec36387ae
|