A fast quantile estimator for streamed data. Produces a quantile estimate without storing any data points.

Project description

PyQuantile

PyQuantile is a modified implementation of the P² algorithm. It uses an N-Dependent Hybrid approach, that smoothly transitions the quantile marker adjustment from the stable linear estimate (when the sample count N is small, stabilizing initial convergence) to the highly accurate Parabolic P² Interpolation (when N exceeds a specified threshold, maximizing long-term accuracy). The rationale for blending interpolations is to leverage the stability of linear interpolation for small datasets, while transitioning to the higher asymptotic accuracy of parabolic interpolation as the sample size grows larger.

PyQuantile's main goals:

Handle smaller datasets, remaining more stable compared with the traditional P² algorithm.
Provide extremely fast estimation for streaming data, suitable for real-time analytics and large-scale applications.
Low overhead regardless of the size of the data stream, with minimal CPU usage per update.
Achieve strong accuracy for most quantiles and distributions, with error rates that are acceptable for real-world use cases.
Incoming datapoints in the stream cannot be stored, making the estimator ideal for environments with strict memory or privacy constraints.

PyQuantile demonstrates typical P² performance characteristics for streaming quantile estimation. Memory usage remains constant (O(1)) regardless of data volume, using very small MiB of base memory with no growth even after processing millions of values. Processing speed is impressive at 1.2-2.0 million values per second with consistent sub-millisecond latency (0.001-0.002ms per operation).

Initial memory allocation < 2.0 MiB, which is a one-time cost
Different quantile values uses the same memory
Processing 100 values: No additional memory
Processing 1,000 values: No additional memory
Processing 10,000 values: No additional memory
Processing 100,000 values: No additional memory
Processing 1,000,000 values: Only ~0.13 MiB increase

Performance Comparison: PyQuantile vs Traditional P²

Very Early Stage (N ≤ 100)

Distribution	PyQuantile	Traditional P²	Winner
Normal (0.95)	13.17%	14.30%	PyQuantile (8% better)
Normal (0.99)	33.35%	34.77%	PyQuantile (4% better)
Exponential (0.95)	8.54%	16.15%	PyQuantile (47% better)
Log-Normal (0.99)	45.84%	59.88%	PyQuantile (23% better)

PyQuantile is more stable and consistent in early stages, especially dramatic on skewed distributions. Traditional P² can be unstable with limited samples.

Early Stage (100 < N ≤ 1000)

Results are mixed - both approaches have converged significantly, with differences becoming less pronounced.

Late Stage (N > 10,000)

Both approaches achieve excellent accuracy (<1%), proving both converge well.

Final Accuracy (N = 50,000)

Virtually identical - both achieve 0.03-0.37% error.

PyQuantile vs Other Streaming Algorithms

Running PyQuantile against t-digest and DDSketch, both of which use historical data, PyQuantile only maintains 5 markers at 40bytes and remains competitive with accuracy, but much faster.

Performance Results (100,000 samples, q=0.95, Exponential Distribution)

Algorithm	Speed (samples/sec)	Final Error	Trade-off
PyQuantile	1,241K/sec	0.11%	Best balance
t-digest	37K/sec	0.01%	Slow but most accurate
DDSketch	684K/sec	0.10%	Good balance, but slower than PyQuantile

PyQuantile stores no historical data, only 5 markers (40 bytes)
PyQuantile Achieves <0.2% error without historical data
t-digest keeps centroids, DDSketch keeps buckets
PyQuantile is 2x faster than DDSketch, 34x faster than t-digest
PyQuantile is O(1) space complexity vs O(log n) or O(√n) for others

Installation

You can install PyQuantile via pip:

pip install pyquantile

Usage

Using PyQuantile is simple. Here is an example of how to import it and add samples to an estimator:

import pyquantile as pq
estimator = pq.QuantileEstimator(0.75)

Samples are added to an estimator like this:

estimator.add(10)
estimator.add(20)
estimator.add(30)

We can add samples via a loop, and .quantile() can be called at any time to get the latest estimate. The following code will simulate a stream of data for 60 seconds and plot accuracy of quantile estimates over time:

duration_seconds = 60
quantile = 0.75
stream = []
estimator = pq.QuantileEstimator(quantile)
accuracies = []
estimates = []
true_quantiles = []
timestamps = []

start = time.time()
while time.time() - start < duration_seconds:
    x = np.random.normal(loc=0, scale=1)
    stream.append(x)
    estimator.add(x)
    current_estimate = estimator.quantile()
    current_true = np.quantile(stream, quantile)
    accuracy = abs(current_estimate - current_true)
    estimates.append(current_estimate)
    true_quantiles.append(current_true)
    accuracies.append(accuracy)
    timestamps.append(time.time() - start)
    time.sleep(0.01)

plt.figure(figsize=(10, 6))
plt.plot(timestamps, estimates, label=f'Estimated {quantile} Quantile')
plt.plot(timestamps, true_quantiles, label=f'True {quantile} Quantile')
plt.plot(timestamps, accuracies, label='Absolute Error', color='green')
plt.xlabel('Time (s)')
plt.ylabel('Value')
plt.title('Quantile Estimation Accuracy Over Time')
plt.legend()
plt.show()

Kafka Integration Example

You can use PyQuantile in real-time streaming scenarios. For example, to process data from a Kafka topic:

from pyquantile import QuantileEstimator
from kafka import KafkaConsumer
import json

estimator = QuantileEstimator(0.75)
consumer = KafkaConsumer('my_topic', bootstrap_servers='localhost:9092')

for message in consumer:
	data_point = float(json.loads(message.value))
	estimator.add(data_point)
	print("Current estimate:", estimator.quantile())

You can adapt this pattern for other streaming services (RabbitMQ, AWS Kinesis, etc.) by feeding each incoming data point to estimator.add().

Development

To build and install the library locally, clone the repository and run: git clone https://github.com/richardsmythe/pyquantile.git cd pyquantile pip install .

Demo

A python file named test_pyquantile.py is included. This file demonstrates PyQuantile in action and plots the results against the exact quantile. Run it with python .\test_pyquantile.py. This simple script will show how PyQuantile performs against an exact quantile. With more data the absolute errors begin to even out the estimator becomes more accurate.

Contributing

Contributions are welcome! If you would like to contribute, please fork the repository and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.2.2

Nov 12, 2025

0.2.1

Nov 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyquantile-0.2.2.tar.gz (8.1 kB view details)

Uploaded Nov 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyquantile-0.2.2-cp311-cp311-win_amd64.whl (84.7 kB view details)

Uploaded Nov 12, 2025 CPython 3.11Windows x86-64

File details

Details for the file pyquantile-0.2.2.tar.gz.

File metadata

Download URL: pyquantile-0.2.2.tar.gz
Upload date: Nov 12, 2025
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pyquantile-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`b779d6a2935a6271397b8d9c1060ebbf8b6c6d7dc2f589d53afc9d475f2c263b`
MD5	`ee9a3666f6798749431dce8fbc4363ff`
BLAKE2b-256	`8d5824617cda07ba72fb5e42286e53c851e1bd7a1c55c41b6c35db694e95a6be`

See more details on using hashes here.

File details

Details for the file pyquantile-0.2.2-cp311-cp311-win_amd64.whl.

File metadata

Download URL: pyquantile-0.2.2-cp311-cp311-win_amd64.whl
Upload date: Nov 12, 2025
Size: 84.7 kB
Tags: CPython 3.11, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for pyquantile-0.2.2-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`16cb3de767245a611960f97300900e942aec1ff7ae53cddc6ed42380f89e5e2b`
MD5	`0251d09aa1ca59b0bd572e8e28fffa9c`
BLAKE2b-256	`a2331360928b02e2dc99bd32cfdb703086f0d5e8b0893951d9890734ab0aa8b2`

See more details on using hashes here.

pyquantile 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

PyQuantile

Performance Comparison: PyQuantile vs Traditional P²

Very Early Stage (N ≤ 100)

Early Stage (100 < N ≤ 1000)

Late Stage (N > 10,000)

Final Accuracy (N = 50,000)

PyQuantile vs Other Streaming Algorithms

Performance Results (100,000 samples, q=0.95, Exponential Distribution)

Installation

Usage

Samples are added to an estimator like this:

Kafka Integration Example

Development

Demo

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes