Professional Exness forex tick data preprocessing with ClickHouse backend. Provides efficient storage with lossless precision and direct queryability.
Project description
exness-data-preprocess v2.0.0
Forex tick data preprocessing with ClickHouse backend. Downloads, stores, and queries tick data from Exness public repository.
Data Inventory (Validated 2025-12-28)
| Instrument | raw_spread_ticks | standard_ticks | ohlc_1m | Date Range |
|---|---|---|---|---|
| EURUSD | 56,687,313 | 62,203,144 | 1,466,724 | 2022-01-02 → 2025-12-26 |
| XAUUSD | 166,820,410 | 167,871,871 | 1,401,953 | 2022-01-02 → 2025-12-26 |
Total: 453.6M ticks, 2.87M OHLC bars in 5.99 GiB
Requirements
- Python 3.11+
- ClickHouse running on
localhost:8123
Verify ClickHouse
# Check ClickHouse is running
clickhouse client --query "SELECT version()"
# Check exness database exists
clickhouse client --query "SELECT count() FROM exness.raw_spread_ticks"
Installation
# Using uv
uv pip install exness-data-preprocess
# From source
git clone https://github.com/Eon-Labs/exness-data-preprocess.git
cd exness-data-preprocess
uv sync
Python API
import exness_data_preprocess as edp
# Context manager (recommended)
with edp.ExnessDataProcessor() as processor:
# Query ticks
df = processor.query_ticks('EURUSD', 'raw_spread', '2024-01-01', '2024-01-31')
# Query OHLC
df = processor.query_ohlc('EURUSD', '1m', '2024-01-01', '2024-01-31')
# Get coverage info
cov = processor.get_data_coverage('EURUSD')
# Download new data (incremental)
result = processor.update_data('EURUSD', start_date='2022-01-01')
API Methods
| Method | Parameters | Returns |
|---|---|---|
query_ticks(pair, variant, start_date, end_date) |
variant: raw_spread or standard |
DataFrame |
query_ohlc(pair, timeframe, start_date, end_date) |
timeframe: 1m, 5m, 15m, 1h, 4h, 1d |
DataFrame |
get_data_coverage(pair) |
CoverageInfo | |
update_data(pair, start_date, delete_zip=True) |
UpdateResult |
Return Models
# UpdateResult
result.months_added # int: months downloaded
result.raw_ticks_added # int: raw spread ticks added
result.standard_ticks_added # int: standard ticks added
result.ohlc_bars # int: OHLC bars generated
result.storage_bytes # int: storage used
# CoverageInfo
cov.raw_spread_ticks # int: total raw spread ticks
cov.standard_ticks # int: total standard ticks
cov.ohlc_bars # int: total OHLC bars
cov.earliest_date # str: earliest timestamp
cov.latest_date # str: latest timestamp
cov.storage_bytes # int: storage used
Direct SQL Access
# Count ticks
clickhouse client --query "SELECT count() FROM exness.raw_spread_ticks WHERE instrument='EURUSD'"
# Query tick data
clickhouse client --query "
SELECT timestamp, bid, ask
FROM exness.raw_spread_ticks
WHERE instrument='EURUSD'
AND timestamp >= '2024-01-01'
AND timestamp < '2024-01-02'
LIMIT 10"
# Query OHLC
clickhouse client --query "
SELECT timestamp, open, high, low, close
FROM exness.ohlc_1m
WHERE instrument='EURUSD'
AND timestamp >= '2024-01-01'
LIMIT 10"
# Resample to 1h
clickhouse client --query "
SELECT
toStartOfHour(timestamp) AS ts,
argMin(open, timestamp) AS open,
max(high) AS high,
min(low) AS low,
argMax(close, timestamp) AS close
FROM exness.ohlc_1m
WHERE instrument='EURUSD' AND timestamp >= '2024-01-01'
GROUP BY ts
ORDER BY ts"
Schema
Tick Tables
exness.raw_spread_ticks, exness.standard_ticks:
| Column | Type | Codec |
|---|---|---|
| instrument | LowCardinality(String) | - |
| timestamp | DateTime64(6, 'UTC') | DoubleDelta, LZ4 |
| bid | Float64 | Gorilla(8), ZSTD(1) |
| ask | Float64 | Gorilla(8), ZSTD(1) |
OHLC Table
exness.ohlc_1m (27 columns):
| Category | Columns |
|---|---|
| Core | instrument, timestamp, open, high, low, close |
| Spreads | raw_spread_avg, standard_spread_avg |
| Tick counts | tick_count_raw_spread, tick_count_standard |
| Timezone | ny_hour, london_hour, ny_session, london_session |
| Holidays | is_us_holiday, is_uk_holiday, is_major_holiday |
| Sessions | is_nyse_session, is_lse_session, is_xswx_session, is_xfra_session, is_xtse_session, is_xnze_session, is_xtks_session, is_xasx_session, is_xhkg_session, is_xses_session |
Engine: ReplacingMergeTree (automatic deduplication)
Partition: toYYYYMM(timestamp)
Order: (instrument, timestamp)
Data Source
- URL: https://ticks.ex2archive.com/
- Format: Monthly ZIP files with CSV tick data
- Variants: Raw_Spread (zero spreads) + Standard (market spreads)
- Precision: Microsecond timestamps
Development
# Setup
uv sync --dev
# Test
uv run pytest
# Lint
uv run ruff check --fix .
# Type check
uv run mypy src/
Documentation
- ClickHouse User Guide - Detailed ClickHouse usage
- Database Schema - 27-column OHLC specification
- Module Architecture - 13 modules with SLOs
- ADR: ClickHouse Migration - Architecture decision
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exness_data_preprocess-2.1.0.tar.gz.
File metadata
- Download URL: exness_data_preprocess-2.1.0.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4514f06e1155486f352c01a9d27b7024468111e713b69d8d1548069d4fc722ed
|
|
| MD5 |
b42738e54b76c1b099a338cb63ccdfd7
|
|
| BLAKE2b-256 |
fdc21a057531c02b038f5b98cabab6cf7aded48fd0fe6ffcbb1f1c7f4af0f692
|
Provenance
The following attestation bundles were made for exness_data_preprocess-2.1.0.tar.gz:
Publisher:
publish.yml on terrylica/exness-data-preprocess
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
exness_data_preprocess-2.1.0.tar.gz -
Subject digest:
4514f06e1155486f352c01a9d27b7024468111e713b69d8d1548069d4fc722ed - Sigstore transparency entry: 780894423
- Sigstore integration time:
-
Permalink:
terrylica/exness-data-preprocess@6630f0a728eaf17d3af31819fee98416a36a4ab1 -
Branch / Tag:
refs/tags/v2.1.0 - Owner: https://github.com/terrylica
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6630f0a728eaf17d3af31819fee98416a36a4ab1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file exness_data_preprocess-2.1.0-py3-none-any.whl.
File metadata
- Download URL: exness_data_preprocess-2.1.0-py3-none-any.whl
- Upload date:
- Size: 41.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af8cff3ab8962e0f8989c8a56227a4a46686028a9e4cc390cf17d26fc9ec4a78
|
|
| MD5 |
50d522f506ab0840beaf3ee9e143e9cc
|
|
| BLAKE2b-256 |
24e2464c4d9c839456993b74e50f0abfb1744f06e53588b694129dfec2b5027f
|
Provenance
The following attestation bundles were made for exness_data_preprocess-2.1.0-py3-none-any.whl:
Publisher:
publish.yml on terrylica/exness-data-preprocess
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
exness_data_preprocess-2.1.0-py3-none-any.whl -
Subject digest:
af8cff3ab8962e0f8989c8a56227a4a46686028a9e4cc390cf17d26fc9ec4a78 - Sigstore transparency entry: 780894425
- Sigstore integration time:
-
Permalink:
terrylica/exness-data-preprocess@6630f0a728eaf17d3af31819fee98416a36a4ab1 -
Branch / Tag:
refs/tags/v2.1.0 - Owner: https://github.com/terrylica
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6630f0a728eaf17d3af31819fee98416a36a4ab1 -
Trigger Event:
release
-
Statement type: