Skip to main content

Professional Exness forex tick data preprocessing with ClickHouse backend. Provides efficient storage with lossless precision and direct queryability.

Project description

exness-data-preprocess v2.0.0

Forex tick data preprocessing with ClickHouse backend. Downloads, stores, and queries tick data from Exness public repository.

Data Inventory (Validated 2025-12-28)

Instrument raw_spread_ticks standard_ticks ohlc_1m Date Range
EURUSD 56,687,313 62,203,144 1,466,724 2022-01-02 → 2025-12-26
XAUUSD 166,820,410 167,871,871 1,401,953 2022-01-02 → 2025-12-26

Total: 453.6M ticks, 2.87M OHLC bars in 5.99 GiB

Requirements

  • Python 3.11+
  • ClickHouse running on localhost:8123

Verify ClickHouse

# Check ClickHouse is running
clickhouse client --query "SELECT version()"

# Check exness database exists
clickhouse client --query "SELECT count() FROM exness.raw_spread_ticks"

Installation

# Using uv
uv pip install exness-data-preprocess

# From source
git clone https://github.com/Eon-Labs/exness-data-preprocess.git
cd exness-data-preprocess
uv sync

Python API

import exness_data_preprocess as edp

# Context manager (recommended)
with edp.ExnessDataProcessor() as processor:
    # Query ticks
    df = processor.query_ticks('EURUSD', 'raw_spread', '2024-01-01', '2024-01-31')

    # Query OHLC
    df = processor.query_ohlc('EURUSD', '1m', '2024-01-01', '2024-01-31')

    # Get coverage info
    cov = processor.get_data_coverage('EURUSD')

    # Download new data (incremental)
    result = processor.update_data('EURUSD', start_date='2022-01-01')

API Methods

Method Parameters Returns
query_ticks(pair, variant, start_date, end_date) variant: raw_spread or standard DataFrame
query_ohlc(pair, timeframe, start_date, end_date) timeframe: 1m, 5m, 15m, 1h, 4h, 1d DataFrame
get_data_coverage(pair) CoverageInfo
update_data(pair, start_date, delete_zip=True) UpdateResult

Return Models

# UpdateResult
result.months_added        # int: months downloaded
result.raw_ticks_added     # int: raw spread ticks added
result.standard_ticks_added # int: standard ticks added
result.ohlc_bars           # int: OHLC bars generated
result.storage_bytes       # int: storage used

# CoverageInfo
cov.raw_spread_ticks       # int: total raw spread ticks
cov.standard_ticks         # int: total standard ticks
cov.ohlc_bars              # int: total OHLC bars
cov.earliest_date          # str: earliest timestamp
cov.latest_date            # str: latest timestamp
cov.storage_bytes          # int: storage used

Direct SQL Access

# Count ticks
clickhouse client --query "SELECT count() FROM exness.raw_spread_ticks WHERE instrument='EURUSD'"

# Query tick data
clickhouse client --query "
SELECT timestamp, bid, ask
FROM exness.raw_spread_ticks
WHERE instrument='EURUSD'
  AND timestamp >= '2024-01-01'
  AND timestamp < '2024-01-02'
LIMIT 10"

# Query OHLC
clickhouse client --query "
SELECT timestamp, open, high, low, close
FROM exness.ohlc_1m
WHERE instrument='EURUSD'
  AND timestamp >= '2024-01-01'
LIMIT 10"

# Resample to 1h
clickhouse client --query "
SELECT
    toStartOfHour(timestamp) AS ts,
    argMin(open, timestamp) AS open,
    max(high) AS high,
    min(low) AS low,
    argMax(close, timestamp) AS close
FROM exness.ohlc_1m
WHERE instrument='EURUSD' AND timestamp >= '2024-01-01'
GROUP BY ts
ORDER BY ts"

Schema

Tick Tables

exness.raw_spread_ticks, exness.standard_ticks:

Column Type Codec
instrument LowCardinality(String) -
timestamp DateTime64(6, 'UTC') DoubleDelta, LZ4
bid Float64 Gorilla(8), ZSTD(1)
ask Float64 Gorilla(8), ZSTD(1)

OHLC Table

exness.ohlc_1m (27 columns):

Category Columns
Core instrument, timestamp, open, high, low, close
Spreads raw_spread_avg, standard_spread_avg
Tick counts tick_count_raw_spread, tick_count_standard
Timezone ny_hour, london_hour, ny_session, london_session
Holidays is_us_holiday, is_uk_holiday, is_major_holiday
Sessions is_nyse_session, is_lse_session, is_xswx_session, is_xfra_session, is_xtse_session, is_xnze_session, is_xtks_session, is_xasx_session, is_xhkg_session, is_xses_session

Engine: ReplacingMergeTree (automatic deduplication)

Partition: toYYYYMM(timestamp)

Order: (instrument, timestamp)

Data Source

  • URL: https://ticks.ex2archive.com/
  • Format: Monthly ZIP files with CSV tick data
  • Variants: Raw_Spread (zero spreads) + Standard (market spreads)
  • Precision: Microsecond timestamps

Development

# Setup
uv sync --dev

# Test
uv run pytest

# Lint
uv run ruff check --fix .

# Type check
uv run mypy src/

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exness_data_preprocess-2.1.0.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exness_data_preprocess-2.1.0-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file exness_data_preprocess-2.1.0.tar.gz.

File metadata

  • Download URL: exness_data_preprocess-2.1.0.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for exness_data_preprocess-2.1.0.tar.gz
Algorithm Hash digest
SHA256 4514f06e1155486f352c01a9d27b7024468111e713b69d8d1548069d4fc722ed
MD5 b42738e54b76c1b099a338cb63ccdfd7
BLAKE2b-256 fdc21a057531c02b038f5b98cabab6cf7aded48fd0fe6ffcbb1f1c7f4af0f692

See more details on using hashes here.

Provenance

The following attestation bundles were made for exness_data_preprocess-2.1.0.tar.gz:

Publisher: publish.yml on terrylica/exness-data-preprocess

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exness_data_preprocess-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for exness_data_preprocess-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af8cff3ab8962e0f8989c8a56227a4a46686028a9e4cc390cf17d26fc9ec4a78
MD5 50d522f506ab0840beaf3ee9e143e9cc
BLAKE2b-256 24e2464c4d9c839456993b74e50f0abfb1744f06e53588b694129dfec2b5027f

See more details on using hashes here.

Provenance

The following attestation bundles were made for exness_data_preprocess-2.1.0-py3-none-any.whl:

Publisher: publish.yml on terrylica/exness-data-preprocess

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page