Skip to main content

A collection of utilities for efficiently working with **large-scale** Parquet datasets.

Project description

parq-tools

Run Tests PyPI Coverage Python Versions License Publish Docs Open Issues Open PRs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. A typical use case is asset-based workflows with large scientific datasets.

:::note If your datasets are not large, you might find the pandas library more convenient. :::

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.
  • DataFrame Enhancements → Provides a LazyParquetDataFrame class that extends pandas.DataFrame with lazy loading from Parquet files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.3.4.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.3.4-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.3.4.tar.gz.

File metadata

  • Download URL: parq_tools-0.3.4.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parq_tools-0.3.4.tar.gz
Algorithm Hash digest
SHA256 b025ba2764fcab1a8719b8e988268ec766ee4bf44ad62f5884444827765ffec1
MD5 6e4580440399b57ba48c23c2df29f06c
BLAKE2b-256 62f6187f17da724de52dcdbc7d10c0c102f7f0ca78ce8d1ee5bb4d14ee1d5ccf

See more details on using hashes here.

Provenance

The following attestation bundles were made for parq_tools-0.3.4.tar.gz:

Publisher: publish_to_pypi.yml on elphick/parq-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parq_tools-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parq_tools-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 613aefe59473ff24524641212b9eb78a436d8e52b781ef8dac2154e164dd60a3
MD5 4a2afe0e943af6ad8ad067dc33543db2
BLAKE2b-256 9b035b0cfb3ba85f40dc7791474b8db489cab31e35352e43203894654c4ce197

See more details on using hashes here.

Provenance

The following attestation bundles were made for parq_tools-0.3.4-py3-none-any.whl:

Publisher: publish_to_pypi.yml on elphick/parq-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page