A collection of utilities for efficiently working with **large-scale** Parquet datasets.
Project description
parq-tools
Overview
parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets.
A typical use case is asset-based workflows with large scientific datasets.
:::note
If your datasets are not large, you might find the pandas library more convenient.
:::
Features
- Filtering → Efficiently filter large parquet files.
- Concatenation → Combines multiple Parquet files efficiently along rows (
axis=0) or columns (axis=1). - Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
- Profiling Enhancements → Improves
ydata-profilingby profiling specific columns incrementally, merging results for large files. - DataFrame Enhancements → Provides a
LazyParquetDataFrameclass that extendspandas.DataFramewith lazy loading from Parquet files.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parq_tools-0.3.4.tar.gz.
File metadata
- Download URL: parq_tools-0.3.4.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b025ba2764fcab1a8719b8e988268ec766ee4bf44ad62f5884444827765ffec1
|
|
| MD5 |
6e4580440399b57ba48c23c2df29f06c
|
|
| BLAKE2b-256 |
62f6187f17da724de52dcdbc7d10c0c102f7f0ca78ce8d1ee5bb4d14ee1d5ccf
|
Provenance
The following attestation bundles were made for parq_tools-0.3.4.tar.gz:
Publisher:
publish_to_pypi.yml on elphick/parq-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parq_tools-0.3.4.tar.gz -
Subject digest:
b025ba2764fcab1a8719b8e988268ec766ee4bf44ad62f5884444827765ffec1 - Sigstore transparency entry: 1243028430
- Sigstore integration time:
-
Permalink:
elphick/parq-tools@9673045a369abdd38ff41ab7c3bd6b0439e34756 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/elphick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_to_pypi.yml@9673045a369abdd38ff41ab7c3bd6b0439e34756 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file parq_tools-0.3.4-py3-none-any.whl.
File metadata
- Download URL: parq_tools-0.3.4-py3-none-any.whl
- Upload date:
- Size: 41.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
613aefe59473ff24524641212b9eb78a436d8e52b781ef8dac2154e164dd60a3
|
|
| MD5 |
4a2afe0e943af6ad8ad067dc33543db2
|
|
| BLAKE2b-256 |
9b035b0cfb3ba85f40dc7791474b8db489cab31e35352e43203894654c4ce197
|
Provenance
The following attestation bundles were made for parq_tools-0.3.4-py3-none-any.whl:
Publisher:
publish_to_pypi.yml on elphick/parq-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parq_tools-0.3.4-py3-none-any.whl -
Subject digest:
613aefe59473ff24524641212b9eb78a436d8e52b781ef8dac2154e164dd60a3 - Sigstore transparency entry: 1243028433
- Sigstore integration time:
-
Permalink:
elphick/parq-tools@9673045a369abdd38ff41ab7c3bd6b0439e34756 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/elphick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_to_pypi.yml@9673045a369abdd38ff41ab7c3bd6b0439e34756 -
Trigger Event:
workflow_dispatch
-
Statement type: