This package contains a collection of tests to improve your Polars data analysis superpowers
Project description
Welcome to pelage!
The goal of this project is to provide a simple way to test your
polars code on the fly, while doing your analysis. The main idea is to
chain a series of meaningful checks on your data so that you can
continue and be more confident about your data quality. Here is how to
use it:
import polars as pl
import pelage as plg
data = pl.DataFrame(
{
"a": [1, 2, 3],
"b": ["a", "b", "c"],
}
)
validated_data = (
data.pipe(plg.has_shape, (3, 2))
.pipe(plg.has_no_nulls)
.with_columns(
pl.col("a").cast(str).alias("new_a"),
)
)
print(validated_data)
shape: (3, 3)
┌─────┬─────┬───────┐
│ a ┆ b ┆ new_a │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═══════╡
│ 1 ┆ a ┆ 1 │
│ 2 ┆ b ┆ 2 │
│ 3 ┆ c ┆ 3 │
└─────┴─────┴───────┘
Here is a example of the error messages that if the checks fail:
try:
validated_data.pipe(plg.not_accepted_values, {"new_a": ["3"]})
except plg.PolarsAssertError as err:
print(err)
Details
shape: (1, 1)
┌───────┐
│ new_a │
│ --- │
│ str │
╞═══════╡
│ 3 │
└───────┘
Error with the DataFrame passed to the check function:
--> This DataFrame contains values marked as forbidden
Here are the main keys points:
-
Each
pelagecheck returns the originalpolarsDataFrame if the data is valid. It allows you continue your analysis by chaining additional transformations. -
pelageraises an meaningful error message each time the data does not meet your expectations.
Installation
Install the package directly via PIP:
pip install pelage
Main Concepts
Defensive analysis:
The main idea of pelage is to leverage your possibility for defensive
analysis, similarly to other python packages such as “bulwark” or
“engarde”. However pelage rely mainly on possibility to directly pipe
and chain transformations provided by the fantastic polars API rather
than using decorators.
Additionally, some efforts have been put to have type hints for the provided functions in order to ensure full compatibility with your IDE across your chaining.
Leveraging polars blazing speed:
Although it is written in python most of pelage checks are written in
a way that enable the polars API to work its magic. We try to use a
syntax that is compatible with fast execution and parallelism provided
by polars.
Interoperability:
The polars DSL and syntax have been develop with the idea to make the
transition to SQL much easier. In this perspective, pelage wants to
facilitate the use of tests to ensure data quality while enabling a
possible transition towards SQL, and using the same tests in SQL. This
is why we implemented most of the checks that have been developed for
dbt tool box, notably:
- dbt generic checks
- dbt-utils tests
- (Soon to come: dbt expectations)
We believe that data quality checks should be written as close as
possible to the data exploration phase, and we hope that providing
theses checks in a context where it is easier to visualize your data
will be helpful. Similarly, we know that it is sometimes much easier to
industrialize SQL data pipelines, in this perspective the similarity
between pelage and dbt testing capabilities should make the
transition much smoother.
Why pelage?
pelage is the french word designating an animal fur, and particularly
in the case of polar bears, it shields them from water, temperature
variations and act as a strong camouflage. With the skin it constitutes
a strong barrier against the changes in the outside world, and it is
therefore well-suited name for a package designed to help with defensive
analysis.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pelage-0.6.0.tar.gz.
File metadata
- Download URL: pelage-0.6.0.tar.gz
- Upload date:
- Size: 5.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1c245d9f94e915bb7c2a97ca3a02c83f3c324bcfccac3137f708f9398dafdc5
|
|
| MD5 |
6399a004effb992d4d70465ee4d11f08
|
|
| BLAKE2b-256 |
4979e5b2ba2b185c419eb72ede92794a13df53904221ab207880e5bbb8764ee6
|
Provenance
The following attestation bundles were made for pelage-0.6.0.tar.gz:
Publisher:
publish.yml on alixtc/pelage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pelage-0.6.0.tar.gz -
Subject digest:
e1c245d9f94e915bb7c2a97ca3a02c83f3c324bcfccac3137f708f9398dafdc5 - Sigstore transparency entry: 661725045
- Sigstore integration time:
-
Permalink:
alixtc/pelage@740c0c5858c2766995bf038d739e18f3ce5eadf8 -
Branch / Tag:
refs/tags/0.6.0 - Owner: https://github.com/alixtc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@740c0c5858c2766995bf038d739e18f3ce5eadf8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pelage-0.6.0-py3-none-any.whl.
File metadata
- Download URL: pelage-0.6.0-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
859823a7e4ca0e0f505c5802d7c1dac1a24cb72178a22ba4ef1317ad2e6f294e
|
|
| MD5 |
59703211a218e8ff7207e8718a5d0595
|
|
| BLAKE2b-256 |
ee6ef7265fb09687dc0d7c66d753b47ed6fae4b496f3cb5290bec89e6c8d7df5
|
Provenance
The following attestation bundles were made for pelage-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on alixtc/pelage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pelage-0.6.0-py3-none-any.whl -
Subject digest:
859823a7e4ca0e0f505c5802d7c1dac1a24cb72178a22ba4ef1317ad2e6f294e - Sigstore transparency entry: 661725046
- Sigstore integration time:
-
Permalink:
alixtc/pelage@740c0c5858c2766995bf038d739e18f3ce5eadf8 -
Branch / Tag:
refs/tags/0.6.0 - Owner: https://github.com/alixtc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@740c0c5858c2766995bf038d739e18f3ce5eadf8 -
Trigger Event:
push
-
Statement type: