Skip to main content

This package contains a collection of tests to improve your Polars data analysis superpowers

Project description

release coverage Licence python-version polars-version PyPI - Downloads

Welcome to pelage!

The goal of this project is to provide a simple way to test your polars code on the fly, while doing your analysis. The main idea is to chain a series of meaningful checks on your data so that you can continue and be more confident about your data quality. Here is how to use it:

import polars as pl

import pelage as plg

data = pl.DataFrame(
    {
        "a": [1, 2, 3],
        "b": ["a", "b", "c"],
    }
)
validated_data = (
    data.pipe(plg.has_shape, (3, 2))
    .pipe(plg.has_no_nulls)
    .with_columns(
        pl.col("a").cast(str).alias("new_a"),
    )
)

print(validated_data)
shape: (3, 3)
┌─────┬─────┬───────┐
│ a   ┆ b   ┆ new_a │
│ --- ┆ --- ┆ ---   │
│ i64 ┆ str ┆ str   │
╞═════╪═════╪═══════╡
│ 1   ┆ a   ┆ 1     │
│ 2   ┆ b   ┆ 2     │
│ 3   ┆ c   ┆ 3     │
└─────┴─────┴───────┘

Here is a example of the error messages that if the checks fail:

try:
    validated_data.pipe(plg.not_accepted_values, {"new_a": ["3"]})
except plg.PolarsAssertError as err:
    print(err)
Details
shape: (1, 1)
┌───────┐
│ new_a │
│ ---   │
│ str   │
╞═══════╡
│ 3     │
└───────┘
Error with the DataFrame passed to the check function:
--> This DataFrame contains values marked as forbidden

Here are the main keys points:

  • Each pelage check returns the original polars DataFrame if the data is valid. It allows you continue your analysis by chaining additional transformations.

  • pelage raises an meaningful error message each time the data does not meet your expectations.

Installation

Install the package directly via PIP:

pip install pelage

Main Concepts

Defensive analysis:

The main idea of pelage is to leverage your possibility for defensive analysis, similarly to other python packages such as “bulwark” or “engarde”. However pelage rely mainly on possibility to directly pipe and chain transformations provided by the fantastic polars API rather than using decorators.

Additionally, some efforts have been put to have type hints for the provided functions in order to ensure full compatibility with your IDE across your chaining.

Leveraging polars blazing speed:

Although it is written in python most of pelage checks are written in a way that enable the polars API to work its magic. We try to use a syntax that is compatible with fast execution and parallelism provided by polars.

Site-Readme Github-Readme

Interoperability:

The polars DSL and syntax have been develop with the idea to make the transition to SQL much easier. In this perspective, pelage wants to facilitate the use of tests to ensure data quality while enabling a possible transition towards SQL, and using the same tests in SQL. This is why we implemented most of the checks that have been developed for dbt tool box, notably:

We believe that data quality checks should be written as close as possible to the data exploration phase, and we hope that providing theses checks in a context where it is easier to visualize your data will be helpful. Similarly, we know that it is sometimes much easier to industrialize SQL data pipelines, in this perspective the similarity between pelage and dbt testing capabilities should make the transition much smoother.

Why pelage?

pelage is the french word designating an animal fur, and particularly in the case of polar bears, it shields them from water, temperature variations and act as a strong camouflage. With the skin it constitutes a strong barrier against the changes in the outside world, and it is therefore well-suited name for a package designed to help with defensive analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pelage-0.6.0.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pelage-0.6.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file pelage-0.6.0.tar.gz.

File metadata

  • Download URL: pelage-0.6.0.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pelage-0.6.0.tar.gz
Algorithm Hash digest
SHA256 e1c245d9f94e915bb7c2a97ca3a02c83f3c324bcfccac3137f708f9398dafdc5
MD5 6399a004effb992d4d70465ee4d11f08
BLAKE2b-256 4979e5b2ba2b185c419eb72ede92794a13df53904221ab207880e5bbb8764ee6

See more details on using hashes here.

Provenance

The following attestation bundles were made for pelage-0.6.0.tar.gz:

Publisher: publish.yml on alixtc/pelage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pelage-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: pelage-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pelage-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 859823a7e4ca0e0f505c5802d7c1dac1a24cb72178a22ba4ef1317ad2e6f294e
MD5 59703211a218e8ff7207e8718a5d0595
BLAKE2b-256 ee6ef7265fb09687dc0d7c66d753b47ed6fae4b496f3cb5290bec89e6c8d7df5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pelage-0.6.0-py3-none-any.whl:

Publisher: publish.yml on alixtc/pelage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page