Skip to main content

Rust + PyO3 implementation scaffold for FastWOE.

Project description

fastwoe

Fast Weight of Evidence (WOE) Encoding and Inference

This repository is scaffolded as a Rust workspace with PyO3 bindings for Python.

Current Status

  • Rust core and PyO3 bindings are active for model and preprocessing paths.
  • Binary and multiclass inference with CI/IV analysis are available.
  • FAISS remains an optional Python-path binning method; it is not promoted to Rust-core implementation based on current benchmark results (docs/performance/FAISS_DECISION_BENCHMARK.md).

Workspace

  • crates/fastwoe-core: pure Rust WOE/statistics engine.
  • crates/fastwoe-py: PyO3 extension module (fastwoe_rs).

Prerequisites

  1. Install Rust (stable) with rustup.
  2. Install Python 3.9+.
  3. Install maturin: python -m pip install maturin

Recommended Environments

  • General development/runtime:
    • Python 3.9+ with project dependencies from pyproject.toml.
  • FAISS benchmarking/runtime (recommended separate env):
    • Use numpy<2 with faiss-cpu to avoid NumPy ABI issues in some FAISS builds.
    • Example: conda create -n fastwoe-faiss -c conda-forge python=3.12 numpy=1.26 pandas faiss-cpu maturin pytest ruff

Local Development

  1. Rust checks: cargo fmt --all cargo clippy --all-targets --all-features -D warnings cargo test --all-features
  2. Build/install Python extension in active environment: maturin develop --release --manifest-path crates/fastwoe-py/Cargo.toml

CI-Equivalent Local Repro (No Index Fetch)

If dependencies are already installed in a conda env (for example fastwoe-faiss), run:

bash scripts/repro_ci_local.sh fastwoe-faiss

This reproduces the CI-critical path without fetching packages from pip indexes:

  • release wheel build + install
  • parity/preprocessor/invariant tests
  • end-to-end latency threshold checks for kmeans and tree

This flow was validated on February 7, 2026.

Python Tooling

Ruff and Python dev settings are configured in pyproject.toml.

Optional FAISS path (Linux): python -m pip install '.[faiss]'

On macOS, install FAISS with conda-forge: conda install -c conda-forge faiss-cpu

Quick Python Usage

from fastwoe import FastWoe

model = FastWoe(smoothing=0.5, default_woe=0.0)
categories = ["A", "A", "B", "C"]
target = [1, 0, 0, 1]

model.fit(categories, target)
woe_values = model.transform(["A", "B", "Z"])
proba = model.predict_proba(["A", "B", "Z"])
mapping = model.get_mapping()

FastWoe accepts Python lists, NumPy arrays, pandas Series, and pandas DataFrames.

Optional local-build verification:

import fastwoe
import fastwoe.fastwoe_rs as rs

print("fastwoe package:", fastwoe.__file__)
print("extension:", rs.__file__)

Multi-Feature API (Categorical Matrix)

from fastwoe import FastWoe

model = FastWoe()
rows = [
    ["A", "x"],
    ["A", "y"],
    ["B", "x"],
    ["C", "z"],
]
target = [1, 0, 0, 1]

model.fit_matrix(rows, target, feature_names=["cat", "bucket"])
X_woe = model.transform_matrix(rows)
proba = model.predict_proba_matrix(rows)
cat_mapping = model.get_feature_mapping("cat")

Multiclass One-vs-Rest API

from fastwoe import FastWoe

model = FastWoe(smoothing=0.5, default_woe=0.0)

rows = [
    ["A", "x"],
    ["A", "y"],
    ["B", "x"],
    ["C", "z"],
    ["B", "y"],
]
labels = ["c0", "c1", "c2", "c0", "c1"]

model.fit_matrix_multiclass(rows, labels, feature_names=["cat", "bucket"])
all_probs = model.predict_proba_matrix_multiclass(rows)  # shape: (n_rows, n_classes)
c1_probs = model.predict_proba_matrix_class(rows, "c1")
classes = model.get_class_labels()
X_woe_multi = model.transform_matrix_multiclass(rows)
woe_feature_names = model.get_feature_names_multiclass()

# Feature mapping for a specific class (one-vs-rest)
cat_mapping_for_c0 = model.get_feature_mapping_multiclass("c0", "cat")

Confidence Intervals

from fastwoe import FastWoe

model = FastWoe()
model.fit(["A", "B", "A"], [1, 0, 1])
ci = model.predict_ci(["A", "Z"], alpha=0.05)
# [(prediction, lower_ci, upper_ci), ...]

# Matrix APIs
rows = [["A", "x"], ["B", "y"]]
model.fit_matrix(rows, [1, 0], feature_names=["cat", "bucket"])
ci_matrix = model.predict_ci_matrix(rows, alpha=0.05)

# Multiclass APIs
model.fit_matrix_multiclass(rows, ["c0", "c1"], feature_names=["cat", "bucket"])
ci_multi = model.predict_ci_matrix_multiclass(rows, alpha=0.05)
ci_c0 = model.predict_ci_matrix_class(rows, "c0", alpha=0.05)

Assumption-Risk Diagnostics

predict_proba* and predict_ci* can emit warnings when FastWoe detects strong feature dependence or ultra-sparse categorical patterns in training data.

from fastwoe import FastWoe

rows = [["A", "x"], ["A", "y"], ["B", "x"], ["C", "z"]]
target = [1, 0, 0, 1]

model = FastWoe()
model.fit_matrix(rows, target, feature_names=["f0", "f1"])
diagnostics = model.get_assumption_diagnostics()

# Optional: disable runtime warnings in strict pipelines.
quiet_model = FastWoe(warn_on_assumption_risk=False)

IV Analysis (Credit-Scoring Focus)

from fastwoe import FastWoe

rows = [["A", "x"], ["A", "y"], ["B", "x"], ["C", "z"]]
target = [1, 0, 0, 1]

model = FastWoe()
model.fit_matrix(rows, target, feature_names=["cat", "bucket"])

# Per-feature Information Value with standard error + CI
iv_rows = model.get_iv_analysis(alpha=0.05)
iv_cat_only = model.get_iv_analysis(feature_name="cat", alpha=0.05)

# DataFrame output for reporting pipelines
iv_df = model.get_iv_analysis(as_frame=True)

# Multiclass one-vs-rest IV analysis for a specific class label
model.fit_matrix_multiclass(rows, ["c0", "c1", "c2", "c0"], feature_names=["cat", "bucket"])
iv_c0 = model.get_iv_analysis_multiclass("c0", alpha=0.05)

High-Cardinality Preprocessing

from fastwoe import WoePreprocessor, FastWoe

rows = [
    ["cat_1", "segment_a"],
    ["cat_1", "segment_b"],
    ["cat_2", "segment_a"],
    ["cat_99", "segment_z"],  # rare
]

pre = WoePreprocessor(top_p=0.9, min_count=2, max_categories=20)
rows_reduced = pre.fit_transform(rows)
summary = pre.get_reduction_summary()

model = FastWoe()
model.fit_matrix(rows_reduced, [1, 0, 0, 1], feature_names=["merchant", "segment"])

End-to-End DataFrame Workflow (Preprocess + WOE + Mapping)

import numpy as np
import pandas as pd
from fastwoe import FastWoe, WoePreprocessor

np.random.seed(42)
n = 350
data = pd.DataFrame({
    "category": np.random.choice(["A", "B", "C", "D"], size=n, p=[0.35, 0.30, 0.25, 0.10]),
    "high_card_cat": [f"cat_{i}" for i in np.random.randint(0, 50, size=n)],
    "target": np.random.binomial(1, 0.3, size=n),
})

pre = WoePreprocessor(max_categories=10, min_count=5)
X = pre.fit_transform(
    data[["category", "high_card_cat"]],
    cat_features=["high_card_cat"],
)

woe = FastWoe()
X_woe = woe.fit_transform_matrix(
    X,
    data["target"],
    feature_names=["category", "high_card_cat"],
    as_frame=True,
)

rows = woe.get_feature_mapping("category")
mapping_df = pd.DataFrame([
    {
        "category": r.category,
        "event_count": r.event_count,
        "non_event_count": r.non_event_count,
        "woe": r.woe,
        "woe_se": r.woe_se,
    }
    for r in rows
])
mapping_df["count"] = mapping_df["event_count"] + mapping_df["non_event_count"]
mapping_df["event_rate"] = mapping_df["event_count"] / mapping_df["count"]

The categorical reduction path is backed by Rust (PreprocessorCore) when the extension is built. Numerical binning (quantile, uniform, kmeans, tree) is also Rust-backed via NumericBinnerCore; the FAISS path remains optional/Python-backed. For preprocessing, numeric features are marshaled to Rust as numeric values (not full-row strings), which reduces overhead for NumPy/pandas inputs.

Numerical binning is also supported before WOE:

from fastwoe import WoePreprocessor

rows = [[1000.0, "A"], [1200.0, "B"], [1400.0, "C"], [None, "D"]]
pre = WoePreprocessor(n_bins=3, binning_method="quantile")
rows_binned = pre.fit_transform(rows, numerical_features=[0], cat_features=[1])

kmeans (KBins-style) numeric binning is also supported:

from fastwoe import WoePreprocessor

rows = [[0.1], [0.2], [0.3], [10.0], [10.2], [20.0]]
pre = WoePreprocessor(n_bins=3, binning_method="kmeans")
rows_binned = pre.fit_transform(rows, numerical_features=[0])

Optional FAISS-backed 1D k-means binning is available when faiss is installed:

from fastwoe import WoePreprocessor

rows = [[0.1], [0.2], [0.3], [10.0], [10.2], [20.0]]
pre = WoePreprocessor(n_bins=3, binning_method="faiss")
rows_binned = pre.fit_transform(rows, numerical_features=[0])

If faiss cannot be imported or fails at runtime (for example, NumPy ABI mismatch), FastWoe falls back to kmeans and emits a RuntimeWarning.

Current benchmark decision: keep FAISS optional (do not move to Rust-core yet). See docs/performance/FAISS_DECISION_BENCHMARK.md for measured results.

Supervised tree-style numerical binning is available for binary targets:

from fastwoe import WoePreprocessor

rows = [[1000.0], [1100.0], [1200.0], [2000.0], [2100.0], [2200.0]]
y = [0, 0, 0, 1, 1, 1]
pre = WoePreprocessor(n_bins=2, binning_method="tree")
rows_binned = pre.fit_transform(rows, numerical_features=[0], target=y)

You can also enforce monotonic event-rate bins on numerical features:

from fastwoe import WoePreprocessor

rows = [[1.0], [2.0], [3.0], [4.0], [5.0], [6.0]]
y = [0, 0, 1, 1, 1, 1]
pre = WoePreprocessor(n_bins=4, binning_method="quantile")
rows_binned = pre.fit_transform(
    rows,
    numerical_features=[0],
    target=y,
    monotonic_constraints="increasing",
)

Pandas Output Mode

import pandas as pd
from fastwoe import FastWoe

X = pd.DataFrame({"cat": ["A", "B"], "bucket": ["x", "y"]})
y = [1, 0]

model = FastWoe()
model.fit_matrix(X, y, feature_names=X.columns)

X_woe_df = model.transform_matrix(X, as_frame=True)
ci_df = model.predict_ci_matrix(X, as_frame=True)
model.fit_matrix_multiclass(X, ["c0", "c1"], feature_names=X.columns)
proba_multi_df = model.predict_proba_matrix_multiclass(X, as_frame=True)

Performance Guidance

  • Build extension wheels in optimized mode: python -m maturin build --release --manifest-path crates/fastwoe-py/Cargo.toml
  • Run core performance benchmarks: cargo bench -p fastwoe-core --bench woe_simulation
  • Run FAISS-vs-kmeans decision benchmark: python tools/benchmark_faiss_decision.py --methods kmeans tree faiss --sizes 10000 100000 --output docs/performance/
  • Run preprocessor memory benchmark: python tools/benchmark_preprocessor_memory.py --methods kmeans tree --sizes 10000 --output benchmark-artifacts/
  • Validate end-to-end latency thresholds: python tools/check_preprocessor_latency_thresholds.py --report benchmark-artifacts/FAISS_DECISION_BENCHMARK.md --threshold kmeans:10000:120:180 --threshold tree:10000:120:160
  • Validate end-to-end memory thresholds: python tools/check_preprocessor_memory_thresholds.py --report benchmark-artifacts/PREPROCESSOR_MEMORY_BENCHMARK.md --threshold kmeans:10000:150:190 --threshold tree:10000:150:190
  • Validate FAISS memory soft regression ratios (scheduled benchmark scope): python tools/check_faiss_memory_regression.py --report docs/performance/PREPROCESSOR_MEMORY_BENCHMARK.md --sizes 10000 100000 --max-pre-delta-ratio 1.5 --max-e2e-delta-ratio 1.5
  • Validate on your real credit-scoring CSV: python tools/benchmark_real_dataset.py --input-csv /path/to/credit.csv --target-col default_flag --methods kmeans tree --threshold kmeans:500:900 --threshold tree:500:900 --output benchmark-artifacts/
  • Release profile is tuned for runtime speed (lto=fat, codegen-units=1, stripped symbols).

Latest FAISS decision snapshot (docs/performance/FAISS_DECISION_BENCHMARK.md):

  • 10k rows preprocess best: kmeans 32.126 ms vs faiss 47.869 ms
  • 100k rows preprocess best: kmeans 453.994 ms vs faiss 493.762 ms
  • End-to-end best (preprocess + fit + predict): kmeans 49.710/616.789 ms vs faiss 58.275/650.255 ms
  • Outcome: do not implement Rust-core FAISS yet.

Troubleshooting

  • maturin failed: rustc is not installed: install Rust via rustup and ensure cargo is on PATH.
  • Unable to find maturin script (often in conda/venv mixed setups): add $CONDA_PREFIX/bin to PATH and run maturin CLI directly, or use bash scripts/repro_ci_local.sh <conda-env>.
  • ImportError: numpy.core.multiarray failed to import when importing faiss: use a separate environment with numpy<2 and reinstall FAISS in that env.
  • Extension import problems after Python/env change: rerun python -m maturin develop --release --manifest-path crates/fastwoe-py/Cargo.toml.

CI and Release

  • CI workflow: .github/workflows/ci.yml
  • Wheels workflow: .github/workflows/wheels.yml
  • Benchmark workflow: .github/workflows/benchmarks.yml
  • Release checklist: docs/release/RELEASE_CHECKLIST.md
  • Migration + limitations: docs/release/MIGRATION_AND_LIMITATIONS.md

Publishing flow (Wheels):

  • push tag v* builds Linux/macOS/Windows wheels + sdist, then publishes to PyPI.
  • Manual run with input publish_to:
  • none: build-only (artifact validation, no publish)
  • testpypi: publish to TestPyPI
  • pypi: publish to PyPI (without creating a new tag)

Trusted publishing setup (required once on PyPI/TestPyPI):

  • Repository: Finyasy/fastwoe
  • Workflow: .github/workflows/wheels.yml

Optional fallback (if OIDC trusted publishing is not configured yet):

  • Set GitHub secret PYPI_API_TOKEN for PyPI publish
  • Set GitHub secret TEST_PYPI_API_TOKEN for TestPyPI publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastwoe_rs-0.1.11.tar.gz (46.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl (250.2 kB view details)

Uploaded CPython 3.9+Windows x86-64

fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (351.9 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl (311.1 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file fastwoe_rs-0.1.11.tar.gz.

File metadata

  • Download URL: fastwoe_rs-0.1.11.tar.gz
  • Upload date:
  • Size: 46.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastwoe_rs-0.1.11.tar.gz
Algorithm Hash digest
SHA256 2fad730f0706b8d93d89578726a7a373782379598a2213c00f2dd74b70a8bfa5
MD5 410d8d3f2c1b74be6e02b16e194e4392
BLAKE2b-256 e832f17af6308bf88392383352543cadf3fad3aba7a306c8e5e9751ab673fbf6

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastwoe_rs-0.1.11.tar.gz:

Publisher: wheels.yml on Finyasy/fastwoe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 250.2 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a3fd74dfbbd6df4c3291b15dde240f10380a008c5abc6a4da6d0c2acef17ce9f
MD5 def3b55fd3a188849d759626d96accee
BLAKE2b-256 33ea6bc2f79c36d16a26a45c64eb24e90c8994cac138cd6b5418d67b34f444a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl:

Publisher: wheels.yml on Finyasy/fastwoe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d836be595578834458672e07a9c94f3a1d88e94dac47dd9143df9ef22b83c6f
MD5 758f09f4c3629d2a84b654586bb6042d
BLAKE2b-256 8e2241d9541618f66a67f5d9ad67e02ce00262ee894384e53e35dbf18376568f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: wheels.yml on Finyasy/fastwoe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 024840707becb43b7c69e7d16380bdf8ddc209551f5941b4e1b09d2ebc88a762
MD5 79f9a3275fe896221ec6ae0f5b733bae
BLAKE2b-256 89e6c66a7565262ff163584594092da949ee0b5e0c2542d65cd4669352b3d4fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: wheels.yml on Finyasy/fastwoe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page