Rust + PyO3 implementation scaffold for FastWOE.
Project description
fastwoe
Fast Weight of Evidence (WOE) Encoding and Inference
This repository is scaffolded as a Rust workspace with PyO3 bindings for Python.
Current Status
- Rust core and PyO3 bindings are active for model and preprocessing paths.
- Binary and multiclass inference with CI/IV analysis are available.
- FAISS remains an optional Python-path binning method; it is not promoted to Rust-core implementation based on current benchmark results (
docs/performance/FAISS_DECISION_BENCHMARK.md).
Workspace
crates/fastwoe-core: pure Rust WOE/statistics engine.crates/fastwoe-py: PyO3 extension module (fastwoe_rs).
Prerequisites
- Install Rust (stable) with rustup.
- Install Python 3.9+.
- Install maturin:
python -m pip install maturin
Recommended Environments
- General development/runtime:
- Python 3.9+ with project dependencies from
pyproject.toml.
- Python 3.9+ with project dependencies from
- FAISS benchmarking/runtime (recommended separate env):
- Use
numpy<2withfaiss-cputo avoid NumPy ABI issues in some FAISS builds. - Example:
conda create -n fastwoe-faiss -c conda-forge python=3.12 numpy=1.26 pandas faiss-cpu maturin pytest ruff
- Use
Local Development
- Rust checks:
cargo fmt --allcargo clippy --all-targets --all-features -D warningscargo test --all-features - Build/install Python extension in active environment:
maturin develop --release --manifest-path crates/fastwoe-py/Cargo.toml
CI-Equivalent Local Repro (No Index Fetch)
If dependencies are already installed in a conda env (for example fastwoe-faiss), run:
bash scripts/repro_ci_local.sh fastwoe-faiss
This reproduces the CI-critical path without fetching packages from pip indexes:
- release wheel build + install
- parity/preprocessor/invariant tests
- end-to-end latency threshold checks for
kmeansandtree
This flow was validated on February 7, 2026.
Python Tooling
Ruff and Python dev settings are configured in pyproject.toml.
Optional FAISS path (Linux):
python -m pip install '.[faiss]'
On macOS, install FAISS with conda-forge:
conda install -c conda-forge faiss-cpu
Quick Python Usage
from fastwoe import FastWoe
model = FastWoe(smoothing=0.5, default_woe=0.0)
categories = ["A", "A", "B", "C"]
target = [1, 0, 0, 1]
model.fit(categories, target)
woe_values = model.transform(["A", "B", "Z"])
proba = model.predict_proba(["A", "B", "Z"])
mapping = model.get_mapping()
FastWoe accepts Python lists, NumPy arrays, pandas Series, and pandas DataFrames.
Optional local-build verification:
import fastwoe
import fastwoe.fastwoe_rs as rs
print("fastwoe package:", fastwoe.__file__)
print("extension:", rs.__file__)
Multi-Feature API (Categorical Matrix)
from fastwoe import FastWoe
model = FastWoe()
rows = [
["A", "x"],
["A", "y"],
["B", "x"],
["C", "z"],
]
target = [1, 0, 0, 1]
model.fit_matrix(rows, target, feature_names=["cat", "bucket"])
X_woe = model.transform_matrix(rows)
proba = model.predict_proba_matrix(rows)
cat_mapping = model.get_feature_mapping("cat")
Multiclass One-vs-Rest API
from fastwoe import FastWoe
model = FastWoe(smoothing=0.5, default_woe=0.0)
rows = [
["A", "x"],
["A", "y"],
["B", "x"],
["C", "z"],
["B", "y"],
]
labels = ["c0", "c1", "c2", "c0", "c1"]
model.fit_matrix_multiclass(rows, labels, feature_names=["cat", "bucket"])
all_probs = model.predict_proba_matrix_multiclass(rows) # shape: (n_rows, n_classes)
c1_probs = model.predict_proba_matrix_class(rows, "c1")
classes = model.get_class_labels()
X_woe_multi = model.transform_matrix_multiclass(rows)
woe_feature_names = model.get_feature_names_multiclass()
# Feature mapping for a specific class (one-vs-rest)
cat_mapping_for_c0 = model.get_feature_mapping_multiclass("c0", "cat")
Confidence Intervals
from fastwoe import FastWoe
model = FastWoe()
model.fit(["A", "B", "A"], [1, 0, 1])
ci = model.predict_ci(["A", "Z"], alpha=0.05)
# [(prediction, lower_ci, upper_ci), ...]
# Matrix APIs
rows = [["A", "x"], ["B", "y"]]
model.fit_matrix(rows, [1, 0], feature_names=["cat", "bucket"])
ci_matrix = model.predict_ci_matrix(rows, alpha=0.05)
# Multiclass APIs
model.fit_matrix_multiclass(rows, ["c0", "c1"], feature_names=["cat", "bucket"])
ci_multi = model.predict_ci_matrix_multiclass(rows, alpha=0.05)
ci_c0 = model.predict_ci_matrix_class(rows, "c0", alpha=0.05)
Assumption-Risk Diagnostics
predict_proba* and predict_ci* can emit warnings when FastWoe detects
strong feature dependence or ultra-sparse categorical patterns in training data.
from fastwoe import FastWoe
rows = [["A", "x"], ["A", "y"], ["B", "x"], ["C", "z"]]
target = [1, 0, 0, 1]
model = FastWoe()
model.fit_matrix(rows, target, feature_names=["f0", "f1"])
diagnostics = model.get_assumption_diagnostics()
# Optional: disable runtime warnings in strict pipelines.
quiet_model = FastWoe(warn_on_assumption_risk=False)
IV Analysis (Credit-Scoring Focus)
from fastwoe import FastWoe
rows = [["A", "x"], ["A", "y"], ["B", "x"], ["C", "z"]]
target = [1, 0, 0, 1]
model = FastWoe()
model.fit_matrix(rows, target, feature_names=["cat", "bucket"])
# Per-feature Information Value with standard error + CI
iv_rows = model.get_iv_analysis(alpha=0.05)
iv_cat_only = model.get_iv_analysis(feature_name="cat", alpha=0.05)
# DataFrame output for reporting pipelines
iv_df = model.get_iv_analysis(as_frame=True)
# Multiclass one-vs-rest IV analysis for a specific class label
model.fit_matrix_multiclass(rows, ["c0", "c1", "c2", "c0"], feature_names=["cat", "bucket"])
iv_c0 = model.get_iv_analysis_multiclass("c0", alpha=0.05)
High-Cardinality Preprocessing
from fastwoe import WoePreprocessor, FastWoe
rows = [
["cat_1", "segment_a"],
["cat_1", "segment_b"],
["cat_2", "segment_a"],
["cat_99", "segment_z"], # rare
]
pre = WoePreprocessor(top_p=0.9, min_count=2, max_categories=20)
rows_reduced = pre.fit_transform(rows)
summary = pre.get_reduction_summary()
model = FastWoe()
model.fit_matrix(rows_reduced, [1, 0, 0, 1], feature_names=["merchant", "segment"])
End-to-End DataFrame Workflow (Preprocess + WOE + Mapping)
import numpy as np
import pandas as pd
from fastwoe import FastWoe, WoePreprocessor
np.random.seed(42)
n = 350
data = pd.DataFrame({
"category": np.random.choice(["A", "B", "C", "D"], size=n, p=[0.35, 0.30, 0.25, 0.10]),
"high_card_cat": [f"cat_{i}" for i in np.random.randint(0, 50, size=n)],
"target": np.random.binomial(1, 0.3, size=n),
})
pre = WoePreprocessor(max_categories=10, min_count=5)
X = pre.fit_transform(
data[["category", "high_card_cat"]],
cat_features=["high_card_cat"],
)
woe = FastWoe()
X_woe = woe.fit_transform_matrix(
X,
data["target"],
feature_names=["category", "high_card_cat"],
as_frame=True,
)
rows = woe.get_feature_mapping("category")
mapping_df = pd.DataFrame([
{
"category": r.category,
"event_count": r.event_count,
"non_event_count": r.non_event_count,
"woe": r.woe,
"woe_se": r.woe_se,
}
for r in rows
])
mapping_df["count"] = mapping_df["event_count"] + mapping_df["non_event_count"]
mapping_df["event_rate"] = mapping_df["event_count"] / mapping_df["count"]
The categorical reduction path is backed by Rust (PreprocessorCore) when the extension is built.
Numerical binning (quantile, uniform, kmeans, tree) is also Rust-backed via NumericBinnerCore; the FAISS path remains optional/Python-backed.
For preprocessing, numeric features are marshaled to Rust as numeric values (not full-row strings), which reduces overhead for NumPy/pandas inputs.
Numerical binning is also supported before WOE:
from fastwoe import WoePreprocessor
rows = [[1000.0, "A"], [1200.0, "B"], [1400.0, "C"], [None, "D"]]
pre = WoePreprocessor(n_bins=3, binning_method="quantile")
rows_binned = pre.fit_transform(rows, numerical_features=[0], cat_features=[1])
kmeans (KBins-style) numeric binning is also supported:
from fastwoe import WoePreprocessor
rows = [[0.1], [0.2], [0.3], [10.0], [10.2], [20.0]]
pre = WoePreprocessor(n_bins=3, binning_method="kmeans")
rows_binned = pre.fit_transform(rows, numerical_features=[0])
Optional FAISS-backed 1D k-means binning is available when faiss is installed:
from fastwoe import WoePreprocessor
rows = [[0.1], [0.2], [0.3], [10.0], [10.2], [20.0]]
pre = WoePreprocessor(n_bins=3, binning_method="faiss")
rows_binned = pre.fit_transform(rows, numerical_features=[0])
If faiss cannot be imported or fails at runtime (for example, NumPy ABI mismatch),
FastWoe falls back to kmeans and emits a RuntimeWarning.
Current benchmark decision: keep FAISS optional (do not move to Rust-core yet).
See docs/performance/FAISS_DECISION_BENCHMARK.md for measured results.
Supervised tree-style numerical binning is available for binary targets:
from fastwoe import WoePreprocessor
rows = [[1000.0], [1100.0], [1200.0], [2000.0], [2100.0], [2200.0]]
y = [0, 0, 0, 1, 1, 1]
pre = WoePreprocessor(n_bins=2, binning_method="tree")
rows_binned = pre.fit_transform(rows, numerical_features=[0], target=y)
You can also enforce monotonic event-rate bins on numerical features:
from fastwoe import WoePreprocessor
rows = [[1.0], [2.0], [3.0], [4.0], [5.0], [6.0]]
y = [0, 0, 1, 1, 1, 1]
pre = WoePreprocessor(n_bins=4, binning_method="quantile")
rows_binned = pre.fit_transform(
rows,
numerical_features=[0],
target=y,
monotonic_constraints="increasing",
)
Pandas Output Mode
import pandas as pd
from fastwoe import FastWoe
X = pd.DataFrame({"cat": ["A", "B"], "bucket": ["x", "y"]})
y = [1, 0]
model = FastWoe()
model.fit_matrix(X, y, feature_names=X.columns)
X_woe_df = model.transform_matrix(X, as_frame=True)
ci_df = model.predict_ci_matrix(X, as_frame=True)
model.fit_matrix_multiclass(X, ["c0", "c1"], feature_names=X.columns)
proba_multi_df = model.predict_proba_matrix_multiclass(X, as_frame=True)
Performance Guidance
- Build extension wheels in optimized mode:
python -m maturin build --release --manifest-path crates/fastwoe-py/Cargo.toml - Run core performance benchmarks:
cargo bench -p fastwoe-core --bench woe_simulation - Run FAISS-vs-kmeans decision benchmark:
python tools/benchmark_faiss_decision.py --methods kmeans tree faiss --sizes 10000 100000 --output docs/performance/ - Run preprocessor memory benchmark:
python tools/benchmark_preprocessor_memory.py --methods kmeans tree --sizes 10000 --output benchmark-artifacts/ - Validate end-to-end latency thresholds:
python tools/check_preprocessor_latency_thresholds.py --report benchmark-artifacts/FAISS_DECISION_BENCHMARK.md --threshold kmeans:10000:120:180 --threshold tree:10000:120:160 - Validate end-to-end memory thresholds:
python tools/check_preprocessor_memory_thresholds.py --report benchmark-artifacts/PREPROCESSOR_MEMORY_BENCHMARK.md --threshold kmeans:10000:150:190 --threshold tree:10000:150:190 - Validate FAISS memory soft regression ratios (scheduled benchmark scope):
python tools/check_faiss_memory_regression.py --report docs/performance/PREPROCESSOR_MEMORY_BENCHMARK.md --sizes 10000 100000 --max-pre-delta-ratio 1.5 --max-e2e-delta-ratio 1.5 - Validate on your real credit-scoring CSV:
python tools/benchmark_real_dataset.py --input-csv /path/to/credit.csv --target-col default_flag --methods kmeans tree --threshold kmeans:500:900 --threshold tree:500:900 --output benchmark-artifacts/ - Release profile is tuned for runtime speed (
lto=fat,codegen-units=1, stripped symbols).
Latest FAISS decision snapshot (docs/performance/FAISS_DECISION_BENCHMARK.md):
- 10k rows preprocess best:
kmeans 32.126 msvsfaiss 47.869 ms - 100k rows preprocess best:
kmeans 453.994 msvsfaiss 493.762 ms - End-to-end best (preprocess + fit + predict):
kmeans 49.710/616.789 msvsfaiss 58.275/650.255 ms - Outcome: do not implement Rust-core FAISS yet.
Troubleshooting
maturin failed: rustc is not installed: install Rust via rustup and ensurecargois on PATH.Unable to find maturin script(often in conda/venv mixed setups): add$CONDA_PREFIX/binto PATH and runmaturinCLI directly, or usebash scripts/repro_ci_local.sh <conda-env>.ImportError: numpy.core.multiarray failed to importwhen importingfaiss: use a separate environment withnumpy<2and reinstall FAISS in that env.- Extension import problems after Python/env change:
rerun
python -m maturin develop --release --manifest-path crates/fastwoe-py/Cargo.toml.
CI and Release
- CI workflow:
.github/workflows/ci.yml - Wheels workflow:
.github/workflows/wheels.yml - Benchmark workflow:
.github/workflows/benchmarks.yml - Release checklist:
docs/release/RELEASE_CHECKLIST.md - Migration + limitations:
docs/release/MIGRATION_AND_LIMITATIONS.md
Publishing flow (Wheels):
pushtagv*builds Linux/macOS/Windows wheels + sdist, then publishes to PyPI.- Manual run with input
publish_to: none: build-only (artifact validation, no publish)testpypi: publish to TestPyPIpypi: publish to PyPI (without creating a new tag)
Trusted publishing setup (required once on PyPI/TestPyPI):
- Repository:
Finyasy/fastwoe - Workflow:
.github/workflows/wheels.yml
Optional fallback (if OIDC trusted publishing is not configured yet):
- Set GitHub secret
PYPI_API_TOKENfor PyPI publish - Set GitHub secret
TEST_PYPI_API_TOKENfor TestPyPI publish
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastwoe_rs-0.1.11.tar.gz.
File metadata
- Download URL: fastwoe_rs-0.1.11.tar.gz
- Upload date:
- Size: 46.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fad730f0706b8d93d89578726a7a373782379598a2213c00f2dd74b70a8bfa5
|
|
| MD5 |
410d8d3f2c1b74be6e02b16e194e4392
|
|
| BLAKE2b-256 |
e832f17af6308bf88392383352543cadf3fad3aba7a306c8e5e9751ab673fbf6
|
Provenance
The following attestation bundles were made for fastwoe_rs-0.1.11.tar.gz:
Publisher:
wheels.yml on Finyasy/fastwoe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastwoe_rs-0.1.11.tar.gz -
Subject digest:
2fad730f0706b8d93d89578726a7a373782379598a2213c00f2dd74b70a8bfa5 - Sigstore transparency entry: 942186517
- Sigstore integration time:
-
Permalink:
Finyasy/fastwoe@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/Finyasy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 250.2 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3fd74dfbbd6df4c3291b15dde240f10380a008c5abc6a4da6d0c2acef17ce9f
|
|
| MD5 |
def3b55fd3a188849d759626d96accee
|
|
| BLAKE2b-256 |
33ea6bc2f79c36d16a26a45c64eb24e90c8994cac138cd6b5418d67b34f444a4
|
Provenance
The following attestation bundles were made for fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl:
Publisher:
wheels.yml on Finyasy/fastwoe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastwoe_rs-0.1.11-cp39-abi3-win_amd64.whl -
Subject digest:
a3fd74dfbbd6df4c3291b15dde240f10380a008c5abc6a4da6d0c2acef17ce9f - Sigstore transparency entry: 942186529
- Sigstore integration time:
-
Permalink:
Finyasy/fastwoe@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/Finyasy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 351.9 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d836be595578834458672e07a9c94f3a1d88e94dac47dd9143df9ef22b83c6f
|
|
| MD5 |
758f09f4c3629d2a84b654586bb6042d
|
|
| BLAKE2b-256 |
8e2241d9541618f66a67f5d9ad67e02ce00262ee894384e53e35dbf18376568f
|
Provenance
The following attestation bundles were made for fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
wheels.yml on Finyasy/fastwoe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastwoe_rs-0.1.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
3d836be595578834458672e07a9c94f3a1d88e94dac47dd9143df9ef22b83c6f - Sigstore transparency entry: 942186558
- Sigstore integration time:
-
Permalink:
Finyasy/fastwoe@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/Finyasy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 311.1 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
024840707becb43b7c69e7d16380bdf8ddc209551f5941b4e1b09d2ebc88a762
|
|
| MD5 |
79f9a3275fe896221ec6ae0f5b733bae
|
|
| BLAKE2b-256 |
89e6c66a7565262ff163584594092da949ee0b5e0c2542d65cd4669352b3d4fd
|
Provenance
The following attestation bundles were made for fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
wheels.yml on Finyasy/fastwoe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastwoe_rs-0.1.11-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
024840707becb43b7c69e7d16380bdf8ddc209551f5941b4e1b09d2ebc88a762 - Sigstore transparency entry: 942186543
- Sigstore integration time:
-
Permalink:
Finyasy/fastwoe@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Branch / Tag:
refs/tags/v0.1.11 - Owner: https://github.com/Finyasy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@5b4f318b517b11cf2e22c28d0fc79ea3f1e9ad21 -
Trigger Event:
push
-
Statement type: