The dbt-powered Python library for financial analytics — schema-aware, compliance-ready, and ML-native.

These details have not been verified by PyPI

Project links

Project description

fintech-analytics

The dbt-powered Python library for financial analytics — schema-aware, compliance-ready, ML-native, and now with drift detection, segment explainability, and revenue forecasting.

From raw transactions to fraud scores, RFM segments, cohort retention, AML compliance, revenue forecasts, and model drift detection — in one library.

Install

pip install fintech-analytics

# With Kaggle integration
pip install "fintech-analytics[kaggle]"

# With dbt engine (runs real .sql models)
pip install "fintech-analytics[dbt]"

# Everything
pip install "fintech-analytics[all]"

Quick Start

from fintech_analytics import Pipeline

p = Pipeline.from_csv("transactions.csv")
p.run()

# Core metrics
print(p.metrics)

# RFM segmentation
print(p.segment.summary())

# Fraud detection with plain-English explanation
p.fraud.detect()
p.fraud.print_explain("txn_123")

# Segment explainability — why is this customer "At Risk"?
p.segment.print_explain("cust_456")

# Revenue + fraud rate forecast (next 3 months)
forecast = p.forecast.forecast(months=3)
forecast.print_report()

# Fraud model drift detection
p_next_month = Pipeline.from_csv("next_month.csv")
p_next_month.run()
drift = p.fraud.drift(p_next_month)
drift.print_report()

# AML compliance checks
p.compliance.print_report()

# Interactive dashboard (opens in browser)
p.dashboard()

# Export everything to CSV
p.export("results/")

Schema Auto-Detection

Works with any CSV column naming — no renaming required:

# Your CSV might have columns named anything:
# trans_amt, ref_number, date, result, description...
p = Pipeline.from_csv("transactions.csv")
# fintech-analytics detects and maps them automatically

# Or override specific columns
p = Pipeline.from_csv(
    "transactions.csv",
    schema_mapping={
        "transaction_id": "ref_number",
        "amount":         "trans_amt",
        "transaction_at": "date",
    }
)

Only the amount column is required. Everything else — including transaction_id and timestamps — is auto-generated if missing.

Kaggle Integration

# Built-in presets for popular datasets — no schema_mapping needed
p = Pipeline.from_kaggle(
    "mlg-ulb/creditcardfraud",
    kaggle_username="your_username",   # or set KAGGLE_USERNAME env var
    kaggle_key="your_api_key",         # or set KAGGLE_KEY env var
)
p.run()
print(p.fraud.detect(mode="supervised").head())

Supported presets (no schema mapping needed):

mlg-ulb/creditcardfraud
kartik2112/fraud-detection
ybifoundation/credit-card-fraud-detection-prediction

Get credentials: kaggle.com/settings/account

Command Line Interface

# Full pipeline
fintech-analytics run --input transactions.csv --output results/

# Fraud detection
fintech-analytics fraud --input transactions.csv
fintech-analytics fraud --input transactions.csv --explain txn_123

# RFM segmentation
fintech-analytics rfm --input transactions.csv
fintech-analytics rfm --input transactions.csv --export crm.csv --segment "At Risk"

# AML compliance
fintech-analytics compliance --input transactions.csv

# Benchmark against industry
fintech-analytics benchmark --input transactions.csv --industry payments_uk

# Dashboard
fintech-analytics dashboard --input transactions.csv --port 8080

Full API Reference

`p.metrics` — KPI Summary

{
    "total_transactions": 150000,
    "unique_customers":   2000,
    "unique_merchants":   300,
    "total_volume":       10230000.0,
    "avg_transaction":    84.39,
    "completion_pct":     92.3,
    "fraud_rate_pct":     1.96,
    "date_from":          "2023-01-01",
    "date_to":            "2024-12-31",
}

`p.segment` — Customer Segmentation

p.segment.rfm()                        # full RFM table (customer_id, scores, segment)
p.segment.summary()                    # per-segment counts, avg spend, total revenue
p.segment.champions                    # Champions DataFrame
p.segment.at_risk                      # At Risk + Cannot Lose Them
p.segment.lost                         # Hibernating + Lost
p.segment.get_segment("Promising")     # any segment by name
p.segment.print_summary()             # rich table output

Segment Explainability (new in v0.1.2)

# Explain why a customer is in their current segment
p.segment.print_explain("cust_123")
# ┌─────────────────────────────────────────────────────────────────┐
# │ Segment Explanation — cust_123                                  │
# │ Segment:    About to Sleep  (R=2  F=3  M=2  Total=7/15)        │
# │ Last txn:   47 days ago                                         │
# │ Frequency:  12 transactions                                     │
# │ Lifetime $: $843.20                                             │
# │                                                                 │
# │ Why this segment:                                               │
# │  • Recency 2/5 — 47 days ago (avg: 12 days) — significantly lapsed│
# │  • Frequency 3/5 — 12 txns (avg: 10) — average engagement      │
# │  • Monetary 2/5 — $843 spend (avg: $1,240) — below average     │
# │                                                                 │
# │ What's holding them back:                                       │
# │  • Needs to transact within 20 days to reach top quartile      │
# │  • $397 more spend needed to reach top 25%                     │
# │                                                                 │
# │ To move up:                                                     │
# │  → Need Attention: Make one transaction in next 60 days        │
# │  → Promising: Return within 60 days and make 2+ purchases      │
# │                                                                 │
# │ Action now: Reconnect. Share relevant products.                 │
# └─────────────────────────────────────────────────────────────────┘

exp = p.segment.explain("cust_123")   # returns SegmentExplanation object
print(exp.segment)                     # "About to Sleep"
print(exp.to_move_up)                  # dict of next segments → exact steps
print(exp.recommended_action)         # "Reconnect. Share relevant products."

# Batch explain — CRM-ready DataFrame for all At Risk customers
df = p.segment.batch_explain("At Risk")
df.to_csv("at_risk_actions.csv", index=False)
# Columns: customer_id, segment, key_driver, recommended_action, next_segment, next_step

`p.fraud` — Fraud Detection

p.fraud.detect()                         # all transactions scored
p.fraud.detect(mode="supervised")        # XGBoost (needs is_fraud labels)
p.fraud.detect(mode="unsupervised")      # Isolation Forest (no labels)
p.fraud.explain("txn_123")              # dict: probability, reasons, action
p.fraud.print_explain("txn_123")        # rich formatted output
p.fraud.summary()                        # aggregate fraud stats

Fraud Model Drift Detection (new in v0.1.2)

Uses Population Stability Index (PSI) — the industry standard for determining when a fraud model needs retraining (Basel II / model risk management).

# Compare last month to this month
p_jan = Pipeline.from_csv("january.csv"); p_jan.run()
p_feb = Pipeline.from_csv("february.csv"); p_feb.run()

report = p_jan.fraud.drift(p_feb)
report.print_report()
# ┌──────────────────────────────────────────────────────────────────┐
# │ Fraud Model Drift Report — MONITOR                               │
# │ Overall PSI: 0.142  (stable<0.10, monitor<0.20, retrain≥0.20)  │
# │                                                                  │
# │ Metric           Reference    Current      Change               │
# │ Volume           45,000       51,200       +13.8%              │
# │ Fraud Rate       1.96%        2.41%        +0.45%              │
# │ Overall PSI      —            0.142        monitor             │
# │                                                                  │
# │ Per-Feature PSI:                                                 │
# │ amount           0.1823       monitor                          │
# │ amount_log       0.1823       monitor                          │
# │ hour             0.0041       stable                           │
# │                                                                  │
# │ Recommendations:                                                 │
# │ • PSI 0.142 in warning zone — schedule retraining soon         │
# │ • Feature 'amount' has high drift — investigate distribution    │
# └──────────────────────────────────────────────────────────────────┘

report.should_retrain    # True / False
report.status            # "stable" | "monitor" | "retrain"
report.overall_psi       # 0.142
report.feature_psi       # {"amount": 0.1823, "hour": 0.0041, ...}
report.summary()         # plain dict
report.to_json("drift_report.json")

# PSI thresholds:
#   < 0.10  → stable    — no action needed
#   0.10–0.20 → monitor — consider retraining soon
#   > 0.20  → retrain   — significant drift, retrain immediately

`p.forecast` — Revenue & Fraud Rate Forecasting (new in v0.1.2)

Forecasts next N months of revenue and fraud rate using Prophet (when installed) or linear trend as fallback.

forecast = p.forecast.forecast(months=3)
forecast.print_report()
# ┌──────────────────────────────────────────────────────────────────┐
# │ Forecast Report                                                  │
# │ 3-month forecast · 24 months history · model: prophet · high   │
# │                                                                  │
# │ Revenue Forecast                                                 │
# │ Month      Predicted      Low (95%)    High (95%)  vs History  │
# │ 2025-01    $892,400       $841,200     $943,600    +2.1%       │
# │ 2025-02    $910,100       $855,400     $964,800    +4.2%       │
# │ 2025-03    $934,800       $876,100     $993,500    +6.9%       │
# │                                                                  │
# │ Fraud Rate Forecast                                              │
# │ Month      Predicted    Low (95%)    High (95%)                │
# │ 2025-01    2.103%       1.821%       2.385%                    │
# │ 2025-02    2.087%       1.805%       2.369%                    │
# │ 2025-03    2.071%       1.789%       2.353%                    │
# └──────────────────────────────────────────────────────────────────┘

forecast.revenue           # DataFrame: month, predicted, lower_95, upper_95
forecast.fraud_rate        # DataFrame: month, predicted, lower_95, upper_95
forecast.summary()         # dict with next_month_revenue, next_month_fraud_rate_pct
forecast.model_used        # "prophet" or "linear_trend"
forecast.confidence        # "high" | "medium" | "low"

# Install Prophet for best accuracy (optional)
# pip install prophet

`p.cohorts` — Cohort Analytics

p.cohorts.retention()              # full retention table
p.cohorts.retention_matrix()       # pivot table (heatmap-ready, M0–M12)
p.cohorts.best_cohort()            # highest M3 retention cohort
p.cohorts.average_retention()      # average across all cohorts per month

`p.merchants` — Merchant Intelligence

p.merchants.risk_scorecard()       # all merchants ordered by fraud rate
p.merchants.critical()             # fraud rate > 5%
p.merchants.elevated()             # fraud rate 2–5%
p.merchants.top_by_volume(10)      # top 10 by transaction volume
p.merchants.category_summary()     # aggregated by merchant category
p.merchants.velocity_alerts()      # merchants with unusual transaction spikes

`p.compliance` — AML Checks

p.compliance.aml_flags()           # all flags as DataFrame
p.compliance.print_report()        # rich formatted report

AML checks performed:

Structuring (smurfing) — multiple transactions just below reporting threshold (default £10,000)
Velocity anomalies — unusual transaction frequency in short time window
Round number bias — high proportion of round-number transactions (manual fraud signal)
Dormant account activity — sudden transactions on previously inactive accounts
Large cash transactions — ATM/transfer amounts above threshold

`p.query()` — Custom SQL

# Run any SQL against the pipeline database
p.query("SELECT * FROM analytics.rfm WHERE segment = 'Champions'")
p.query("SELECT * FROM raw.transactions WHERE amount > 5000")
p.query("""
    SELECT segment, count(*) as n, round(avg(monetary), 2) as avg_spend
    FROM analytics.rfm
    GROUP BY segment
    ORDER BY avg_spend DESC
""")

Tables available:

Table	Contents
`raw.transactions`	Normalised input data
`analytics.rfm`	RFM segments
`analytics.cohort_retention`	Cohort retention matrix
`analytics.merchant_scorecard`	Merchant risk scores
`analytics.payment_performance`	Monthly KPIs

dbt Engine

The package ships the production dbt SQL models from the fintech-analytics-platform repo. When you use engine="dbt", your data runs through the real .sql files — not a Python approximation.

# engine="dbt"    → runs 13 real .sql dbt models (requires dbt-duckdb)
# engine="python" → pure Python fallback, no dbt needed
# engine="auto"   → uses dbt if installed, otherwise python (default)

p = Pipeline.from_csv("transactions.csv", engine="dbt")
p.run()
# Running bundled dbt models (production-grade SQL pipeline)
#   raw.transactions: 150,000 rows ✓
#   Running dbt build (13 models + 30 tests)...
#   Done. PASS=43 WARN=0 ERROR=0

Supported Data Formats

Format	Method
CSV file	`Pipeline.from_csv("file.csv")`
pandas DataFrame	`Pipeline.from_dataframe(df)`
Kaggle dataset	`Pipeline.from_kaggle("owner/dataset")`

RFM Segments

Segment	Description	Recommended Action
Champions	High R, F, and M	Reward them. Ask for reviews. Upsell.
Loyal Customers	High F and M	Offer loyalty programme. Upsell tier.
Potential Loyalists	Recent, moderate F	Offer membership. Personalise.
Recent Customers	High R, low F	Onboarding support. Build habit.
Promising	Moderate R and F	Brand awareness. Free trials.
Need Attention	Moderate R, low F, high M	Reactivation campaign. Limited offer.
About to Sleep	Low R, low F	Reconnect. Share relevant products.
At Risk	Low R, high historical F and M	Win-back campaign. Personalised email.
Cannot Lose Them	Very low R, very high F	Renew. Call them directly.
Hibernating	Very low R and F	Relevant discount offer.
Lost	Lowest R and F	Revive with new offering or ignore.

Architecture

fintech_analytics/
├── schema/
│   ├── detector.py      # fuzzy column name auto-detection + Kaggle presets
│   └── mapping.py       # normalisation, UUID generation, numeric timestamp handling
├── pipeline/
│   ├── core.py          # Pipeline class — main entry point
│   ├── dbt_engine.py    # runs bundled .sql dbt models on user data
│   └── dashboard.py     # self-contained HTML dashboard server
├── analytics/
│   ├── segmentation.py  # RFM segments + explain() + batch_explain()
│   ├── explain.py       # SegmentExplainer — PSI-driven segment explanations
│   ├── cohorts.py       # cohort retention matrix
│   ├── merchants.py     # merchant risk scoring + velocity alerts
│   └── forecast.py      # revenue + fraud rate forecasting (Prophet / linear)
├── ml/
│   ├── fraud.py         # XGBoost + Isolation Forest + SHAP explainability
│   └── drift.py         # PSI-based fraud model drift detection
└── compliance/
    └── checks.py        # AML checks: structuring, velocity, round numbers, dormant, cash

Optional Dependencies

# Kaggle dataset download
pip install "fintech-analytics[kaggle]"

# dbt SQL engine (runs real .sql models)
pip install "fintech-analytics[dbt]"

# Prophet for accurate forecasting (otherwise uses linear trend)
pip install prophet

# SHAP explainability for fraud
pip install shap

# Kafka streaming
pip install "fintech-analytics[streaming]"

# Everything
pip install "fintech-analytics[all]"

What's New

v0.1.2

Fraud drift detection — p.fraud.drift(p2) with PSI scores, per-feature breakdown, retrain recommendations
Segment explainability — p.segment.explain(cid) and p.segment.batch_explain() with CRM-ready export
Revenue forecasting — p.forecast.forecast(months=3) with Prophet + linear trend fallback

v0.1.1

Kaggle auth via kaggle_username/kaggle_key parameters
transaction_id and timestamp auto-generated when missing
Built-in Kaggle dataset presets (creditcardfraud, fraud-detection)
Only amount column required — everything else is optional

v0.1.0

Initial release

Contributing

git clone https://github.com/tanvirpasha21/fintech-analytics-platform
cd fintech-analytics-platform/fintech_analytics_pkg

pip install -e ".[dev]"
pip install dbt-duckdb  # for dbt engine tests

pytest tests/ -v

License

MIT — free to use, modify, and distribute.

Author

MD Tanvir Anjum — Founder, Void Studio github.com/tanvirpasha21 · voidstudiotech.co.uk · linkedin.com/in/mdtanviranjum21

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Mar 26, 2026

0.1.2

Mar 24, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fintech_analytics-0.1.3.tar.gz (75.2 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fintech_analytics-0.1.3-py3-none-any.whl (78.4 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file fintech_analytics-0.1.3.tar.gz.

File metadata

Download URL: fintech_analytics-0.1.3.tar.gz
Upload date: Mar 26, 2026
Size: 75.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for fintech_analytics-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`6a4adb79effb831cf18b5d3b8d0b4d3a083691d659175adee20b0ae1ad64988b`
MD5	`712fcb9a50bee52bacefe33a72ef580d`
BLAKE2b-256	`92f3107820f123ae7af795be798c670e99c3ca7307cc8dd13bb73ab6e17a20e4`

See more details on using hashes here.

File details

Details for the file fintech_analytics-0.1.3-py3-none-any.whl.

File metadata

Download URL: fintech_analytics-0.1.3-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 78.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for fintech_analytics-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b89549fbf0051aac93d0c15065d4644cccb62006dbb37ac8a12367f5df3adebe`
MD5	`88a91d4aafa19118b12adb9663593c02`
BLAKE2b-256	`147dca636361a8eb1df41f52f6fc2455c19e979df9d425ff33306cc4ee676a10`

See more details on using hashes here.

fintech-analytics 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fintech-analytics

Install

Quick Start

Schema Auto-Detection

Kaggle Integration

Command Line Interface

Full API Reference

p.metrics — KPI Summary

p.segment — Customer Segmentation

Segment Explainability (new in v0.1.2)

p.fraud — Fraud Detection

Fraud Model Drift Detection (new in v0.1.2)

p.forecast — Revenue & Fraud Rate Forecasting (new in v0.1.2)

p.cohorts — Cohort Analytics

p.merchants — Merchant Intelligence

p.compliance — AML Checks

p.query() — Custom SQL

dbt Engine

Supported Data Formats

RFM Segments

Architecture

Optional Dependencies

What's New

v0.1.2

v0.1.1

v0.1.0

Contributing

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`p.metrics` — KPI Summary

`p.segment` — Customer Segmentation

`p.fraud` — Fraud Detection

`p.forecast` — Revenue & Fraud Rate Forecasting (new in v0.1.2)

`p.cohorts` — Cohort Analytics

`p.merchants` — Merchant Intelligence

`p.compliance` — AML Checks

`p.query()` — Custom SQL