Extreme Quantile Regression Neural Networks for insurance pricing — covariate-dependent GPD tail modelling
Project description
insurance-eqrn
Extreme Quantile Regression Neural Networks for insurance pricing.
The problem
Your EVT model gives you the 1-in-200 claim for the portfolio. EQRN gives you the 1-in-200 claim for the Kensington flat vs the Somerset farmhouse. That difference is your reinsurance margin.
The standard approach to extreme severity modelling — fit a GPD to all claims above a threshold, read off the 99.5th percentile — pools everything together. It gives you one shape parameter and one scale parameter for the whole book. If your TPBI claims have a heavier tail for younger injured parties and lighter for older ones, the pooled model averages those tails away. Your per-segment VaR is wrong and your XL pricing is wrong.
The solution is covariate-dependent GPD parameters: xi(x) and sigma(x) as functions of risk characteristics, not pooled scalars. This is what EQRN does.
EQRN (Pasche & Engelke 2024, Annals of Applied Statistics) is the first method to estimate covariate-dependent GPD parameters using a neural network. This library is the first Python implementation.
What this library provides
EQRNModel— two-step fitting: LightGBM intermediate quantile + GPD neural networkEQRNDiagnostics— QQ plot, threshold stability, calibration, xi scatter- Out-of-fold intermediate quantile estimation (prevents leakage into GPD step)
- Orthogonal GPD reparameterisation for stable gradient training
predict_quantile— conditional VaR at any extreme level (0.99, 0.995, ...)predict_tvar— conditional TVaR / expected shortfallpredict_exceedance_prob— P(claim > threshold | risk profile)predict_xl_layer— expected loss in per-risk XL layer (attachment, limit)
Install
pip install insurance-eqrn
PyTorch is required. For CPU-only:
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install insurance-eqrn
Quickstart
import numpy as np
from insurance_eqrn import EQRNModel, EQRNDiagnostics
# X: covariate matrix (e.g. risk characteristics)
# y: claim severity values (above basic threshold)
model = EQRNModel(
tau_0=0.85, # intermediate quantile level
hidden_sizes=(32, 16, 8),
n_epochs=300,
shape_fixed=False, # covariate-dependent xi
seed=42,
)
model.fit(X_train, y_train, X_val=X_val, y_val=y_val)
# Per-segment 99.5th percentile severity
var_995 = model.predict_quantile(X_test, q=0.995)
# TVaR for reinsurance pricing
tvar_99 = model.predict_tvar(X_test, q=0.99)
# XL layer: £500k xs £500k
xl_loss = model.predict_xl_layer(X_test, attachment=500_000, limit=500_000)
# Fitted GPD parameters per observation
params = model.predict_params(X_test)
# DataFrame with columns: xi, sigma, nu, threshold
The two-step method
Step 1: Intermediate quantile (LightGBM, out-of-fold)
Fits a quantile regression at level tau_0 (default 0.8) using K-fold cross-validation. Out-of-fold predictions are mandatory here. If you use in-sample predictions, the GPD network in Step 2 sees artificially clean thresholds and learns the wrong exceedance set.
Step 2: GPD neural network on exceedances
Identifies observations above their predicted threshold (~20% of training data at tau_0=0.8). Trains a feedforward network mapping (X, Q_hat(tau_0)) → (nu(x), xi(x)) using the orthogonal GPD deviance loss.
The orthogonal parameterisation (nu = sigma * (xi + 1)) makes the Fisher information matrix diagonal, which stabilises Adam training substantially compared to the direct (sigma, xi) parameterisation.
Prediction
For a new observation x at target level tau > tau_0:
Q_x(tau) = Q_hat_x(tau_0) + sigma(x)/xi(x) * [((1-tau_0)/(1-tau))^xi(x) - 1]
At xi ≈ 0 (exponential limit), this is Q_hat + sigma * log((1-tau_0)/(1-tau)).
Parameters
| Parameter | Default | Description |
|---|---|---|
| tau_0 | 0.8 | Intermediate quantile level. Increase for smaller datasets |
| hidden_sizes | (32, 16, 8) | Network hidden layer widths |
| n_epochs | 500 | Maximum training epochs |
| patience | 50 | Early stopping patience |
| shape_fixed | False | If True, xi is a scalar. Start here before fitting full model |
| l2_pen | 1e-4 | L2 weight decay |
| shape_penalty | 0 | Penalty on variance of xi(x) — smooths the shape surface |
| p_drop | 0 | Dropout probability. Try 0.1–0.2 for small datasets |
| n_folds | 5 | K-fold folds for OOF intermediate quantile |
| seed | None | Random seed |
Diagnostics
from insurance_eqrn import EQRNDiagnostics
diag = EQRNDiagnostics(model)
# GPD QQ plot — should track the diagonal if the tail model is correct
diag.qq_plot(X_test, y_test)
# Predicted vs empirical coverage at each quantile level
diag.calibration_plot(X_test, y_test, levels=[0.9, 0.95, 0.99, 0.995])
# Mean residual life plot — linearity onset shows where GPD approximation holds
diag.mean_residual_life_plot(y_train)
# Threshold stability — fit shape_fixed models at each tau_0, look for plateau
diag.threshold_stability_plot(X_train, y_train)
# Summary table: predicted vs empirical exceedance rates
diag.summary_table(X_test, y_test)
Insurance applications
Motor TPBI (Third-Party Bodily Injury)
Young injured parties have longer annuity streams and heavier tails. EQRN lets you model xi(x) as a function of injured party age, claim type, solicitor involvement. Output: P(claim > £500k | risk profile) per policy.
Property large loss
Commercial property fire severity varies by construction class, sum insured, sprinkler status. EQRN provides 1-in-200 loss conditional on risk characteristics — input to CAT reinsurance models.
Per-risk XL pricing
# Price layer: £1M xs £500k, conditional on risk
xl = model.predict_xl_layer(X_test, attachment=500_000, limit=1_000_000)
Solvency II SCR
EQRN provides per-segment 99.5th percentile severity, which is the correct input for simulation-based SCR calculations on heterogeneous portfolios. Segment-level conditional VaR is more conservative than pooled EVT for high-risk segments and more accurate for low-risk segments.
When not to use EQRN
- Frequency modelling: EQRN models severity above a threshold. Frequency is a separate model.
- Attritional claims: Claims below tau_0 are not modelled by EQRN.
- Small books (n_exceedances < 200): Set shape_fixed=True as a minimum. Below ~100 exceedances, fall back to marginal EVT.
- No covariates: Use insurance-evt directly.
Reference
Pasche, O.C. & Engelke, S. (2024). "Neural networks for extreme quantile regression with an application to forecasting of flood risk." Annals of Applied Statistics, 18(4), 2818–2839. DOI:10.1214/24-AOAS1907.
R reference implementation: opasche/EQRN (CRAN, March 2025).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insurance_eqrn-0.1.1.tar.gz.
File metadata
- Download URL: insurance_eqrn-0.1.1.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c09bceca8501759a3c5c476643f5b907e470606308cd6842866a524cb77415
|
|
| MD5 |
3345b837b286742908b13bb0a0f59b97
|
|
| BLAKE2b-256 |
616f770373dd5ff0ff4d6589e2b49c097c60b07e4bc583b6b0acb3edfcbdf525
|
File details
Details for the file insurance_eqrn-0.1.1-py3-none-any.whl.
File metadata
- Download URL: insurance_eqrn-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ee35cff8ee2eb11c957cabb51869f3d73aaa4c005ca369a96156c02dbe7fbf4
|
|
| MD5 |
e041a0d8de6d8792c40ce8a5c2af8a50
|
|
| BLAKE2b-256 |
2aed4ade4c9d971f0b5ce76c55a9d6c96d7500f1bb2a79d2ba89362c315cc4c6
|