Panel Cointegration & Long-Run Relations: SPMG, PMG, PME, MGDL, Breitung, PDOLS, MGMW estimators
Project description
multicoint — Panel Cointegration & Multiple Long-Run Relations
A comprehensive Python library for estimating long-run equilibrium relationships in heterogeneous panel data.
Implements 7 state-of-the-art estimators from three foundational papers in modern panel econometrics, with publication-quality tables and visualizations.
📖 Table of Contents
- Installation
- Quick Start
- Estimators
- Diagnostics
- Visualization
- Comparison Tables
- Simulated Datasets
- Full API Reference
- Mathematical Background
- References
- Author
📦 Installation
# From PyPI (once published)
pip install multicoint
# From GitHub
pip install git+https://github.com/merwanroudane/multicointt.git
# Local development install
git clone https://github.com/merwanroudane/multicointt.git
cd multicointt
pip install -e .
Dependencies
| Package | Version | Purpose |
|---|---|---|
numpy |
≥ 1.24 | Core numerical computation |
scipy |
≥ 1.10 | Statistical distributions, linalg |
pandas |
≥ 2.0 | Data handling and export |
matplotlib |
≥ 3.7 | Publication-quality plots |
seaborn |
≥ 0.12 | Statistical visualization |
rich |
≥ 13.0 | Beautiful terminal tables |
tabulate |
≥ 0.9 | Markdown/LaTeX table export |
openpyxl |
≥ 3.1 | Excel export |
🚀 Quick Start
import numpy as np
import multicoint as mc
# Simulate panel data: consumption/GDP for 15 countries, 143 years
Y, X = mc.datasets.simulate_great_ratios(n=15, T=143, theta=1.0, seed=42)
# Estimate with all five Great Ratios estimators
spmg = mc.SPMG(p=2, bootstrap_reps=500).fit(Y, X, var_names=["Consumption", "GDP"])
pmg = mc.PMG(p=2, bootstrap_reps=500).fit(Y, X)
breit = mc.Breitung(p=2).fit(Y, X)
pdols = mc.PDOLS(leads_lags=4).fit(Y, X)
mgmw = mc.MGMW(q=5).fit(Y, X)
# Print comparison table
mc.comparison_table([spmg, pmg, breit, pdols, mgmw],
title="Great Ratios: Consumption/GDP")
# Forest plot of confidence intervals
mc.plot_confidence_intervals([spmg, pmg, breit, pdols, mgmw], theta0=1.0)
📐 Estimators
1. SPMG — System Pooled Mean Group
The flagship estimator. Handles two-way long-run causality, non-cointegrating units, and cross-sectional dependence.
Paper: Chudik, Pesaran & Smith (2023), "Revisiting the Great Ratios Hypothesis", Fed Dallas GI WP 415.
Model:
$$\Delta w_{it} = -\phi_i \beta' w_{i,t-1} + \Upsilon_i q_{it} + u_{it}$$
where $w_{it} = (y_{it}, x_{it})'$, $\beta = (1, -\theta)'$, $\phi_i = (\phi_{yi}, \phi_{xi})'$ is a 2×1 vector (both equations), and $\Sigma_i = \text{Var}(u_{it})$ is 2×2.
Key advantages over PMG:
φ_iis bivariate → captures two-way long-run causality- Units with
φ_i ≈ 0contribute negligibly → robust to non-cointegration - Normalization invariant:
θ̂_{y.x} · θ̂_{x.y} = 1
Syntax
from multicoint import SPMG
estimator = SPMG(
p=2, # Lag order (ARDL order, default 2)
max_iter=500, # Maximum iterations for MLE convergence
precision=1e-4, # Convergence criterion |θ^(k) - θ^(k-1)|
bootstrap_reps=2000, # Bootstrap replications for robust CI
seed=1234, # Random seed for reproducibility
)
result = estimator.fit(
Y, # np.ndarray, shape (T, n) — dependent variable
X, # np.ndarray, shape (T, n) — independent variable
var_names=["y", "x"], # Optional variable names
unit_names=["US", "UK", …], # Optional unit names
bootstrap=True, # Compute bootstrap CI (default True)
)
Result Attributes
result.theta_hat # float — Estimated long-run coefficient θ̂
result.std_error # float — Asymptotic standard error
result.t_ratio # float — t-ratio for H₀: θ = 1
result.p_value # float — Two-sided p-value
result.ci_95 # ConfidenceInterval — 95% asymptotic CI
result.ci_99 # ConfidenceInterval — 99% asymptotic CI
result.ci_bootstrap # ConfidenceInterval — 95% bootstrap CI (if computed)
result.n_units # int — Number of cross-section units
result.n_periods # int — Number of time periods
result.n_iterations # int — Iterations to convergence
result.converged # bool — Whether MLE converged
result.phi_hat # np.ndarray (n, 2) — Unit-specific φ̂_i = (φ̂_{yi}, φ̂_{xi})
result.sigma_hat # np.ndarray (n, 2, 2) — Unit-specific Σ̂_i
result.is_unit_coefficient # bool — Whether θ=1 is within 95% CI
result.summary() # str — Rich-formatted summary table
result.to_dict() # dict — Export as dictionary
result.to_series() # pd.Series — Export as pandas Series
Example
spmg = SPMG(p=2, bootstrap_reps=2000, seed=1234)
res = spmg.fit(Y, X, var_names=["log(C/capita)", "log(GDP/capita)"])
res.summary() # Prints rich table
# Access individual results
print(f"θ̂ = {res.theta_hat:.4f} ± {res.std_error:.4f}")
print(f"Bootstrap 95% CI: {res.ci_bootstrap}")
print(f"Unit coefficient (θ=1)? {res.is_unit_coefficient}")
2. PMG — Pooled Mean Group
Single-equation version. Assumes long-run causality runs from x to y.
Paper: Pesaran, Shin & Smith (1999), JASA, 94, 621–634.
Model (conditional ARDL):
$$\Delta y_{it} = c_i - \phi_i(y_{i,t-1} - \theta x_{i,t-1}) + \text{short-run} + \varepsilon_{it}$$
where $\phi_i$ is scalar (single equation), and $\theta$ is pooled across units.
Syntax
from multicoint import PMG
estimator = PMG(
p=2, # Lag order
max_iter=500, # Max iterations
precision=1e-4, # Convergence tolerance
bootstrap_reps=2000, # Bootstrap replications
seed=1234,
)
result = estimator.fit(Y, X, bootstrap=True)
Key Differences from SPMG
| Feature | PMG | SPMG |
|---|---|---|
φ_i dimension |
Scalar (1×1) | Vector (2×1) |
Σ_i dimension |
Scalar (σ²ᵢ) | Matrix (2×2) |
| Long-run causality | x→y only | x↔y (both directions) |
| Non-cointegrating | Robust | More robust |
| Normalization | Not invariant | Invariant |
3. Breitung — Two-Step Parametric
Parametric estimator that does not assume known causal direction.
Paper: Breitung (2005), Econometric Reviews, 24, 151–173.
Syntax
from multicoint import Breitung
estimator = Breitung(p=2) # Only lag order needed
result = estimator.fit(Y, X)
Note: The Breitung estimator requires inverting α̂_i' Σ̂_i⁻¹ α̂_i per unit, which can be unstable when α̂_i → 0. SPMG is preferred for robustness.
4. PDOLS — Panel Dynamic OLS
Adds leads and lags of Δx to absorb endogeneity.
Paper: Mark & Sul (2003), Oxford Bulletin of Economics and Statistics.
Syntax
from multicoint import PDOLS
# With 4 leads and lags (standard)
result4 = PDOLS(leads_lags=4).fit(Y, X)
# With 8 leads and lags (more conservative)
result8 = PDOLS(leads_lags=8).fit(Y, X)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
leads_lags |
int |
4 |
Number of leads AND lags of Δx to include |
5. MGMW — Müller-Watson Mean Group
Temporally aggregates data into q sub-periods, then runs pooled FE.
Paper: Müller & Watson (2018), Econometrica, 86, 775–804.
Syntax
from multicoint import MGMW
result = MGMW(q=5).fit(Y, X) # q=5 sub-periods (matching MATLAB code)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
q |
int |
5 |
Number of temporal sub-periods |
6. MGDL — Mean Group Distributed Lag
Estimates impulse response functions of common observed shocks in panels with one or two cross-section dimensions.
Paper: Choi & Chudik (2024), "Mean Group Distributed Lag Estimation of IRFs in Large Panels", Fed Dallas GI WP 423.
Model:
$$x_{ijt} = a_{ij} + \sum_{\ell=0}^{h} b_{ij\ell} v_{t-\ell} + \phi'{hij} g{hijt} + e_{hijt}$$
where $v_t$ is the common observed shock, and $b_{ij\ell}$ are the IRF coefficients.
Syntax
from multicoint import MGDL
estimator = MGDL(
h=4, # IRF horizon (quarters)
augmented_var=True, # Use augmented variance (eq 9-10) for robust inference
bonferroni=True, # Bonferroni correction for family-wise coverage
seasonal=True, # Include seasonal dummies
)
result = estimator.fit(
X_panel, # np.ndarray, shape (M, N, T) — panel of outcomes
shock, # np.ndarray, shape (T,) — common shock series
product_names=[…], # Optional: M product names
location_names=[…], # Optional: N location names
)
Result Attributes (MGDLResult)
result.product_irfs # (M, h+1) — Mean group IRF per product
result.location_effects # (N, h+1) — Location effects ĉ_j
result.cum_multipliers # (M,) — Cumulative multiplier δ̂_{i,h}
result.cum_ci_lower # (M,) — Lower bound of family-wise 95% CI
result.cum_ci_upper # (M,) — Upper bound of family-wise 95% CI
result.significant # (M,) bool — Whether each product is significant
result.M, result.N, result.T, result.h # Dimensions
Example: Oil Price Pass-Through
import multicoint as mc
# X_panel: (43 products, 41 cities, 104 quarters)
# oil_shock: (104,) first-differenced log WTI crude oil prices
mgdl = mc.MGDL(h=4, augmented_var=True, bonferroni=True)
res = mgdl.fit(X_panel, oil_shock, product_names=product_list)
mgdl.summary() # Rich table of cumulative multipliers
# Plot significant IRFs
mc.plot_impulse_response(res)
7. PME — Pooled Minimum Eigenvalue
Estimates multiple long-run relations in panels where n >> T. No prior method exists for this setting.
Paper: Chudik, Pesaran & Smith (2025), "Analysis of Multiple Long-Run Relations in Panel Data Models", arXiv:2506.02135v3.
Key innovations:
- Works when n >> T (e.g., n=1000 firms, T=20 years)
- Estimates r₀ ≥ 1 long-run relations simultaneously
- No need to model short-run dynamics (semi-parametric)
- No need to specify causal ordering
- Robust to interactive time effects (latent factors)
Algorithm
- Split T observations into q ≥ 2 non-overlapping sub-samples
- Compute pooled covariance matrix Q_{w̄w̄} from sub-sample means
- Eigendecompose Q_{w̄w̄}: the r₀ smallest eigenvalues → long-run relations
- Threshold eigenvalues at T^{-δ} to estimate r₀
- Identify coefficients via exact restrictions
Syntax
from multicoint import PME
estimator = PME(
q=2, # Number of sub-samples (minimum 2, default 2)
delta=0.25, # Thresholding exponent for r₀ estimation
r0=None, # If known, fix r₀. Otherwise estimated automatically.
)
result = estimator.fit(
W, # np.ndarray, shape (n, T, m) — panel of m variables
var_names=["exports", "imports", "GDP"], # Optional
)
Result Attributes (PMEResult)
result.r_hat # int — Estimated number of long-run relations
result.eigenvalues # (m,) — Ordered eigenvalues of Q_{w̄w̄}
result.B_hat # (m, r_hat) — Eigenvectors (long-run directions)
result.Theta_hat # (m-r_hat, r_hat) — Identified long-run coefficients
result.std_errors # Standard errors of vec(Θ̂)
result.t_ratios # t-ratios for H₀: θ_{jk} = 0
result.p_values # Two-sided p-values
result.ci_95 # List of (lower, upper) 95% CIs
result.Q_ww # (m, m) — Pooled covariance matrix
result.n, result.T, result.m, result.q # Dimensions
Example: Multiple Cointegrating Relations
import multicoint as mc
# W: (200 countries, 50 years, 3 variables: exports, imports, GDP)
pme = mc.PME(q=2, delta=0.25)
res = pme.fit(W, var_names=["Exports", "Imports", "GDP"])
pme.summary() # Prints eigenvalue analysis + coefficient table
print(f"Number of long-run relations: r̂₀ = {res.r_hat}")
print(f"Eigenvalues: {res.eigenvalues}")
print(f"Long-run coefficients:\n{res.Theta_hat}")
# Scree plot of eigenvalues with threshold
mc.plot_eigenvalues(res)
🔬 Diagnostics
Cross-Section Dependence Test
from multicoint import cd_test
# residuals: (T, n) array of panel residuals
result = cd_test(residuals)
print(f"CD statistic: {result.cd_stat:.3f}")
print(f"p-value: {result.p_value:.4f}")
print(f"Average pair-wise correlation: {result.avg_rho:.4f}")
Panel Unit Root Tests
from multicoint import panel_adf, panel_kpss
# ADF test (H₀: unit root)
adf_results = panel_adf(data, const=True) # data: (T, n)
for r in adf_results:
print(f"Unit {r['unit']}: ADF={r['adf_stat']:.3f}, reject={r['reject_5pct']}")
# KPSS test (H₀: stationarity)
kpss_results = panel_kpss(data)
for r in kpss_results:
print(f"Unit {r['unit']}: KPSS={r['kpss_stat']:.3f}, reject={r['reject_5pct']}")
📊 Visualization
Forest Plot of Confidence Intervals
import multicoint as mc
results = [spmg_result, pmg_result, breitung_result, pdols_result, mgmw_result]
fig, ax = mc.plot_confidence_intervals(
results,
theta0=1.0, # Reference line (unit coefficient)
figsize=(10, 6),
save="forest_plot.png"
)
Bar Chart of Long-Run Coefficients
# Compare across multiple ratios
ratio_results = {
"C/GDP": [spmg_cg, pmg_cg, breit_cg],
"I/GDP": [spmg_ig, pmg_ig, breit_ig],
"Debt/GDP": [spmg_dg, pmg_dg, breit_dg],
}
fig, ax = mc.plot_long_run_coefficients(ratio_results, save="ratios.png")
PME Eigenvalue Scree Plot
fig, ax = mc.plot_eigenvalues(pme_result, save="eigenvalues.png")
MGDL Impulse Response Functions
fig, axes = mc.plot_impulse_response(
mgdl_result,
product_idx=[0, 5, 25], # Specific products to plot
save="irfs.png"
)
📋 Comparison Tables
Single Estimator Summary
result.summary() # Rich-formatted terminal output
Multi-Estimator Comparison
import multicoint as mc
text, df = mc.comparison_table(
[spmg_result, pmg_result, breitung_result, pdols4_result, mgmw_result],
title="Great Ratios: Consumption / GDP",
print_it=True,
)
# Export to various formats
df.to_csv("comparison.csv", index=False)
df.to_excel("comparison.xlsx", index=False)
df.to_latex("comparison.tex", index=False)
🧪 Simulated Datasets
Built-in DGP simulators matching the Monte Carlo designs from all three papers:
1. Bivariate Panel with Single Long-Run Relation
from multicoint.datasets import simulate_great_ratios
Y, X = simulate_great_ratios(
n=30, # Cross-section units
T=100, # Time periods
theta=1.0, # True long-run coefficient
phi_range=(0.1, 0.3), # Speed of adjustment range
pi_noncoint=0.2, # Fraction of non-cointegrating units
seed=42,
)
# Y, X: np.ndarray of shape (T, n)
2. Multivariate Panel with Multiple Long-Run Relations
from multicoint.datasets import simulate_multiple_lr
W, B0_true = simulate_multiple_lr(
n=200, # Cross-section units
T=50, # Time periods
m=3, # Number of variables
r0=2, # Number of long-run relations
seed=42,
)
# W: (n, T, m), B0_true: (m, r0) true coefficient matrix
3. Panel for MGDL IRF Estimation
from multicoint.datasets import simulate_irf_panel
X_panel, shock, true_irf = simulate_irf_panel(
M=43, # Products
N=41, # Cities
T=104, # Quarters
h=4, # IRF horizon
seed=42,
)
# X_panel: (M, N, T), shock: (T,), true_irf: (M, h+1)
📚 Full API Reference
Estimator Classes
| Class | Paper | Variables | Long-run relations | Causality | n vs T |
|---|---|---|---|---|---|
SPMG |
[1] | 2 | 1 | Both (x↔y) | T ≫ n |
PMG |
[1] | 2 | 1 | One-way (x→y) | T ≫ n |
Breitung |
[1] | 2 | 1 | Both | T ≫ n |
PDOLS |
[1] | 2 | 1 | One-way (x→y) | T ≫ n |
MGMW |
[1] | 2 | 1 | Both | T ≫ n |
MGDL |
[2] | M×N | IRFs | Shock→outcome | N,M,T large |
PME |
[3] | m ≥ 2 | r₀ ≥ 1 | Any | n ≫ T |
Common Parameters
| Parameter | Type | Default | Used by |
|---|---|---|---|
p |
int |
2 |
SPMG, PMG, Breitung |
max_iter |
int |
500 |
SPMG, PMG |
precision |
float |
1e-4 |
SPMG, PMG |
bootstrap_reps |
int |
2000 |
SPMG, PMG |
seed |
int |
1234 |
SPMG, PMG |
leads_lags |
int |
4 |
PDOLS |
q |
int |
5/2 |
MGMW / PME |
h |
int |
4 |
MGDL |
delta |
float |
0.25 |
PME |
r0 |
int |
None |
PME (auto-estimated if None) |
Data Format
| Estimator | Input Shape | Description |
|---|---|---|
| SPMG/PMG/etc. | Y(T,n), X(T,n) |
Bivariate panel: T periods, n units |
| MGDL | X(M,N,T), v(T,) |
Two cross-sections + shock series |
| PME | W(n,T,m) |
m-variate panel: n units, T periods |
📐 Mathematical Background
Convergence Rates
| Estimator | Rate | Conditions |
|---|---|---|
| SPMG | $T\sqrt{(1-\pi)n}$ | $T \gg n$ |
| PMG | $T\sqrt{n}$ | $T \gg n$, strict exogeneity |
| MGDL | $\sqrt{N}$ and $\sqrt{M}$ | $N/T \to \kappa_1 \geq 0$ |
| PME | $\sqrt{nT}$ | $T \approx n^d$, $d > 1/2$ |
Bootstrap Methods
| Method | Used by | Description |
|---|---|---|
| Wild (Rademacher) | SPMG, PMG | $\kappa_t \sim \pm 1$ with prob 1/2 each |
| Conditional on X | PMG | Holds X fixed, resamples y equation only |
| Unconditional | SPMG | Resamples both y and x equations jointly |
📖 References
-
Chudik, A., Pesaran, M.H. & Smith, R.P. (2023). "Revisiting the Great Ratios Hypothesis." Federal Reserve Bank of Dallas, Globalization Institute Working Paper No. 415. [SPMG, PMG, Breitung, PDOLS, MGMW]
-
Choi, C.-Y. & Chudik, A. (2024). "Mean Group Distributed Lag Estimation of Impulse Response Functions in Large Panels." Federal Reserve Bank of Dallas, GI Working Paper No. 423. [MGDL]
-
Chudik, A., Pesaran, M.H. & Smith, R.P. (2025). "Analysis of Multiple Long-Run Relations in Panel Data Models." arXiv:2506.02135v3. [PME]
Supporting References
- Pesaran, M.H., Shin, Y. & Smith, R.P. (1999). "Pooled Mean Group Estimation." JASA, 94, 621–634.
- Breitung, J. (2005). "A Parametric Approach to the Estimation of Cointegration Vectors in Panel Data." Econometric Reviews, 24, 151–173.
- Mark, N.C. & Sul, D. (2003). "Cointegration Vector Estimation by Panel DOLS." Oxford Bulletin of Economics and Statistics.
- Müller, U.K. & Watson, M.W. (2018). "Long-Run Covariability." Econometrica, 86, 775–804.
- Pesaran, M.H. (2004). "General Diagnostic Tests for Cross Section Dependence in Panels." CESifo Working Paper 1229.
👨🔬 Author
Dr. Merwan Roudane
- 📧 Email: merwanroudane920@gmail.com
- 🐙 GitHub: github.com/merwanroudane
📄 License
MIT License. See LICENSE for details.
Built with ❤️ for the econometrics community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file multicoint-1.0.1.tar.gz.
File metadata
- Download URL: multicoint-1.0.1.tar.gz
- Upload date:
- Size: 41.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90d4ea09382e9bc8e24ae10e12b2bb8d48c60a8579270f455f115221269d35df
|
|
| MD5 |
b7514f2c817944fe338249bd2f65cae0
|
|
| BLAKE2b-256 |
4c8a64249ff579f6b1944b657b2b65db86e65c26a5501f7a56232a188b19c2fb
|
File details
Details for the file multicoint-1.0.1-py3-none-any.whl.
File metadata
- Download URL: multicoint-1.0.1-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f24664e418a6c831a1906a7308a15b098786703206b4d42abd4244c75e9ecc1f
|
|
| MD5 |
4ac796148e3b443cfda389e065ddbc5a
|
|
| BLAKE2b-256 |
eaec77b73d73f811462808fa9bcddd2599fbc09c25246b95b8de391a8fde120a
|