Skip to main content

Panel Cointegration & Long-Run Relations: SPMG, PMG, PME, MGDL, Breitung, PDOLS, MGMW estimators

Project description

version python license econometrics

multicoint — Panel Cointegration & Multiple Long-Run Relations

A comprehensive Python library for estimating long-run equilibrium relationships in heterogeneous panel data.

Implements 7 state-of-the-art estimators from three foundational papers in modern panel econometrics, with publication-quality tables and visualizations.


📖 Table of Contents


📦 Installation

# From PyPI (once published)
pip install multicoint

# From GitHub
pip install git+https://github.com/merwanroudane/multicointt.git

# Local development install
git clone https://github.com/merwanroudane/multicointt.git
cd multicointt
pip install -e .

Dependencies

Package Version Purpose
numpy ≥ 1.24 Core numerical computation
scipy ≥ 1.10 Statistical distributions, linalg
pandas ≥ 2.0 Data handling and export
matplotlib ≥ 3.7 Publication-quality plots
seaborn ≥ 0.12 Statistical visualization
rich ≥ 13.0 Beautiful terminal tables
tabulate ≥ 0.9 Markdown/LaTeX table export
openpyxl ≥ 3.1 Excel export

🚀 Quick Start

import numpy as np
import multicoint as mc

# Simulate panel data: consumption/GDP for 15 countries, 143 years
Y, X = mc.datasets.simulate_great_ratios(n=15, T=143, theta=1.0, seed=42)

# Estimate with all five Great Ratios estimators
spmg = mc.SPMG(p=2, bootstrap_reps=500).fit(Y, X, var_names=["Consumption", "GDP"])
pmg  = mc.PMG(p=2, bootstrap_reps=500).fit(Y, X)
breit = mc.Breitung(p=2).fit(Y, X)
pdols = mc.PDOLS(leads_lags=4).fit(Y, X)
mgmw  = mc.MGMW(q=5).fit(Y, X)

# Print comparison table
mc.comparison_table([spmg, pmg, breit, pdols, mgmw],
                    title="Great Ratios: Consumption/GDP")

# Forest plot of confidence intervals
mc.plot_confidence_intervals([spmg, pmg, breit, pdols, mgmw], theta0=1.0)

📐 Estimators

1. SPMG — System Pooled Mean Group

The flagship estimator. Handles two-way long-run causality, non-cointegrating units, and cross-sectional dependence.

Paper: Chudik, Pesaran & Smith (2023), "Revisiting the Great Ratios Hypothesis", Fed Dallas GI WP 415.

Model:

$$\Delta w_{it} = -\phi_i \beta' w_{i,t-1} + \Upsilon_i q_{it} + u_{it}$$

where $w_{it} = (y_{it}, x_{it})'$, $\beta = (1, -\theta)'$, $\phi_i = (\phi_{yi}, \phi_{xi})'$ is a 2×1 vector (both equations), and $\Sigma_i = \text{Var}(u_{it})$ is 2×2.

Key advantages over PMG:

  • φ_i is bivariate → captures two-way long-run causality
  • Units with φ_i ≈ 0 contribute negligibly → robust to non-cointegration
  • Normalization invariant: θ̂_{y.x} · θ̂_{x.y} = 1

Syntax

from multicoint import SPMG

estimator = SPMG(
    p=2,                  # Lag order (ARDL order, default 2)
    max_iter=500,         # Maximum iterations for MLE convergence
    precision=1e-4,       # Convergence criterion |θ^(k) - θ^(k-1)|
    bootstrap_reps=2000,  # Bootstrap replications for robust CI
    seed=1234,            # Random seed for reproducibility
)

result = estimator.fit(
    Y,                    # np.ndarray, shape (T, n) — dependent variable
    X,                    # np.ndarray, shape (T, n) — independent variable
    var_names=["y", "x"],        # Optional variable names
    unit_names=["US", "UK", ],  # Optional unit names
    bootstrap=True,              # Compute bootstrap CI (default True)
)

Result Attributes

result.theta_hat        # float — Estimated long-run coefficient θ̂
result.std_error        # float — Asymptotic standard error
result.t_ratio          # float — t-ratio for H₀: θ = 1
result.p_value          # float — Two-sided p-value
result.ci_95            # ConfidenceInterval — 95% asymptotic CI
result.ci_99            # ConfidenceInterval — 99% asymptotic CI
result.ci_bootstrap     # ConfidenceInterval — 95% bootstrap CI (if computed)
result.n_units          # int — Number of cross-section units
result.n_periods        # int — Number of time periods
result.n_iterations     # int — Iterations to convergence
result.converged        # bool — Whether MLE converged
result.phi_hat          # np.ndarray (n, 2) — Unit-specific φ̂_i = (φ̂_{yi}, φ̂_{xi})
result.sigma_hat        # np.ndarray (n, 2, 2) — Unit-specific Σ̂_i
result.is_unit_coefficient  # bool — Whether θ=1 is within 95% CI
result.summary()        # str — Rich-formatted summary table
result.to_dict()        # dict — Export as dictionary
result.to_series()      # pd.Series — Export as pandas Series

Example

spmg = SPMG(p=2, bootstrap_reps=2000, seed=1234)
res = spmg.fit(Y, X, var_names=["log(C/capita)", "log(GDP/capita)"])
res.summary()  # Prints rich table

# Access individual results
print(f"θ̂ = {res.theta_hat:.4f} ± {res.std_error:.4f}")
print(f"Bootstrap 95% CI: {res.ci_bootstrap}")
print(f"Unit coefficient (θ=1)? {res.is_unit_coefficient}")

2. PMG — Pooled Mean Group

Single-equation version. Assumes long-run causality runs from x to y.

Paper: Pesaran, Shin & Smith (1999), JASA, 94, 621–634.

Model (conditional ARDL):

$$\Delta y_{it} = c_i - \phi_i(y_{i,t-1} - \theta x_{i,t-1}) + \text{short-run} + \varepsilon_{it}$$

where $\phi_i$ is scalar (single equation), and $\theta$ is pooled across units.

Syntax

from multicoint import PMG

estimator = PMG(
    p=2,                  # Lag order
    max_iter=500,         # Max iterations
    precision=1e-4,       # Convergence tolerance
    bootstrap_reps=2000,  # Bootstrap replications
    seed=1234,
)

result = estimator.fit(Y, X, bootstrap=True)

Key Differences from SPMG

Feature PMG SPMG
φ_i dimension Scalar (1×1) Vector (2×1)
Σ_i dimension Scalar (σ²ᵢ) Matrix (2×2)
Long-run causality x→y only x↔y (both directions)
Non-cointegrating Robust More robust
Normalization Not invariant Invariant

3. Breitung — Two-Step Parametric

Parametric estimator that does not assume known causal direction.

Paper: Breitung (2005), Econometric Reviews, 24, 151–173.

Syntax

from multicoint import Breitung

estimator = Breitung(p=2)  # Only lag order needed
result = estimator.fit(Y, X)

Note: The Breitung estimator requires inverting α̂_i' Σ̂_i⁻¹ α̂_i per unit, which can be unstable when α̂_i → 0. SPMG is preferred for robustness.


4. PDOLS — Panel Dynamic OLS

Adds leads and lags of Δx to absorb endogeneity.

Paper: Mark & Sul (2003), Oxford Bulletin of Economics and Statistics.

Syntax

from multicoint import PDOLS

# With 4 leads and lags (standard)
result4 = PDOLS(leads_lags=4).fit(Y, X)

# With 8 leads and lags (more conservative)
result8 = PDOLS(leads_lags=8).fit(Y, X)

Parameters

Parameter Type Default Description
leads_lags int 4 Number of leads AND lags of Δx to include

5. MGMW — Müller-Watson Mean Group

Temporally aggregates data into q sub-periods, then runs pooled FE.

Paper: Müller & Watson (2018), Econometrica, 86, 775–804.

Syntax

from multicoint import MGMW

result = MGMW(q=5).fit(Y, X)  # q=5 sub-periods (matching MATLAB code)

Parameters

Parameter Type Default Description
q int 5 Number of temporal sub-periods

6. MGDL — Mean Group Distributed Lag

Estimates impulse response functions of common observed shocks in panels with one or two cross-section dimensions.

Paper: Choi & Chudik (2024), "Mean Group Distributed Lag Estimation of IRFs in Large Panels", Fed Dallas GI WP 423.

Model:

$$x_{ijt} = a_{ij} + \sum_{\ell=0}^{h} b_{ij\ell} v_{t-\ell} + \phi'{hij} g{hijt} + e_{hijt}$$

where $v_t$ is the common observed shock, and $b_{ij\ell}$ are the IRF coefficients.

Syntax

from multicoint import MGDL

estimator = MGDL(
    h=4,                  # IRF horizon (quarters)
    augmented_var=True,    # Use augmented variance (eq 9-10) for robust inference
    bonferroni=True,       # Bonferroni correction for family-wise coverage
    seasonal=True,         # Include seasonal dummies
)

result = estimator.fit(
    X_panel,              # np.ndarray, shape (M, N, T) — panel of outcomes
    shock,                # np.ndarray, shape (T,) — common shock series
    product_names=[],    # Optional: M product names
    location_names=[],   # Optional: N location names
)

Result Attributes (MGDLResult)

result.product_irfs      # (M, h+1) — Mean group IRF per product
result.location_effects  # (N, h+1) — Location effects ĉ_j
result.cum_multipliers   # (M,) — Cumulative multiplier δ̂_{i,h}
result.cum_ci_lower      # (M,) — Lower bound of family-wise 95% CI
result.cum_ci_upper      # (M,) — Upper bound of family-wise 95% CI
result.significant       # (M,) bool — Whether each product is significant
result.M, result.N, result.T, result.h  # Dimensions

Example: Oil Price Pass-Through

import multicoint as mc

# X_panel: (43 products, 41 cities, 104 quarters)
# oil_shock: (104,) first-differenced log WTI crude oil prices

mgdl = mc.MGDL(h=4, augmented_var=True, bonferroni=True)
res = mgdl.fit(X_panel, oil_shock, product_names=product_list)
mgdl.summary()  # Rich table of cumulative multipliers

# Plot significant IRFs
mc.plot_impulse_response(res)

7. PME — Pooled Minimum Eigenvalue

Estimates multiple long-run relations in panels where n >> T. No prior method exists for this setting.

Paper: Chudik, Pesaran & Smith (2025), "Analysis of Multiple Long-Run Relations in Panel Data Models", arXiv:2506.02135v3.

Key innovations:

  • Works when n >> T (e.g., n=1000 firms, T=20 years)
  • Estimates r₀ ≥ 1 long-run relations simultaneously
  • No need to model short-run dynamics (semi-parametric)
  • No need to specify causal ordering
  • Robust to interactive time effects (latent factors)

Algorithm

  1. Split T observations into q ≥ 2 non-overlapping sub-samples
  2. Compute pooled covariance matrix Q_{w̄w̄} from sub-sample means
  3. Eigendecompose Q_{w̄w̄}: the r₀ smallest eigenvalues → long-run relations
  4. Threshold eigenvalues at T^{-δ} to estimate r₀
  5. Identify coefficients via exact restrictions

Syntax

from multicoint import PME

estimator = PME(
    q=2,          # Number of sub-samples (minimum 2, default 2)
    delta=0.25,   # Thresholding exponent for r₀ estimation
    r0=None,      # If known, fix r₀. Otherwise estimated automatically.
)

result = estimator.fit(
    W,            # np.ndarray, shape (n, T, m) — panel of m variables
    var_names=["exports", "imports", "GDP"],  # Optional
)

Result Attributes (PMEResult)

result.r_hat         # int — Estimated number of long-run relations
result.eigenvalues   # (m,) — Ordered eigenvalues of Q_{w̄w̄}
result.B_hat         # (m, r_hat) — Eigenvectors (long-run directions)
result.Theta_hat     # (m-r_hat, r_hat) — Identified long-run coefficients
result.std_errors    # Standard errors of vec(Θ̂)
result.t_ratios      # t-ratios for H₀: θ_{jk} = 0
result.p_values      # Two-sided p-values
result.ci_95         # List of (lower, upper) 95% CIs
result.Q_ww          # (m, m) — Pooled covariance matrix
result.n, result.T, result.m, result.q  # Dimensions

Example: Multiple Cointegrating Relations

import multicoint as mc

# W: (200 countries, 50 years, 3 variables: exports, imports, GDP)
pme = mc.PME(q=2, delta=0.25)
res = pme.fit(W, var_names=["Exports", "Imports", "GDP"])
pme.summary()  # Prints eigenvalue analysis + coefficient table

print(f"Number of long-run relations: r̂₀ = {res.r_hat}")
print(f"Eigenvalues: {res.eigenvalues}")
print(f"Long-run coefficients:\n{res.Theta_hat}")

# Scree plot of eigenvalues with threshold
mc.plot_eigenvalues(res)

🔬 Diagnostics

Cross-Section Dependence Test

from multicoint import cd_test

# residuals: (T, n) array of panel residuals
result = cd_test(residuals)
print(f"CD statistic: {result.cd_stat:.3f}")
print(f"p-value: {result.p_value:.4f}")
print(f"Average pair-wise correlation: {result.avg_rho:.4f}")

Panel Unit Root Tests

from multicoint import panel_adf, panel_kpss

# ADF test (H₀: unit root)
adf_results = panel_adf(data, const=True)  # data: (T, n)
for r in adf_results:
    print(f"Unit {r['unit']}: ADF={r['adf_stat']:.3f}, reject={r['reject_5pct']}")

# KPSS test (H₀: stationarity)
kpss_results = panel_kpss(data)
for r in kpss_results:
    print(f"Unit {r['unit']}: KPSS={r['kpss_stat']:.3f}, reject={r['reject_5pct']}")

📊 Visualization

Forest Plot of Confidence Intervals

import multicoint as mc

results = [spmg_result, pmg_result, breitung_result, pdols_result, mgmw_result]
fig, ax = mc.plot_confidence_intervals(
    results,
    theta0=1.0,           # Reference line (unit coefficient)
    figsize=(10, 6),
    save="forest_plot.png"
)

Bar Chart of Long-Run Coefficients

# Compare across multiple ratios
ratio_results = {
    "C/GDP": [spmg_cg, pmg_cg, breit_cg],
    "I/GDP": [spmg_ig, pmg_ig, breit_ig],
    "Debt/GDP": [spmg_dg, pmg_dg, breit_dg],
}
fig, ax = mc.plot_long_run_coefficients(ratio_results, save="ratios.png")

PME Eigenvalue Scree Plot

fig, ax = mc.plot_eigenvalues(pme_result, save="eigenvalues.png")

MGDL Impulse Response Functions

fig, axes = mc.plot_impulse_response(
    mgdl_result,
    product_idx=[0, 5, 25],  # Specific products to plot
    save="irfs.png"
)

📋 Comparison Tables

Single Estimator Summary

result.summary()  # Rich-formatted terminal output

Multi-Estimator Comparison

import multicoint as mc

text, df = mc.comparison_table(
    [spmg_result, pmg_result, breitung_result, pdols4_result, mgmw_result],
    title="Great Ratios: Consumption / GDP",
    print_it=True,
)

# Export to various formats
df.to_csv("comparison.csv", index=False)
df.to_excel("comparison.xlsx", index=False)
df.to_latex("comparison.tex", index=False)

🧪 Simulated Datasets

Built-in DGP simulators matching the Monte Carlo designs from all three papers:

1. Bivariate Panel with Single Long-Run Relation

from multicoint.datasets import simulate_great_ratios

Y, X = simulate_great_ratios(
    n=30,              # Cross-section units
    T=100,             # Time periods
    theta=1.0,         # True long-run coefficient
    phi_range=(0.1, 0.3),  # Speed of adjustment range
    pi_noncoint=0.2,   # Fraction of non-cointegrating units
    seed=42,
)
# Y, X: np.ndarray of shape (T, n)

2. Multivariate Panel with Multiple Long-Run Relations

from multicoint.datasets import simulate_multiple_lr

W, B0_true = simulate_multiple_lr(
    n=200,     # Cross-section units
    T=50,      # Time periods
    m=3,       # Number of variables
    r0=2,      # Number of long-run relations
    seed=42,
)
# W: (n, T, m), B0_true: (m, r0) true coefficient matrix

3. Panel for MGDL IRF Estimation

from multicoint.datasets import simulate_irf_panel

X_panel, shock, true_irf = simulate_irf_panel(
    M=43,      # Products
    N=41,      # Cities
    T=104,     # Quarters
    h=4,       # IRF horizon
    seed=42,
)
# X_panel: (M, N, T), shock: (T,), true_irf: (M, h+1)

📚 Full API Reference

Estimator Classes

Class Paper Variables Long-run relations Causality n vs T
SPMG [1] 2 1 Both (x↔y) T ≫ n
PMG [1] 2 1 One-way (x→y) T ≫ n
Breitung [1] 2 1 Both T ≫ n
PDOLS [1] 2 1 One-way (x→y) T ≫ n
MGMW [1] 2 1 Both T ≫ n
MGDL [2] M×N IRFs Shock→outcome N,M,T large
PME [3] m ≥ 2 r₀ ≥ 1 Any n ≫ T

Common Parameters

Parameter Type Default Used by
p int 2 SPMG, PMG, Breitung
max_iter int 500 SPMG, PMG
precision float 1e-4 SPMG, PMG
bootstrap_reps int 2000 SPMG, PMG
seed int 1234 SPMG, PMG
leads_lags int 4 PDOLS
q int 5/2 MGMW / PME
h int 4 MGDL
delta float 0.25 PME
r0 int None PME (auto-estimated if None)

Data Format

Estimator Input Shape Description
SPMG/PMG/etc. Y(T,n), X(T,n) Bivariate panel: T periods, n units
MGDL X(M,N,T), v(T,) Two cross-sections + shock series
PME W(n,T,m) m-variate panel: n units, T periods

📐 Mathematical Background

Convergence Rates

Estimator Rate Conditions
SPMG $T\sqrt{(1-\pi)n}$ $T \gg n$
PMG $T\sqrt{n}$ $T \gg n$, strict exogeneity
MGDL $\sqrt{N}$ and $\sqrt{M}$ $N/T \to \kappa_1 \geq 0$
PME $\sqrt{nT}$ $T \approx n^d$, $d > 1/2$

Bootstrap Methods

Method Used by Description
Wild (Rademacher) SPMG, PMG $\kappa_t \sim \pm 1$ with prob 1/2 each
Conditional on X PMG Holds X fixed, resamples y equation only
Unconditional SPMG Resamples both y and x equations jointly

📖 References

  1. Chudik, A., Pesaran, M.H. & Smith, R.P. (2023). "Revisiting the Great Ratios Hypothesis." Federal Reserve Bank of Dallas, Globalization Institute Working Paper No. 415. [SPMG, PMG, Breitung, PDOLS, MGMW]

  2. Choi, C.-Y. & Chudik, A. (2024). "Mean Group Distributed Lag Estimation of Impulse Response Functions in Large Panels." Federal Reserve Bank of Dallas, GI Working Paper No. 423. [MGDL]

  3. Chudik, A., Pesaran, M.H. & Smith, R.P. (2025). "Analysis of Multiple Long-Run Relations in Panel Data Models." arXiv:2506.02135v3. [PME]

Supporting References

  • Pesaran, M.H., Shin, Y. & Smith, R.P. (1999). "Pooled Mean Group Estimation." JASA, 94, 621–634.
  • Breitung, J. (2005). "A Parametric Approach to the Estimation of Cointegration Vectors in Panel Data." Econometric Reviews, 24, 151–173.
  • Mark, N.C. & Sul, D. (2003). "Cointegration Vector Estimation by Panel DOLS." Oxford Bulletin of Economics and Statistics.
  • Müller, U.K. & Watson, M.W. (2018). "Long-Run Covariability." Econometrica, 86, 775–804.
  • Pesaran, M.H. (2004). "General Diagnostic Tests for Cross Section Dependence in Panels." CESifo Working Paper 1229.

👨‍🔬 Author

Dr. Merwan Roudane


📄 License

MIT License. See LICENSE for details.


Built with ❤️ for the econometrics community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multicoint-1.0.1.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multicoint-1.0.1-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file multicoint-1.0.1.tar.gz.

File metadata

  • Download URL: multicoint-1.0.1.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for multicoint-1.0.1.tar.gz
Algorithm Hash digest
SHA256 90d4ea09382e9bc8e24ae10e12b2bb8d48c60a8579270f455f115221269d35df
MD5 b7514f2c817944fe338249bd2f65cae0
BLAKE2b-256 4c8a64249ff579f6b1944b657b2b65db86e65c26a5501f7a56232a188b19c2fb

See more details on using hashes here.

File details

Details for the file multicoint-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: multicoint-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for multicoint-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f24664e418a6c831a1906a7308a15b098786703206b4d42abd4244c75e9ecc1f
MD5 4ac796148e3b443cfda389e065ddbc5a
BLAKE2b-256 eaec77b73d73f811462808fa9bcddd2599fbc09c25246b95b8de391a8fde120a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page