AI-powered maintenance risk predictor for git repositories with interactive HTML reports and database integration
Project description
๐ MaintSight
AI-powered maintenance degradation predictor for git repositories using XGBoost machine learning
MaintSight analyzes your git repository's commit history and code patterns to predict maintenance degradation at the file level. Using a trained XGBoost model, it identifies code quality trends and helps prioritize refactoring efforts by detecting files that are degrading over time.
๐ Table of Contents
- Features
- Quick Start
- Installation
- Usage
- Output Formats
- Degradation Categories
- Command Reference
- Model Information
- Development
- Testing
- Contributing
- License
โจ Features
- ๐ค XGBoost ML Predictions: Pre-trained model for maintenance degradation scoring
- ๐ Git History Analysis: Analyzes commits, changes, and collaboration patterns
- ๐ Multiple Output Formats: JSON, CSV, Markdown, or interactive HTML reports
- ๐ฏ Degradation Categorization: Four-level classification (Improved/Stable/Degraded/Severely Degraded)
- ๐ Threshold Filtering: Focus on degraded files only
- ๐ Interactive HTML Reports: Rich, interactive analysis with visualizations
- โก Fast & Efficient: Analyzes hundreds of files in seconds
- ๐ ๏ธ Easy Integration: Simple CLI interface and pip package
๐ Quick Start
# Install from PyPI
pip install maintsight
# Run predictions on current directory (generates interactive HTML report)
maintsight predict
# Show only degraded files with threshold
maintsight predict --threshold 0.1
# Generate JSON output
maintsight predict --format json
# Analyze specific repository
maintsight predict /path/to/repo
๐ฆ Installation
From PyPI (Recommended)
pip install maintsight
From Source
git clone https://github.com/techdebtgpt/maintsight-pip.git
cd maintsight-pip
pip install -r requirements.txt
# Install in development mode
pip install -e .
Development Installation
pip install -e ".[dev]"
๐ Usage
Basic Prediction
# Analyze current directory (generates HTML report)
maintsight predict
# Analyze specific repository
maintsight predict /path/to/repo
# Generate summary output with threshold
maintsight predict --threshold 0.1
Advanced Options
# Analyze specific branch
maintsight predict --branch develop
# Limit commit analysis window
maintsight predict --window-size-days 90
# Limit number of commits
maintsight predict --max-commits 5000
# Generate JSON output
maintsight predict --format json
# All options together
maintsight predict /path/to/repo --branch main --window-size-days 150 --max-commits 1000 --format html
Python API Usage
from maintsight import GitCommitCollector, MockPredictor
from maintsight.utils.html_generator import generate_html_report
# Collect git data
collector = GitCommitCollector(repo_path="./", branch="main")
commit_data = collector.fetch_commit_data()
# Generate predictions
predictor = MockPredictor()
predictions = predictor.predict(commit_data)
# Generate HTML report
html_path = generate_html_report(predictions, commit_data, "./")
๐ Output Formats
JSON (Default)
[
{
"module": "src/legacy/parser.ts",
"normalized_score": 0.3456,
"raw_prediction": 0.3456,
"risk_category": "severely_degraded"
},
{
"module": "src/utils/helpers.ts",
"normalized_score": -0.1234,
"raw_prediction": -0.1234,
"risk_category": "improved"
}
]
CSV
module,normalized_score,raw_prediction,risk_category
"src/legacy/parser.ts","0.3456","0.3456","severely_degraded"
"src/utils/helpers.ts","-0.1234","-0.1234","improved"
Markdown Report
Generates a comprehensive report with:
- Degradation distribution summary
- Top 20 most degraded files
- Category breakdown with percentages
- Actionable recommendations
Interactive HTML Report
Always generated automatically in .maintsight/ folder with:
- Visual degradation trends
- Interactive file explorer
- Detailed metrics per file
- Commit history analysis
๐ฏ Degradation Categories
| Score Range | Category | Description | Action |
|---|---|---|---|
| < 0.0 | ๐ข Improved | Code quality improving over time | Continue good practices |
| 0.0-0.1 | ๐ต Stable | Code quality stable | Regular maintenance |
| 0.1-0.2 | ๐ก Degraded | Code quality declining | Schedule for refactoring |
| > 0.2 | ๐ด Severely Degraded | Rapid quality decline | Immediate attention needed |
๐ Command Reference
maintsight predict
Analyze repository and predict maintenance degradation.
maintsight predict [PATH] [OPTIONS]
Arguments:
PATH- Repository path (default: current directory)
Options:
-b, --branch BRANCH- Git branch to analyze (default: "main")-n, --max-commits N- Maximum commits to analyze (default: 1000)-w, --window-size-days N- Time window in days for analysis (default: 150)-f, --format FORMAT- Output format: json|csv|markdown|html (default: "html")-t, --threshold FLOAT- Only show files above degradation threshold-o, --output PATH- Output file path-v, --verbose- Verbose output-h, --help- Show help information
Examples
# Generate HTML report with default settings
maintsight predict
# Analyze last 90 days on develop branch
maintsight predict --branch develop --window-size-days 90
# Get JSON output for processing
maintsight predict --format json --output results.json
# Show only degraded files
maintsight predict --threshold 0.1
๐ง Model Information
MaintSight uses an XGBoost model trained on software maintenance degradation patterns. The model predicts how code quality changes over time by analyzing git commit patterns and code evolution metrics.
Key Features Analyzed
The model considers multiple dimensions of code evolution:
- Commit patterns: Frequency, size, and timing of changes
- Author collaboration: Number of contributors and collaboration patterns
- Code churn: Lines added, removed, and modified over time
- Change consistency: Regularity and predictability of modifications
- Bug indicators: Patterns suggesting defects or fixes
- Temporal factors: File age and time since last modification
Prediction Output
- normalized_score: Numerical score indicating code quality trend
- Negative values: Quality improving
- Positive values: Quality degrading
- Higher magnitude = stronger trend
- risk_category: Classification based on degradation severity
- raw_prediction: Unprocessed model output
๐ง Development
Prerequisites
- Python >= 3.8
- Git
Setup
# Clone repository
git clone https://github.com/techdebtgpt/maintsight-pip.git
cd maintsight-pip
# Install in development mode
pip install -e ".[dev]"
# Or install requirements directly
pip install -r requirements.txt
# Test the CLI
maintsight predict --help
# Run pre-publish validation
python scripts/pre_publish.py
Project Structure
maintsight-pip/
โโโ maintsight/ # Python package
โ โโโ __init__.py
โ โโโ cli.py # Click-based CLI
โ โโโ models/ # Data models
โ โ โโโ __init__.py
โ โ โโโ commit_data.py # CommitData dataclass
โ โ โโโ risk_category.py # RiskCategory enum
โ โ โโโ risk_prediction.py # RiskPrediction dataclass
โ โ โโโ file_stats.py # FileStats dataclass
โ โ โโโ xgboost_model.py # XGBoost model structures
โ โ โโโ xgboost_model.pkl.pkl # Pre-trained model
โ โ โโโ xgboost_model_metadata.json # Model metadata
โ โโโ services/ # Core services
โ โ โโโ __init__.py
โ โ โโโ git_commit_collector.py
โ โ โโโ feature_engineer.py
โ โ โโโ xgboost_predictor.py
โ โโโ utils/ # Utilities
โ โโโ __init__.py
โ โโโ logger.py # Rich-based logger
โ โโโ html_generator.py # HTML report generator
โโโ tests/ # pytest tests
โ โโโ __init__.py
โ โโโ test_risk_category.py
โโโ cli.py # Main CLI entry point
โโโ pyproject.toml # Modern Python packaging
โโโ setup.py # Legacy setuptools support
โโโ requirements.txt # Runtime dependencies
โโโ requirements-dev.txt # Development dependencies
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=maintsight
# Run specific test file
pytest tests/test_risk_category.py
# Run with verbose output
pytest -v
# Install test dependencies
pip install -e ".[dev]"
Test Coverage Goals
- Services: 80%+
- Utils: 90%+
- CLI: 70%+
๐ Pre-publish Validation
Before publishing to PyPI, run the comprehensive pre-publish validation script:
# Run all quality checks (tests, formatting, linting, building)
python scripts/pre_publish.py
# This will:
# โ
Validate package configuration
# โ
Run all tests
# โ
Auto-format code with ruff
# โ
Check linting with ruff
# โ
Verify type hints with mypy (non-blocking)
# โ
Build and verify package artifacts
The script ensures your code is ready for production by running the same checks as the CI pipeline.
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Quick Start
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure all tests pass (
pytest) - Run pre-publish validation (
python scripts/pre_publish.py) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style
- Use Python 3.8+ features
- Follow PEP 8 style guide
- Use ruff for code formatting and linting
- Use type hints where appropriate
- Write meaningful commit messages
- Add tests for new features
- Update documentation as needed
# Format and fix code issues
ruff format .
ruff check . --fix
# Type checking
mypy .
# Run pre-publish validation (includes all checks)
python scripts/pre_publish.py
๐ Bug Reports
Found a bug? Please open an issue with:
- MaintSight version (
maintsight --help) - Python version
- Operating system
- Steps to reproduce
- Expected vs actual behavior
- Error messages/stack traces
๐ License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
๐ Acknowledgments
- XGBoost community for the excellent gradient boosting framework
- Git community for robust version control
- All contributors who help improve MaintSight
Made with โค๏ธ by the TechDebtGPT Team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maintsight_pip-0.5.0.tar.gz.
File metadata
- Download URL: maintsight_pip-0.5.0.tar.gz
- Upload date:
- Size: 104.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fc3d607b65a5b69fab750661312ee91ea476ac16a139038ff1eba49b0b6201c
|
|
| MD5 |
80a55e61f5cfcd18aa41dd257abf3b29
|
|
| BLAKE2b-256 |
84e0e7343e150e1dacc7fd39b572ccef9561cc328be4fd7d0584c6065375bade
|
Provenance
The following attestation bundles were made for maintsight_pip-0.5.0.tar.gz:
Publisher:
release.yml on floristafa/maintsight-pip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
maintsight_pip-0.5.0.tar.gz -
Subject digest:
0fc3d607b65a5b69fab750661312ee91ea476ac16a139038ff1eba49b0b6201c - Sigstore transparency entry: 1003857758
- Sigstore integration time:
-
Permalink:
floristafa/maintsight-pip@48fd4d27e958518793ca64031ec734149effea7f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/floristafa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@48fd4d27e958518793ca64031ec734149effea7f -
Trigger Event:
push
-
Statement type:
File details
Details for the file maintsight_pip-0.5.0-py3-none-any.whl.
File metadata
- Download URL: maintsight_pip-0.5.0-py3-none-any.whl
- Upload date:
- Size: 103.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dad1a6c1b65ea34d305d86cbe379ea7310ec3efb808eeaee0119e23e4d87e65
|
|
| MD5 |
3e6b97eebd5e49c0929bfdaafc554f56
|
|
| BLAKE2b-256 |
8f5d7b3add19369d6f8ac9522894d188f084198860119fe21058eb416f63f342
|
Provenance
The following attestation bundles were made for maintsight_pip-0.5.0-py3-none-any.whl:
Publisher:
release.yml on floristafa/maintsight-pip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
maintsight_pip-0.5.0-py3-none-any.whl -
Subject digest:
4dad1a6c1b65ea34d305d86cbe379ea7310ec3efb808eeaee0119e23e4d87e65 - Sigstore transparency entry: 1003857779
- Sigstore integration time:
-
Permalink:
floristafa/maintsight-pip@48fd4d27e958518793ca64031ec734149effea7f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/floristafa
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@48fd4d27e958518793ca64031ec734149effea7f -
Trigger Event:
push
-
Statement type: