Perform exploratory data analysis on REDCap data
Project description
REDCap-EDA
๐ Overview
REDCap-EDA is a command-line tool for performing Exploratory Data Analysis (EDA) on REDCap datasets. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.
๐ Features
- โ Automatic Data Type Enforcement (casts columns based on a predefined or user-defined schema)
- ๐ Summary Statistics (mean, median, std dev, outliers, categorical distributions)
- ๐ Visualizations (histograms, box plots, categorical distributions, time trends, word clouds)
- ๐ Comprehensive PDF Report Generation with UnifiedReport
- ๐ Multiprocessing for Faster Execution
- ๐ Progress Bars with
tqdm - ๐ Exports Reports (JSON, PDF, and saved visualizations)
- ๐ Interactive Schema Creation for custom datasets
๐ฆ Installation
pip install redcap-eda
๐ ๏ธ Usage
๐น Example Using the Sample Dataset and Interactive Schema Creation
redcap-eda analyze --sample
๐น Example Using the Sample Dataset with a Predefined Schema
redcap-eda analyze --sample --sample-schema
๐น Running EDA on a Custom Dataset with Interactive Schema Creation
redcap-eda analyze --csv path/to/your_data.csv
๐น Running EDA on a Custom Dataset with a Predefined Schema
redcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json
๐น Running in Debug Mode
redcap-eda --debug analyze --sample
๐น Listing Available Test Cases
redcap-eda list-cases
๐ Project Structure
.
โโโ Makefile # Helper commands
โโโ README.md # Project documentation
โโโ dist # Distribution files for PyPI
โโโ mypy.ini # Type checking configuration
โโโ poetry.lock # Poetry dependency lock file
โโโ pyproject.toml # Poetry project configuration
โโโ schemas # Saved schema files
โ โโโ schema_sample_dataset.json
โโโ src
โ โโโ logs
โ โ โโโ redcap_eda.log # Log files
โ โโโ redcap_eda
โ โโโ analysis # EDA analysis modules
โ โ โโโ categorical
โ โ โ โโโ mixins.py # Categorical data analysis
โ โ โโโ datetime
โ โ โ โโโ mixins.py # Datetime data analysis
โ โ โโโ eda.py # Main EDA module
โ โ โโโ json_report_handler.py # JSON export utility
โ โ โโโ lib.py # Shared data structures (e.g., AnalysisResult)
โ โ โโโ missing
โ โ โ โโโ mixins.py # Missing data analysis
โ โ โโโ numerical
โ โ โ โโโ mixins.py # Numerical data analysis
โ โ โโโ text
โ โ โโโ mixins.py # Text data analysis
โ โโโ cast_schema.py # Schema enforcement
โ โโโ cli.py # Command-line interface
โ โโโ load_case_data.py # Dataset loader
โ โโโ logger.py # Logging utilities
โ โโโ unified_report.py # PDF report generation
โโโ tests # Unit tests
โโโ __init__.py
โโโ fixtures
โโโ toy_data.csv # Sample test data
๐ Contributing
- Fork the repository and create a feature branch.
- Run tests to ensure code integrity:
poetry run pytest tests/
- Submit a pull request with a detailed description.
๐ License
This project is licensed under the MIT License.
๐ค Acknowledgments
- REDCap for enabling structured data collection.
- The Open Source Community for inspiration & contributions!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redcap_eda-0.2.1.tar.gz.
File metadata
- Download URL: redcap_eda-0.2.1.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.13.2 Linux/6.13.4-zen1-1-zen
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4142d963ddf52680f20c3d996c6225d9e329f7f7972e7ee50b327aa7ab25f34
|
|
| MD5 |
5b1783656f2a171f821faf9290df90b2
|
|
| BLAKE2b-256 |
d9aa50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d
|
File details
Details for the file redcap_eda-0.2.1-py3-none-any.whl.
File metadata
- Download URL: redcap_eda-0.2.1-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.13.2 Linux/6.13.4-zen1-1-zen
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
366f778a899e42469ecb42ca2268fd1454edc7762363e2c37aea0f27d6bf1350
|
|
| MD5 |
93c068a948722014af85306e4d278f4b
|
|
| BLAKE2b-256 |
3826e5be179d63ebf7c1903ec9471ad02ce8b557e48589e56e69188f48c40586
|