Skip to main content

Perform exploratory data analysis on REDCap data

Project description

REDCap-EDA

CI Status

๐Ÿ“Œ Overview

REDCap-EDA is a command-line tool for performing Exploratory Data Analysis (EDA) on REDCap datasets. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.

๐Ÿš€ Features

  • โœ… Automatic Data Type Enforcement (casts columns based on a predefined or user-defined schema)
  • ๐Ÿ“Š Summary Statistics (mean, median, std dev, outliers, categorical distributions)
  • ๐Ÿ“‰ Visualizations (histograms, box plots, categorical distributions, time trends, word clouds)
  • ๐Ÿ“‚ Comprehensive PDF Report Generation with UnifiedReport
  • ๐Ÿ”„ Multiprocessing for Faster Execution
  • ๐Ÿ” Progress Bars with tqdm
  • ๐Ÿ“‚ Exports Reports (JSON, PDF, and saved visualizations)
  • ๐Ÿ“ Interactive Schema Creation for custom datasets

๐Ÿ“ฆ Installation

pip install redcap-eda

๐Ÿ› ๏ธ Usage

๐Ÿ”น Example Using the Sample Dataset and Interactive Schema Creation

redcap-eda analyze --sample

๐Ÿ”น Example Using the Sample Dataset with a Predefined Schema

redcap-eda analyze --sample --sample-schema

๐Ÿ”น Running EDA on a Custom Dataset with Interactive Schema Creation

redcap-eda analyze --csv path/to/your_data.csv

๐Ÿ”น Running EDA on a Custom Dataset with a Predefined Schema

redcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json

๐Ÿ”น Running in Debug Mode

redcap-eda --debug analyze --sample

๐Ÿ”น Listing Available Test Cases

redcap-eda list-cases

๐Ÿ“‚ Project Structure

.
โ”œโ”€โ”€ Makefile                # Helper commands
โ”œโ”€โ”€ README.md               # Project documentation
โ”œโ”€โ”€ dist                    # Distribution files for PyPI
โ”œโ”€โ”€ mypy.ini                # Type checking configuration
โ”œโ”€โ”€ poetry.lock             # Poetry dependency lock file
โ”œโ”€โ”€ pyproject.toml          # Poetry project configuration
โ”œโ”€โ”€ schemas                 # Saved schema files
โ”‚   โ””โ”€โ”€ schema_sample_dataset.json
โ”œโ”€โ”€ src
โ”‚   โ”œโ”€โ”€ logs
โ”‚   โ”‚   โ””โ”€โ”€ redcap_eda.log  # Log files
โ”‚   โ””โ”€โ”€ redcap_eda
โ”‚       โ”œโ”€โ”€ analysis        # EDA analysis modules
โ”‚       โ”‚   โ”œโ”€โ”€ categorical
โ”‚       โ”‚   โ”‚   โ””โ”€โ”€ mixins.py # Categorical data analysis
โ”‚       โ”‚   โ”œโ”€โ”€ datetime
โ”‚       โ”‚   โ”‚   โ””โ”€โ”€ mixins.py # Datetime data analysis
โ”‚       โ”‚   โ”œโ”€โ”€ eda.py      # Main EDA module
โ”‚       โ”‚   โ”œโ”€โ”€ json_report_handler.py # JSON export utility
โ”‚       โ”‚   โ”œโ”€โ”€ lib.py       # Shared data structures (e.g., AnalysisResult)
โ”‚       โ”‚   โ”œโ”€โ”€ missing
โ”‚       โ”‚   โ”‚   โ””โ”€โ”€ mixins.py # Missing data analysis
โ”‚       โ”‚   โ”œโ”€โ”€ numerical
โ”‚       โ”‚   โ”‚   โ””โ”€โ”€ mixins.py # Numerical data analysis
โ”‚       โ”‚   โ””โ”€โ”€ text
โ”‚       โ”‚       โ””โ”€โ”€ mixins.py # Text data analysis
โ”‚       โ”œโ”€โ”€ cast_schema.py  # Schema enforcement
โ”‚       โ”œโ”€โ”€ cli.py          # Command-line interface
โ”‚       โ”œโ”€โ”€ load_case_data.py # Dataset loader
โ”‚       โ”œโ”€โ”€ logger.py       # Logging utilities
โ”‚       โ””โ”€โ”€ unified_report.py # PDF report generation
โ””โ”€โ”€ tests                   # Unit tests
    โ”œโ”€โ”€ __init__.py
    โ””โ”€โ”€ fixtures
        โ””โ”€โ”€ toy_data.csv    # Sample test data

๐Ÿ“ Contributing

  1. Fork the repository and create a feature branch.
  2. Run tests to ensure code integrity:
    poetry run pytest tests/
    
  3. Submit a pull request with a detailed description.

๐Ÿ“œ License

This project is licensed under the MIT License.

๐Ÿค Acknowledgments

  • REDCap for enabling structured data collection.
  • The Open Source Community for inspiration & contributions!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redcap_eda-0.2.1.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redcap_eda-0.2.1-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file redcap_eda-0.2.1.tar.gz.

File metadata

  • Download URL: redcap_eda-0.2.1.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.2 Linux/6.13.4-zen1-1-zen

File hashes

Hashes for redcap_eda-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d4142d963ddf52680f20c3d996c6225d9e329f7f7972e7ee50b327aa7ab25f34
MD5 5b1783656f2a171f821faf9290df90b2
BLAKE2b-256 d9aa50d8fd45f0772dcc8f67fdd6fe1a1c7b5c5e5400348c174fe6474325237d

See more details on using hashes here.

File details

Details for the file redcap_eda-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: redcap_eda-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.2 Linux/6.13.4-zen1-1-zen

File hashes

Hashes for redcap_eda-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 366f778a899e42469ecb42ca2268fd1454edc7762363e2c37aea0f27d6bf1350
MD5 93c068a948722014af85306e4d278f4b
BLAKE2b-256 3826e5be179d63ebf7c1903ec9471ad02ce8b557e48589e56e69188f48c40586

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page