Skip to main content

A Python library that compiles scattered docs into a unified knowledge base using LLMs. Crawls multiple documentation sources and synthesizes them into a single coherent reference — no matter where your docs live.

Project description

Docodex

An LLM-powered library that compiles scattered documentation into a unified, queryable knowledge base.

Inspired by Andrej Karpathy's approach to building personal knowledge bases with LLMs.

What It Does

Raw docs (GitHub, Azure Wiki, local files)
    ↓
Ingest → Extract & Store
    ↓
Compile → LLM generates summaries, concepts, relationships
    ↓
Wiki (markdown files in Git)
    ↓
Visualize → Graphs, timelines, clusters

Philosophy

  • Simple & Transparent: Human-readable markdown, not opaque databases
  • Git-Native: Version control for your knowledge
  • LLM as Compiler: AI extracts and organizes, humans review
  • No Vector DB: LLM-maintained indexes beat RAG at this scale (100-10K docs)
  • Compounding Intelligence: Every query enhances the wiki

Architecture

Clean Architecture with SOLID principles:

  • Domain Layer: Entities, protocols
  • Application Layer: Use cases (Ingest, Compile, Visualize)
  • Infrastructure Layer: Adapters (GitHub, Azure Wiki), LLM providers

Phase 1 Goals

✅ Ingest from GitHub repos, Azure Wiki, local files
✅ LLM compilation (summaries, concepts, relationships, timelines)
✅ Git-based storage (markdown with YAML frontmatter)
✅ Visualizations (knowledge graph, timeline, clusters)
✅ Incremental updates (only recompile changed docs)
✅ View in Obsidian or any markdown viewer

Project Structure

docodex/
├── .claude/              # Claude Code context and skills
├── docs/                 # Architecture and design docs
├── src/docodex/      # Main library code
├── tests/                # Test suite
└── examples/             # Usage examples

Getting Started

See QUICK_START.md for development setup.

Documentation

Tech Stack

  • Python 3.11+
  • Pydantic v2 for models/DTOs
  • Anthropic SDK (with adapters for other LLMs)
  • Git for storage
  • Markdown + YAML frontmatter

License

MIT

Author

Rohit


Note: This project uses AI assistance for development, but all code, commits, and releases are authored solely by the human developer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docodex-0.1.0.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docodex-0.1.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file docodex-0.1.0.tar.gz.

File metadata

  • Download URL: docodex-0.1.0.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.20

File hashes

Hashes for docodex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f71b701d6ece104e510b186ccab95eb1a76cafd302af3a60f90e98e83f2cbd8
MD5 c9e332a2c62ba0205e6491316e29bb90
BLAKE2b-256 0198fab6938c249f871ff3b7e315fd67ce562ef5686b49d4a14e9a67a89c54c1

See more details on using hashes here.

File details

Details for the file docodex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: docodex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.20

File hashes

Hashes for docodex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 68d27bdb24e5884dfb74953c5f638dd3ace9b092de3d28d5a4f0351593ba5a6f
MD5 9de4625c05009593cb4cfd41bd9c720e
BLAKE2b-256 10b765125ca145ec60cba9b80a57d8840f846c2f0c27d8a6a071641996a403b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page