A Python library that compiles scattered docs into a unified knowledge base using LLMs. Crawls multiple documentation sources and synthesizes them into a single coherent reference — no matter where your docs live.
Project description
Docodex
An LLM-powered library that compiles scattered documentation into a unified, queryable knowledge base.
Inspired by Andrej Karpathy's approach to building personal knowledge bases with LLMs.
What It Does
Raw docs (GitHub, Azure Wiki, local files)
↓
Ingest → Extract & Store
↓
Compile → LLM generates summaries, concepts, relationships
↓
Wiki (markdown files in Git)
↓
Visualize → Graphs, timelines, clusters
Philosophy
- Simple & Transparent: Human-readable markdown, not opaque databases
- Git-Native: Version control for your knowledge
- LLM as Compiler: AI extracts and organizes, humans review
- No Vector DB: LLM-maintained indexes beat RAG at this scale (100-10K docs)
- Compounding Intelligence: Every query enhances the wiki
Architecture
Clean Architecture with SOLID principles:
- Domain Layer: Entities, protocols
- Application Layer: Use cases (Ingest, Compile, Visualize)
- Infrastructure Layer: Adapters (GitHub, Azure Wiki), LLM providers
Phase 1 Goals
✅ Ingest from GitHub repos, Azure Wiki, local files
✅ LLM compilation (summaries, concepts, relationships, timelines)
✅ Git-based storage (markdown with YAML frontmatter)
✅ Visualizations (knowledge graph, timeline, clusters)
✅ Incremental updates (only recompile changed docs)
✅ View in Obsidian or any markdown viewer
Project Structure
docodex/
├── .claude/ # Claude Code context and skills
├── docs/ # Architecture and design docs
├── src/docodex/ # Main library code
├── tests/ # Test suite
└── examples/ # Usage examples
Getting Started
See QUICK_START.md for development setup.
Documentation
Tech Stack
- Python 3.11+
- Pydantic v2 for models/DTOs
- Anthropic SDK (with adapters for other LLMs)
- Git for storage
- Markdown + YAML frontmatter
License
MIT
Author
Rohit
Note: This project uses AI assistance for development, but all code, commits, and releases are authored solely by the human developer.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docodex-0.1.0.tar.gz.
File metadata
- Download URL: docodex-0.1.0.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f71b701d6ece104e510b186ccab95eb1a76cafd302af3a60f90e98e83f2cbd8
|
|
| MD5 |
c9e332a2c62ba0205e6491316e29bb90
|
|
| BLAKE2b-256 |
0198fab6938c249f871ff3b7e315fd67ce562ef5686b49d4a14e9a67a89c54c1
|
File details
Details for the file docodex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: docodex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68d27bdb24e5884dfb74953c5f638dd3ace9b092de3d28d5a4f0351593ba5a6f
|
|
| MD5 |
9de4625c05009593cb4cfd41bd9c720e
|
|
| BLAKE2b-256 |
10b765125ca145ec60cba9b80a57d8840f846c2f0c27d8a6a071641996a403b9
|