Skip to main content

A high-performance Python application template

Project description

OmniSkill

DeepWiki Context7 Python 3.12+ License: Apache-2.0 PyPI

omni-skill

A super skill generator that turns CSV and Markdown datasets into ready-to-use Agentic-RAG skills with a single command.

Overview

OmniSkill analyzes your dataset directory and generates:

  • SKILL.md — Skill specification document that tells LLMs how to use the knowledge base
  • search.py — Standalone Python script for BM25-based retrieval against the dataset
  • datasets/ — Symlinked source data for the generated skill

It also provides a CLI for manually creating skill scaffolding and searching knowledge bases.

Installation

Using uv (Recommended)

uv add omniskill

Using pip

pip install omniskill

From Source

git clone https://github.com/longcipher/omni-skill.git
cd omni-skill
uv sync --all-groups

Quick Start

Generate a Skill from a Dataset

Point OmniSkill at any directory containing CSV and/or Markdown files:

omniskill generate examples/backend-api-master

This analyzes the dataset files and produces a complete skill under skills/<dataset-name>/:

skills/backend-api-master/
  SKILL.md      # Skill specification for LLMs
  search.py     # Standalone search script
  datasets/     # Symlink to source data

Use the Generated Skill

The generated search.py can be run directly:

python skills/backend-api-master/search.py "API design best practices"

Or use the CLI:

omniskill search "API design best practices" --skill-dir skills/backend-api-master

Custom Options

# Custom skill name and output directory
omniskill generate my-datasets/ --name my-skill --output out/my-skill/

# Verbose mode shows dataset analysis details
omniskill generate my-datasets/ --verbose

Architecture

graph TD
    subgraph "1. Dataset Input"
        CSV[CSV Files] --> GEN[omniskill generate]
        MD[Markdown Files] --> GEN
    end

    subgraph "2. Generator"
        GEN --> ANALYZE[Analyze Dataset]
        ANALYZE --> GEN_SCRIPT[Generate search.py]
        ANALYZE --> GEN_MD[Generate SKILL.md]
        ANALYZE --> LINK[Symlink datasets/]
    end

    subgraph "3. Generated Skill"
        GEN_SCRIPT --> SEARCH_PY[search.py]
        GEN_MD --> SKILL_MD[SKILL.md]
        LINK --> DATASETS[datasets/]
    end

    subgraph "4. Runtime"
        SEARCH_PY --> ENGINE[SearchEngine]
        ENGINE --> INDEXER[CSV / MD Indexer]
        ENGINE --> BM25[BM25 Searcher]
        BM25 --> ASM[PromptAssembler]
        ASM --> |XML / Markdown / llms.txt| OUTPUT[Formatted Context]
    end

    SKILL_MD -. "instructs LLM" .-> SEARCH_PY

CLI Reference

generate — Generate a Skill from Datasets

omniskill generate <dataset-dir> [options]

Arguments:

  • dataset-dir — Path to a directory containing CSV and/or Markdown files

Options:

  • --name, -n — Skill name (defaults to the dataset directory name)
  • --output, -o — Output directory (defaults to skills/<skill-name>/)
  • --verbose, -v — Show dataset analysis details

Example:

omniskill generate data/api-specs/ --name api-assistant --output skills/api-assistant/

create — Create Skill Scaffolding

Create an empty skill directory with template files:

omniskill create <skill-name> [--force]

Options:

  • --force, -f — Overwrite existing skill directory

search — Search a Knowledge Base

omniskill search <query> --skill-dir <path> [options]

Options:

  • --skill-dir, -d — Path to the skill directory (required)
  • --format, -f — Output format: xml, markdown (default: xml)
  • --limit, -l — Maximum number of results (default: 10)
  • --type, -t — Filter by document type: csv, markdown
  • --tag — Filter by tag (AND logic, repeatable)
  • --metadata — Include BM25 scores in output
  • --verbose, -v — Enable verbose output

Python API

Generate a Skill Programmatically

from omniskill.core.generator import generate_skill

analysis = generate_skill(
    dataset_dir="data/my-datasets",
    skill_name="my-skill",
    output_dir="skills/my-skill",
)

print(f"Generated {analysis.skill_name} with {analysis.total_documents} documents")

SearchEngine

Index and search directories of CSV/Markdown files:

from omniskill.core.engine import SearchEngine
from omniskill.core.assembler import OutputFormat, PromptAssembler

engine = SearchEngine()
engine.index_directory("skills/my-skill/datasets")

results = engine.search("API design", limit=10)

assembler = PromptAssembler()
print(assembler.assemble(results, output_format=OutputFormat.XML))

Dataset Analysis

Analyze a dataset directory without generating files:

from omniskill.core.generator import analyze_dataset

analysis = analyze_dataset("data/my-datasets")
print(f"CSV files: {len(analysis.csv_files)}")
print(f"Markdown files: {len(analysis.markdown_files)}")
print(f"Total documents: ~{analysis.total_documents}")

Output Formats

OmniSkill supports three output formats for assembled search results:

Format CLI Flag Description
XML --format xml Structured <context_injection> with <rules> and <reference> sections
Markdown --format markdown Human-readable sections with source attribution
llms.txt (in generated scripts) Follows the llms.txt spec for LLM consumption

Contributing

Development Setup

git clone https://github.com/longcipher/omni-skill.git
cd omni-skill
uv sync --all-groups

Common Commands

just format      # Format code
just lint        # Run linter
just test        # Run unit tests
just bdd         # Run BDD tests
just test-all    # Run all tests
just build       # Build package
just typecheck   # Run type checker

Adding New Features

  1. Write a failing Gherkin scenario in features/*.feature
  2. Write a failing pytest test for the inner domain logic
  3. Implement the feature
  4. Re-run just test and just bdd to verify

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omniskill-0.2.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omniskill-0.2.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file omniskill-0.2.0.tar.gz.

File metadata

  • Download URL: omniskill-0.2.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omniskill-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a72de38afaa392e7e26ed68dcf19c4863586d65321da514285d2fb611961fae0
MD5 270e2d2801f92be288eb1ae3a4e845b8
BLAKE2b-256 2f3c7a9f13f7450ed97c19aaa01bac0b905cf13c6663978090c0d5c66b141480

See more details on using hashes here.

File details

Details for the file omniskill-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: omniskill-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omniskill-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca337297d64cff70eeb23082e047b34eb0650ce5549cd2da6edc4bc58b45e73a
MD5 238b003e9d86cbcb6be5b689db3642e0
BLAKE2b-256 ca86e4231464da2515a5e026907ab437a80b5370d2fe2a93bd72c4fb4afdd717

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page