A Python library encapsulating best practices for rubric-based evaluation of LLM/VLM outputs using LLM-as-a-judge.

These details have not been verified by PyPI

Project links

Project description

AutoRubric

A Python library for evaluating text outputs against weighted criteria using LLM-as-a-judge.

  @misc{rao2026autorubric,
        title={Autorubric: A Unified Framework for Rubric-Based LLM Evaluation},
        author={Delip Rao and Chris Callison-Burch},
        year={2026},
        eprint={2603.00077},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2603.00077},
  }

Installation

pip install autorubric

Quick Example

import asyncio
from autorubric import Rubric, LLMConfig
from autorubric.graders import CriterionGrader

async def main():
    grader = CriterionGrader(llm_config=LLMConfig(model="openai/gpt-5.1-mini"))

    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States NMC cell-level energy density in the 250-300 Wh/kg range"},
        {"weight": 8.0, "requirement": "Identifies LFP thermal runaway threshold (~270°C) as higher than NMC (~210°C)"},
        {"weight": 6.0, "requirement": "States LFP cycle life advantage (2000-5000 cycles vs 1000-2000 for NMC)"},
        {"weight": -15.0, "requirement": "Incorrectly claims LFP has higher gravimetric energy density than NMC"}
    ])

    result = await rubric.grade(
        to_grade="""NMC cathodes (LiNixMnyCozO2) achieve 250-280 Wh/kg at the cell level,
        while LFP (LiFePO4) typically reaches 150-205 Wh/kg. However, LFP offers superior
        thermal stability with decomposition onset at ~270°C compared to ~210°C for NMC,
        and delivers 2000-5000 charge cycles versus 1000-2000 for NMC.""",
        grader=grader,
        query="Compare NMC and LFP cathode materials for EV battery applications.",
    )

    print(f"Score: {result.score:.2f}")
    for criterion in result.report:
        print(f"  [{criterion.final_verdict}] {criterion.criterion.requirement}")

asyncio.run(main())

Documentation

Full documentation, API reference, and a cookbook with several dozen recipes are available at autorubric.org.

Resource	Link
Project site	autorubric.org
API reference	autorubric.org/docs/api
Cookbook	autorubric.org/docs/cookbook

Features

Feature	Description
Weighted criteria	Positive and negative weights with explicit requirements
Per-criterion explanations	Every verdict includes the judge's reasoning
100+ LLM providers	OpenAI, Anthropic, Google, Azure, Groq, Ollama, and more via LiteLLM
Ensemble judging	Combine multiple LLM judges with configurable aggregation strategies
Few-shot calibration	Provide labeled examples to improve grading consistency
Multi-choice criteria	Ordinal and nominal scales beyond binary met/unmet verdicts
Batch evaluation	High-throughput `EvalRunner` with checkpointing and resumption
Metrics & validation	Agreement metrics, bootstrap confidence intervals, distribution analysis
Length penalty	Configurable penalty for overly long responses
Thinking/reasoning support	Budget-controlled extended thinking for supported models
Response caching	Disk-based caching to avoid redundant LLM calls
Dataset support	Structured datasets with per-item rubrics, prompts, and ground truth
YAML configuration	Define rubrics, LLM configs, and datasets in YAML
Meta-rubric evaluation	Evaluate and automatically improve rubric quality

License

MIT License - see LICENSE file for details.

Acknowledgments

This research was developed with funding from the Defense Advanced Research Projects Agency’s (DARPA) SciFy program (Agreement No. HR00112520300). The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Mar 29, 2026

1.0.0

Mar 15, 2026

0.3.2

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autorubric-1.0.1.tar.gz (127.5 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autorubric-1.0.1-py3-none-any.whl (133.0 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file autorubric-1.0.1.tar.gz.

File metadata

Download URL: autorubric-1.0.1.tar.gz
Upload date: Mar 29, 2026
Size: 127.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for autorubric-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`f643796f42a71702cf06bad859c5e20d540331bd4ccbd889ada911364ba705cb`
MD5	`f7d3f5584ab2f824d0809ea298dc29e2`
BLAKE2b-256	`51ff54fbf7e2081ba775794b2e505f9d0e499ea0fb63212697c95f6763024695`

See more details on using hashes here.

File details

Details for the file autorubric-1.0.1-py3-none-any.whl.

File metadata

Download URL: autorubric-1.0.1-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 133.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for autorubric-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d50dc7191571c7375d7e7ec36c63a8717d0899d40b3fc067f1141c0724c8ea06`
MD5	`4aee7940e81fc20e7f366ddd0e81e309`
BLAKE2b-256	`c3738a17189c3874b7c3797d57f6446a4a613bbe39d731c186f15db56a7818e9`

See more details on using hashes here.

autorubric 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoRubric

Installation

Quick Example

Documentation

Features

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes