AI-powered research paper digest with LLM-based ranking, quality scoring, and scheduling

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Daily Research Digest

AI-powered research paper digest with LLM-based ranking, quality scoring, and automatic scheduling. Fetches papers from Semantic Scholar and HuggingFace Daily Papers with author credibility signals.

Features

Multi-source paper fetching: Semantic Scholar (with author h-index) and HuggingFace Daily Papers (with upvotes)
Quality-aware ranking: Combines LLM relevance scores with author h-index and community upvotes
Smart deduplication: Merges quality signals when papers appear in multiple sources
LLM-powered relevance: Rank papers using Anthropic, OpenAI, or Google models
Automatic scheduling: Background scheduler for daily digest generation
Email delivery: SMTP-based digest emails (GitHub Actions compatible)
JSON storage: Date-based organization for easy retrieval

What's New in v0.2.0

Quality Scoring Pipeline: Papers are now ranked using a combined quality score that factors in:
- LLM relevance score (1-10)
- Author h-index (from Semantic Scholar)
- Community upvotes (from HuggingFace)
Simplified Sources: Streamlined to Semantic Scholar + HuggingFace (Semantic Scholar indexes arXiv papers)
New Paper Fields: author_h_indices, huggingface_upvotes, quality_score

Quality Scoring Formula

final_score = llm_relevance × (1 + quality_boost)

quality_boost = 0.2 × h_factor + 0.2 × upvotes_factor

h_factor: Normalized average author h-index (0-1)
upvotes_factor: Normalized HuggingFace upvotes (0-1)
Maximum 40% boost from quality signals
Missing data = no boost (not a penalty)

Installation

# Basic installation
pip install daily-research-digest

# With specific LLM provider
pip install daily-research-digest[anthropic]  # For Claude
pip install daily-research-digest[openai]     # For GPT
pip install daily-research-digest[google]     # For Gemini

# With all providers
pip install daily-research-digest[all]

# Development installation
pip install daily-research-digest[dev]

Quick Start

import asyncio
from pathlib import Path
from daily_research_digest import (
    DigestConfig,
    DigestGenerator,
    DigestStorage,
)

# Configure digest
config = DigestConfig(
    categories=["cs.AI", "cs.CL", "cs.LG"],  # For reference/filtering
    interests="AI agents, large language models, natural language processing",
    max_papers=50,
    top_n=10,
    llm_provider="anthropic",
    anthropic_api_key="your-api-key-here",
)

# Set up storage
storage = DigestStorage(Path("./digests"))

# Create generator
generator = DigestGenerator(storage)

# Generate digest
async def main():
    result = await generator.generate(config)
    print(f"Status: {result['status']}")

    if result['status'] == 'completed':
        digest = result['digest']
        print(f"Generated digest with {len(digest['papers'])} papers")

        for paper in digest['papers']:
            score = paper.get('quality_score') or paper['relevance_score']
            print(f"\n{score:.1f} - {paper['title']}")
            print(f"  {paper['link']}")

            # Show quality signals
            if paper.get('author_h_indices'):
                avg_h = sum(paper['author_h_indices']) / len(paper['author_h_indices'])
                print(f"  Avg h-index: {avg_h:.0f}")
            if paper.get('huggingface_upvotes'):
                print(f"  HF upvotes: {paper['huggingface_upvotes']}")

            print(f"  Reason: {paper['relevance_reason']}")

asyncio.run(main())

Scheduled Digests

import asyncio
from daily_research_digest import ArxivScheduler, DigestGenerator, DigestStorage, DigestConfig
from pathlib import Path

config = DigestConfig(
    categories=["cs.AI", "cs.LG"],
    interests="machine learning research",
    llm_provider="anthropic",
    anthropic_api_key="your-key",
)

storage = DigestStorage(Path("./digests"))
generator = DigestGenerator(storage)
scheduler = ArxivScheduler(generator, schedule_hour=6)  # 6 AM UTC

async def run_scheduler():
    # Start scheduler (runs daily at 6 AM UTC)
    scheduler.start(config)

    # Keep running
    try:
        while True:
            await asyncio.sleep(3600)
    except KeyboardInterrupt:
        scheduler.stop()

asyncio.run(run_scheduler())

GitHub Actions Cron Usage

Send daily digest emails using GitHub Actions. The digest runner supports:

Configurable time windows (24h, 48h, 7d)
Idempotent execution (won't re-send on workflow reruns)
Multiple LLM providers
SMTP email delivery
Structured JSON logging

Quick Setup

Add repository secrets (Settings > Secrets and variables > Actions):

Secret	Required	Description
`DIGEST_RECIPIENTS`	Yes	Comma-separated email addresses
`SMTP_HOST`	Yes	SMTP server hostname
`SMTP_USER`	No	SMTP username
`SMTP_PASS`	No	SMTP password
`ANTHROPIC_API_KEY`	Yes*	Anthropic API key
`OPENAI_API_KEY`	Alt	OpenAI API key
`GOOGLE_API_KEY`	Alt	Google API key

*Required if using Anthropic (default). Use OpenAI or Google key with corresponding LLM_PROVIDER.

Add repository variables (optional, for customization):

Variable	Default	Description
`DIGEST_INTERESTS`	`machine learning...`	Research interests for search & ranking
`DIGEST_SUBJECT`	`Daily Research Digest - {date}`	Email subject
`DIGEST_TZ`	`UTC`	Timezone
`DIGEST_WINDOW`	`24h`	Time window
`LLM_PROVIDER`	`anthropic`	LLM provider

Enable the workflow: The .github/workflows/digest.yml file runs daily at 6 AM UTC.

Manual Trigger

You can manually trigger the digest from the Actions tab using "Run workflow".

CLI Usage

Run the digest sender locally:

# Set required environment variables
export DIGEST_RECIPIENTS="you@example.com"
export DIGEST_INTERESTS="machine learning, AI agents, transformers"
export SMTP_HOST="smtp.gmail.com"
export SMTP_USER="your-email@gmail.com"
export SMTP_PASS="your-app-password"
export ANTHROPIC_API_KEY="your-api-key"

# Run the digest sender
python -m daily_research_digest.digest_send

Environment Variables Reference

Variable	Required	Default	Description
`DIGEST_RECIPIENTS`	Yes	-	Comma-separated email addresses
`DIGEST_INTERESTS`	Yes	-	Research interests for search & ranking
`DIGEST_SUBJECT`	No	`Daily Research Digest - {date}`	Email subject (supports `{date}`)
`DIGEST_FROM`	No	`noreply@example.com`	Sender address
`DIGEST_TZ`	No	`UTC`	Timezone for window calculation
`DIGEST_WINDOW`	No	`24h`	Time window (`24h`, `1d`, `48h`, `7d`)
`DIGEST_MAX_PAPERS`	No	`50`	Max papers to fetch per source
`DIGEST_TOP_N`	No	`10`	Top papers in digest
`SMTP_HOST`	Yes	-	SMTP server hostname
`SMTP_PORT`	No	`587`	SMTP server port
`SMTP_USER`	No	-	SMTP username
`SMTP_PASS`	No	-	SMTP password
`SMTP_TLS`	No	`true`	Use TLS (`true`/`false`)
`LLM_PROVIDER`	No	`anthropic`	`anthropic`, `openai`, or `google`
`ANTHROPIC_API_KEY`	*	-	Required for anthropic provider
`OPENAI_API_KEY`	*	-	Required for openai provider
`GOOGLE_API_KEY`	*	-	Required for google provider

Configuration

DigestConfig

categories: List of category codes for reference (e.g., ["cs.AI", "cs.LG"])
interests: Research interests description for search and ranking
max_papers: Maximum papers to fetch per source (default: 50)
top_n: Number of top papers to include in digest (default: 10)
llm_provider: One of "anthropic", "openai", or "google"
priority_authors: List of author names to boost in ranking
author_boost: Multiplier for priority author papers (default: 1.5)
API keys for your chosen provider

LLM Providers

The package supports multiple LLM providers for paper ranking:

Provider	Model	Package Required
anthropic	claude-3-haiku-20240307	langchain-anthropic
openai	gpt-3.5-turbo	langchain-openai
google	gemini-1.5-flash	langchain-google-genai

Each uses fast, cost-effective models optimized for ranking tasks.

Digest Format

Digests are saved as JSON files with the following structure:

{
  "date": "2024-01-15",
  "generated_at": "2024-01-15T06:00:00Z",
  "categories": ["cs.AI", "cs.CL"],
  "interests": "AI agents, LLMs",
  "total_papers_fetched": 50,
  "papers": [
    {
      "arxiv_id": "2401.12345",
      "title": "Paper Title",
      "abstract": "Abstract text...",
      "authors": ["Author One", "Author Two"],
      "categories": ["cs.AI", "cs.CL"],
      "published": "2024-01-14T00:00:00Z",
      "updated": "2024-01-14T00:00:00Z",
      "link": "https://arxiv.org/abs/2401.12345",
      "relevance_score": 9.0,
      "relevance_reason": "Directly addresses AI agent architectures",
      "author_h_indices": [45, 32, 28],
      "huggingface_upvotes": 156,
      "quality_score": 11.2
    }
  ]
}

Paper Fields

Field	Type	Description
`arxiv_id`	string	Paper identifier (arXiv ID or source-specific)
`title`	string	Paper title
`abstract`	string	Paper abstract
`authors`	string[]	List of author names
`categories`	string[]	Paper categories/fields
`published`	string	Publication date (ISO format)
`link`	string	URL to paper
`relevance_score`	float	LLM-assigned relevance (1-10)
`relevance_reason`	string	LLM explanation for score
`author_h_indices`	int[]	h-index for each author (from Semantic Scholar)
`huggingface_upvotes`	int	Community upvotes (from HuggingFace)
`quality_score`	float	Final combined score

API Reference

DigestGenerator

Generates paper digests using the quality scoring pipeline.

storage = DigestStorage(Path("./digests"))
generator = DigestGenerator(storage)
result = await generator.generate(config)

Quality Scoring

Compute quality scores manually:

from daily_research_digest import compute_quality_scores

# papers is a list of Paper objects with relevance_score set
compute_quality_scores(papers)

# Each paper now has quality_score populated
for paper in papers:
    print(f"{paper.quality_score:.1f} - {paper.title}")

SemanticScholarClient

Fetches papers from Semantic Scholar with author h-index.

from daily_research_digest import SemanticScholarClient

client = SemanticScholarClient(api_key="optional-api-key")
papers = await client.fetch_papers(
    query="large language models",
    limit=50,
    fields_of_study=["Computer Science"]
)

for paper in papers:
    if paper.author_h_indices:
        avg_h = sum(paper.author_h_indices) / len(paper.author_h_indices)
        print(f"{paper.title} - avg h-index: {avg_h:.0f}")

HuggingFaceClient

Fetches trending papers from HuggingFace Daily Papers with upvotes.

from daily_research_digest import HuggingFaceClient

client = HuggingFaceClient()
papers = await client.fetch_papers(limit=50)

for paper in papers:
    if paper.huggingface_upvotes:
        print(f"{paper.title} - {paper.huggingface_upvotes} upvotes")

DigestStorage

Manages digest persistence.

storage = DigestStorage(Path("./digests"))
storage.save_digest(digest)
digest = storage.get_digest("2024-01-15")
dates = storage.list_digests(limit=30)

ArxivScheduler

Schedules automated digest generation.

scheduler = ArxivScheduler(generator, schedule_hour=6)
scheduler.start(config)
scheduler.stop()

Development

# Clone repository
git clone https://github.com/LevRoz630/daily-research-digest.git
cd daily-research-digest

# Install with dev dependencies
pip install -e ".[dev,all]"

# Run tests
pytest

# Format code
black daily_research_digest tests

# Lint
ruff daily_research_digest tests

# Type check
mypy daily_research_digest

License

MIT License - see LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

leorozenbaum

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Feb 1, 2026

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daily_research_digest-0.2.0.tar.gz (44.1 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

daily_research_digest-0.2.0-py3-none-any.whl (32.6 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file daily_research_digest-0.2.0.tar.gz.

File metadata

Download URL: daily_research_digest-0.2.0.tar.gz
Upload date: Feb 1, 2026
Size: 44.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for daily_research_digest-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3a467774c50aa98bf7a7a69c45f7ce68be541a2afe7c7191c08640316c30e4e6`
MD5	`ceb44129369c73e77419a85b1f68f2aa`
BLAKE2b-256	`e37996e0f57131e8cfa04fff331fc8faae0e441ab1121c9c1c3302fc2664c594`

See more details on using hashes here.

Provenance

The following attestation bundles were made for daily_research_digest-0.2.0.tar.gz:

Publisher: publish.yml on LevRoz630/daily-research-digest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: daily_research_digest-0.2.0.tar.gz
- Subject digest: 3a467774c50aa98bf7a7a69c45f7ce68be541a2afe7c7191c08640316c30e4e6
- Sigstore transparency entry: 895338064
- Sigstore integration time: Feb 1, 2026
Source repository:
- Permalink: LevRoz630/daily-research-digest@a85aba8621a0c88910aabcaa384f3359ea4ab0d9
- Branch / Tag: refs/heads/main
- Owner: https://github.com/LevRoz630
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a85aba8621a0c88910aabcaa384f3359ea4ab0d9
- Trigger Event: workflow_dispatch

File details

Details for the file daily_research_digest-0.2.0-py3-none-any.whl.

File metadata

Download URL: daily_research_digest-0.2.0-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 32.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for daily_research_digest-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd76480435324f5f8ce3c2af4f2afb43bb79dab4bc76df1e6f292ff67c9c79c2`
MD5	`7fd6545d1c3bb0dd95b202c8b71a6685`
BLAKE2b-256	`275cbd1018539867216a89a95579cb3899a837af0eddb046ab58a111ef7e7286`

See more details on using hashes here.

Provenance

The following attestation bundles were made for daily_research_digest-0.2.0-py3-none-any.whl:

Publisher: publish.yml on LevRoz630/daily-research-digest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: daily_research_digest-0.2.0-py3-none-any.whl
- Subject digest: bd76480435324f5f8ce3c2af4f2afb43bb79dab4bc76df1e6f292ff67c9c79c2
- Sigstore transparency entry: 895338126
- Sigstore integration time: Feb 1, 2026
Source repository:
- Permalink: LevRoz630/daily-research-digest@a85aba8621a0c88910aabcaa384f3359ea4ab0d9
- Branch / Tag: refs/heads/main
- Owner: https://github.com/LevRoz630
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a85aba8621a0c88910aabcaa384f3359ea4ab0d9
- Trigger Event: workflow_dispatch

daily-research-digest 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Daily Research Digest

Features

What's New in v0.2.0

Quality Scoring Formula

Installation

Quick Start

Scheduled Digests

GitHub Actions Cron Usage

Quick Setup

Manual Trigger

CLI Usage

Environment Variables Reference

Configuration

DigestConfig

LLM Providers

Digest Format

Paper Fields

API Reference

DigestGenerator

Quality Scoring

SemanticScholarClient

HuggingFaceClient

DigestStorage

ArxivScheduler

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance