Skip to main content

Tracking daily arXiv updates and generating intelligent summaries with LLMs.

Project description

arxiv-daily

PyPI version Python Version License

AI-powered arXiv research assistant - Beautiful terminal interface for tracking arXiv preprints and generating intelligent summaries with LLMs.

Build your own personal RAG (Retrieval-Augmented Generation) knowledge base - track daily papers, generate structured summaries with rich metadata, and export them to Markdown for seamless integration with vector databases, semantic search engines, and note-taking workflows like Obsidian.

Key capabilities:

  • Daily arXiv Updates: Fetch and filter the latest preprints from any arXiv channel.
  • AI-Powered Summaries: Generate structured, organized summaries using LLMs.
  • Paper Metadata: Fetch detailed metadata for any arXiv paper.
  • Beautiful Output: Colorful terminal output, syntax highlighting, and progress bars using the Rich library.
  • Smart Filtering: Filter by arXiv categories and channels for focused research.
  • Obsidian Integration: Export summaries as Markdown with frontmatter for knowledge management.

Quick Start

Install

Install the package from PyPI:

pip install arxiv-daily

Or install from source for development:

git clone https://github.com/GZU-MuTian/arxiv-daily.git
cd arxiv-daily
pip install -e .

Environment Setup (Recommended)

To streamline usage and avoid repetitive CLI flags, we recommend configuring environment variables. This approach simplifies command execution and enhances security by avoiding credentials in command history.

# LLM Configuration (required)
DEEPSEEK_API_KEY="your-deepseek-api-key-here"

# Default arXiv categories (comma-separated)
export ARXIV_CATEGORY="cs.AI,astro-ph.HE,hep-ph"

# Default output directory for summaries (optional)
export ARXIV_SUMMARIZE_OUTPUT="/path/to/your/obsidian/vault"

# Default output directory for knowledge graph concepts (optional)
export ARXIV_EXTRACTOR_OUTPUT="/path/to/your/obsidian/vault/concepts"

Usage Guide

Command-Line Interface

arxiv-daily includes a CLI named arXiv.

Tip: Run arXiv --help for an overview, or arXiv <command> --help for command-specific options.

Fetch the latest preprints from any arXiv channel with beautiful terminal formatting:

# Get the latest papers in Astrophysics
arXiv new

# Specific channel (e.g., Computer Science - AI)
arXiv new --channel cs.AI

# Filter by multiple categories
arXiv new --channel astro-ph --category astro-ph.HE,astro-ph.IM

Fetch Paper Metadata:

# Get metadata for a specific paper
arXiv meta 2401.12345

# Supports various input formats
arXiv meta arXiv:2401.12345
arXiv meta arXiv:2401.12345v1

Generate AI Summaries:

# Basic summary with default model (DeepSeek)
arXiv summarize 2401.12345

# Specify model and provider
arXiv summarize 2401.12345 --model deepseek-chat --provider deepseek

# Short form
arXiv summarize 2401.12345 -m deepseek-chat -p deepseek -t 0.5

# Save to file (if ARXIV_SUMMARIZE_OUTPUT is set)
arXiv summarize 2401.12345

# Save to specific directory
arXiv summarize 2401.12345 -o /path/to/output

Extract Knowledge Graph Relationships:

# Basic extraction with default model (DeepSeek)
arXiv extractor 2401.12345

# Specify model and provider
arXiv extractor 2401.12345 --model deepseek-chat --provider deepseek

# Short form
arXiv extractor 2401.12345 -m deepseek-chat -p deepseek -t 0.5

# Save concept files to directory (if ARXIV_EXTRACTOR_OUTPUT is set)
arXiv extractor 2401.12345

# Save to specific directory
arXiv extractor 2401.12345 -o /path/to/concepts

The extractor command analyzes paper summaries and extracts key concepts with their relationships, creating a structured knowledge graph. Each concept is categorized and linked to the source paper, making it perfect for building a personal research knowledge base.

Obsidian Integration: When using the -o option, concepts are saved as individual Markdown files with:

  • YAML frontmatter for metadata
  • Obsidian-style links ([[arxiv-id]])
  • Automatic deduplication (same paper won't be added twice)

Adjust verbosity for debugging or quiet runs:

# Production - errors only (default)
arXiv --log-level ERROR new

# Short form for detailed debugging
arXiv -v DEBUG new

Knowledge Graph Extraction

The arXiv extractor command builds a structured knowledge base by extracting key concepts and relationships from academic papers.

Concept Categories

The extractor classifies concepts into these research domains:

  • galaxy-physics: Galaxy formation, evolution, dynamics
  • cosmology: Dark matter, cosmic microwave background, large-scale structure
  • earth-planetary: Exoplanets, planetary atmospheres, astrobiology
  • high-energy-astrophysics: Black holes, neutron stars, gamma-ray bursts
  • solar-stellar: Stellar evolution, solar physics, star formation
  • statistics-ai: Machine learning, statistical methods, neural networks
  • numerical-simulation: N-body simulations, hydrodynamics, radiative transfer
  • instrumental-design: Telescopes, spectrographs, detectors
  • astronomical-events: Supernovae, gravitational waves, fast radio bursts

Example Workflow

# 1. Generate summary first
arXiv summarize 2401.12345 -o ./summaries

# 2. Extract knowledge graph
arXiv extractor 2401.12345 -o ./concepts

Integration with Obsidian

The extractor is designed to work seamlessly with Obsidian:

  1. Backlinks: Use [[arxiv-id]] syntax for paper references
  2. Tags: Automatic tagging for easy filtering
  3. Graph View: Visualize connections between papers and concepts
  4. Search: Find all papers mentioning a specific concept

Project Structure

arxiv_daily/
├── agents.py        # LangGraph agents for complex summarization workflows
├── chains.py        # LangChain chains for LLM interactions (includes KnowledgeGraphExtractor)
├── cli.py           # Command-line interface built with Typer
├── core.py          # Core functions (_run_new, _run_summarize, _run_extractor)
├── llm_client.py    # Unified LLM provider interface
├── utils.py         # Utility functions
└── __init__.py

Related Resources

Contact

For questions and support:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_daily-0.1.6.tar.gz (270.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv_daily-0.1.6-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_daily-0.1.6.tar.gz.

File metadata

  • Download URL: arxiv_daily-0.1.6.tar.gz
  • Upload date:
  • Size: 270.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arxiv_daily-0.1.6.tar.gz
Algorithm Hash digest
SHA256 677e103db849f56dcb7721ce4fed1a1965b628befc50704ecacb727472b4a0e1
MD5 5bf91d43b5365530d2f8c6bda22f09e6
BLAKE2b-256 0450ecd76271df88a7c9e4675b4a39d4589acf2206c3c145337f2890fa4f7577

See more details on using hashes here.

File details

Details for the file arxiv_daily-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: arxiv_daily-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arxiv_daily-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7bff00718347ce51fe0c8f1d7b222d43654bda47493d17b41a14e0545b7b1401
MD5 c833296993203c5294553a20194e74ee
BLAKE2b-256 02b9c3da6f19d2915f64dd76d26bdc55ddac97b0aea098aaad8303093f26f257

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page