Skip to main content

Automatically rename academic PDFs using metadata from scholarly databases

Project description

Rename Academic PDF Files

Tired of PDFs with cryptic names like hhaf081.pdf or 1-s2.0-S0377221718308774-main.pdf?

rename-academic-pdf automatically renames your academic paper pdf files to meaningful filenames with title, author(s), year, and journal. It can also generate BibTeX files and convert PDFs to markdown in one command. For example:

paper.pdfAuthor2024-PaperTitle-Journal.pdf

Why use this tool?

  • 📄 Smart renaming — Extracts metadata from DOI, arXiv, or paper content
  • 📚 BibTeX export — Automatically build your bibliography file
  • 📝 Markdown conversion — Convert PDFs to markdown for AI/LLM workflows
  • 🔄 Batch processing — Rename hundreds of papers with one command
  • 🌐 7+ academic APIs — CrossRef, OpenAlex, Semantic Scholar, arXiv, PubMed, and more
  • No API key required — Works out of the box

Quick Start

# Install
pip install -U rename-academic-pdf

# Rename a single PDF
rename-academic-pdf paper.pdf

# Batch rename all PDFs
rename-academic-pdf *.pdf

# Preview changes without renaming
rename-academic-pdf paper.pdf --dry-run

# Export BibTeX entries
rename-academic-pdf *.pdf --bib-file references.bib

# Generate markdown versions along with BibTeX entries
pip install -U "rename-academic-pdf[all]"
rename-academic-pdf *.pdf --markdown-dir ./markdown/ --bib-file references.bib

Installation

From PyPI (recommended)

pip install rename-academic-pdf

# With optional features
pip install "rename-academic-pdf[all]"  # LLM fallback + markdown conversion

From source

git clone https://github.com/maifeng/rename-academic-pdf.git
cd rename-academic-pdf
pip install -e .

Requirements: Python 3.7+


Features

  • Intelligent identifier extraction: DOI, arXiv ID, PMID from filename, PDF text, and metadata
  • Multi-API cascade: Queries 7+ academic databases with smart fallbacks
  • BibTeX export: Fetch or generate BibTeX entries with PDF/markdown paths
  • Markdown conversion: Convert PDFs to markdown using markitdown
  • Journal abbreviations: Built-in abbreviations for 100+ journals and custom overrides
  • Batch processing: Rename multiple PDFs with wildcards (*.pdf, **/*.pdf)
  • LLM fallback: Use GPT models to extract metadatawhen APIs fail (optional)
  • No API key required: Most APIs are free (optional keys for better rate limits)

Filename Formats

Default Behavior

Default format: AuthorsYear-Title-Journal.pdf

  • ≤ 5 authors: All authors concatenated (e.g., SmithJones2024-...)
  • > 5 authors: First author + "EtAl" (e.g., SmithEtAl2024-...)

You can override the default format string using command line options or in a config file (see Configuration File section).

Format Presets

Preset Template Example
default {author}{year}-{title}-{journal} Author2025-PaperTitle-JournalName.pdf
compact {author}{year}-{title} Author2025-PaperTitle.pdf
full {author}-{year}-{title}-{journal} Author-2025-PaperTitle-JournalName.pdf
minimal {author}{year} Author2025.pdf
year_first {year}-{author}-{title} 2025-Author-PaperTitle.pdf
journal_first {journal}-{author}{year}-{title} JournalName-Author2025-PaperTitle.pdf
rename-academic-pdf paper.pdf --format compact      # No journal
rename-academic-pdf paper.pdf --format minimal      # Author + year only
rename-academic-pdf paper.pdf --format year_first   # Year first

Custom Format Strings

Create your own format using template variables:

  • {author} - Author name(s): all authors if ≤5, FirstAuthorEtAl if >5
  • {year} - Publication year
  • {title} - Paper title
  • {journal} - Journal abbreviation
rename-academic-pdf paper.pdf --format-string '{journal}_{year}_{author}'
rename-academic-pdf paper.pdf --format-string '{author}-{title}'

Additional Options

--first-author-only: Use only first author

rename-academic-pdf paper.pdf --first-author-only
# Output: Smith2024-Title-Journal.pdf (instead of SmithJonesBrown2024-...)

--separator (- or _): Change separator character

rename-academic-pdf paper.pdf --separator _
# Output: Smith2024_Title_Journal.pdf

--journal-abbrev-file: Use custom journal abbreviations file

rename-academic-pdf paper.pdf --journal-abbrev-file ~/my-journals.json
# Uses custom abbreviations from the specified JSON file
# Can be saved in ~/.rename-academic-pdf/journal_abbreviations.json for automatic loading

--max-title-length: Maximum title length in filename (default: 80)

rename-academic-pdf paper.pdf --max-title-length 120
# Longer titles allowed (truncates at word boundary, never mid-word)

--bib-file: Append BibTeX entries to a file

rename-academic-pdf paper.pdf --bib-file ~/references.bib
# Fetches BibTeX from DOI.org or arXiv, or generates from metadata

--markdown-dir: Generate markdown versions of PDFs

rename-academic-pdf paper.pdf --markdown-dir ~/markdown/
# Converts PDFs to markdown using markitdown
# Requires: pip install "rename-academic-pdf[markdown]"

API Coverage

The script tries multiple APIs in cascade order:

Identifier-Based (Primary)

  1. DOI → DOI.org → CrossRef → DataCite → Semantic Scholar
  2. arXiv ID → arXiv API → Semantic Scholar
  3. SSRN ID → Convert to DOI (10.2139/ssrn.{id}) → DOI.org → CrossRef
  4. PMID → PubMed API

Title-Based (Fallback)

  1. Semantic Scholar (200M+ papers, CS/AI focus)
  2. DBLP (Computer science bibliography)
  3. OpenAlex (200M+ papers, all fields)

Database Coverage

  • DOI.org: Authoritative DOI resolver (Citeproc JSON)
  • CrossRef: 130M+ journal articles (including SSRN)
  • DataCite: Datasets, conferences, grey literature
  • arXiv: STEM preprints
  • SSRN: Working papers (via DOI lookup)
  • PubMed: Biomedical literature
  • Semantic Scholar: CS/AI papers (optional API key)
  • DBLP: Computer science papers
  • OpenAlex: Comprehensive, free, no API key

Environment Variables (Optional)

# ~/.bashrc or ~/.zshrc
export SEMANTIC_SCHOLAR_API_KEY="your-api-key-here"
export PUBMED_API_KEY="your-api-key-here"  # For faster rate limits
export EMAIL="your@email.com"  # For CrossRef polite pool
export OPENAI_API_KEY="your-api-key-here"  # For --llm flag (OpenAI)
export OPENROUTER_API_KEY="your-api-key-here"  # For --llm flag (OpenRouter)

Get a free Semantic Scholar API key: https://www.semanticscholar.org/product/api

LLM-Based Extraction (Experimental)

When the --llm flag is enabled, the script will use an LLM as a fallback after all API-based methods fail. It extracts metadata from the first 3 pages of PDF text. This could be useful for working papers without doi. The default model is gpt-4.1-mini. Supports other OpenAI and OpenRouter model.

OpenAI (Default)

# Uses OPENAI_API_KEY
rename-academic-pdf *.pdf --llm
rename-academic-pdf *.pdf --llm --llm-model gpt-4o-mini

OpenRouter

Use provider/model format to automatically use OpenRouter:

# Uses OPENROUTER_API_KEY (auto-detected from model format)
rename-academic-pdf *.pdf --llm --llm-model anthropic/claude-3-haiku
rename-academic-pdf *.pdf --llm --llm-model google/gemini-2.0-flash-001

Requirements:

  • pip install openai (or pip install "rename-academic-pdf[llm]")
  • OPENAI_API_KEY for OpenAI models, or OPENROUTER_API_KEY for OpenRouter models set in environment variables.

Journal Abbreviations

The package includes built-in abbreviations for 100+ major academic journals. For example:

  • "Journal of Management Information Systems" → "JMIS"
  • "Information Systems Research" → "ISR"
  • "Review of Financial Studies" → "RFS"

Custom Journal Abbreviations

You can provide your own journal abbreviations to override or extend the built-in list. The package searches for custom abbreviation files in the following order:

  1. Command-line argument: --journal-abbrev-file path/to/file.json
  2. User's home directory: ~/.rename-academic-pdf/journal_abbreviations.json
  3. Default bundled file: Built-in abbreviations

Creating a Custom Abbreviations File

Create a JSON file with the following structure:

{
    "comment": "My custom journal abbreviations",
    "abbreviations": {
        "Journal of Interesting Research": "JIR",
        "Quarterly Review of Examples": "QRE",
        "Proceedings of Example Conference": "PEC"
    }
}

Using Custom Abbreviations

Option 1: Command-line argument

rename-academic-pdf paper.pdf --journal-abbrev-file ~/my-journals.json

Option 2: User home directory (automatically loaded)

# Create the directory
mkdir -p ~/.rename-academic-pdf

# Copy or create your custom file
cp my-journals.json ~/.rename-academic-pdf/journal_abbreviations.json

# Run normally - custom abbreviations will be used automatically
rename-academic-pdf paper.pdf

Configuration File

You can set default options by creating a config file at ~/.rename-academic-pdf/config.json:

{
    "format_string": "{author}_{year}_{journal}_{title}",
    "first_author_only": true,
    "max_title_length": 100,
    "llm": true,
    "llm_model": "gpt-4o-mini",
    "bib_file": "~/papers.bib",
    "markdown_dir": "~/paper_markdown"
}

Available Options

Option Type Default Description
format string "default" Format preset (default, compact, full, minimal, year_first, journal_first)
format_string string - Custom format string (overrides format if both set)
separator string "-" Separator character ("-" or "_")
first_author_only boolean false Use only first author
max_title_length integer 80 Maximum title length in filename (truncates at word boundary)
llm boolean false Enable LLM fallback
llm_model string "gpt-4.1-mini" LLM model for --llm mode
bib_file string - Path to BibTeX file to append entries to
markdown_dir string - Directory to save markdown versions of PDFs

Command-line arguments always override config file settings.

License

MIT License - see LICENSE file

Author

Created by Feng Mai.

☕ If this tool saved you time, consider buying me a coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rename_academic_pdf-1.2.0.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rename_academic_pdf-1.2.0-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file rename_academic_pdf-1.2.0.tar.gz.

File metadata

  • Download URL: rename_academic_pdf-1.2.0.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for rename_academic_pdf-1.2.0.tar.gz
Algorithm Hash digest
SHA256 cc504b71c9b23c21e320c970162e2fea596f3f42666fd86ddacfaddc7d2d5bcf
MD5 6f1a6c0018bd97a55f0b42b02ac4047f
BLAKE2b-256 ca2f29c6cb2a2793955133ee63482ff919cb2314b309e3f1f6a87e25eb647a09

See more details on using hashes here.

File details

Details for the file rename_academic_pdf-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rename_academic_pdf-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fcc98864c2084b9118a68ad14c9dfab2cbc752871415a45c4cf6e54fae3694e0
MD5 0362061e3039bf7cd002212f2601c1fa
BLAKE2b-256 8abfc37df93a5a6fd95e13448e0ee7754bdc55dcc2de8c99d403f954d85d11b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page