Skip to main content

A Python wrapper for the ChemRxiv API

Project description

ChemRxiv

A Python wrapper for accessing the ChemRxiv preprint server Open Engage API. ChemRxiv is the free preprint repository for chemistry launched in 2016, providing open access to early research outputs in chemistry and related fields. This package provides a convenient Python interface to search, download, and interact with ChemRxiv content. Note: Usage is subject to ChemRxiv's licensing terms and attribution requirements which may restrict commercial and non-commercial use of the retrieved data. The publications might have been retracted or are not yet peer-reviewed.

Installation

pip install chemrxiv

Quick Start

In your Python script, include the line:

import chemrxiv

Usage Examples

Basic Search

import chemrxiv

# Create the default API client
client = chemrxiv.Client()

# Search for papers about catalysis
search = chemrxiv.Search(
    term="catalysis",
    limit=10,
    sort=chemrxiv.SortCriterion.PUBLISHED_DATE_DESC
)

results = list(client.results(search))

# Print titles of found papers
for paper in results:
    print(f"Title: {paper.title}")
    print(f"Authors: {', '.join(str(author) for author in paper.authors)}")
    print(f"DOI: {paper.doi}")
    print("---")

Advanced Search with Filters

import chemrxiv
from datetime import datetime

# Search with date range and license filters
filtered_search = chemrxiv.Search(
    term="ammonia decomposition",
    limit=5,
    search_date_from="2023-01-01T00:00:00.000Z",
    search_date_to="2025-05-19T00:00:00.000Z",
    search_license="CC BY 4.0",
    category_ids=["605c72ef153207001f6470d4"]  # Specific chemistry category
)

results = list(client.results(filtered_search))
print(f"Found {len(results)} papers matching criteria")

Retrieving Papers by ID or DOI

import chemrxiv

client = chemrxiv.Client()

# Get paper by ChemRxiv ID
paper_id = "6826a61850018ac7c5739260"
paper = client.item(paper_id)
print(f"Title: {paper.title}")
print(f"Abstract: {paper.abstract[:200]}...")

# Get paper by DOI
doi = "10.26434/chemrxiv-2025-5j4tn"
paper = client.item_by_doi(doi)
print(f"Title: {paper.title}")
print(f"Authors: {', '.join(str(author) for author in paper.authors)}")

Downloading Papers

Download PDFs directly from search results or individual papers:

import chemrxiv

client = chemrxiv.Client()
search = chemrxiv.Search(term="catalysis", limit=1)
paper = next(client.results(search))

# Download PDF with default filename
paper.download_pdf()

# Download PDF with custom filename
paper.download_pdf(filename="my-paper.pdf")

# Download PDF to specific directory
paper.download_pdf(dirpath="./downloads", filename="catalysis-paper.pdf")

# Download supporting information PDF
paper.download_si()

Exploring Categories and Licenses

import chemrxiv

client = chemrxiv.Client()

# Get all available categories
categories = client.categories()
print(f"Available categories ({len(categories)}):")
for category in categories[:5]:  # Show first 5
    print(f"- {category.name} (ID: {category.id})")
    if category.description:
        print(f"  Description: {category.description}")
    if category.count:
        print(f"  Papers: {category.count}")

# Get all available licenses
licenses = client.licenses()
print(f"\\nAvailable licenses ({len(licenses)}):")
for license in licenses:
    print(f"- {license.name} (ID: {license.id})")
    if license.url:
        print(f"  URL: {license.url}")

Custom Client Configuration

import chemrxiv

# Create a client with custom settings
custom_client = chemrxiv.Client(
    page_size=50,        # Fetch 50 results per API call
    delay_seconds=1.0,   # Wait 1 second between requests
    num_retries=5        # Retry failed requests up to 5 times
)

# Use custom client for searches
search = chemrxiv.Search(term="organic synthesis", limit=100)
results = list(custom_client.results(search))
print(f"Retrieved {len(results)} papers with custom client")

Debugging with Logging

To inspect network behavior and API interactions:

import logging
import chemrxiv

# Configure DEBUG-level logging
logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

# Set ChemRxiv logger to DEBUG
chemrxiv_logger = logging.getLogger("chemrxiv")
chemrxiv_logger.setLevel(logging.DEBUG)

# Now API calls will show detailed logging
client = chemrxiv.Client()
search = chemrxiv.Search(term="catalysis", limit=5)
results = list(client.results(search))

API Reference

Client

The Client class provides a reusable interface for making requests to the ChemRxiv API.

client = chemrxiv.Client(
    page_size=10,        # Number of results per API request (default: 10)
    delay_seconds=0,     # Delay between requests in seconds (default: 0)
    num_retries=3        # Number of retry attempts for failed requests (default: 3)
)

Methods:

  • results(search) - Execute a search and return an iterator of Result objects
  • item(item_id) - Retrieve a specific paper by ChemRxiv ID
  • item_by_doi(doi) - Retrieve a specific paper by DOI
  • categories() - Get all available categories
  • licenses() - Get all available licenses

Search

The Search class defines search parameters for querying ChemRxiv.

search = chemrxiv.Search(
    term="search+keywords",                    # Search term (required)
    limit=10,                                 # Maximum number of results
    sort=chemrxiv.SortCriterion.PUBLISHED_DATE_DESC,  # Sort criterion
    search_date_from="2023-01-01T00:00:00.000Z",      # Start date filter
    search_date_to="2025-12-31T00:00:00.000Z",        # End date filter
    search_license="CC-BY-4.0",                       # License filter
    category_ids=["category_id_1", "category_id_2"]   # Category filters
)

Sort Criteria:

  • chemrxiv.SortCriterion.PUBLISHED_DATE_DESC - Newest first
  • chemrxiv.SortCriterion.PUBLISHED_DATE_ASC - Oldest first
  • chemrxiv.SortCriterion.RELEVANCE - Most relevant first

Result

The Result class represents a paper from ChemRxiv with metadata and download capabilities.

Properties:

  • title - Paper title
  • authors - List of author objects
  • abstract - Paper abstract
  • doi - Digital Object Identifier
  • id - ChemRxiv paper ID
  • published_date - Publication date
  • pdf_url - Direct URL to PDF
  • categories - List of associated categories
  • license - License information

Methods:

  • download_pdf(filename=None, dirpath=".") - Download the paper as PDF
  • download_source(filename=None, dirpath=".") - Download source files if available

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

This library was created by Magdalena Lederbauer @mlederbauer and heavily inspired by the arXiV Python Wrapper and the ChemRxiv Dashboard. Thanks to Anamaria Leonescu @analeonescu for the support with accessing the API. Thanks to fxcoudert who open sourced the scripts used for accessing the ChemRxiv via the Figshare API. As ChemRxiv migrated to Open Engage in 2021, the script is deprecated and this library is intended to replace and streamline it. Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemrxiv-0.1.2.tar.gz (60.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemrxiv-0.1.2-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file chemrxiv-0.1.2.tar.gz.

File metadata

  • Download URL: chemrxiv-0.1.2.tar.gz
  • Upload date:
  • Size: 60.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for chemrxiv-0.1.2.tar.gz
Algorithm Hash digest
SHA256 978d364979bf19e25cdb5b6f9811b9747540d4b9ec1ee41f997ae553fec059b5
MD5 02c0dfdd696d673637ada1a61d9f050a
BLAKE2b-256 e544ea1090f30d84d43aa9dfc81e02f1c2de1f6839d00c150176fe2f00a46fce

See more details on using hashes here.

File details

Details for the file chemrxiv-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: chemrxiv-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for chemrxiv-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e200f4c8b71c1f812285ed125744af01b8cd8039a5696d3efb6d1e09779488db
MD5 ecac8e57b3bab9818f6703d8feb8cbb8
BLAKE2b-256 01b5d2a8ecd256fad85bbea59da1b821088d5ee9694b1da9b1082a6e4b31a10b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page