pyopenalex

A Pydantic-powered Python client for the OpenAlex API

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nthomsencph

These details have not been verified by PyPI

Project description

PyOpenAlex

A Pydantic-powered Python client for the OpenAlex API.

OpenAlex is an open catalog of the global research system: 270M+ scholarly works, 90M+ authors, and 100K+ sources. PyOpenAlex gives you typed access to all of it with an API that follows the patterns of FastAPI and Pydantic.

from pyopenalex import OpenAlex, gt

with OpenAlex() as client:
    for work in client.works.filter(cited_by_count=gt(1000), publication_year=2024).limit(10):
        print(work.to_markdown(limit_abstract=150))

Installation

pip install pyopenalex

Requires Python 3.10+.

Quick Start

from pyopenalex import OpenAlex

client = OpenAlex(api_key="your-key")  # or set OPENALEX_API_KEY env var

# Get a single work by ID
work = client.works.get("W2741809807")
print(work.title)
print(work.doi)
print(work.abstract)  # reconstructed from inverted index

# Search for authors
results = client.authors.search("Einstein").get(5)
for author in results.results:
    print(f"{author.name}: {author.works_count} works")

Entities

PyOpenAlex supports all OpenAlex entity types:

# Core entities
client.works              # Scholarly documents (articles, books, datasets)
client.authors            # Researcher profiles
client.sources            # Journals, repositories, conferences
client.institutions       # Universities, research organizations
client.topics             # Subject classifications
client.keywords           # Extracted keywords
client.publishers         # Publishing organizations
client.funders            # Funding agencies

# Topic hierarchy
client.domains            # Top-level categories (4 total)
client.fields             # Second-level categories (26 total)
client.subfields          # Third-level categories (254 total)

# Reference entities
client.sdgs               # UN Sustainable Development Goals
client.countries          # Countries
client.continents         # Continents
client.languages          # Languages
client.work_types         # Work types (article, book, dataset, etc.)
client.source_types       # Source types (journal, repository, etc.)
client.institution_types  # Institution types (education, company, etc.)
client.licenses           # Open access licenses (CC BY, etc.)
client.awards             # Research grants and funding awards

Every entity is a Pydantic model with fully typed fields and convenience aliases:

work = client.works.get("W2741809807")

work.title                              # str | None
work.year                               # alias for publication_year
work.citations                          # alias for cited_by_count
work.authors                            # list of author name strings
work.name                               # alias for display_name
work.abstract                           # reconstructed from inverted index
work.open_access.is_oa                  # bool
work.open_access.oa_status              # str (gold, green, hybrid, bronze, diamond, closed)
work.authorships[0].author.display_name # str | None
work.authorships[0].institutions        # list[DehydratedInstitution]
work.primary_location.source            # DehydratedSource | None

The .name and .citations aliases are available on all entity types.

Looking Up Entities

By OpenAlex ID

work = client.works.get("W2741809807")
author = client.authors.get("A5023888391")

By External ID

Works accept DOIs, authors accept ORCIDs, institutions accept ROR IDs:

work = client.works.get("https://doi.org/10.7717/peerj.4375")
author = client.authors.get("https://orcid.org/0000-0001-6187-6610")
institution = client.institutions.get("https://ror.org/0161xgx34")

Batch Lookup

Fetch up to 100 entities at once:

works = client.works.get(["W2741809807", "W2100837269", "W1775749144"])

Random Entity

work = client.works.random()

Finding Works by Name

Look up works by author, institution, source, topic, or funder name. PyOpenAlex handles the two-step ID resolution automatically:

# By author
client.works.by_author("Yann LeCun").sort("cited_by_count", desc=True).get(10)

# By institution
client.works.by_institution("MIT").filter(publication_year=2024).get(10)

# By journal
client.works.by_source("Nature").filter(publication_year=2024).count()

# By topic
client.works.by_topic("machine learning").get(10)

# By funder
client.works.by_funder("NIH").filter(is_oa=True).get(10)

Filtering

Chain .filter() calls to narrow results. Multiple filters combine with AND:

results = (
    client.works
    .filter(publication_year=2024, is_oa=True)
    .sort("cited_by_count", desc=True)
    .get(10)
)

Filter Expressions

PyOpenAlex provides expression functions for building filters, similar to how FastAPI uses Query(), Path(), and Body():

from pyopenalex import gt, lt, ne, or_, between

# Greater than / less than
client.works.filter(cited_by_count=gt(100))
client.works.filter(publication_year=lt(2020))

# Not equal
client.works.filter(type=ne("paratext"))

# OR (up to 100 values)
client.works.filter(doi=or_(
    "https://doi.org/10.7717/peerj.4375",
    "https://doi.org/10.1038/nature12373",
))

# Range
client.works.filter(publication_year=between(2020, 2024))

Nested Filters

Use dicts for dot-notation filter paths. PyOpenAlex flattens them automatically:

# These are equivalent:
client.works.filter(authorships={"institutions": {"id": "I136199984"}})
client.works.filter_raw("authorships.institutions.id:I136199984")

Raw Filters

For full control, pass the filter string directly:

client.works.filter_raw("publication_year:2024,is_oa:true,cited_by_count:>100")

Searching

Full-Text Search

results = client.works.search("machine learning").get()

Field-Specific Search

results = client.works.search_filter(title="neural networks").get()

Search and filters can be combined:

results = (
    client.works
    .search("CRISPR")
    .filter(publication_year=2024, is_oa=True)
    .sort("cited_by_count", desc=True)
    .get()
)

Sorting

# Ascending (default)
client.works.sort("publication_date")

# Descending
client.works.sort("cited_by_count", desc=True)

Field Selection

Request only the fields you need to reduce response size:

results = client.works.select("id", "title", "doi", "cited_by_count").get()

Fetching Results

Get a specific number of results

Pass a count to .get() and PyOpenAlex handles pagination internally:

# Get exactly 10 results
results = client.works.search("CRISPR").get(10)

# Get 250 results (auto-paginates across multiple API calls)
results = client.works.filter(publication_year=2024).get(250)

# Get all matching results
results = client.works.filter(publication_year=2024, type="dataset").get(all=True)

# Default: single page (25 results)
results = client.works.search("CRISPR").get()

Iteration

Iterate over any query and PyOpenAlex handles cursor pagination automatically:

for work in client.works.filter(publication_year=2024, is_oa=True):
    print(work.title)

Use .limit() to cap the total number of results:

for work in client.works.filter(publication_year=2024).limit(500):
    process(work)

Manual page control

For cases where you need direct control over pagination:

page1 = client.works.filter(publication_year=2024).page(1).per_page(100).get()
page2 = client.works.filter(publication_year=2024).page(2).per_page(100).get()

Counting

Get the total number of matching results without fetching them:

count = client.works.filter(publication_year=2024, is_oa=True).count()

Grouping

Aggregate results by a field:

response = client.works.filter(publication_year=2024).group_by("type").get()
for group in response.group_by:
    print(f"{group.key_display_name}: {group.count}")

Sampling

Get a random sample of results:

results = client.works.sample(100, seed=42).get()

Autocomplete

Fast typeahead search returning up to 10 results:

results = client.institutions.autocomplete("harvard")
for r in results:
    print(f"{r.display_name} ({r.works_count} works)")

Query Reuse

The query builder is immutable. Each method returns a new instance, so you can safely branch from a base query:

base = client.works.filter(publication_year=2024, is_oa=True)

most_cited = base.sort("cited_by_count", desc=True).get(10)
recent = base.sort("publication_date", desc=True).get(10)
count = base.count()

Response Objects

List queries return a ListResponse with three parts:

response = client.works.search("CRISPR").get()

response.meta        # Meta: count, page, per_page, cost_usd, ...
response.results     # list[Work]: the entities
response.group_by    # list[GroupByResult]: populated when using group_by

Configuration

API Key

Set your API key in any of these ways (in order of precedence):

# 1. Constructor argument
client = OpenAlex(api_key="your-key")

# 2. Environment variable
# export OPENALEX_API_KEY=your-key
client = OpenAlex()

# 3. .env file (loaded automatically)
# OPENALEX_API_KEY=your-key
client = OpenAlex()

Get a free API key at openalex.org/settings/api.

Other Settings

client = OpenAlex(
    api_key="your-key",
    base_url="https://api.openalex.org",  # default
    timeout=30.0,                          # request timeout in seconds
    max_retries=3,                         # retries on 403/5xx errors
)

All settings can be set via environment variables with the OPENALEX_ prefix:

export OPENALEX_API_KEY=your-key
export OPENALEX_TIMEOUT=60
export OPENALEX_MAX_RETRIES=5

Context Manager

The client can be used as a context manager to ensure the HTTP connection is closed:

with OpenAlex() as client:
    work = client.works.get("W2741809807")

Markdown Rendering

All entities can render themselves as clean markdown, useful for LLM tool responses and reports:

work = client.works.get("W2741809807")
print(work.to_markdown())

Output:

## The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles (2018)

- **DOI:** https://doi.org/10.7717/peerj.4375
- **Type:** book-chapter
- **Citations:** 1,163
- **Open Access:** Yes (gold)
- **Source:** PeerJ
- **Authors:** Heather Piwowar, Jason Priem, Vincent Lariviere, ...

Despite growing interest in Open Access (OA) to scholarly literature...

Use limit_abstract to truncate long abstracts:

work.to_markdown(limit_abstract=150)

Special Endpoints

# Check rate limit status (requires API key)
rl = client.rate_limit()
print(f"Remaining: ${rl.remaining_cost_today_usd}")

# List available changefiles for bulk data sync
dates = client.changefiles()
entries = client.changefile("2026-03-18")

# Download a PDF ($0.01 per download)
client.download_pdf("W2741809807", "/tmp/paper.pdf")

Error Handling

PyOpenAlex raises typed exceptions:

from pyopenalex import (
    AuthenticationError,
    NotFoundError,
    RateLimitError,
    APIError,
)

try:
    work = client.works.get("W0000000000")
except NotFoundError:
    print("Work not found")
except AuthenticationError:
    print("Invalid or missing API key")
except RateLimitError:
    print("Rate limit exceeded")
except APIError as e:
    print(f"HTTP {e.status_code}: {e}")

Error responses include the API's own messages, including helpful suggestions like "Did you mean: authorships.author.id?".

403 (burst rate limit): retries automatically with exponential backoff
429 (daily limit exhausted): fails immediately with an actionable message
5xx (server error): retries automatically with exponential backoff

Abstract Reconstruction

OpenAlex stores abstracts as inverted indexes. PyOpenAlex reconstructs them for you:

work = client.works.get("W2741809807")
print(work.abstract)  # full abstract text, or None if unavailable

Examples

See examples/quickstart.py for a runnable script showcasing core features and examples/new_endpoints.py for the topic hierarchy, geographic, and special endpoints.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nthomsencph

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Mar 19, 2026

0.3.2

Mar 18, 2026

0.3.1

Mar 18, 2026

0.3.0

Mar 18, 2026

0.2.0

Mar 18, 2026

0.1.1

Mar 18, 2026

0.1.0

Mar 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyopenalex-0.4.0.tar.gz (38.9 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyopenalex-0.4.0-py3-none-any.whl (34.7 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file pyopenalex-0.4.0.tar.gz.

File metadata

Download URL: pyopenalex-0.4.0.tar.gz
Upload date: Mar 19, 2026
Size: 38.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyopenalex-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`754d61ecec03dfafded238b704bc89eeb8dfdcc82dd07cc16a3cac025c78e5fe`
MD5	`bddf5d6d682c29ba1be007f092f4a233`
BLAKE2b-256	`d4015f41606a442be7b84c80e8fa1a84e00130d63178a4c9b840c05a4101c1c1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyopenalex-0.4.0.tar.gz:

Publisher: publish.yml on nthomsencph/pyopenalex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyopenalex-0.4.0.tar.gz
- Subject digest: 754d61ecec03dfafded238b704bc89eeb8dfdcc82dd07cc16a3cac025c78e5fe
- Sigstore transparency entry: 1138092959
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: nthomsencph/pyopenalex@80b5f907b1bfcecaebfcac632dd1cf64d67171fb
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/nthomsencph
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@80b5f907b1bfcecaebfcac632dd1cf64d67171fb
- Trigger Event: release

File details

Details for the file pyopenalex-0.4.0-py3-none-any.whl.

File metadata

Download URL: pyopenalex-0.4.0-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 34.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyopenalex-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e2fd6ee22d7d19efbf4a234b795fff10f540634a585894cc2e484edacbf14ac`
MD5	`35be31ca56e1ca04a31235fae2e2714a`
BLAKE2b-256	`6be476bbfff595101a91f810a058f450cb6ef496daa520f72502d1503e81b078`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyopenalex-0.4.0-py3-none-any.whl:

Publisher: publish.yml on nthomsencph/pyopenalex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyopenalex-0.4.0-py3-none-any.whl
- Subject digest: 5e2fd6ee22d7d19efbf4a234b795fff10f540634a585894cc2e484edacbf14ac
- Sigstore transparency entry: 1138092993
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: nthomsencph/pyopenalex@80b5f907b1bfcecaebfcac632dd1cf64d67171fb
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/nthomsencph
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@80b5f907b1bfcecaebfcac632dd1cf64d67171fb
- Trigger Event: release

pyopenalex 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

Quick Start

Entities

Looking Up Entities

By OpenAlex ID

By External ID

Batch Lookup

Random Entity

Finding Works by Name

Filtering

Filter Expressions

Nested Filters

Raw Filters

Searching

Full-Text Search

Field-Specific Search

Sorting

Field Selection

Fetching Results

Get a specific number of results

Iteration

Manual page control

Counting

Grouping

Sampling

Autocomplete

Query Reuse

Response Objects

Configuration

API Key

Other Settings

Context Manager

Markdown Rendering

Special Endpoints

Error Handling

Abstract Reconstruction

Examples

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance