Skip to main content

Lightweight Python library for searching biomedical databases (PubMed, NCBI Gene, UniProt, ChEMBL, etc.)

Project description

OpenData Research Tools

A lightweight Python library for searching biomedical databases and returning standardized data.

Philosophy

Focus on search, not management:

  • ✅ Search multiple open biomedical databases
  • ✅ Return standardized Python dictionaries
  • ✅ Optional HTTP caching for performance
  • ❌ No data management or storage
  • ❌ No opinionated workflows
  • ❌ Minimal dependencies (only requests)

Let your application decide how to store, process, and manage the data.

Features

  • 10+ Data Sources: PubMed, NCBI Gene, UniProt, ChEMBL, PubChem, Protein Structures, WikiData, Clinical Trials, and more
  • Standardized Output: All tools return consistent dictionary formats
  • HTTP Caching: Optional caching to reduce API calls and improve performance
  • Gene Resolution: Resolve gene symbols across multiple databases
  • Zero Configuration: Works out of the box with sensible defaults
  • Lightweight: Only one dependency (requests)

Installation

pip install opendata-research-tools

Quick Start

Basic Search

from opendata_research_tools.search import PubMedSearchTool

# Create a search tool instance
tool = PubMedSearchTool(enable_cache=True)

# Search PubMed
results = tool.search(query="cancer immunotherapy", max_results=10)

# Results is a list of dictionaries
for article in results:
    print(f"Title: {article['title']}")
    print(f"PMID: {article['pmid']}")
    print(f"Authors: {article['authors']}")
    print(f"Year: {article['year']}")
    print()

Multiple Data Sources

from opendata_research_tools.search import (
    PubMedSearchTool,
    NCBIGeneSearchTool,
    UniProtSearchTool,
    ChEMBLSearchTool
)

# Search literature
pubmed = PubMedSearchTool()
articles = pubmed.search("JAK2 V617F")

# Search gene databases
gene_tool = NCBIGeneSearchTool()
gene_info = gene_tool.search("JAK2")

# Search protein databases
uniprot = UniProtSearchTool()
protein_info = uniprot.search("JAK2")

# Search for compounds
chembl = ChEMBLSearchTool()
compounds = chembl.search(target="JAK2", activity_type="inhibitor")

# All results are dictionaries - use them however you want!
# Save to database, process with pandas, generate reports, etc.

Custom Caching

from opendata_research_tools.search import PubMedSearchTool

# Disable caching
tool = PubMedSearchTool(enable_cache=False)

# Custom cache directory
tool = PubMedSearchTool(enable_cache=True, cache_dir="./my_cache")

Supported Databases

Database Tool Description
PubMed PubMedSearchTool Scientific literature and research papers
NCBI Gene NCBIGeneSearchTool Gene information, sequences, and annotations
UniProt UniProtSearchTool Protein sequences, functions, and structures
ChEMBL ChEMBLSearchTool Bioactive molecules and drug-like compounds
PubChem PubChemSearchTool Chemical compounds and their properties
Protein Structures ProteinStructureSearchTool PDB and AlphaFold protein structures
WikiData WikiDataSearchTool Structured knowledge graph data
Clinical Trials ClinicalTrialsSearchTool Clinical trial information
News NewsSearchTool Industry news and developments
Patents PatentSearchTool Patent information

Return Format

Each tool returns a list of dictionaries with standardized fields. Example for PubMed:

[
    {
        'pmid': '12345678',
        'title': 'Article Title',
        'authors': 'Smith J, Doe J',
        'journal': 'Nature',
        'year': '2023',
        'abstract': 'Article abstract text...',
        'keywords': ['cancer', 'therapy'],
        'doi': '10.1234/example',
        'url': 'https://pubmed.ncbi.nlm.nih.gov/12345678/'
    },
    ...
]

See the API Reference for detailed field descriptions for each data source.

Gene Symbol Resolution

Resolve gene symbols across multiple databases:

from opendata_research_tools.utils import GeneSynonymResolver

resolver = GeneSynonymResolver()
result = resolver.resolve("TP53")

print(f"Canonical Symbol: {result.canonical_symbol}")
print(f"Entrez ID: {result.entrez_id}")
print(f"Aliases: {result.aliases}")
print(f"Organism: {result.organism}")

Documentation

Development

# Clone the repository
git clone https://github.com/xxxxx/opendata-research-tools.git
cd opendata-research-tools

# Install with development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=opendata_research_tools --cov-report=html

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opendata_research_tools-0.1.1.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opendata_research_tools-0.1.1-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file opendata_research_tools-0.1.1.tar.gz.

File metadata

  • Download URL: opendata_research_tools-0.1.1.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for opendata_research_tools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 74c01c899dad1ca2c1b5bb7b3ea42460462b91f8f9a478f8b18411916ae798cd
MD5 27bfd752afb504fdd52efb4c4a281459
BLAKE2b-256 00a37e2bc228b17009c6d7fb21490f3b230558788db52110e05769e3c6b23f91

See more details on using hashes here.

File details

Details for the file opendata_research_tools-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for opendata_research_tools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 10ff63d7f9d50fd03a3140105edf131638c58c017d1966facf814a27da3364e8
MD5 6e9bac7cd46dd6fb8ec5e7e2866088cb
BLAKE2b-256 32d3fddf948f5283c03db9415b8f9aaa49d2baf0f2f55a837fadc2a05383e063

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page