VectorChord Python SDK
Project description
Installation
pip install vechord
The related Docker images can be found in VectorChord Suite.
- DockerHub:
tensorchord/vchord-suite:pg17-20250815 - GitHub Packages:
ghcr.io/tensorchord/vchord-suite:pg17-20250815
Features
- vector search with RaBitQ (powered by VectorChord)
- multivec search with WARP (powered by VectorChord)
- keyword search with BM25 score (powered by VectorChord-bm25)
- reduce boilerplate code by taking full advantage of the Python type hint
- provide decorator to inject the data from/to the database
- guarantee the data consistency with the PostgreSQL transaction
- auto-generate the web service
- provide common tools like (can also use any other libraries):
-
Augmenterfor contextual retrieval -
Chunkerto segment the text into chunks -
Embeddingto generate the embedding from the text -
Evaluatorto evaluate the search results withNDCG,MAP,Recall, etc. -
Extractorto extract the content from PDF, HTML, etc. -
EntityRecognizerto extract the entities and relations from the text -
Rerankerfor hybrid search -
GroundTruthto generate the ground truth for evaluation
-
Examples
- simple.py: for people that are familiar with specialized vector database APIs
- beir.py: the most flexible way to use the library (loading, indexing, querying and evaluation)
- web.py: build a web application with from the defined tables and pipeline
- essay.py: extract the content from Paul Graham's essays and evaluate the search results from LLM generated queries
- contextual.py: contextual retrieval example with local PDF
- anthropic.py: contextual retrieval with the Anthropic's Tutorial example
- hybrid.py: hybrid search that rerank the results from vector search with keyword search
- graph.py: graph-like entity-relation retrieval
- dynamic.py: run arbitrary pipelines with dynamic steps
User Guide
For more details, check our API reference and User Guide.
Define the table
from typing import Annotated, Optional
from vechord.spec import Table, Vector, PrimaryKeyAutoIncrease, ForeignKey
# use 3072 dimension vector
DenseVector = Vector[3072]
class Document(Table, kw_only=True):
uid: Optional[PrimaryKeyAutoIncrease] = None # auto-increase id, no need to set
link: str = ""
text: str
class Chunk(Table, kw_only=True)
uid: Optional[PrimaryKeyAutoIncrease] = None
doc_id: Annotated[int, ForeignKey[Document.uid]] # reference to `Document.uid` on DELETE CASCADE
vector: DenseVector # this comes with a default vector index
text: str
Inject with decorator
import httpx
from vechord.registry import VechordRegistry
from vechord.extract import SimpleExtractor
from vechord.embedding import GeminiDenseEmbedding
vr = VechordRegistry(namespace="test", url="postgresql://postgres:postgres@127.0.0.1:5432/", tables=[Document, Chunk])
extractor = SimpleExtractor()
emb = GeminiDenseEmbedding()
@vr.inject(output=Document) # dump to the `Document` table
# function parameters are free to define since `inject(input=...)` is not set
async def add_document(url: str) -> Document: # the return type is `Document`
async with httpx.AsyncClient() as client:
resp = await client.get(url)
text = extractor.extract_html(resp.text)
return Document(link=url, text=text)
@vr.inject(input=Document, output=Chunk) # load from the `Document` table and dump to the `Chunk` table
# function parameters are the attributes of the `Document` table, only defined attributes
# will be loaded from the `Document` table
async def add_chunk(uid: int, text: str) -> list[Chunk]: # the return type is `list[Chunk]`
chunks = text.split("\n")
return [Chunk(doc_id=uid, vector=await emb.vectorize_chunk(t), text=t) for t in chunks]
async def main():
async with vr, emb: # handle the connection with context manager
await add_document("https://paulgraham.com/best.html") # add arguments as usual
await add_chunk() # omit the arguments since the `input` is will be loaded from the `Document` table
await vr.insert(Document(text="hello world")) # insert manually
print(await vr.select_by(Document.partial_init())) # select all the columns from table `Document`
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Transaction
To guarantee the data consistency, users can use the VechordRegistry.run method to run multiple
functions in a transaction.
In this transaction, all the functions will only load the data from the database that is inserted in the current transaction. So users can focus on the data processing part without worrying about which part of data has not been processed yet.
pipeline = vr.create_pipeline([add_document, add_chunk])
await pipeline.run("https://paulgraham.com/best.html") # only accept the arguments for the first function
Search
print(await vr.search_by_vector(Chunk, await emb.vectorize_query("startup")))
Customized Index Configuration
from vechord.spec import VectorIndex
class Chunk(Table, kw_only=True):
uid: Optional[PrimaryKeyAutoIncrease] = None
vector: Annotated[DenseVector, VectorIndex(distance="cos", lists=128)]
text: str
Access the underlying database cursor directly
await vr.client.get_cursor().execute("SET vchordrq.probes = 100;")
HTTP Service
This creates a WSGI application that can be served by any WSGI server.
Open the OpenAPI Endpoint to check the API documentation.
import uvicorn
uvicorn.run(create_web_app(vr))
Development
docker run --rm -d --name vdb -e POSTGRES_PASSWORD=postgres -p 5432:5432 ghcr.io/tensorchord/vchord-suite:pg17-20250815
envd up
# inside the envd env, sync all the dependencies
make sync
# format the code
make format
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vechord-0.2.4.tar.gz.
File metadata
- Download URL: vechord-0.2.4.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2a399039023c766ec5f1f3cc4f0fc5ef043c44eabd419d48a7676803b7529ba
|
|
| MD5 |
6161b4f80201fdf60cad0f4f0303db9d
|
|
| BLAKE2b-256 |
9121ce89e7521dd55ec4571a8dd78f2b1e1a450c1e606ae772a5934ae3c333dd
|
File details
Details for the file vechord-0.2.4-py3-none-any.whl.
File metadata
- Download URL: vechord-0.2.4-py3-none-any.whl
- Upload date:
- Size: 57.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1f9a6112c061ab4217893ec80ab80f7edeb2d44fe09ebf72691213775f69c46
|
|
| MD5 |
4b87460e5b86780745e2f72cca4f6b9a
|
|
| BLAKE2b-256 |
54257e1edebd4837ee92b764adb5d94b5c436479e7727a4c00b3e2ccef87a311
|