Pluggable transcription library with audio device abstraction
Project description
transcriptify
Pluggable, async-first transcription library for Python. Provides a unified interface over transcription backends (currently OpenAI Whisper / GPT-4o) with swappable audio sources (files, raw bytes, microphone).
Installation
pip install transcriptify
Optional dependencies
| Extra | Install command | What it adds |
|---|---|---|
mic |
pip install transcriptify[mic] |
Microphone recording via sounddevice |
dev |
pip install transcriptify[dev] |
pytest, ruff, mypy, pre-commit |
Quickstart
Transcribe a file
import asyncio
from transcriptify import OpenAIWhisper
from transcriptify.audio import FileAudioDevice
async def main():
whisper = OpenAIWhisper() # uses OPENAI_API_KEY env var
async with FileAudioDevice("recording.wav") as device:
audio = await device.read()
result = await whisper.transcribe(audio)
print(result.text)
asyncio.run(main())
Transcribe from microphone
Requires the mic extra (pip install transcriptify[mic]).
import asyncio
from transcriptify import OpenAIWhisper
from transcriptify.audio import MicrophoneAudioDevice
async def main():
whisper = OpenAIWhisper()
async with MicrophoneAudioDevice(sample_rate=16_000) as mic:
audio = await mic.record(seconds=5)
result = await whisper.transcribe(audio)
print(result.text)
asyncio.run(main())
Streaming transcription
Models that support streaming (gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize) yield incremental deltas:
async with FileAudioDevice("recording.wav") as device:
audio = await device.read()
async for delta in whisper.stream(audio):
print(delta.text, end="", flush=True)
Transcribe raw bytes
from transcriptify.audio import BytesAudioDevice
device = BytesAudioDevice(wav_bytes, sample_rate=16_000, encoding="wav")
audio = await device.read()
result = await whisper.transcribe(audio)
OpenAIWhisper configuration
from transcriptify import OpenAIWhisper, WhisperModel, ResponseFormat
whisper = OpenAIWhisper(
api_key="sk-...", # or set OPENAI_API_KEY env var
model=WhisperModel.GPT_4O_TRANSCRIBE, # default
language="en", # optional ISO-639-1
response_format=ResponseFormat.VERBOSE_JSON, # json, text, srt, vtt, verbose_json, diarized_json
prompt="Technical discussion about Python", # optional context hint
)
Available models
| Enum | Value | Streaming | Diarization |
|---|---|---|---|
WhisperModel.WHISPER_1 |
whisper-1 |
No | No |
WhisperModel.GPT_4O_TRANSCRIBE |
gpt-4o-transcribe |
Yes | No |
WhisperModel.GPT_4O_MINI_TRANSCRIBE |
gpt-4o-mini-transcribe |
Yes | No |
WhisperModel.GPT_4O_TRANSCRIBE_DIARIZE |
gpt-4o-transcribe-diarize |
Yes | Yes |
Azure OpenAI
whisper = OpenAIWhisper(
azure_endpoint="https://your-resource.openai.azure.com",
azure_deployment="your-whisper-deployment",
api_key="your-azure-key",
api_version="2024-06-01", # optional, this is the default
)
Audio devices
| Class | Import | Description |
|---|---|---|
FileAudioDevice |
transcriptify.audio |
Reads .wav (parsed) or any other format (raw bytes). Supports chunked streaming via chunk_duration_ms. |
BytesAudioDevice |
transcriptify.audio |
Wraps raw bytes. Also supports real-time push via BytesAudioDevice.from_stream() + push(). |
MicrophoneAudioDevice |
transcriptify.audio |
Records from system mic. Requires transcriptify[mic]. Supports record(seconds=N) and continuous stream(). |
Implementing custom adapters
Custom transcription backend
from transcriptify import Transcriber, AudioChunk, TranscriptionResult
class MyTranscriber(Transcriber):
async def transcribe(self, audio: AudioChunk) -> TranscriptionResult:
text = await my_backend.recognize(audio.data)
return TranscriptionResult(text=text)
Custom audio device
from transcriptify import AudioDevice, AudioChunk
from collections.abc import AsyncIterator
class MyAudioDevice(AudioDevice):
async def read(self) -> AudioChunk:
data = await my_source.get_audio()
return AudioChunk(data=data, sample_rate=16_000)
async def stream(self) -> AsyncIterator[AudioChunk]:
async for frame in my_source.frames():
yield AudioChunk(data=frame, sample_rate=16_000)
Requirements
- Python >= 3.11
pydantic >= 2.12.5openai >= 2.29.0python-dotenv >= 1.2.2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transcriptify-0.1.0.tar.gz.
File metadata
- Download URL: transcriptify-0.1.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b26fcd289ba1ad1f899dc5ae68d50168dbe4d5957c26ae8d1beec03d98c9923
|
|
| MD5 |
038bb64b1515f0bf6e658d5876b79942
|
|
| BLAKE2b-256 |
1164342537626c9fd6ebffde3c501378634158df3a4fddb1a2a4ef87e4af01d9
|
File details
Details for the file transcriptify-0.1.0-py3-none-any.whl.
File metadata
- Download URL: transcriptify-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a06023baf51a49053f01b944d420bf523c04545155209aa4ac82e7635cb1481f
|
|
| MD5 |
cea7d4e251d512dab39448b609d8190c
|
|
| BLAKE2b-256 |
7666924d3ad0c7426ae3925757d313203d0749b9fda8fd14ac26f98ee5299bda
|