Skip to main content

Pluggable transcription library with audio device abstraction

Project description

transcriptify

Pluggable, async-first transcription library for Python. Provides a unified interface over transcription backends (currently OpenAI Whisper / GPT-4o) with swappable audio sources (files, raw bytes, microphone).

Installation

pip install transcriptify

Optional dependencies

Extra Install command What it adds
mic pip install transcriptify[mic] Microphone recording via sounddevice
dev pip install transcriptify[dev] pytest, ruff, mypy, pre-commit

Quickstart

Transcribe a file

import asyncio
from transcriptify import OpenAIWhisper
from transcriptify.audio import FileAudioDevice

async def main():
    whisper = OpenAIWhisper()  # uses OPENAI_API_KEY env var

    async with FileAudioDevice("recording.wav") as device:
        audio = await device.read()
        result = await whisper.transcribe(audio)

    print(result.text)

asyncio.run(main())

Transcribe from microphone

Requires the mic extra (pip install transcriptify[mic]).

import asyncio
from transcriptify import OpenAIWhisper
from transcriptify.audio import MicrophoneAudioDevice

async def main():
    whisper = OpenAIWhisper()

    async with MicrophoneAudioDevice(sample_rate=16_000) as mic:
        audio = await mic.record(seconds=5)

    result = await whisper.transcribe(audio)
    print(result.text)

asyncio.run(main())

Streaming transcription

Models that support streaming (gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize) yield incremental deltas:

async with FileAudioDevice("recording.wav") as device:
    audio = await device.read()
    async for delta in whisper.stream(audio):
        print(delta.text, end="", flush=True)

Transcribe raw bytes

from transcriptify.audio import BytesAudioDevice

device = BytesAudioDevice(wav_bytes, sample_rate=16_000, encoding="wav")
audio = await device.read()
result = await whisper.transcribe(audio)

OpenAIWhisper configuration

from transcriptify import OpenAIWhisper, WhisperModel, ResponseFormat

whisper = OpenAIWhisper(
    api_key="sk-...",                            # or set OPENAI_API_KEY env var
    model=WhisperModel.GPT_4O_TRANSCRIBE,        # default
    language="en",                               # optional ISO-639-1
    response_format=ResponseFormat.VERBOSE_JSON,  # json, text, srt, vtt, verbose_json, diarized_json
    prompt="Technical discussion about Python",   # optional context hint
)

Available models

Enum Value Streaming Diarization
WhisperModel.WHISPER_1 whisper-1 No No
WhisperModel.GPT_4O_TRANSCRIBE gpt-4o-transcribe Yes No
WhisperModel.GPT_4O_MINI_TRANSCRIBE gpt-4o-mini-transcribe Yes No
WhisperModel.GPT_4O_TRANSCRIBE_DIARIZE gpt-4o-transcribe-diarize Yes Yes

Azure OpenAI

whisper = OpenAIWhisper(
    azure_endpoint="https://your-resource.openai.azure.com",
    azure_deployment="your-whisper-deployment",
    api_key="your-azure-key",
    api_version="2024-06-01",  # optional, this is the default
)

Audio devices

Class Import Description
FileAudioDevice transcriptify.audio Reads .wav (parsed) or any other format (raw bytes). Supports chunked streaming via chunk_duration_ms.
BytesAudioDevice transcriptify.audio Wraps raw bytes. Also supports real-time push via BytesAudioDevice.from_stream() + push().
MicrophoneAudioDevice transcriptify.audio Records from system mic. Requires transcriptify[mic]. Supports record(seconds=N) and continuous stream().

Implementing custom adapters

Custom transcription backend

from transcriptify import Transcriber, AudioChunk, TranscriptionResult

class MyTranscriber(Transcriber):
    async def transcribe(self, audio: AudioChunk) -> TranscriptionResult:
        text = await my_backend.recognize(audio.data)
        return TranscriptionResult(text=text)

Custom audio device

from transcriptify import AudioDevice, AudioChunk
from collections.abc import AsyncIterator

class MyAudioDevice(AudioDevice):
    async def read(self) -> AudioChunk:
        data = await my_source.get_audio()
        return AudioChunk(data=data, sample_rate=16_000)

    async def stream(self) -> AsyncIterator[AudioChunk]:
        async for frame in my_source.frames():
            yield AudioChunk(data=frame, sample_rate=16_000)

Requirements

  • Python >= 3.11
  • pydantic >= 2.12.5
  • openai >= 2.29.0
  • python-dotenv >= 1.2.2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcriptify-0.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transcriptify-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file transcriptify-0.1.0.tar.gz.

File metadata

  • Download URL: transcriptify-0.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for transcriptify-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b26fcd289ba1ad1f899dc5ae68d50168dbe4d5957c26ae8d1beec03d98c9923
MD5 038bb64b1515f0bf6e658d5876b79942
BLAKE2b-256 1164342537626c9fd6ebffde3c501378634158df3a4fddb1a2a4ef87e4af01d9

See more details on using hashes here.

File details

Details for the file transcriptify-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for transcriptify-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a06023baf51a49053f01b944d420bf523c04545155209aa4ac82e7635cb1481f
MD5 cea7d4e251d512dab39448b609d8190c
BLAKE2b-256 7666924d3ad0c7426ae3925757d313203d0749b9fda8fd14ac26f98ee5299bda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page