Skip to main content

Automatic Speech Recoginition Model

Project description

TrorYong ASR Model

TrorYongASR, is an Automatic Speech Recognition Model implemented by KrorngAI.

TrorYong (ត្រយ៉ង) is Khmer word for giant ibis, the bird that symbolises Cambodia.

Support My Work

While this work comes truly from the heart, each project represents a significant investment of time -- from deep-dive research and code preparation to the final narrative and editing process. I am incredibly passionate about sharing this knowledge, but maintaining this level of quality is a major undertaking. If you find my work helpful and are in a position to do so, please consider supporting my work with a donation. You can click here to donate or scan the QR code below. Your generosity acts as a huge encouragement and helps ensure that I can continue creating in-depth, valuable content for you.

Using Cambodian bank account, you can donate by scanning my ABA QR code here. (or click here. Make sure that receiver's name is 'Khun Kim Ang'.)

Installation

You can easily install tror-yong-asr using pip command as the following:

pip install tror-yong-asr

To use TrorYongASR, there are few dependencies: transformers, safetensors, and torchaudio.

Usage

Get started with the code below

from transformers import AutoProcessor
from tror_yong_asr import TrorYongASRModel, transcribe, translate, detect_language


model_id = "KrorngAI/TrorYongASR-tiny"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = TrorYongASRModel.from_pretrained(model_id, trust_remote_code=True)

result1 = detect_language('/path/to/audio_file.mp3', model, processor)
print(result1)

result2 = transcribe('/path/to/audio_file.mp3', model, processor, max_tokens=64)
print(result2)

result3 = translate('/path/to/audio_file.mp3', model, processor, max_tokens=64)
print(result3)

TrorYongASR has 2 pre-trained weights that support Khmer and English:

  • Tiny version with model_id=KrorngAI/TrorYongASR-tiny
  • Small version with model_id=KrorngAI/TrorYongASR-small
Figure 1: TrorYongASR architecture. Dropout layers are omitted due to space constraints, [B], [L], [T], [E], and [P] are begin-of-sequence, language, task, end-of-sequence, and padding tokens, respectively. This figures presents the case having 16 distinct target prediction positions. The QKV-projection is explicitly shown here because particularly for TrorYongASR, the single position basis p is used for each position to directly form query projection The last linear layer outputs logits over the vocabulary set. These logits are then used to compute cross-entropy loss.

Evaluation

TrorYongASR was evaluated on test-split of google/fleurs with code km-kh for Khmer and librispeech.clean for English.

WER Comparison with Whisper:

Tiny Parameters Khmer (fleurs) English (librispeech.clean)
TrorYongASR 29M 75.88% 54.33%
Whisper 39M 100.6% 7.6%
Small Parameters Khmer (fleurs) English (librispeech.clean)
TrorYongASR 135M 50.46% 21.75%
Whisper 244M 104.4% 3.4%

Fine-tune TrorYongASR

To be added

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tror_yong_asr-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

File details

Details for the file tror_yong_asr-0.1.0.tar.gz.

File metadata

  • Download URL: tror_yong_asr-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for tror_yong_asr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e6d0eb3953bc231735594645149122622cc58a26d7258b2019dd659a546e6cb3
MD5 ff9297b1c6d142002ba7d75e25548324
BLAKE2b-256 20fd71abcecccb5b86d14f79d6e36f993edf4ab85ce2714ab20416d5a6897cb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page