Skip to main content

Automatic Speech Recoginition Model

Project description

TrorYong ASR Model

TrorYongASR, is an Automatic Speech Recognition Model implemented by KrorngAI.

TrorYong (ត្រយ៉ង) is Khmer word for giant ibis, the bird that symbolises Cambodia.

Support My Work

While this work comes truly from the heart, each project represents a significant investment of time -- from deep-dive research and code preparation to the final narrative and editing process. I am incredibly passionate about sharing this knowledge, but maintaining this level of quality is a major undertaking. If you find my work helpful and are in a position to do so, please consider supporting my work with a donation. You can click here to donate or scan the QR code below. Your generosity acts as a huge encouragement and helps ensure that I can continue creating in-depth, valuable content for you.

Using Cambodian bank account, you can donate by scanning my ABA QR code here. (or click here. Make sure that receiver's name is 'Khun Kim Ang'.)

Installation

You can easily install tror-yong-asr using pip command as the following:

pip install tror-yong-asr

To use TrorYongASR, there are few dependencies: transformers, safetensors, and torchaudio.

Usage

Get started with the code below

from transformers import AutoProcessor
from tror_yong_asr import TrorYongASRModel, transcribe, translate, detect_language


model_id = "KrorngAI/TrorYongASR-tiny"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = TrorYongASRModel.from_pretrained(model_id)

result1 = detect_language('/path/to/audio_file.mp3', model, processor)
print(result1)

result2 = transcribe('/path/to/audio_file.mp3', model, processor, max_tokens=64)
print(result2)

result3 = translate('/path/to/audio_file.mp3', model, processor, max_tokens=64)
print(result3)

TrorYongASR has 2 pre-trained weights that support Khmer and English:

  • Tiny version with model_id=KrorngAI/TrorYongASR-tiny
  • Small version with model_id=KrorngAI/TrorYongASR-small
Figure 1: TrorYongASR architecture. Dropout layers are omitted due to space constraints, [B], [L], [T], [E], and [P] are begin-of-sequence, language, task, end-of-sequence, and padding tokens, respectively. This figures presents the case having 16 distinct target prediction positions. The QKV-projection is explicitly shown here because particularly for TrorYongASR, the single position basis p is used for each position to directly form query projection The last linear layer outputs logits over the vocabulary set. These logits are then used to compute cross-entropy loss.

Evaluation

TrorYongASR was evaluated on test-split of google/fleurs with code km-kh for Khmer and librispeech.clean for English.

WER Comparison with Whisper:

Tiny Parameters Khmer (fleurs) English (librispeech.clean)
TrorYongASR 29M 75.88% 54.33%
Whisper 39M 100.6% 7.6%
Small Parameters Khmer (fleurs) English (librispeech.clean)
TrorYongASR 135M 50.46% 21.75%
Whisper 244M 104.4% 3.4%

Fine-tune TrorYongASR

Below is the notebook of fine-tuning tutorial.

Open in Colab

If you speak Khmer, you can watch my YouTube video explaining each step of the fine-tuning below.

Watch the video

Note: from version v.1.1 onward, you can use functions push_to_hub, save_pretrained, and from_pretrained like any models of transformers.

from transformers import AutoProcessor
from tror_yong_asr import TrorYongASRModel

original_model_id="KrorngAI/TrorYongASR-tiny"
processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True)
model = TrorYongASRModel.from_pretrained(original_model_id)

new_model_id="your_hf_repo"
processor.push_to_hub(new_model_id)
model.push_to_hub(new_model_id)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tror_yong_asr-0.1.1.tar.gz (19.4 kB view details)

Uploaded Source

File details

Details for the file tror_yong_asr-0.1.1.tar.gz.

File metadata

  • Download URL: tror_yong_asr-0.1.1.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for tror_yong_asr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 640b882f29af89969d28dee2a2074622032c3fb7472e871cd776a8e053a6634c
MD5 11e04cbe69fdcd6ccccf6787b04c1658
BLAKE2b-256 1b03e3230a3fddb24507f0495e75878400097e507ba8a10ecfb3f0702a61245a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page