Cantonese ASR
Collection
7 items
•
Updated
•
1
This model is a fine-tuned version of facebook/w2v-bert-2.0. This has a CER of 10.27 on Common Voice 16 (yue) test set (without punctuations).
For training, three datasets were used:
zh-HK
and yue
Train Setfrom transformers import pipeline
bert_asr = pipeline(
"automatic-speech-recognition", model="alvanlii/wav2vec2-BERT-cantonese", device="cuda"
)
text = pipe(file)["text"]
or
import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor
model_name = "alvanlii/wav2vec2-BERT-cantonese"
asr_model = AutoModelForCTC.from_pretrained(model_name).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)
audio_input, _ = sf.read(file)
inputs = processor([audio_input], sampling_rate=16_000).input_features
features = torch.tensor(inputs)
with torch.no_grad():
logits = asr_model(features).logits
predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids, skip_special_tokens=True)