Enhancing Armenian Automatic Speech Recognition Performance: A Comprehensive Strategy for Speed, Accuracy, and Linguistic Refinement
Varuzhan H. Baghdasaryan
Abstract
This research introduces a comprehensive strategy to enhance the performance of an existing automatic speech recognition (ASR) model, which has been previously documented in published articles. The study sets out to achieve several objectives. Firstly, it concentrates on updating the ASR model by retraining it with new datasets. This involves integrating samples from the latest Common Voice corpus release and data collected independently via the armspeech.com web application. Another key focus lies in optimizing the ASR model for near-real-time processing, intending to improve its speed and efficiency. The proposed adjustments to the model’s architecture aim to balance accuracy and processing speed, which is essential for applications requiring prompt speech recognition. Furthermore, the research explores the integration of Transformer models into the post-processing pipeline to introduce punctuation and capitalization into the ASR output. This step not only enhances the linguistic quality of transcriptions but also improves their readability and usability. In tandem with these advancements, the research presents a systematic approach to gathering, annotating, and storing datasets specifically tailored for punctuation and capitalization tasks. The methodology outlines the acquisition and organization of a dataset conducive to training Transformer models for these linguistic tasks. This comprehensive approach, which encompasses dataset enrichment, architectural modifications, and post-processing enhancements, aims to elevate the ASR model’s accuracy, speed, and linguistic refinement, with a particular focus on addressing the intricacies of the Armenian language. The research contributes valuable insights into the optimization of ASR systems, tackling both language-specific challenges and broader issues related to linguistic post-processing.
Keywords
Armenian ASR; Armenian automatic speech recognition; Armenian speech-to-text; Armenian speech corpus; Nvidia NeMo; Citrinet; Transformer; DistilBERT; punctuation; capitalization.
Cite This Article
Baghdasaryan, V. H. (2024). Enhancing Armenian Automatic Speech Recognition Performance: A Comprehensive Strategy for Speed, Accuracy, and Linguistic Refinement. International Journal of Scientific Advances (IJSCIA), Volume 5| Issue 2: Mar-Apr 2024, Pages 281-288, URL: https://www.ijscia.com/wp-content/uploads/2024/03/Volume5-Issue2-Mar-Apr-No.583-281-288.pdf
Volume 5 | Issue 2: Mar-Apr 2024