S2T.ai: Self Supervised Speech Recognition System

Journal: GRENZE International Journal of Engineering and Technology
Authors: Pradeep Kumar, Ajay Kumar, Kakoli Banerjee, Aparna Sharma, Ashish Katiyar
Volume: 10 Issue: 2
Grenze ID: 01.GIJET.10.2.257 Pages: 4471-4477

Abstract

Speech recognition is a computational linguistics sub-field focused on enabling computers to accurately transcribe spoken words into text. It drives digital transformation, spanning education, industry, healthcare, and emerging IoT and ML applications. Research in this field is rapidly advancing as scientists endeavor to broaden computers' abilities in processing spoken language. Feature extraction which is one of the steps in the process, transforms raw audio into machine-readable data for analysis. It is vital for machine learning and pattern recognition tasks. This paper presents a groundbreaking advancement in speech recognition with the introduction of the wav2vec 2.0 model which is a self- supervised feature extractor. Departing from conventional supervised methods, this model achieves superior performance by initially learning representations from unlabeled speech audio and subsequently fine-tuning on transcribed speech. Utilizing latent space masking and a task involving contrast, the model efficiently learns contextualized representations, demonstrating remarkable adaptability on the Libri-Light dataset. Even with minimal labeled data, wav2vec2.0 outperforms previous cutting edge semi-supervised approaches, showcasing its potential for robust speech recognition in scenarios with limited labeled data—a significant breakthrough for the broader accessibility of speech recognition technology.

Download Now << BACK

GIJET