Speech Corpus Development for Speaker Independent Speech Recognition for Indian Languages

Journal: GRENZE International Journal of Computer Theory and Engineering
Authors: Amaresh P Kandagal, V Udayashankara
Volume: 3 Issue: 4
Grenze ID: 01.GIJCTE.3.4.14 Pages: 81-87

Abstract

In this paper, we discuss development of speech corpus for speaker independent speech recognition for Indian airports and it is extended for continuous speech recognition Indian languages. We have collected the speech corpus from 801 speakers to build large vocabulary ASR engine. Speech corpus recorded over telephone line and microphone. It is recorded from speakers ranging from age group between 20 to 60 years. 4.5 hours of microphone data recorded from 375 male and 244 female voices. The telephonic data is of 1.3 hours which includes male and female voices. Total 6.2 hours of speech corpus is collected. The recording was conducted at office, college and home environments. We also discuss preliminary isolated speech recognition results using the acoustic models created on these corpus using Hidden Markov Model toolkit (HTK).

Download Now << BACK

GIJCTE