Speech Corpus Development for Speaker Independent
Speech Recognition for Indian Languages
Journal:
GRENZE International Journal of Computer Theory and Engineering
Authors:
Amaresh P Kandagal, V Udayashankara
Volume:
3
Issue:
4
Grenze ID:
01.GIJCTE.3.4.14
Pages:
81-87
Abstract
In this paper, we discuss development of speech corpus for speaker independent
speech recognition for Indian airports and it is extended for continuous speech recognition
Indian languages. We have collected the speech corpus from 801 speakers to build large
vocabulary ASR engine. Speech corpus recorded over telephone line and microphone. It is
recorded from speakers ranging from age group between 20 to 60 years. 4.5 hours of
microphone data recorded from 375 male and 244 female voices. The telephonic data is of
1.3 hours which includes male and female voices. Total 6.2 hours of speech corpus is
collected. The recording was conducted at office, college and home environments. We also
discuss preliminary isolated speech recognition results using the acoustic models created on
these corpus using Hidden Markov Model toolkit (HTK).