Deep Learning Based Bangla Speech-to-Text Conversion

Published in 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), 2018

Recommended citation: Tausif, Md Tahsin, Sayontan Chowdhury, Md Shiplu Hawlader, Md Hasanuzzaman, and Hasnain Heickal. "Deep learning based bangla speech-to-text conversion." In 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), pp. 49-54. IEEE, 2018. https://ieeexplore.ieee.org/iel7/8454865/8457084/08457100.pdf

ABSTRACT

Speech-To-Text conversion is the process of recognizing speech in audio and producing a text transcript for it. Due to speech being such an intuitive medium of communication, this technology can have far reaching effects in easing the interaction between humans and machine. This paper presents a complete speech-to-text conversion system for the Bangla language (also known as Bengali) using Deep Recurrent Neural Networks. Possible optimization such as Broken Language Format has been proposed which is based on properties of the Bangla Language for reducing the training time of the network. A simple deep recurrent neural network architecture has been used for speech recognition. It was trained with collected data and which yielded over 95% accuracy in case of training data and 50% accuracy in case of testing data.