A Proposed Method to Translate Code Blended Web based Text in English Language

Journal: GRENZE International Journal of Engineering and Technology
Authors: Sudeshna Sani, Dipra Mitra, Kumar Gaurav, Kanika Thakur, Srihari Babu Gole
Volume: 10 Issue: 1
Grenze ID: 01.GIJET.10.1.93_1 Pages: 2309-2315

Abstract

Code-blended online texts are increasingly prevalent due to multilingual influences. This phenomenon is commonly observed in user-generated content on social platforms, especially from multilingual users. Such content, including articles and information, possesses informal language features like non-standard abbreviations, contracted transliterations, and casual grammar. To effectively handle and analyze code-blended data for various Natural Language Processing tasks, understanding this phenomenon is crucial. As the need for translating codeblended content into standard language grows, our study focuses on short utterances gathered from Facebook, Twitter, and WhatsApp. The dataset, comprised of English and Marathi language pair from Indian web-based text, aims to provide a resource for translating codeblended content into plain English. We have tested the dataset on various machine learning classifiers, achieving good accuracy in language identification, and thereafter translation was performed through LSTM network. We scored 91% prediction accuracy while translating code mixed sentences into normal English language sentences

Download Now << BACK

GIJET