Code-blended online texts are increasingly prevalent due to multilingual influences.
This phenomenon is commonly observed in user-generated content on social platforms, especially
from multilingual users. Such content, including articles and information, possesses informal
language features like non-standard abbreviations, contracted transliterations, and casual
grammar. To effectively handle and analyze code-blended data for various Natural Language
Processing tasks, understanding this phenomenon is crucial. As the need for translating codeblended
content into standard language grows, our study focuses on short utterances gathered
from Facebook, Twitter, and WhatsApp. The dataset, comprised of English and Marathi
language pair from Indian web-based text, aims to provide a resource for translating codeblended
content into plain English. We have tested the dataset on various machine learning
classifiers, achieving good accuracy in language identification, and thereafter translation was
performed through LSTM network. We scored 91% prediction accuracy while translating code
mixed sentences into normal English language sentences