GRENZE International Journal of Engineering and Technology
Authors:
Milind Rane, Piyush Waghulde, Srushti Yadav, Pranamya Vemula, Om Wagh
Volume:
10
Issue:
1
Grenze ID:
01.GIJET.10.1.562_1
Pages:
1972-1977
Abstract
The rise of social media and global communication requires multilingual image
captioning for diverse participants, enabling cross-cultural understanding and engagement. By
automatically generating captions in multiple languages, this technology enables seamless
communication and enriches the user experience in an increasingly interconnected world.
Existing image captioning models leverage alternative methodologies, such as transformerbased
models, attention mechanisms, and pre-trained language models, to generate descriptive
captions for images without relying on the CNN-LSTM architecture. These approaches provide
effective solutions for image captioning, showcasing advancements in natural language
processing and machine learning. However, the integration of CNN-LSTM architecture with
the Flickr8k dataset and translation enables more accurate and contextually relevant
multilingual image captions, enhancing versatility and adaptability for real-world applications.
This approach combines deep learning techniques, diverse training data, and machine
translation capabilities for superior multilingual image caption generation. The model's
accuracy is assessed using accepted evaluation metrics and the model generates accurate
captions in multiple languages for the images provided