Efficient Text Extraction and Summarization using
Easyocr and GPT-3
Journal:
GRENZE International Journal of Engineering and Technology
Authors:
A V Sriharsha, Mekala Bhavana, Syamala Tejaswee, Mynasaheb Bushra Ahmed, Peddi Reddi Jaipal Reddy
Volume:
10
Issue:
2
Grenze ID:
01.GIJET.10.2.640_1
Pages:
1842-1848
Abstract
Optical character recognition (OCR) extracts text from images while large language
models (LLMs) understand and generate human-like text. OCR engines like EasyOCR
transform document images into machine-readable text. But this raw extracted text contains
artifacts. LLMs like GPT-3 require clean embedding vectors instead of text. We propose an
integrated pipeline combining EasyOCR and GPT-3 for enhanced text comprehension from
images. EasyOCR optically recognizes text from document images. The extracted text is cleaned
and converted into embeddings that are fed to GPT-3. GPT-3's deep language model then
provides contextual understanding of the extracted text. This enables interpreting complex
concepts, resolving ambiguities, summarizing key ideas, and generating natural language
descriptions. Our pipeline establishes a paradigm for augmenting text understanding through
synergistic combination of OCR and LLMs. It has diverse applications in document analysis,
information retrieval, question answering, and other NLP tasks involving documents. The
integrated model outperforms previous approaches on benchmarks for comprehension of
extracted text. It demonstrates the benefits of complementing optical text extraction with
LLMs' innate language abilities. This paves the way for advanced OCR systems that not just
read but also understand text in documents.