Process of Data Cleansing and handling of Memory in Data Streams

Journal: GRENZE International Journal of Engineering and Technology
Authors: Kavitha N, Y Kalpana, Kumar V
Volume: 10 Issue: 2
Grenze ID: 01.GIJET.10.2.389 Pages: 5171-5177

Abstract

Real-time Mining [1][2][3] and Streaming of Data have become more popular in the data field, with access to the fastest and the latest data. Real-time Data Mining seeks the development of a real-time framework for enhancing resource efficiency while minimizing environmental impact. Real-time analysis has a huge rate of change in data, and it has to be processed and updated frequently. Data mining is an interdisciplinary subject comprising machine learning, statistics, database technology, and artificial intelligence. Data mining primarily aims to decipher the past and anticipate the future through data exploration and analysis, known as Knowledge Discovery in a Database. Data mining attempts to store the data in the local data set hosted by local computers connected to the computer networks. In the real world, data has become abundant with the advent of data streams, invariably raising the question of data storage. Also, data will not be clean when received in the form of streams. Non-clean data cannot be stored in the database and will not be effective for data analysis. An organized repository of related information must be stored in the database; hence, data must be cleaned before storage. Data is cleaned as per the analysis required of the data. Once the data is cleaned, the question of the memory to store the data arises. Storing real-time data is a relatively trivial process, and there should not be any missing data from the streams. Data cleaning and memory management in real-time data is always challenging. This research proposes a novel methodology for data cleaning and management of memory to overcome these issues. An algorithm is executed using the scheduler at the specific interval separating the test and trial data. Trial data will be used for further analysis, and test data will be discarded at the specific interval. The test data will be a derivative of trial data.

Download Now << BACK

GIJET