Data Deduplication using Machine Learning

Journal: GRENZE International Journal of Engineering and Technology
Authors: Narasimhula Revanth, Ravuri Varun, Ebbili Harshitha, J. Jane Rubel Angelina, S.J.Subhashini
Volume: 10 Issue: 2
Grenze ID: 01.GIJET.10.2.62 Pages: 3176-3180

Abstract

Reducing data in storage systems is becoming more crucial as a practical way to cut down on data Center management expenses. Current post-deduplication delta-compression techniques combine delta compression with lossless compression and conventional data deduplication to optimize data reduction efficiency. Regretfully,we observe that because of their poor accuracy in identifying similar data blocks, current techniques achieve noticeably lower data-reduction ratios than the optimal. Data deduplication is a computing technique that removes duplicate copies of data that repeats. A similar and almost interchangeable term is single-instance data storage. This method will be used to reduce bytes that need to be sent during network data transfers same as to increase storage utilization. During the de-duplication process, distinct data segments, also known as byte patterns, are found and saved after analysis. Further chunks are compare with the stored copy as the analysis progresses, and if a match is found, the redundant chunk is substituted with a brief reference pointing to the stored chunk. Reduced data storage or transfer requirements result from the possibility same as byte pattern occurring hundreds, thousands, or even more times (match frequency depends on chunk size). Comparing data segments to find duplicates is how most recognized methods for data deduplication is implemented.

Download Now << BACK

GIJET