Dealing with The Problem of Overlapping Clusters Prior to Datamatching using Clustering Techniques

Conference: Fifth International Conference on Advances in Computer Engineering
Author(s): Somasekhar G, Karthikeyan K Year: 2014
Grenze ID: 02.ACE.2014.5.546 Page: 234-243

Abstract

Datamatching and Bigdatamatching are current interesting research topics which are elaborately discussed in several research studies.When the initial dataset is huge,the best suitable technique for datamatching is canopy clustering.But this clustering technique assigns the records or datapoints to multiple overlapping clusters which introduces redundant pair comparisons when similar records share more than one cluster. An approach has been proposed that avoids such redundant pair comparisons prior to datamatching phase.Sofar a very few research studies are carried out on the redundant-free similarity computation.The new algorithms proposed in the approach serve two major purposes.i)Explaining an incremental procedure for creation of clusters which is more useful in the process of datamatching.ii)Redundant-free pair selection for datamatching.The approach does not require post-processing of final result.Compared to Kolb’s approach[3],our approach reduces the complexity in the datamatching phase.

<< BACK

ACE - 2014