MCMM-LSW: MULTILEVEL CONTENT MINING MODEL FOR LARGE SCALE WEBSITES

Conference: Fifth International Conference on Advances in Information Technology and Mobile Communication
Author(s): Neeraj Raheja, Vijay Kumar Katiyar Year: 2015
Grenze ID: 02.AIM.2015.5.522 Page: 90-99

Abstract

As per current usage of WWW, the data available over the websites is growing at a large scale hence efficient web data extraction approaches has become a great challenge for large scale websites. The main requirement of such websites is to extract the efficient and accurate data in sufficient amount of time. This paper proposes a Web content extraction model for extracting content from large scale websites. The proposed Model (MCMM-LSW) produces a link tree of website and extracts content based on the seed page extracted from different levels of link tree. The results produce higher recall, precision and overall accuracy (F-measure) than the techniques used in the literature. The effect of changing number of levels of the website is also shown in results. Finally the comparison of keyword based search and proposed approach is also shown.

<< BACK

AIM - 2015