An Efficient Web Search Engine for Noisy Free Information Retrieval

An Efficient Web Search Engine for Noisy Free Information Retrieval

Pradeep Sahoo1 and Rajagopalan Parthasarthy2

1Department of Computer Science and Engineering, Anna University, India

2Department of Computer Science and Engineering, GKM College of Engineering and Technology, India

Abstract: The vast growth, various dynamic and low quality of the world wide web makes it very difficult to retrieve relevant information from internet during query search. To resolve this issue, various web mining techniques are being used. The biggest challenge in web mining is to remove noisy data information or unwanted information from the webpage such as banner, video, audio, images, hyperlinks etc. which are not associated to a user query. To overcome these issues, a novel custom search engine is proposed with efficient algorithm in this paper. The proposed Uniform Resource Locator (URL) pattern extractor algorithm will extract the all relevance index pages from the web and ranking the indexes based on user query. Then, Noisy Data Cleaner (NDC) algorithm is applied to remove the unwanted content from the retrieved web pages. The results show that the proposed URL Pattern Extractor (UPE)+NDC algorithm provides very promising results for different datasets with high precision and recall rate in comparison with the existing algorithms.

Keywords: Web content extraction, relevant information, noise data elimination, noisy data cleaner algorithm, URL pattern extractor algorithm.

Received November 27, 2014; accepted June 1, 2015

Full text 

Read 1958 times Last modified on Thursday, 17 May 2018 05:45
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…