An Efficient Web Search Engine for Noisy Free Information Retrieval
Pradeep Sahoo1 and Rajagopalan Parthasarthy2
1Department of Computer Science and Engineering, Anna University, India
2Department of Computer
Science and Engineering, GKM College of Engineering and Technology, India
Abstract: The vast
growth, various dynamic and low quality of the world wide web makes it very
difficult to retrieve relevant information from internet during query search.
To resolve this issue, various web mining techniques are being used. The
biggest challenge in web mining is to remove noisy data information or unwanted
information from the webpage such as banner, video, audio, images, hyperlinks
etc. which are not associated to a user query. To overcome these issues, a
novel custom search engine is proposed with efficient algorithm in this paper.
The proposed Uniform Resource Locator (URL) pattern extractor algorithm will
extract the all relevance index pages from the web and ranking the indexes
based on user query. Then, Noisy Data Cleaner (NDC) algorithm is applied to
remove the unwanted content from the retrieved web pages. The results show that
the proposed URL Pattern Extractor (UPE)+NDC algorithm provides very promising
results for different datasets with high precision and recall rate in
comparison with the existing algorithms.
Keywords: Web content
extraction, relevant information, noise data elimination, noisy data cleaner
algorithm, URL pattern extractor algorithm.
Received November 27, 2014; accepted June 1, 2015