A MapReduce-based Quick Search Approach on Large Files
Ye-feng
Li1, Jia-jin Le2, and Mei Wang2
1College of Computer Science
and Technology, Beijing University of Technology, China
2College of Computer Science and
Technology, Donghua University, China
Abstract: String search is an important branch of pattern matching for information
retrieval in various fields. In the past four decades, the research importance
has been attached on skipping more unnecessary characters to improve the search
performance, and never taken into consideration on large scale of data. In this
paper, two major achievements are contributed. At first, we propose a Quick
Search algorithm for data Stream (QSS) on a single machine to support string
search in a large text file, as opposed to previous researches that limits to a
bound memory. For the next, we implement the search algorithm on MapReduce
framework to improve the velocity of retrieving the search results. The
experiments demonstrate that our approach is fast and effective for large
files.
Keywords: String search, mapreduce, data
stream and large file.
Received May 21, 2015; accepted September 24, 2017