Enhancing Generic Pipeline Model for Code Clone Detection Using Divide and Conquer Approach
1School of Computer Sciences, Universiti Sains Malaysia, 11800 USM Penang, Malaysia
2Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor, Malaysia
Abstract: Code clone is known as identical copies of the same instances or fragments of source codes in software. Current code clone research focuses on the detection and analysis of code clones in order to help software developers identify code clones in source codes and reuse the source codes in order to decrease the maintenance cost. Many approaches such as textual based comparison approach, token based comparison and tree based comparison approach have been used to detect code clones. As software grows and becomes a legacy system, the complexity of these approaches in detecting code clones increases. Thus, this scenario makes it more difficult to detect code clones. Generic pipeline model is the most recent code clone detection that comprises five processes which are parsing process, pre-processing process, pooling process, comparing processes and filtering process to detect code clone. This research highlights the enhancement of the generic pipeline model using divide and conquer approach that involves concatenation process. The aim of this approach is to produce a better input for the generic pipeline model by processing smaller part of source code files before focusing on the large chunk of source codes in a single pipeline. We implement and apply the proposed approach with the support of a tool called Java Code Clone Detector. The result obtained shows an improvement in the rate of code clone detection and overall runtime performance as compared to the existing generic pipeline model.
Keywords: Code clone detection, divide and conquer approach, generic pipeline model.
Received August 31, 2012; accepted February 23, 2014