Improving Classification Performance Using Genetic Programming to Evolve String Kernels

Improving Classification Performance Using Genetic Programming to Evolve String Kernels

Ruba Sultan1, Hashem Tamimi1,2, and Yaqoub Ashhab2

1College of IT and Computer Engineering, Palestine Polytechnic University, Palestine

2Palestine-Korea Biotechnology Center, Palestine Polytechnic University, Palestine

Abstract: The objective of this work is to present a novel evolutionary-based approach that can create and optimize powerful string kernels using Genetic Programming. The proposed model creates and optimizes a superior kernel, which is expressed as a combination of string kernels, their parameters, and corresponding weights. As a proof of concept to demonstrate the feasibility of the presented approach, classification performance of the newly evolved kernel versus a group of conventional single string kernels was evaluated using a challenging classification problem from biology domain known as theclassification of binder and non-binder peptides to Major Histocompatibility Complex Class II. Using 4794 strings containing 3346 binder and 1448 non-binder peptides, the present approach achieved Area Under Curve=0.80, while the 11 tested conventional string kernels have Area Under Curve ranging from 0.59 to 0.75. This significant improvement of the optimized evolved kernel over all other tested string kernels demonstrates the validity of this approach for enhancing Support Vector Machine classification. The presented approach is not exclusive for biological strings. It can be applied to solve pattern recognition problems for other types of strings as well as natural language processing.

Keywords: Support vector machine, string kernels, genetic programming, pattern recognition.

Received October 31, 2015; accepted June 1, 2016
 
Read 1449 times
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…