Improving Classification Performance Using
Genetic Programming to Evolve String Kernels
Ruba
Sultan1, Hashem Tamimi1,2, and Yaqoub Ashhab2
1College
of IT and Computer Engineering, Palestine Polytechnic University, Palestine
2Palestine-Korea Biotechnology
Center, Palestine Polytechnic University, Palestine
Abstract: The objective of this work is to present a novel
evolutionary-based approach that can create and optimize powerful string
kernels using Genetic Programming. The proposed model creates and optimizes a
superior kernel, which is expressed as a combination of string kernels, their
parameters, and corresponding weights. As a proof of concept to demonstrate the
feasibility of the presented approach, classification performance of the newly
evolved kernel versus a group of conventional single string kernels was
evaluated using a challenging classification problem from biology domain known
as theclassification of binder and
non-binder peptides to Major Histocompatibility Complex Class II. Using 4794
strings containing 3346 binder and 1448 non-binder peptides, the present
approach achieved Area Under Curve=0.80, while the 11 tested conventional
string kernels have Area Under Curve ranging from 0.59 to 0.75. This
significant improvement of the optimized evolved kernel over all other tested
string kernels demonstrates the validity of this approach for enhancing Support
Vector Machine classification. The presented approach is not exclusive for
biological strings. It can be applied to solve pattern recognition problems for
other types of strings as well as natural language processing.
Keywords: Support vector machine, string kernels, genetic
programming, pattern recognition.