Encoding Gene Expression Using Deep
Autoencoders for Expression Inference
Raju Bhukya
Department
of Computer Science and Engineering, National Institute of
Technology, India
Abstract: Gene expression of an organism contains all the
information that characterises its observable traits. Researchers have invested
abundant time and money to quantitatively measure the expressions in
laboratories. On account of such techniques being too expensive to be widely
used, the correlation between expressions of certain genes was exploited to
develop statistical solutions. Pioneered by the National Institutes of Health
Library of Integrated Network-Based Cellular Signature (NIH LINCS) program,
expression inference techniques has many improvements over the years. The Deep
Learning for Gene expression (D-GEX) project by University of California,
Irvine approached the problem from a machine learning perspective, leading to
the development of a multi-layer feedforward neural network to infer target
gene expressions from clinically measured landmark expressions. Still, the huge
number of genes to be inferred from a limited set of known expressions vexed
the researchers. Ignoring possible correlation between target genes, they
partitioned the target genes randomly and built separate networks to infer
their expressions. This paper proposes that the dimensionality of the target
set can be virtually reduced using deep autoencoders. Feedforward networks will
be used to predict the coded representation of target expressions. In spite of
the reconstruction error of the autoencoder, overall prediction error on the
microarray based Gene Expression Omnibus (GEO) dataset was reduced by 6.6%,
compared to D-GEX. An improvement of 16.64% was obtained on cross platform
normalized data obtained by combining the GEO dataset and an RNA-Seq based
1000G dataset.
Keywords: Deep autoencoder, gene expression, internal
covariance shift, machine learning, MLP, PCA.
Received March 5, 2019;
accepted April 13, 2020