A
New Vector Representation of Short Texts for Classification
Abstract:
Short and sparse
characteristics and synonyms and homonyms are main obstacles for short-text
classification. In recent years, research on short-text classification has
focused on expanding short texts but has barely guaranteed the validity of
expanded words. This study proposes a new method to weaken these effects
without external knowledge. The proposed method analyses short texts by using
the topic model based on Latent Dirichlet Allocation (LDA), represents each
short text by using a vector space model and presents a new method to adjust
the vector of short texts. In the experiments, two open short-text data sets
composed of google news and web search snippets are utilised to evaluate the
classification performance and prove the effectiveness of our method.
Keywords: Text representation, short-text
classification, Latent Dirichlet Allocation, topic model.
Received January 30, 2019; accepted July 2, 2019
https://doi.org/10.34028/iajit/17/2/12