A New Vector Representation of Short Texts for Classification

A New Vector Representation of Short Texts for Classification

Yangyang Li and Bo Liu

College of Information Science and Technology, Jinan University, China

Abstract: Short and sparse characteristics and synonyms and homonyms are main obstacles for short-text classification. In recent years, research on short-text classification has focused on expanding short texts but has barely guaranteed the validity of expanded words. This study proposes a new method to weaken these effects without external knowledge. The proposed method analyses short texts by using the topic model based on Latent Dirichlet Allocation (LDA), represents each short text by using a vector space model and presents a new method to adjust the vector of short texts. In the experiments, two open short-text data sets composed of google news and web search snippets are utilised to evaluate the classification performance and prove the effectiveness of our method.

Keywords: Text representation, short-text classification, Latent Dirichlet Allocation, topic model.

Received January 30, 2019; accepted July 2, 2019
https://doi.org/10.34028/iajit/17/2/12

Full text      

Read 1319 times Last modified on Wednesday, 26 February 2020 05:51
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…