Prediction of Part of Speech Tags for Punjabi using Support Vector Machines

Prediction of Part of Speech Tags for Punjabi using Support Vector

Machines 

Dinesh Kumar1 and Gurpreet Josan2

 1Department of Information Technology, DAV Institute of Engineering and Technology, India

2Department of Computer Science, Punjabi University, India

Abstract: Part-Of-Speech (POS) tagging is a task of assigning the appropriate POS or lexical category to each word in a natural language sentence. In this paper, we have worked on automated annotation of POS tags for Punjabi. We have collected a corpus of around 27,000 words, which included the text from various stories, essays, day-to-day conversations, poems etc., and divided these words into different size files for training and testing purposes. In our approach, we have used Support Vector Machine (SVM) for tagging Punjabi sentences. To the best of our knowledge, SVMs have never been used for tagging Punjabi text. The result shows that SVM based tagger has outperformed the existing taggers. In the existing POS taggers of Punjabi, the accuracy of POS tagging for unknown words is less than that for known words. But in our proposed tagger, high accuracy has been achieved for unknown and ambiguous words. The average accuracy of our tagger is 89.86%, which is better than the existing approaches.


Keywords: POS tagging, SVM, feature set, vectorization, machine learning, tagger, punjabi, indian languages.


Received September 18, 2013; accepted February 28, 2014; Published online December 23, 2015

 

Full text

Read 1414 times Last modified on Wednesday, 06 March 2019 03:18
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…