A Rule-Based Extensible Stemmer for Information Retrieval with Application to Arabic
Haidar Harmanani1, Walid Keirouz2, and Saeed Raheel1
1Computer Science and Mathematics Division, Lebanese American University, Lebanon
2Department of Computer Science, American University of Beirut, Lebanon
Abstract: This paper presents a new and extensible method for information retrieval and content analysis in Natural Languages (NL). The proposed method is stem-based; stems are extracted based on a set of language dependent rules that are interpreted by a rule engine. The rule engine allows the system to be adapted to any natural language by modifying the NL semantic rules and grammar. The system has been fully tested using Arabic, and partially using English, Hebrew, and Persian. We have validated our approach using a database-based prototype.
Keywords: Natural language processing, information retrieval, stemming.
Received February 21, 2005; accepted July 13, 2005