A Semantic Framework for Extracting Taxonomic Relations from
Text Corpus
Phuoc Thi Hong Doan,
Ngamnij Arch-int, and Somjit Arch-int
Department of Computer
Science, Khon Kaen University, Thailand
Abstract: Nowadays, ontologies
have been exploited in many current applications due to the abilities in
representing knowledge and inferring new knowledge. However, the manual
construction of ontologies is tedious and time-consuming. Therefore, the
automated ontology construction from text has been investigated. The extraction
of taxonomic relations between concepts is a crucial step in constructing
domain ontologies. To obtain taxonomic relations from a text corpus, especially
when the data is deficient, the approach of using the web as a source of
collective knowledge (a.k.a web-based approach) is usually applied. The
important challenge of this approach is how to collect relevant knowledge from
a large amount of web pages. To overcome this issue, we propose a framework
that combines Word Sense Disambiguation (WSD) and web approach to extract
taxonomic relations from a domain-text corpus. This framework consists of two
main stages: concept extraction and taxonomic-relation extraction. Concepts
acquired from the concept-extraction stage are disambiguated through WSD module
and passed to stage of extraction taxonomic relations afterward. To evaluate
the efficiency of the proposed framework, we conduct experiments on datasets
about two domains of tourism and sport. The obtained results show that the
proposed method is efficient in corpora which are insufficient or have no
training data. Besides, the proposed method outperforms the state of the art
method in corpora having high WSD results.
Keywords: Taxonomic relation, ontology construction, word sense
disambiguation, knowledge acquisition.