A Hybrid Technique for Annotating Book Tables

A Hybrid Technique for Annotating Book Tables

Asima Latif1, Shah Khusro1, Irfan Ullah1, and Nasir Ahmad2

1Department of Computer Science, University of Peshawar, Pakistan

2Department of Computer Systems Engineering, University of Engineering and Technology Peshawar, Pakistan


Abstract: Table extraction is usually complemented with the table annotation to find the hidden semantics in a particular piece of document or a book. These hidden semantics are determined by identifying a type for each column, finding the relationships between the columns, if any, and the entities in each cell. Though used for the small documents and web-pages, these approaches have not been extended to the table extraction and annotation in the book tables. This paper focuses on detecting, locating and annotating entities in book tables. More specifically it contributes algorithms for identifying and locating the tables in books and annotating the table entities by using the online knowledge source DBpedia Spotlight. The missing entities from the DBpedia Spotlight are then annotated using Google Snippets. It was found that the combined results give higher accuracy and superior performance over the use of DBpedia alone. The approach is a complementary one to the existing table annotation approaches as it enables us to discover and annotate entities that are not present in the catalogue. We have tested our scheme on Computer Science books and got promising results in terms of accuracy and performance.

Keywords: DBpedia spotlight, google snippets, table extraction, table annotation, table semantics, knowledge base.

Received February 8, 2015; accepted August 31, 2015

Full text 

 
Read 1806 times
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…