Role of References in Similarity Estimation of
Publications
Abstract: Similarity estimation among publications is very important in classification and clustering techniques for grouping, indexing, citation matching and Author Name Disambiguation (AND) purposes. Publication attributes are basic sources of information and play important role in similarity estimation. Most of the works in AND use title, co-authors and venue attributes for estimating similarity among publications. Many other sources of information such as self-citations, shared citations and references, topic of the publications and abstracts have also been employed to estimate optimal similarity among publications. Recently, in the field of Academic Document Clustering (ADC), reference marker contexts have been utilized for this purpose. However, the use of citations and references is less common since only a few databases include this information. In this paper, we propose to use two components of references (co-authors and titles of references) as sources of information and investigate the importance of these components in similarity estimation. To the best of our knowledge, this is the first endeavour to exploit components of references as sources of information. Experiments conducted on real publication datasets reveal that these components of references are significant source of information for similarity estimation among publications.
Keywords: AND, references, vector space model, cosine similarity, citation matching.
Received May 16, 2014; accepted September 11, 2014