An Efficient Algorithm for Extracting Infrequent
Itemsets from Weblog
Brijesh Bakariya1
and Ghanshyam Thakur2
1Department of Computer Science and Engineering, I.K.
Gujral Punjab Technical University, India
2Department of Computer
Applications, Maulana Azad National Institute of Technology, India
Abstract: Weblog data contains unstructured
information. Due to this, extracting frequent pattern from weblog databases is
a very challenging task. A power set lattice strategy is adopted for handling
that kind of problem. In this lattice, the top label contains full set and at
the bottom label contains empty set. Most number of algorithms follows
bottom-up strategy, i.e. combining smaller to larger sets. Efficient lattice
traversal techniques are presented which quickly identify all the long frequent
itemsets and their subsets if required. This strategy is suitable for
discovering frequent itemsets but it might not be worth being used for infrequent
itemsets. In this paper, we propose Infrequent Itemset Mining for Weblog (IIMW)
algorithm; it is a top-down breadth-first level-wise algorithm for discovering infrequent
itemsets. We have compared our algorithm IIMW to Apriori-Rare, Apriori-Inverse
and generated result in with different parameters such as candidate itemset,
frequent itemset, time, transaction database and support threshold.
Keywords:
Infrequent itemsets, lattice, frequent itemsets, weblog, support threshold.