Sunday, 20 October 2019 01:12

Mining Recent Maximal Frequent Itemsets Over Data Streams with Sliding Window

Saihua Cai1, Shangbo Hao1, Ruizhi Sun1, and Gang Wu2

1College of Information and Electrical Engineering, China Agricultural University, China

2Secretary of Computer Science Department, Tarim University, China

Abstract: The huge number of data streams makes it impossible to mine recent frequent itemsets. Due to the maximal frequent itemsets can perfectly imply all the frequent itemsets and the number is much smaller, therefore, the time cost and the memory usage for mining maximal frequent itemsets are much more efficient. This paper proposes an improved method called Recent Maximal Frequent Itemsets Mining (RMFIsM) to mine recent maximal frequent itemsets over data streams with sliding window. The RMFIsM method uses two matrixes to store the information of data streams, the first matrix stores the information of each transaction and the second one stores the frequent 1-itemsets. The frequent p-itemsets are mined with “extension” process of frequent 2-itemsets, and the maximal frequent itemsets are obtained by deleting the sub-itemsets of long frequent itemsets. Finally, the performance of the RMFIsM method is conducted by a series of experiments, the results show that the proposed RMFIsM method can mine recent maximal frequent itemsets efficiently.

Keywords: Data streams, recent maximal frequent itemsets, sliding window, matrix structure.

Received November 16, 2016; accepted April 25, 2018
Full text    
Sunday, 20 October 2019 01:11

Evaluation of Quran Recitation via OWL

Ontology Based System

Eman Elsayed and Doaa Fathy

Mathematics and Computer Science Department, Al-Azhar University (Girls Branch), Egypt

Abstract: The Linguistic miracle in the Holy Quran leads to many challenges in Automate Quran recitation evaluation. This paper considers one of suggestions of how natural language processing can benefit from using ontology. In this paper, we proposed a general automatic system to evaluate Quran recitation according to Hafs reading. That is via integration the ontology based as artificial intelligent knowledge representation method and Automatic Speech Recognition (ASR) technology as a way of interaction with computer. Our proposed system solves the problem of evaluating all intonations (Tajweed) in the same time in addition to evaluate set of Quran segments in the wright arrangement of reading. The system uses Mel-Frequency Cepstral Coefficients (MFCC) and Vector Quantization (VQ) respectively in feature extraction and dimension reduction on Arabic speech. Also, we construct Quran ontology based for Quranic speech and integrate it with information retrieval system. Quran ontology based is the first version to merge Quran meaning" Tafseer" and its recitation in the Universal oral exam to take advantage of semantic property of ontology. Experimentally, our system gives good accuracy for Quran recitation evaluation.

Keywords: Holy Quran, ontology, automatic speech recognition, feature extraction, Information retrieval system.

Received January 2, 2017; accepted March 11, 2018
Full text    
Sunday, 20 October 2019 01:10

A Personalized Metasearch Engine Based on

Multi-Agent System

Meijia Wang, Qingshan Li, Yishuai Lin, Yingjian Li, and Boyu Zhou

Software Engineering Institute, Xidian University, China

Abstract: By providing unified access to multiple underlying search engines, metasearch engine is an intuitiveway to increase the coverage of the WWW. Great progress has been made in this area, butthe previous studies ignore the perspectives of users. This paper proposes a personalization mechanism for metasearch engine based on multi-agent system to improve precision ratio. The proposed mechanism obtains user interests from click-through data, schedules the appropriate underlying search engines according to the expertness model, and merges results based on user interest distribution. Moreover, it also has the ability to provide personalized result recommendation. Compared with the baseline results, experimental results show that the proposed personalization mechanism performs better on precision. The proposed metasearch engine is feasible for providing useful search results more effectively.

Keywords: Metasearch engine, multi-agent system, personalized search.

Received October 29, 2016; accepted August 26, 2018

Full text    

Sunday, 20 October 2019 01:09

Parameter Optimization of Single Sample Virtually

Expanded Method

Xiaoxia Zhao1, Wenjun Meng1, Jinhu Su1, Yuxuan Chen2

1School of Mechanical Engineering, Taiyuan University of Science and Technology, China

2Design and Research Institute, North China Municipal Engineering Design and Research Institute Co. Ltd., China

Abstract: Aiming at single sample experiment of sample number n=1, the relationship of parameters virtually expanded from n=1 to n=13 is derived in this paperand the big-samples data are gained by Bootstrap method. Instead of existing methods, a developing particle swarm optimization based on Minimax is put forward. With the application of this method in the parameter optimization, the lower confidence limit approaches the lower confidence limit of the Semiempirical Evaluation Method with more rapid speed and higher precision. In this way, the most suitable augmented parameters virtually expanded from n=1 to n=13 are gained, which provides a better virtual augment method for the sample augment from n=1 to n=13.

Keywords: Bootstrap method, semiempirical evaluation method, single sample, augmented parameter, minimax, particle swarm optimization (PSO).

Received November 14, 2016; accepted July 22, 2018
Full text    
Sunday, 20 October 2019 01:08

Classifying Sentiment of Dialectal Arabic

Reviews: A Semi-Supervised Approach

Omar Al-Harbi

Computer and Information Department, Jazan University, Saudi Arabia

Abstract: Arab Internet users tend to use dialectical words to express how they feel about products, services, and places. Although, dialects in Arabic derived from the formal Arabic language, it differs in several aspects. In general, Arabic sentiment analysis recently attracted lots of researchers’ attention. A considerable amount of research has been conducted in Modern Standard Arabic (MSA), but little work has focused on dialectal Arabic. The presence of the dialect in the Arabic texts made Arabic sentiment analysis is a challenging issue, due to it usually does not follow specific rules in writing or speaking system. In this paper, we implement a semi-supervised approach for sentiment polarity classification of dialectal reviews with the presence of Modern Standard Arabic (MSA). We combined dialectal sentiment lexicon with four classifying learning algorithm to perform the polarity classification, namely Support Vector Machines (SVM), Naïve Bayes (NB), Random Forest, and K-Nearest Neighbor (K-NN). To select the features with which the classifiers can perform the best, we used three feature evaluation methods, namely, Correlation-based Feature Selection, Principal Components Analysis, and SVM Feature Evaluation. In the experiment, we applied the approach to a data set which was manually collected. The experimental results show that the approach yielded the highest classification accuracy using SVM algorithm with 92.3 %.

Keywords: Arabic sentiment analysis, Opinion mining, Dialectal sentiment analysis, Dialectal lexicon, Dialectal Arabic processing.

Received January 8, 2017; accepted March 6, 2018
Full text    
Sunday, 20 October 2019 01:07

A Proactive Caching Scheme Based on Content Concentration in Content-Centric Networks

Xi Luo1,2 and Ying An1

1Institute of Information Security and Big Data, Central South University, China

2Department of Information Technology, Hunan Police Academy, China

Abstract: Content-Centric Networking (CCN) provides a new networking paradigm to overcome the challenges of the current Internet and better support the content-oriented services. In-network caching is one of the core technologies for optimizing content distribution and has been attracting ever increasing attention. In the current CCN caching schemes, the caching decision is only implemented during the content response process after the content which matches the user request is found, which lacks the capacity of proactive content advancement. This paper borrows the idea of molecular diffusion and introduces the conception of content concentration into the caching decision to analytically model the user demands for different contents in different network regions, and then a proactive caching scheme based on content concentration is proposed. In this scheme, the content replica can be proactively pushed from the high concentration region to the low concentration one to realize the fast deployment of content cache replicas and the dynamic adjustment of the cache location. Furthermore, a probabilistic content placement method is also implemented by synthetically considering the content popularity. The simulation results show that this scheme can effectively improve the cache hit ratio and shorten the delay involved in content search and response.

Keywords: Content-centric networking, diffusion theory, proactive caching, content popularity.

Received October 13, 2016; accepted November 27, 2017

Full text    

Sunday, 20 October 2019 01:06

A New Approach to Improve Association Rules for Big Data in Cloud Environment

Djilali Dahmani, Sidi Ahmed Rahal, and Ghalem Belalem

Department of Computer Science, University of Science and Technology, Algeria

Abstract: The technique of association rules is very useful in Data Mining, but it generates a huge number of rules. So, a manual post-processing is required to target only the interesting rules. Several researchers suggest integrating users' knowledge by using ontology and rule patterns, and then select automatically the interesting rules after generating all possible rules. However, nowadays the business data are extremely increasing, and many companies have already opted for Big Data systems deployed in cloud environments, then the process of generating association rules becomes very hard. To deal with this issue, we propose an approach using ontology with rule patterns to integrate users' knowledge early in the preprocessing step before searching or generating any rule. So, only the interesting rules which respect the rule patterns will be generated. This approach allows reducing execution time and minimizing the cost of the post-processing especially for Big Data. To confirm the performance results, experiments are carried out on Not Only Strutured Query Language (NoSQL) databases which are distributed in a cloud environment.

Keywords: Big data, association rules, rule patterns, ontology, cloud computing, NoSQL.

Received April 5, 2016; accepted September 23, 2018
Full text    
Sunday, 20 October 2019 01:05

PLDL: A Novel Method for Label Distribution

Learning

Venkatanareshbabu Kuppili, Mainak Biswas, and Damodar Edla

Computer Science and Engineering, National Institute of Technology Goa, India

Abstract: The nature, volume and orientation of data have been changed a lot in the last few years. The changed situation has beckoned data scientists to modify traditional algorithms and innovate new methods for processing new type of high volume, extremely complex data. One of the challenges is label ambiguity in the data, where the distribution of the significance of the labels matters. In this paper, a new method named Probabilistic Label Distribution Learning (PLDL) has been proposed for a computing degree of the belongingness. It is based on a proposed new Label Probability Density Function (LPDF) derived from Parzon estimate. The LPDF has been used in Algorithm Adoption K-Nearest Neighbors (AA-KNN) for Label Distribution Learning (LDL). Probability density estimators are used to estimate this ambiguity for each and every label. The overall degree of the belongingness of unseen instance has been evaluated on various real datasets. Comparative performance evaluation in terms of prediction accuracy of the proposed PLDL has been made with Algorithm adaptation KNN, Multilayer Perceptron, Levenberg-Marquardt neural network and layer recurrent neural for Label Distribution Learning. It has been observed that the increase in prediction accuracy for the proposed PLDL is highly statistically significant for most of the real datasets when compared with the standard algorithms for LDL.

Keywords: Multi-label classification, data mining, label distribution learning, probability density function.

Received September 19, 2016; accepted January 16, 2018
Full text    
Sunday, 20 October 2019 01:05

A Comprehensive Study of Modern and High

Speed TCP-Variant in Linux Kernel: TCP CUBIC

Abrar Khan, Umar Shoaib, and Muhammad Sarfraz

Department of Computer Science, Faculty of Computing and Information Technology, University of Gujrat, Pakistan

Abstract: Transmission Control Protocol TCP is no doubt most widely used congestion control protocol designed for highly reliable and end-to-end communication over the internet. TCP is not suitable in its standard form for modern and high speed networks. Various TCP variants are solution for this issue. CUBIC is a modern TCP variant designed for high speed and scalable networks. CUBIC is also adopted as default congestion control algorithm in Linux kernel. This survey paper contains a detailed discussion about TCP CUBIC and the directions for further improvements. It describes the CUBIC design architecture with the pseudo code of the algorithm, TCP support in Linux kernel and implementation of CUBIC, Network Simulator 2 and Network Simulator 3 based study of CUBIC along with its class diagram. Finally, the performance of CUBIC is evaluated both in wired and wireless environment under the parameters of goodput and intra-protocol fairness along with TCP NewReno and TCP Compound. The simulation results demonstrate that CUBIC is very suitable for wired and high speed networks but its performance degrades in wireless and low speed networks.

Keywords: TCP, Congestion control, Linux kernel, CUBIC, NewReno, Compound.

Received July 18, 2016; accepted July 20, 2017
Full text    
Sunday, 20 October 2019 01:04

Block Size Analysis for Discrete Wavelet Watermarking and Embedding a Vector Image as a Watermark

Hayri Sever1, Ahmet Şenol1, and Ersin Elbaşı2

1Department of Computer Engineering, Çankaya University, 06790 Etimesgut Ankara

2College of Engineering and Technology, American University of the Middle East, Kuwait

Abstract: As telecommunication and computer technologies proliferate, most data are stored and transferred in digital format. Content owners, therefore, are searching for new technologies to protect copyrighted products in digital form. Image watermarking emerged as a technique for protecting image copyrights. Early studies on image watermarking used the pixel domain whereas modern watermarking methods convert a pixel based image to another domain and embed a watermark in the transform domain. This study aims to use, Block Discrete Wavelet Transform (BDWT) as the transform domain for embedding and extracting watermarks. This study consists of 2 parts. The first part investigates the effect of dividing an image into non-overlapping blocks and transforming each image block to a DWT domain, independently. Then, effect of block size on watermark success and, how it is related to block size, are analyzed. The second part investigates embedding a vector image logo as a watermark. Vector images consist of geometric objects such as lines, circles and splines. Unlike pixel-based images, vector images do not lose quality due to scaling. Vector watermarks deteriorate very easily if the watermarked image is processed, such as compression or filtering. Special care must be taken when the embedded watermark is a vector image, such as adjusting the watermark strength or distributing the watermark data into the image. The relative importance of watermark data must be taken into account. To the best of our knowledge this study is the first to use a vector image as a watermark embedded in a host image.

Keywords: Watermarking, DWT, block, vector, SVG.

Received Febrewary 7, 2017; accepted October 17, 2018
Full text    
Sunday, 20 October 2019 01:01

Tracking Recurring Concepts from Evolving Data

Streams using Ensemble Method

Yange Sun1,2, Zhihai Wang2, Jidong Yuan2, and Wei Zhang2

1School of Computer and Information Technology, Xinyang Normal University, China

2School of Computer and Information Technology, Beijing Jiaotong University, China

Abstract: Ensemble models are the most widely used methods for classifying evolving data stream. However, most of the existing data stream ensemble classification algorithms do not consider the issue of recurring concepts, which commonly exist in real-world applications. Motivated by this challenge, an Ensemble with internal Change Detection (ECD) was proposed to enhance performance by exploring the recurring concepts. It is done by maintaining a pool of classifiers, which dynamically adds and removes classifiers in response to the change detector. The algorithm adopts a two window change detection model, which adopts the Jensen-Shannon divergence to measure the distance of the distributions between old and recent data. When a change is detected, the repository of stored historical concepts is checked for reuse. Experimental results on both synthetic and real-world data streams demonstrate that the proposed algorithm not only outperforms the state-of-art methods on standard evaluation metrics, but also adapts well in different types of concept drift scenarios especially when concept s reappear.

Keywords: Data streams, ensemble classification, change detection, recurring concept, Jensen-Shannon divergence.

Received August 10, 2016; accepted July 8, 2018
Full text    
Sunday, 20 October 2019 01:00

The Influence of Data Classification Methods on Predictive Accuracy of Kernel Density Estimation Hotspot Maps

Nenad Milic1, Brankica Popovic2, Sasa Mijalkovic1, and Darko Marinkovic1

1Department of Criminalistics, University of Criminal Investigation and Police Studies, Serbia

2Department of Informatics and Computer Science, University of Criminal Investigation and Police Studies, Serbia

Abstract: When it comes to hot spot identification, spatial analysis techniques come to the fore. One of such techniques, that has gained great popularity among crime analysts, is the Kernel Density Estimation (KDE). Small variation in KDE parameters can give different outputs and hence affect predictive accuracy of hotspot map. The influence these parameters have on KDE hotspot output sparked many researches, mostly analyzing the influence of the cell size and bandwidth size. Yet, the influence of different classification methods applied to calculated cell values, including the choice of threshold value, on the KDE hotspot predictive accuracy remained neglected. The objective of this research was to assess the influence of different classification methods to KDE predictive accuracy. In each KDE computation, calculated cell values were divided into five thematic classes, using three the most common (default) classification methods provided by Environmental Systems Research Institute (ESRI) Geographical Information System (Arc GIS) (equal interval classification, quantile classification and natural breaks classification) and incremental multiples of the grid cells’ mean. Based upon calculated hit rates, predictive accuracy indices and recapture rate indices and taking into account the necessity that mapping output should satisfy some operational requirements as well as statistical rules, this research suggest that incremental mean approach with hotspot threshold of 3 and above multiples of the grid cell’s mean, should be used.

Keywords: Crime mapping, hot spot, kernel density, classification methods.

Received May 20, 2016; accepted May 6, 2018
Full text    
Sunday, 20 October 2019 00:59

Correlation Dependencies between Variables in Feature Selection on Boolean Symbolic Objects

Djamal Ziani

 College of Business and Administration, Al Yamamah University, Saudi Arabia

Abstract: Feature selection is an important process in data analysis and data mining. The increasing size, complexity, and multi-valued nature of data necessitate the use of Symbolic Data Analysis (SDA), which utilizes symbolic objects instead of classical tables, for data analysis. The symbolic objects are created by using abstraction or generalization techniques on individuals. They are a representation of concepts or clusters. To improve the description of these objects, and to eliminate incoherencies and over-generalization, using dependencies between variables is crucial in SDA. This study shows how correlation dependencies between variables can be processed on Boolean Symbolic Objects (BSOs) in feature selection. A new feature selection criterion that considers the dependencies between variables, and a method of dealing with computation complexity is also presented.

Keywords: Feature selection, dependencies; symbolic data analysis, discrimination criteria.

Received September 13, 2016; accepted July 23, 2017
Full text    
Sunday, 20 October 2019 00:59

Towards Achieving Optimal Performance using

Stacked Generalization Algorithm: A Case

Study of Clinical Diagnosis of Malaria Fever

Abiodun Oguntimilehin1, Olusola Adetunmbi2, and Innocent Osho3

1Department of Computer Science, Afe Babalola University, Ado-Ekiti

2Department of Computer Science, Federal University of Technology, Akure

3Department of Animal Production and Health, Federal University of Technology, Akure

Abstract: The birth of data mining has been a blessing to all fields of endeavours and there are numerous data mining algorithms available today. One of the major problems of mining data is the selection of the appropriate algorithm or model for a job at hand; this has led to different comparison experiments by researchers. Stacked Generalization is one of the methods of combining multiple models to give a better accuracy. The method has been investigated to be effective by many researchers over the years. This study investigates how optimal performance could be achieved using Stacked Generalization algorithm. Six different data mining algorithms (PART, REP Tree, J48, Random Tree, RIDOR and JRIP) arranged in two different orders were used as base learners to two different Meta Learners (Random Forest and NNGE) independently and the results obtained were compared in terms of classification accuracy. The study shows that the order of arrangement of the base learners and the choice of Meta Learner could affect the accuracy of the Stacked Generalization method; NNGE outperforms Random Forest as a Meta-Learner and its performance is independent of the order of arrangement of the base learners as against Random Forest. Malaria fever datasets collected from reputable hospitals in Ado-Ekiti, Ekiti State, Nigeria were purposefully used for this study because malaria is one of the major diseases killing almost a million people yearly in the tropical region of Africa, so a more accurate malaria fever diagnosis model is as well proposed as a result of this study.

Keywords: Data mining, ensemble learning, stacked generalization, malaria, diagnosis.

Received October 13, 2016; accepted October 11, 2017

Full text    

Sunday, 20 October 2019 00:57

Optimal Dual Cameras Setup for Motion

Recognition in Salat Activity

Nor Azrini Jaafar1, Nor Azman Ismail1, Kamarul Azmi Jasmi2, and Yusman Azimi Yusoff1

1Faculty of Computing, Universiti Teknologi Malaysia, Malaysia

2Faculty of Islamic Civilization, Universiti Teknologi Malaysia, Malaysia

Abstract: Motion recognition has received significant attention in recent years in the area of computer vision since it has a wide range of potential application that can be developed. A wide variety of algorithms and techniques were proposed in the context of developing human motion recognition systems. This paper investigated optimal dual sensors setup in motion recognition for salat activity by using multisensor which has remained unexplored. Existing works in the related field are able to recognise few salat movements, but not from the multisensor perspective which is important for better recognition and analytic results. This research proposed a solution that is relevant to the current scenario where we deal with one of the fundamental activities required for every Muslim which is salat. Not only carrying out salat with the right actions will help strengthen our relationship with Allah Subhanahu Wa Taʿala (SWT), but also enable the formation of a positive personality, mental well-being, and physical health. Firstly, this research identified the best position setup of a dual sensor. Then, Hidden Markov Model was used to classify all movements in salat activity and the data were trained before the testing phase. This study led to a new way of learning for salat activity which can be further explored and developed. This research contributed a new way of learning by incorporating new interaction in human-computer interaction. The outcome of this research will be very useful in validating the salat movements of every Muslim.

Keywords: Motion recognition, Salat activity, Multisensor, Hidden Markov model, Human-Computer Interaction.

Received May 20, 2016; accepted May 6, 2018
Full text    
Sunday, 20 October 2019 00:56

Exploitation of ICMP Time Exceeded Packets for A Large-Scale Router Delay Analysis

Ali Gezer1 and Gary Warner2

1Electronic and Telecommunication Technology, Kayseri University, Turkey

2Computer Science, University of Alabama at Birmingham, US

Abstract: Internet Control Message Protocol Time-Exceeded (ICMP-TE) time exceeded packets are particular communication protocols to express inaccessibility of nodes in terms of hop count limitations. With the Internet of Things (IoT) concept taking more space in our daily life, accessibility or in some manners inaccessibility of hosts should be analysed more carefully. ICMP time exceeded packets might be hand of an attacker, sometimes an indicator of compromise for a possible IoT Botnet attack or a tool for delay measurement. In this study, with the exploitation of ICMP time exceeded packets, we analyse Round Trip Time (RTT) delays of randomly distributed IP routers around the globe. We conduct a comprehensive delay analysis study considering the delay results of more than 1 million time exceeded packets taken in return for subject ICMP requests. To prove ICMP time exceeded packets might also be a signature for a possible IoT Botnet attack, we carry out a secure experiment for Mirai IoT Botnet scanning and exhibit the indicators to differentiate these two possible usages.

Keywords: ICMP time exceeded packet, iot botnet, Mirai botnet, rtt delay, performance analysis, quality of service.

Received April 19, 2018; accepted June 17, 2019
Full text    
Sunday, 20 October 2019 00:55

PatHT: An Efficient Method of Classification over

Evolving Data Streams

Meng Han, Jian Ding, and Juan Li

 School of Computer Science and Engineering, North Minzu University, China

Abstract: Some existing classifications need frequent update to adapt to the change of concept in data streams. To solve this problem, an adaptive method Pattern-based Hoeffding Tree (PatHT) is proposed to process evolving data streams. A key technology of a training classification decision tree is to improve the efficiency of choosing an optimal splitting attribute. Therefore, frequent patterns are used. Algorithm PatHT discovers constraint-based closed frequent patterns incremental updated. It builds an adaptive and incremental updated tree based on the frequent pattern set. It uses sliding window to avoid concept drift in mining patterns and uses concept drift detector to deal with concept change problem in procedure of training examples. We tested the performance of PatHT against some known algorithms using real data streams and synthetic data streams with different widths of concept change. Our approach outperforms traditional classification models and it is proved by the experimental results.

Key words: Data mining; decision tree; data stream classification; closed pattern mining; concept drift.

Received November 13, 2015; accepted April 12, 2018
Full text    
Sunday, 20 October 2019 00:55

FBMT: Fuzzy Based Merkle Technique for

Detecting and Mitigating Malicious Nodes

in Sensor Networks

Ranjeeth Kumar Sundararajan1 and Umamakeswari Arumugam2

1Department of Computer Science and Engineering, Srinivasa Ramanujan Centre, SASTRA Deemed University, India

2Department of Computer Science and Engineering, School of Computing, SASTRA Deemed University, India

Abstract: Wireless sensor networks are prone to many vulnerabilities because of its unattended environment policy. Intrusion is one of the serious issue in wireless networks, since wireless networks are resource constrained and devising a security mechanism to counter intrusion is a challenging task. This paper focuses on building light-weight Intrusion Detection System to counter the routing attack, by identifying the malicious nodes at the earliest point. The proposed scheme namely Fuzzy Based Merkle Technique applies fuzzy logic to identify the malicious nodes and builds a light-weight Intrusion detection system and adapts Merkle tree approach for building the network. The proposed scheme is efficient in identifying the malicious nodes with minimum energy consumption and less communication overhead than the existing Merkle technique. Network Simulator 2 is used to simulate the Intrusion Detection System (IDS) and the results are verified.

Keywords: Wireless Sensor Networks, Intrusion Detection Systems, Fuzzy logic, Merkle tree.

Received August 12, 2016; accepted October 2, 2018
Full text    
Sunday, 20 October 2019 00:54

An Ontology Alignment Hybrid Method Based on

Decision Rules

Mohamed Biniz1 and Mohamed Fakir2

1Department of Mathematics and Computer Science, Sultan Moulay Slimane University, Morocco

2Department of Informatics, Sultan Moulay Slimane University, Morocco

Abstract: In this paper, we propose a hybrid approach based on the extraction of decision rules to refine the alignment results due to the use of three alignment strategies. This approach contains two phases: training phase which uses structural similarity, element similarity, instance-based similarity and C4.5 algorithms to extract decision rules, and evaluation phase which refines discovered alignment by three alignment strategies using extracted decision rules. This approach is compared with the best systems according to benchmark OAEI 2016: Framework for Ontology Alignment and Mapping (FOAM), A Dynamic Multistrategy Ontology Alignment Framework (RIMOM), AgreementMakerLight and Yet Another Matcher-Biomedical Ontologies (YAM-BIO), the proposed method gives good results (good recall, precision and F-measure). Experimental results show that the proposed approach is effective.

Keywords: Decision rules, alignment, ontology, similarity, similarity flooding.

Received August 24, 2016; accepted May 6, 2018
Full text    
Sunday, 20 October 2019 00:53

Efficient Multiple Pattern Matching Algorithm

Based on BMH: MP-BMH

Akhtar Rasool1, Gulfishan Firdose Ahmed2, Raju Barskar3, and Nilay Khare1

1Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, India

2Department of Computer Science, College of Agriculture, India

3Department of Computer Science and Engineering, University Institute of Technology RGPV, India

Abstract: String matching is playing a key role in a wide range of applications in the information computing. Due to its importance, large numbers of different algorithms have been developed over the last 50 years. There are various standards of single and multiple pattern string matching algorithms like Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Aho-Corasick etc. Multiple pattern string matching is very useful in real world applications. Standard benchmark multiple pattern algorithm Aho-Corasick and many other multiple pattern string matching algorithms are not memory and time efficient for wider length and large number of patterns set on large text data set because they are using the concept like DFA and require full scan of text data. Many string matching tools which are currently popular for string matching are based on these algorithms. Using the bad character principle of BMH, a method for multiple pattern string matching is being developed. Using this method a time and memory efficient exact multiple pattern string matching algorithm Multiple Pattern BMH (MP-BMH) is proposed. Unlike Aho-Corasick, the proffered MP-BMH algorithm provides skipping of unnecessary matching of characters in text while searching like BMH Algorithm. It also provides the skipping of characters in patterns using the similarity between patterns. Along with the shifts while searching, this algorithm also provides shrewd switches among the patterns for efficacious matching.In this paper, the aforesaid method, MP-BMH algorithm with its time, memory and experimental analysis are described.

Keywords: String matching; multiple pattern string matching, Boyer-Moore BM,BMH, MP-BMH.

Received January 8, 2017; accepted January 16, 2018

Full text    

Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…