ANN Based Execution Time Prediction Model and
Assessment of Input Parameters through ISM
Anju Shukla, Shishir Kumar, and Harikesh Singh
Department
of Computer Science and Engineering, Jaypee University of Engineering and
Technology, India
Abstract: Cloud computing is on-demand network access model
which provides dynamic resource provisioning, selection and scheduling. The
performance of these techniques extensively depends on the prediction of
various factors e.g., task execution time, resource trust value etc., As the
accuracy of prediction model absolutely depends on the input data that are fed
into the network, Selection of suitable inputs also plays vital role in predicting
the appropriate value. Based on predicted value, Scheduler can choose the
suitable resource and perform scheduling for efficient resource utilization and
reduced makespan estimates. However, precise prediction of execution time is
difficult in cloud environment due to heterogeneous nature of resources and
varying input data. As each task has different characteristic and execution
criteria, the environment must be intelligent enough to select the suitable
resource. To solve these issues, an Artificial Neural Network (ANN) based
prediction model is proposed to predict the execution time of tasks. First,
input parameters are identified and selected through Interpretive Structural
Modeling (ISM) approach. Second, a prediction model is proposed for predicting
the task execution time for varying number of inputs. Third, the proposed model
is validated and provides 21.72% reduction in mean relative error compared to
other state-of-the-art methods.
Keywords: Cloud computing, neural network, Prediction model,
Resource selection.
Received September 20, 2018; accepted
January 28, 2020
https://doi.org/10.34028/iajit/17/5/1
Otsu’s Thresholding Method Based on Plane
Intercept Histogram and Geometric Analysis
Leyi Xiao1,
Honglin Ouyang1, and Chaodong Fan2
1College
of Electrical and Information Engineering, Hunan University, China
2Foshan
Green Intelligent Manufacturing Research Institute, Xiangtan University, China
Abstract: The Three-Dimensional (3-D) Otsu’s
method is an effective improvement on the traditional Otsu’s method. However, it
not only has high computational complexity, but also needs to improve its
anti-noise ability. This paper presents a new Otsu’s method based on 3-D
histogram. This method transforms 3-D histogram into a 1-D histogram by a plane
that is perpendicular to the main diagonal of the 3-D histogram, and designs a
new maximum variance criterion for threshold selection. In order to enhance its
anti-noise ability, a method based on geometric analysis, which can correct noise,
is used for image segmentation. Simulation experiments show that this method
has stronger anti-noise ability and less time consumption, comparing with the
conventional 3-D Otsu’s method, the recursive 3-D Otsu’s method, the 3-D Otsu’s
method with SFLA, the equivalent 3-D Otsu’s method and the improved 3-D Otsu’s
method.
Keywords: 3-D Otsu’s method, threshold selection, Otsu’s method, 3-D histogram, image segmentation.
Received October 27, 2018; accepted January 28,
2020
A Deep Learning Based Prediction of Arabic
Manuscripts Handwriting Style
Manal Khayyat1 and Lamiaa Elrefaei2
1Computer Science Department, King Abdulaziz
University, Saudi Arabia
2Electrical
Engineering Department, Benha University, Egypt
Abstract: With the increasing amounts of existing
unorganized images on the internet today and the necessity to use them
efficiently in various types of applications. There is a critical need to discover
rigid models that can classify and predict images successfully and
instantaneously. Therefore, this study aims to collect Arabic manuscripts
images in a dataset and predict their handwriting styles using the most
powerful and trending technologies. There are many types of Arabic handwriting
styles, including Al-Reqaa, Al-Nask, Al-Thulth, Al-Kufi, Al-Hur, Al-Diwani,
Al-Farsi, Al-Ejaza, Al-Maghrabi, Al-Taqraa, etc. However, the study classified
the collected dataset images according to the handwriting styles and focused on
only six types of handwriting styles that existed in the collected Arabic
manuscripts. To reach our goal, we applied the MobileNet pre-trained deep learning model on our
classified dataset images to automatically capture and extract the features
from them. Afterward, we evaluated the performance of the developed model by
computing its recorded evaluation metrics. We reached that MobileNet convolutional
neural network is a promising technology since it reached 0.9583 as the highest
recorded accuracy and 0.9633 as the average F-score.
Keywords: Deep Learning Model, Convolutional
Neural Network, Handwriting Style Prediction, Arabic Manuscript Images.
Received
October 6, 2019; accepted April 6, 2020
https://doi.org/10.34028/iajit/17/5/3
Saliency Cuts:
Salient Region Extraction based on Local Adaptive Thresholding for Image Information
Recognition of the Visually Impaired
Mukhriddin Mukhiddinov1, Rag-Gyo Jeong2, and Jinsoo Cho3
1Department of Hardware and Software of Control Systems in
Telecommunications, Tashkent University of Information Technologies named after
Muhammad al-Khwarizmi, Uzbekistan
2Korea Railroad Research Institute, Uiwang, Gyeonggi-do 16105, Republic of
Korea
3Department
of Computer Engineering, Gachon University, Republic of Korea
Abstract: In recent
years, there has been an increased scope for assistive software and
technologies, which help the visually impaired to perceive and recognize
natural scene images. In this article, we propose a novel saliency cuts
approach using local adaptive thresholding to obtain four regions from a given
saliency map. The saliency cuts approach is an effective tool for salient
object detection. First, we produce four regions for image segmentation using a
saliency map as an input image and applying an automatic threshold operation.
Second, the four regions are used to initialize an iterative version of the
Grab Cut algorithm and to produce a robust and high-quality binary mask with a
full resolution. Lastly, based on the binary mask and extracted salient object,
outer boundaries and internal edges are detected by Canny edge detection
method. Extensive experiments demonstrate that the proposed method correctly
detects and extracts the main contents of the image sequences for delivering
visually salient information to the visually impaired people compared to the
results of existing salient object segmentation algorithms.
Keywords: Saliency region
extraction, saliency map, saliency cuts,
local adaptive thresholding, the visually impaired.
Received February 7, 2018; accepted
January 6, 2020
https://doi.org/10.34028/iajit/17/5/4
A Novel Feature Selection Method Based on
Maximum Likelihood Logistic Regression
for Imbalanced Learning in Software Defect
Prediction
Kamal Bashir1, Tianrui Li1, and Mahama Yahaya2
1School of
Information Science and Technology, Southwest Jiaotong University, China
2School of
Transport and Logistics Engineering, Southwest Jiaotong University, China
Abstract: The most frequently used machine learning feature ranking
approaches failed to present optimal feature subset for accurate prediction of
defective software modules in out-of-sample data. Machine learning Feature Selection
(FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio
(GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at
prediction, even after balancing class distribution in the training data. In
this study, we propose a novel FS method based on the Maximum Likelihood Logistic
Regression (MLLR). We apply this method on six software defect datasets in
their sampled and unsampled forms to select useful features for classification
in the context of Software Defect Prediction (SDP). The Support Vector Machine
(SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are
based on sampled and unsampled datasets. The performance of the models captured
using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are
compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test
results validate the superiority of the proposed method over all the FS
techniques, both in sampled and unsampled data. The results confirm that the
MLLR can be useful in selecting optimal feature subset for more accurate
prediction of defective modules in software development process.
Keywords: Software defect prediction· Machine
learning· Class imbalance· Maximum-likelihood logistic regression.
Received April 30, 2018; accepted January
28, 2020
https://doi.org/10.34028/iajit/17/5/5
Text Similarity Computation Model for Identifying Rumor
Based on Bayesian Network in Microblog
Chengcheng Li, Fengming Liu, and Pu Li
Business School, Shandong Normal University, China
Abstract: The research of text similarity, especially for rumor texts, which constructed the calculation model by known rumors and calculated its similarity. From which, people can recognize the rumor in advance, and improve their vigilance to effectively block and control rumors dissemination. Based on the Bayesian network, the similarity calculation model of microblog rumor texts was built. At the same time, taking into account not only the rumor texts have similar characters, but also the rumor producers have similar characters, and therefore the similarity calculation model of rumor texts makers was constructed. Then, the similarity between the text and the user was integrated, and the microblog similarity calculation model was established. Finally, also experimentally studied the performance of the proposed model on the microblog rumor text and the user data set. The experimental results indicated that the similarity algorithm proposed in this paper could be used to identify the rumors of texts and predict the characters of users more accurately and effectively.
Keywords: Microblog Rumor, Similarity, Bayesian Network.
Received May 8, 2018; accepted March 31, 2019
Enhanced Latent Semantic Indexing Using Cosine
Similarity Measures for Medical Application
Fawaz Al-Anzi1 and Dia AbuZeina2
1Department
of Computer Engineering, Kuwait University, Kuwait
2Computer Science Department, Palestine Polytechnic
University, Palestine
Abstract: The Vector Space Model (VSM) is widely used in data mining and Information
Retrieval (IR) systems as a common document representation model. However,
there are some challenges to this technique such as high dimensional space and
semantic looseness of the representation. Consequently, the Latent Semantic
Indexing (LSI) was suggested to reduce the feature dimensions and to generate
semantic rich features that can represent conceptual term-document
associations. In fact, LSI has been effectively employed in search engines and
many other Natural Language Processing (NLP) applications. Researchers thereby promote
endless effort seeking for better performance. In this paper, we propose an
innovative method that can be used in search engines to find better matched
contents of the retrieving documents. The proposed method introduces a new
extension for the LSI technique based on the cosine similarity measures. The
performance evaluation was carried out using an Arabic language data collection
that contains 800 medical related documents, with more than 47,222 unique
words. The proposed method was assessed using a small testing set that contains
five medical keywords. The results show that the performance of the proposed method
is superior when compared to the standard LSI.
Keywords: Arabic Text, Latent Semantic Indexing,
Search Engine, Dimensionality Reduction, Text Classification.
Received December 25, 2018;
accepted January 28, 2020
https://doi.org/10.34028/iajit/17/5/7
Hassan Najadat1, Ibrahim
Al-Daher2, and Khaled Alkhatib1
1Computer
Information Systems Department, Jordan University of Science and Technology,
Jordan
2Computer Science Department, Jordan University of Science and
Technology, Jordan
Abstract: This study introduces an approach
of combining Data Envelopment Analysis (DEA) and ensemble Methods in order to
classify and predict the efficiency of Decision Making Units (DMU). The
approach includes applying DEA in the first stage to compute the efficiency
score for each DMU, then a variables’ ranker was utilized to extract the most
important variables that affect the DMU’s performance, then J48 was adopted to
build a classifier whose outcomes will be enhanced by Diverse Ensemble Creation
by Oppositional Relabeling of Artificial Training Examples (DECORATE) Ensemble
method. To examine the approach, this study utilizes a dataset from firms’
financial statements that are listed on Amman Stock Exchange. The dataset was
preprocessed and turned out to include 53 industrial firms for the years 2012
to 2015.The dataset includes 11 input variables and 11 output ratios. The
examination of financial variables and ratios play a vital role in the
financial analysis practice. This paper shows that financial variable and ratio
averages are points of reference to evaluate and measure firms’ future financial
performance as well as that of other similar firms in the same sector. In
addition, the results of this work are for comparative analyses of the
financial performance of the industrial sector.
Keywords: Data Envelopment Analysis, Decision
Trees, Ensemble Methods, Financial Variables, Financial Ratios.
Received
October 12, 2019; accepted April 8, 2020
Ontology-Based
Transformation and Verification of UML Class Model
Abdul Hafeez1, Syed Abbas2,
and Aqeel-ur-Rehman3
1Department
of Computer Science, SMI University, Karachi
2Faculty
Engineering Science and Technology, Indus University, Karachi
3Faculty of Engineering Science and Technology, Hamdard University,
Karachi
Abstract: Software models describe structures, relationships and features
of the software system. Especially, in Model Driven Engineering (MDE), they are
considered as first-class elements instead of programming code and all software
development activities move around these models. In MDE, programming code is
automatically generated by the models and models’ defects can implicitly
transfer to the code. These defects can harder to discover and rectify. Model
verification is a promising solution to the problem. The Unified Modelling
Language (UML) class model is an important part of UML and is used in both
analysis and design. However, UML only provides graphical elements without any
formal foundation. Therefore, verification of formal properties such as
consistency, satisfiability and consequences are not possible in UML. This
paper mainly focuses on ontology-based transformation and verification of the
UML class model elements which have not been addressed in any existing
verification methods e.g. xor association constraint, and dependencies
relationships. We validate the scalability and effectiveness of the proposed
solution using various UML class models. The empirical study shows that the proposed
approach scales in the presence of the large and complex model.
Keywords: UML Class Model Verification, Dependency Relationship, XOR
Association Constraints.
Received
September 11, 2017; accepted January 28, 2019
Improved Streaming Quotient Filter: A Duplicate Detection Approach for Data
Streams
Shiwei Che, Wu Yang, and Wei Wang
Information Securityresearch Center, Harbin
Engineering University, China
Abstract: The unprecedented development and
popularization of the Internet, combined with the emergence of a variety of
modern applications, such as search engines, online transactions, climate
warning systems and so on, enables the worldwide storage of data to grow
unprecedented. Efficient storage, management and processing of such huge
amounts of data has become an important academic research topic. The detection
and removal of duplicate and redundant data from such multi-trillion data, while
ensuring resource and computational efficiency, has constituted a challenging
area of research.Because of the fact that all
the data of potentially unbounded data streams can not be stored, and the need
to delete duplicated data as accurately as possible, intelligent approximate
duplicate data detection algorithms are urgently required.
Many well-known methods based on the bitmap structure, Bloom Filter and its
variants are listed in the literature. In this paper, we propose a new data
structure, Improved Streaming Quotient Filter (ISQF), to efficiently
detect and remove duplicate data in a data stream. ISQF intelligently stores
the signatures of elements in a data stream, while using an eviction strategy
to provide near zero error rates. We show that ISQF achieves near optimal
performance with fairly low memory requirements, making it an ideal and
efficient method for repeated data detection. It has a very low error rate.
Empirically, we compared ISQF with some existing methods (especially Steaming
Quotient Filter (SQF)). The results show that our proposed method outperforms
theexisting methods in terms of memory usage and accuracy.We also discuss the parallel implementation
of ISQF.
Keywords: Bloom
filters, Computer Network, Data stream, Duplicate detection, False
positive rates.
Received November
30, 2017; accepted July
21, 2019
A
Concept-based Sentiment Analysis Approach for Arabic
Ahmed Nasser1 and Hayri
Sever2
1Control
and Systems Engineering Department, University of Technology, Iraq
2Department of Computer Engineering, Çankaya University, Etimesgut
Abstract: Concept-Based Sentiment Analysis
(CBSA) methods are considered to be more advanced and more accurate when it
compared to ordinary Sentiment Analysis methods, because it has the ability of
detecting the emotions that conveyed by multi-word expressions concepts in
language. This paper presented a CBSA system for Arabic language which utilizes
both of machine learning approaches and concept-based sentiment lexicon. For
extracting concepts from Arabic, a rule-based concept extraction algorithm
called semantic parser is proposed. Different types of feature extraction and
representation techniques are experimented among the building prosses of the
sentiment analysis model for the presented Arabic CBSA system. A comprehensive
and comparative experiments using different types of classification methods and
classifier fusion models, together with different combinations of our proposed
feature sets, are used to evaluate and test the presented CBSA system. The
experiment results showed that the best performance for the sentiment analysis
model is achieved by combined Support Vector Machine-Logistic Regression
(SVM-LR) model where it obtained a F-score value of 93.23% using the
Concept-Based-Features+Lexicon-Based-Features+Word2vec-Features (CBF+LEX+W2V)
features combinations.
Keywords: Arabic Sentiment Analysis, Concept-based Sentiment Analysis,
Machine Learning and Ensemble Learning.
Received
December13, 2017; accepted July 29, 2019
An Enhanced Corpus for Arabic Newspapers
Comments
2LIRE Laboratory, University of Constantine 2,
Algeria
Abstract: In this paper, we propose our enhanced approach to create a dedicated corpus
for Algerian Arabic newspapers comments. The developed approach has to enhance
an existing approach by the enrichment of the available corpus and the
inclusion of the annotation step by following the Model Annotate Train Test
Evaluate Revise (MATTER) approach. A corpus is created by collecting comments
from web sites of three well know Algerian newspapers. Three classifiers,
support vector machines, naïve Bayes, and k-nearest neighbors, were used for
classification of comments into positive and negative classes. To identify the
influence of the stemming in the obtained results, the classification was tested
with and without stemming. Obtained results show that stemming does not enhance
considerably the classification due to the nature of Algerian comments tied to
Algerian Arabic Dialect. The promising results constitute a motivation for us
to improve our approach especially in dealing with non Arabic sentences, especially
Dialectal and French ones.
Keywords: Opinion mining, sentiment analysis, K-Nearest Neighbours,
Naïve Bayes, Support Vector Machines, Arabic, comment.
Received December 22, 2017; accepted June
18, 2019
https://doi.org/10.34028/iajit/17/5/12
The Performance of Penalty Methods on Tree-Seed Algorithm for Numerical Constrained Optimization Pro
The Performance of Penalty Methods on Tree-Seed Algorithm for Numerical
Constrained Optimization Problems
Ahmet Cinar1 and Mustafa Kiran2
1Department of Computer Engineering, Selçuk University,
Turkey
2Department of Computer
Engineering, Konya Technical University, Turkey
Abstract: The constraints are the most
important part of many optimization problems. The metaheuristic algorithms are
designed for solving continuous unconstrained optimization problems initially.
The constraint handling methods are integrated into these algorithms for
solving constrained optimization problems. Penalty approaches are not only the
simplest way but also as effective as other constraint handling techniques. In
literature, there are many penalty approaches and these are grouped as static,
dynamic and adaptive. In this study, we collect them and discuss the key
benefits and drawbacks of these techniques. Tree-Seed Algorithm (TSA) is a
recently developed metaheuristic algorithm, and in this study, nine different
penalty approaches are integrated with the TSA. The performance of these
approaches is analyzed on well-known thirteen constrained benchmark functions.
The obtained results are compared with state-of-art algorithms like Differential
Evolution (DE), Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC),
and Genetic Algorithm (GA). The experimental results and comparisons show that TSA
outperformed all of them on these benchmark functions.
Keywords: Constrained optimization, penalty
functions, penalty approaches, tree-seed algorithm.
Received January 3, 2019; accepted February 26, 2020
https://doi.org/10.34028/iajit/17/5/13
Advanced Analysis of the Integrity of Access Control
Policies: the Specific Case of Databases
Faouzi Jaidi, Faten Ayachi, and Adel
Bouhoula
Digital
Security Research Lab, Higher School of Communication of Tunis, University of
Carthage, Tunisia
Abstract: Databases
are considered as one of the most compromised assets according to 2014-2016
Verizon Data Breach Reports. The reason is that databases are at the heart of Information
Systems (IS) and store confidential business or private records. Ensuring the
integrity of sensitive records is highly required and even vital in critical systems
(e-health, clouds, e-government, big data, e-commerce, etc.,). The access
control is a key mechanism for ensuring the integrity and preserving the
privacy in large scale and critical infrastructures. Nonetheless, excessive, unused
and abused access privileges are identified as most critical threats in the top
ten database security threats according to 2013-2015 Imperva Application
Defense Center reports. To address this issue, we focus in this paper on the
analysis of the integrity of access control policies within relational
databases. We propose a rigorous and complete solution to help security architects
verifying the correspondence between the security planning and its concrete
implementation. We define a formal framework for detecting non-compliance
anomalies in concrete Role Based Access Control (RBAC) policies. We rely on an
example to illustrate the relevance of our contribution.
Keywords: Access Control, Databases Security, Formal Validation,
Integrity Analysis, Conformity Verification.
Received November 11, 2016;
accepted July 7, 2019
A Sparse Topic Model for Bursty Topic Discovery
in Social Networks
Lei Shi, Junping
Du, and Feifei Kou
Beijing Key Laboratory
of Intelligent Telecommunication Software and Multimedia, Beijing University of
Posts and Telecommunications, Beijing
Abstract: Bursty topic
discovery aims to automatically identify bursty events and
continuously keep track of known events. The existing methods focus on the topic model. However, the sparsity of short text brings
the challenge to the traditional topic
models because the words are too few to learn from the original corpus. To tackle this problem, we propose a Sparse
Topic Model (STM) for bursty topic discovery. First, we distinguish the modeling
between the bursty topic and the common topic to detect the
change of the words in time and discover
the bursty words. Second, we introduce “Spike and Slab” prior to decouple the sparsity and smoothness of a distribution. The
bursty words are leveraged to achieve automatic discovery of the bursty topics.
Finally, to evaluate the effectiveness of our proposed algorithm, we collect Sina
weibo dataset to conduct various
experiments. Both qualitative and quantitative evaluations demonstrate that the
proposed STM algorithm outperforms favorably
against several state-of-the-art methods.
Keywords: Bursty topic discovery, topic
model, “Spike and Slab” prior.
Received August 15, 2017; accepted January
28, 2019
https://doi.org/10.34028/iajit/17/5/15
A Fast High Precision Skew Angle Estimation of
Digitized Documents
Merouane Chettat, Djamel Gaceb, and Soumia Belhadi
Laboratory of Computer
Science, Modeling, Optimization and Electronic Systems, Faculty of science,
University M’hamed Bougara Boumerdes, Algeria
Abstract: In
this paper, we treated the problem of automatic skew angle estimation of
scanned documents. The skew of document occurs very often, due to incorrect
positioning of the documents or a manipulation error during scanning. This has
negative consequences on the steps of automatic analysis and recognition of
text. It is therefore essential to verify, before proceeding to these steps,
the presence of skew on the document to be processed and to correct it. The
difficulty of this verification is associated to the presence of graphic zones,
sometimes dominant, that have a considerable impact on the accuracy of the text
skew angle estimation. We also noted the importance of preprocessing to improve
the accuracy and the calculation cost of skew estimation approaches. These two
elements have been taken into consideration in our design and development of a
new approach of skew angle estimation and correction. Our approach is based on
local binarization followed by horizontal smoothing by the Run Length Smoothing
Algorithm (RLSA) method, detection of horizontal contours and the Hierarchical Hough
Transform (HHT). The algorithms involved in our approach have been chosen to
guarantee a skew estimation: accurate, fast and robust, especially to graphic
dominance and real time application. The experimental tests show the
effectiveness of our approach on a representative database of the Document
Image Skew Estimation Contest (DISEC) contest International Conference on
Document Analysis and Recognition (ICDAR).
Keywords: Skew angle estimation, document images,
Hough transform, Binarization, edge detection, RLSA.
Received September 13, 2017; accepted December
24, 2018