Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic
1University of Monastir, Research Laboratory in
Algebra, Numbers Theory and Intelligent Systems RLANTIS, Tunisia
2University
of Sousse, Higher Institute of Computer Science and Communication Techniques
ISITCom, Tunisia
Abstract: Paraphrase detection allows determining how original and suspect documents convey
the same meaning. It has attracted attention from researchers in many Natural
Language Processing (NLP) tasks such as plagiarism detection, question
answering, information retrieval, etc., Traditional methods (e.g., Term
Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA),
and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic
relations when sentences may not contain any common words or the co-occurrence
of words is rarely present. Therefore, we proposed a deep learning model based on
Global Word embedding (GloVe) and Recurrent Convolutional Neural Network
(RCNN). It was efficient for capturing more contextual dependencies between
words vectors with precise semantic meanings. Seeing the lack of resources in Arabic language publicly available, we developed
a paraphrased corpus automatically. It preserved syntactic and semantic
structures of Arabic sentences using word2vec model and Part-Of-Speech (POS) annotation.
Overall experiments shown that our proposed model outperformed the
state-of-the-art methods in terms of precision and recall.
Keywords: Arabic
language processing, word2vec, part-of-speech annotation, paraphrasing,
semantic analysis, recurrent convolutional neural networks.
Received January 24, 2019; accepted February 5,
2020
https://doi.org/10.34028/iajit/18/1/1
Speech Synthesis System for the Holy Quran Recitation
Nadjla Bettayeb and Mhania Guerti
Department of
Electronics, Signal and Communications Laboratory Ecole Nationale
Polytechnique, Algeria
Abstract: This paper aims
to develop a Text-To-Speech (TTS) synthesis system for the holly Quran
recitation, to properly helps reciters and facilitates its use. In this work,
the unit selection method is adopted and improved to reach a good speech
quality. The proposed approach consists mainly of two steps. In the first one,
an Expert System (ES) module is integrated by employing Arabic, Quran language,
phonetic and phonological features. This part was considered as a preselection
to optimize the synthesis algorithm's speed. The second step is the final
selection of units by minimizing a concatenation cost function and a forward-backward
dynamic programming search. The system is evaluated by native and non-native
Arabic speakers. The results show that the goal of a correct Quran recitation by
respecting its reading rules was reached, with 97 % of speech intelligibility
and 72.13% of naturalness.
Keywords: Speech synthesis, holy Quran, unit selection, expert
system, Arabic language processing, tajweed rules.
Received March 21,
2019; accepted April 10, 2020
A Distributed Framework of Autonomous Drones
for Planning and Execution of Relief Operations
during Flood Situations
1Department
of Computer Science, University of Engineering and Technology, Pakistan
2Department
of Software Engineering, University of Gujrat, Pakistan
Abstract: Every year, flood hits the world economy by billions
of dollars, costs thousands of human and animal lives, destroys a vast area of
land and crops, and displaces large populations from their homes. The flood affected
require a time-critical help, and a delay may cause the loss of precious human
lives. The ground rescue operations are difficult to carry out because of the
unavailability of transport infrastructure. However, drones, Unmanned Vehicles,
can easily navigate to the areas where road networks have been destroyed or
become ineffective. The fleet participating in the rescue operation should have
drones with different capabilities in order to make the efforts more
successful. A majority of existing systems in the literature offered a
centralized system for these drones. However, the performance of the existing
system starts decreasing as the required number of tasks increases. This
research is based on the hypothesis that a distributed intelligent method is
more effective than the centralized technique for relief operations performed
by multiple drones. The research aims to propose a distributed method that
allows a fleet of drones with diverse capabilities to communicate and
collaborate, so that the task completion rate of rescue operations could be
increased. The proposed solution consists of three main modules: 1)
communication and message transmission module that enables collaboration
between drones, 2) realignment module that allows drones to negotiate and
occupy the best position in the air to optimize the coverage area, 3) situation
monitoring module that identifies the ground situation and acts accordingly. To
validate the proposed solution, we have performed a simulation using AirSim
simulator and compared the results with the centralized system. The proposed distributed
method performed better than legacy systems. In the future, the work can be
extended using reinforcement learning and other intelligent algorithms.
Keywords: Autonomous drones, flood
relief operations, distributed systems, artificial intelligence, distributed
collaboration.
Received October 11, 2019; accepted July 14,
2020
Lean
Database: An Interdisciplinary Perspective Combining Lean Thinking and
Technology
Jamil Razmak1, Samir Al-Janabi2, Faten Kharbat3, and Charles Bélanger4
1College of Business, Al Ain University, UAE
2Department of
Computing and Software, McMaster University, Hamilton, Canada
3College of Engineering, Al Ain University, UAE
4Faculty of
Management, Laurentian University, Canada
Abstract: The
continuous improvement approach is key to achieve a sustainable competitive
advantage for organizations in their business processes. Nowadays,
organizational business processes are seen through an automated function under
the umbrella of organizational information systems. The huge amount of
automated business processes produces data embedded with a part of messy data
that could provide corrupt data. This study uses a lean thinking concept
integrated with the data cleaning approach to reduce the waste of data
according to business requirements and to enhance continuous improvement as
part of a data defect reduction strategy. A new approach of improving and
cleaning data waste is proposed by combining data cleaning algorithm and lean thinking
concepts. After testing the quality and scalability of the algorithm, along
with the evaluation of a corrupt dataset, the results showed improvement in the
corrupt dataset reduction, leading to higher organizational performance in
business processes. This integration can help researchers and technologists to
fully understand and benefit from interdisciplinary capabilities while building
bridges between different fields.
Keywords: Lean
database, interdisciplinary, lean thinking, data quality, data cleaning.
Received March 3, 2020; accepted July 14,
2020
Reliability-Aware: Task Scheduling in Cloud Computing Using Multi-Agent Reinforcement Learning Algor
Reliability-Aware: Task Scheduling in Cloud Computing Using Multi-Agent Reinforcement Learning
Algorithm and Neural Fitted Q
Husamelddin Balla, Chen
Sheng, and Jing Weipeng
College of Information and Computer Engineering,
Northeast Forestry University, China
Abstract: Cloud computing becomes the basic alternative platform for the most users
application in the recent years. The complexity increasing in cloud environment
due to the continuous development of resources and applications needs a
concentrated integrated fault tolerance approach to provide the quality of
service. Focusing on reliability enhancement in an environment with dynamic
changes such as cloud environment, we developed a multi-agent scheduler using
Reinforcement Learning (RL) algorithm and Neural Fitted Q (NFQ)
to effectively schedule the user requests. Our approach considers the queue
buffer size for each resource by implementing the queue theory to design a
queue model in a way that each scheduler agent has its own queue which receives
the user requests from the global queue. A central learning agent responsible
of learning the output of the scheduler agents and direct those scheduler
agents through the feedback claimed from the previous step. The dynamicity
problem in cloud environment is managed in our system by employing neural
network which supports the reinforcement learning algorithm through a specified
function. The numerical result demonstrated an efficiency of our proposed
approach and enhanced the reliability.
Keywords: Reinforcement learning,
multi-agent scheduler, neural fitted Q, reliability, cloud computing, queuing
theory.
Received
April 5, 2018; accepted January 28, 2020
Formulation of Two-Stage Problem of
Structural-Parametric Synthesis of Adaptive Electronic Document Management
System
Artem Obukhov1,
Mikhail Krasnyanskiy2, and Denis Dedov3
1Department of Automated Decision Support Systems, Tambov
State Technical University, Russian Federation
2Department of Administration,
Tambov State Technical University, Russian Federation
3Department of
Science, Tambov State Technical University, Russian Federation
Abstract: The paper scrutinizers the
options of optimization and adaption to the individual characteristics of users
of Electronic Document Management Systems (EDMS). The problem solution requires further development
of the necessary methods, models and criteria. Therefore, the article considers the problem of the structural-parametric synthesis
of adaptive EDMS. On the basis of previously conducted research in this area, a
new architecture of EDMS is developed, within which a mathematical model of
adaptive EDMS is proposed. It includes the main components of the information system,
as well as a set of estimates for a number of indicators: total discounted
costs, productivity, software quality and, above all, adaptability to the
requirements of the user. Using this mathematical model, a two-stage task of
structural-parametric synthesis of an adaptive EDMS was set, at the first stage
of which the system is synthesized according to the criterion of economic
efficiency, and at the second stage the process happens according to its
adaptation to each user. The scientific novelty of the proposed approach
consists in dividing the optimization task into two stages such as the formalization
of the criteria for adaptive EDMS and development of a new architecture and
mathematical model of adaptive EDMS. The results can be used to solve problems
of design, modernization and adaptation of various information systems.
Keywords: Adaptability, electronic document
management systems, optimization problem statement, structural-parametric
synthesis.
Received
May 6, 2019; accepted May 4, 2020
Secured Data Storage and Retrieval using Elliptic
Curve Cryptography in Cloud
Pradeep
Suthanthiramani1, Muthurajkumar Sannasy 2, Ganapathy
Sannasi 3, and Kannan Arputharaj1
1Department of Information Science and Technology, Anna University, India
2Department of Computer Technology, Anna University, India
3Research
Centre for Cyber-Physical Systems and School of Computer Science and
Engineering, Vellore Institute of Technology, India
Abstract: Security of data stored in the cloud databases is
a challenging and complex issue to be addressed due to the presence of
malicious attacks, data breaches and unsecured access points. In the past, many
researchers proposed security mechanisms including access control, intrusion
detection and prevention models, Encryption based storage methods and key
management schemes. However, the role based access control policies that were
developed to provide security for the data stored in cloud databases based on the
sensitivity of the information are compromised by the attackers through the
misuse of privileges gained by them from multiple roles. Therefore, it is
necessary to propose more efficient mechanisms for securing the sensitive
information through attribute based encryption by analyzing the association
between the various attributes. For handling the security issue related to the
large volume of cloud data effectively, the association rule mining algorithm
has been extended with temporal constraints in this work in order to find the
association among the attributes so that it is possible to form groups among
the attributes as public attributes with insensitive data, group attributes
with medium sensitive data and owner with highly sensitive attributes and data for
enhancing the strength of attribute based encryption scheme. Based on the associations
among the attributes and temporal constraints, it is possible to encrypt the
sensitive data with stronger keys and algorithms. Hence, a new key generation and
encryption algorithm is proposed in this paper by combining the Greatest common
divisor and the Least common multiple between the primary key value and the
first numeric non key attribute that is medium sensitive attributes and data present
in the cloud database for providing secured storage through effective attribute
based encryption. Moreover, a new intelligent algorithm called Elliptic Curve
Cryptography with Base100 Table algorithm is also proposed in this paper for
performing encryption and decryption operations over the most sensitive data
for the data owners. From the experiments conducted in this work, it is
observed that the proposed model enhances the data security by more than 5% when
it is compared with other existing secured storage models available for cloud.
Keywords: Cloud database, secured storage, association
rule mining, greatest common divisor, least common multiple, key generation and
encryption.
Received July 19, 2019; accepted June 17, 2020
Middle Eastern and North African English Speech Corpus (MENAESC): Automatic Identification of MENA E
Middle Eastern and North African English Speech
Corpus (MENAESC): Automatic Identification
of MENA English Accents
Abstract: This study aims to explore the English accents in
the Arab world. Although there are limited resources for a speech corpus that
attempts to automatically identify the degree of accent patterns of an Arabic
speaker of English, there is no speech corpus specialized for Arabic speakers
of English in the Middle East and North Africa (MENA). To that end, different
samples were collected in order to create the linguistic resource that we
called Middle Eastern and North African English Speech Corpus (MENAESC). In addition
to the “accent approach” applied in the field of automatic language/dialect
recognition; we applied also the “macro-accent approach” -by employing Mel-Frequency
Cepstral Coefficients (MFCC), Energy and Shifted Delta Cepstra (SDC) features and Gaussian
Mixture Model-Universal Background Model (GMM-UBM) classifier- on four accents
(Egyptian, Qatari, Syrian, and Tunisian accents) among the eleven accents that
were selected based on their high population density in the location where the
experiments were carried out. By using the Equal Error Rate percentage (EER%)
for the assessment of our system effectiveness in the identification of MENA
English accents using the two approaches mentioned above through the employ of
the MENAESC, results showed we reached 1.5 to 2%, for “accent approach” and 2
to 3.5% for “macro-accents approach” for identification of MENA English. It
also exhibited that the Qatari accent, of the 4 accents included, scored the
lowest EER% for all tests performed. Taken together, the system effectiveness
is not only affected by the approaches used, but also by the database size
MENAESC and its characteristics. Moreover, it is impacted by the proficiency of
the Arabic speakers of English and the influence of their mother tongue.
Keywords:
MENAESC, MFCC+Energy and SDC features,
accent, macro-accent, automatic identification.
Received September
9, 2019; accepted April 8, 2020
Parallel Scalable Approximate Matching Algorithm
for Network Intrusion Detection Systems
Adnan Hnaif1,
Khalid Jaber1, Mohammad Alia1, and Mohammed Daghbosheh2
1Faculty of
Science and Information Technology, Al Zaytoonah University of Jordan, Jordan
2Faculty of
Science and Information Technology, Irbid National University of Jordan, Jordan
Abstract: Matching algorithms are working to find the exact or
the approximate matching between text “T” and pattern “P”, due to the
development of a computer processor, which currently contains a set of
multi-cores, multitasks can be performed simultaneously. This technology makes
these algorithms work in parallel to improve their speed matching performance.
Several exact string matching and approximate matching algorithms have been
developed to work in parallel to find the correspondence between text “T” and
pattern “P”. This paper proposed two models: First, parallelized the Direct
Matching Algorithm (PDMA) in multi-cores architecture using OpenMP technology.
Second, the PDMA implemented in Network Intrusion Detection Systems (NIDS) to
enhance the speed of the NIDS detection engine. The PDMA can be achieved more
than 19.7% in parallel processing time compared with sequential matching
processing. In addition, the performance of the NIDS detection engine improved
for more than 8% compared to the current SNORT-NIDS detection engine.
Keywords: Exact matching algorithms, approximate matching algorithms, parallel
processing, network intrusion detection systems.
Received
February 13, 2020; accepted June 17, 2020
Instagram Post Popularity Trend Analysis and Prediction using Hashtag, Image Assessment, and User Hi
Instagram Post Popularity Trend Analysis and
Prediction using Hashtag, Image Assessment, and User History Features
Kristo Radion Purba, David Asirvatham, and Raja Kumar Murugesan
School of Computer Science and
Engineering, Taylor's University, Malaysia
Abstract: Instagram
is one of the most popular social networks for marketing. Predicting the popularity
of a post on Instagram is important to determine the influence of a user for
marketing purposes. There were studies on popularity prediction on Instagram
using various features and datasets. However, they haven't fully addressed the
challenge of data variability of the global dataset, where they either used
local datasets or discretized output. This research compared several regression
techniques to predict the Engagement Rate (ER) of posts using a global dataset.
The prediction model, coupled with the results of the popularity trend
analysis, will have more utility for a larger audience compared to existing
studies. The features were extracted from hashtags, image analysis, and user
history. It was found that image quality, posting time, and type of image
highly impact ER. The prediction accuracy reached up to 73.1% using the Support
Vector Regression (SVR), which is higher than previous studies on a global
dataset. User history features were useful in the prediction since the data
showed a high variability of ER if compared to a local dataset. The added
manual image assessment values were also among the top predictors.
Keywords: Social media, Instagram, popularity trend, machine
learning, prediction model.
Received February 17, 2020; accepted August 6, 2020
A New Image Encryption Scheme Using Dual Chaotic Map Synchronization
Obaida Al-Hazaimeh1,
Mohammad Al-Jamal2, Mohammed Bawaneh1, Nouh Alhindawi3,
and Bara’a
Hamdoni2
1Department
of Computer Science and Information Technology, Al- Balqa Applied University,
Jordan
2Department
of Mathematics, Yarmouk University, Jordan
3Faculty of Sciences and Information Technology, Jadara University,
Jordan
Abstract:
Chaotic systems behavior attracts
many researchers in the field of image encryption. The major advantage of using
chaos as the basis for developing a crypto-system is due to its sensitivity to
initial conditions and parameter tunning as well as the random-like behavior
which resembles the main ingredients of a good cipher namely the confusion and
diffusion properties. In this article, we present a new scheme based on the
synchronization of dual chaotic systems namely Lorenz and Chen chaotic systems
and prove that those chaotic maps can be completely synchronized with other
under suitable conditions and specific parameters that make a new addition to
the chaotic based encryption systems. This addition provides a master-slave
configuration that is utilized to construct the proposed dual synchronized chaos-based
cipher scheme. The common security analyses are performed to validate the
effectiveness of the proposed scheme. Based on all experiments and analyses, we
can conclude that this scheme is secure, efficient, robust, reliable, and can
be directly applied successfully for many practical security applications
in insecure network channels such as the Internet.
Keywords: Chaos, lorenz systems, chen systems, synchronization, cryptography.
Received April 7, 2020; accepted August
26, 2020
A Novel Approach to Maximize G-mean in
Nonstationary Data with Recurrent Imbalance Shifts
Radhika Kulkarni1,
S. Revathy1, and Suhas Patil2
1Department of Computer Science
Engineering, Sathyabama Institute of Science and Technology, India
2Department
of Computer Science Engineering, Bharati Vidyapeeth’s College of Engineering, India
Abstract: One of the noteworthy difficulties in
the classification of nonstationary data is handling data with class imbalance.
Imbalanced data possess the characteristics of having a lot of samples of one
class than the other. It, thusly, results in the biased accuracy of a
classifier in favour of a majority class. Streaming data may have inherent
imbalance resulting from the nature of dataspace or extrinsic imbalance due to
its nonstationary environment. In streaming data, timely varying class priors
may lead to a shift in imbalance ratio. The researchers have contemplated
ensemble learning, online learning, issue of class imbalance and cost-sensitive
algorithms autonomously. They have scarcely ever tended to every one of these
issues mutually to deal with imbalance shift in nonstationary data. This
correspondence shows a novel methodology joining these perspectives to augment
G-mean in no stationary data with Recurrent Imbalance Shifts (RIS). This
research modifies the state-of-the-art boosting algorithms,1) AdaC2 to get
G-mean based Online AdaC2 for Recurrent Imbalance Shifts (GOA-RIS) and AGOA-RIS
(Ageing and G-mean based Online AdaC2 for Recurrent Imbalance Shifts),
and 2) CSB2 to get G-mean based Online CSB2 for Recurrent Imbalance Shifts (GOC-RIS)
and Ageing and G-mean based Online CSB2 for Recurrent Imbalance Shifts (AGOC-RIS).
The study has empirically and statistically analysed the performances of the
proposed algorithms and Online AdaC2 (OA) and Online CSB2 (OC) algorithms using
benchmark datasets. The test outcomes demonstrate that the proposed algorithms
globally beat the performances of OA and OC.
Keywords: Cost-sensitive algorithms,
data stream classification, imbalanced data, online learning, population shift,
skewed data stream.
Received March 23, 2019; accepted April 13, 2020
An Efficient
Intrusion Detection System by Using Behaviour Profiling and Statistical
Approach Model
Rajagopal
Devarajan and Padmanabhan Rao
PG and Research Department of Computer Science and
Applications, Vivekanandha College of Arts and Sciences for Women (Autonomous),
India
Abstract: Unauthorized
access in a personal computer or single system of a network for tracking the
system access or theft the information is called attack/ hacking. An Intrusion
detection System defined as an effective security technology, it detect,
prevent and possibly react to computer related malicious activities. For
protecting computer systems and networks from abuse used mechanism named
Intrusion detection system. The aim of the study is to know the possibilities
of Intrusion detection and highly efficient and effective prevent technique.
Using this model identified the efficient algorithm for intrusion detection
Behaviour Profiling Algorithm and to perform dynamic analysis using Statistical
Approach model using log file which provides vital information about systems
and the activities on them. The proposed algorithm implemented model it
produced above 90%, 96% and 98% in the wired, wireless and cloud network
respectively. This study concluded that, the efficient algorithm to detect the
intrusion is behaviour profiling algorithm, while join with the statistical
approach model, it produces efficient result. In further research, possibility
to identify which programming technique used to store the activity log into the
database. Next identify which algorithm is opt to implement the intrusion
detection and prevention system by using big data even the network is wired,
wireless or cloud network.
Keywords: IDS, IPS, behaviour profiling algorithm, statistical
approach model, NIDS, HIDS.
Received September 12, 2019; accepted May 9,
2020
https://doi.org/10.34028/iajit/18/1/13
Pain Detection/Classification Framework including Face Recognition based on the Analysis of Facial E
Pain Detection/Classification
Framework including Face Recognition based on the Analysis of Facial
Expressions for E-Health Systems
Fatma
Elgendy1, Mahmoud Alshewimy2, and Amany Sarhan2
1Kafrelshiekh
Higher Institute for Engineering and Technology, Egypt
2Computer
and Control Engineering Department, Tanta University, Egypt
Abstract: Facial expressions can demonstrate
the presence and degree of pain of humans, which is a vital topic in E-healthcare
domain specially for elderly people or patients with special needs. This paper
presents a framework for pain detection, pain classification, and face
recognition using feature extraction, feature selection, and classification
techniques. Pain intensity is measured by Prkachin and Solomon pain intensity
scale. Experimental results showed that the proposed framework is a promising
one compared with previously works. It achieves 91% accuracy in pain detection,
99.89% accuracy in face recognition, and 78%, 92%, 88% accuracy, respectively, for three levels of pain classification.
Keywords: E-health, Gabor filter, Adaboost, relieff filter, SADE, KNN.
Received January 12, 2020; accepted March
19, 2020