Normalization-based Neighborhood Model for Cold Start
Problem in Recommendation System
Aafaq
Zahid1, Nurfadhlina Mohd Sharef1, and Aida Mustapha2
1Faculty of Computer Science and Information
Technology, Universiti Putra Malaysia, Malaysia
2Faculty of Computer Science and Information Technology, Universiti Tun
Hussein Onn Malaysia, Malaysia
Abstract: Existing approaches for Recommendation Systems (RS) are mainly based on users’ past knowledge and the more popular techniques such as the neighborhood models focus on finding similar users in making recommendations. The cold start problem is due to inaccurate recommendations given to new users because of lack of past data related to those users. To deal with such cases where prior information on the new user is not available, this paper proposes a normalization technique to model user involvement for cold start problem or user likings based on the details of items used in the neighborhood models. The proposed normalization technique was evaluated using two datasets namely MovieLens and GroupLens. The results showed that the proposed technique is able to improve the accuracy of the neighborhood model, which in turn increases the accuracy of an RS.
Keywords: Recommender system, cold start, collaborative
filtering, normalization.
Incorporating Reverse Search for Friend Recommendation
with Random Walk
Qing
Yang1, Haiyang Wang1,
Mengyang Bian1, Yuming Lin2, and
Jingwei Zhang2
1Guangxi Key Laboratory of Automatic
Measurement Technology and Instrument, Guilin University of Electronic Technology,
China
2Guangxi Key Laboratory of Trusted Software,
Guilin University of Electronic Technology, China
Abstract: Recommending friends is an important mechanism for
social networks to enhance their vitality and attractions to users. The huge
user base as well as the sparse user relationships give great challenges to
propose friends on social networks. Random walk is a classic strategy for
recommendations, which provides a feasible solution for the above challenges.
However, most of the existing recommendation methods based on random walk are
only weighing the forward search, which ignore the significance of reverse
social relationships. In this paper, we proposed a method to recommend friends by
integrating reverse search into random walk. First, we introduced the FP-Growth
algorithm to construct both web graphs of social networks and their
corresponding transition probability matrix. Second, we defined the reverse
search strategy to include the reverse social influences and to collaborate
with random walk for recommending friends. The proposed model both optimized
the transition probability matrix and improved the search mode to provide
better recommendation performance. Experimental results on real datasets showed
that the proposed method performs better than the naive random walk method
which considered the forward search mode only.
Keywords: Social networks, friend recommendation,
reverse search.
Received September 2, 2017; accepted April 25, 2018
https://doi.org/10.34028/iajit/17/3/2
A Deep Learning based
Arabic Script Recognition System: Benchmark on KHAT
Riaz Ahmad1, Saeeda Naz2, Muhammad
Afzal3, Sheikh Rashid4, Marcus Liwicki5, and
Andreas Dengel6
1Shaheed
Banazir Bhutto University, Sheringal, Pakistan
2Computer
Science Department, GGPGC No.1 Abbottabad, Pakistan
3Mindgarage, University
of Kaiserslautern, Germany
4Al Khwarizmi Institute of Computer
Science, UET Lahore, Pakistan
5Department of Computer
Science, Luleå University of Technology, Luleå
6German Research Center for
Artificial Intelligence (DFKI) in Kaiserslautern, Germany
Abstract: This paper presents a deep learning benchmark on a complex dataset known
as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of
complex patterns of handwritten Arabic text-lines. This paper contributes
mainly in three aspects i.e., (1) pre-processing, (2) deep learning based
approach, and (3) data-augmentation. The pre-processing step includes pruning
of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep
learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM)
networks and Connectionist Temporal Classification (CTC). The MDLSTM has the
advantage of scanning the Arabic text-lines in all directions (horizontal and
vertical) to cover dots, diacritics, strokes and fine inflammation. The
data-augmentation with a deep learning approach proves to achieve better and
promising improvement in results by gaining 80.02% Character Recognition (CR)
over 75.08% as baseline.
Keywords: Handwritten Arabic text recognition, deep learning, data augmentation.
A Fog Computing-based Framework for Privacy Preserving IoT Environments
Dhiah el Diehn Abou-Tair1,
Simon Büchsenstein2, and Ala’ Khalifeh1
1School
of Electrical Engineering and Information Technology, German Jordanian
University, Jordan
2Embedded Systems Engineering, University
of Freiburg, Germany
Abstract: Privacy is becoming an
indispensable component in the emerging Internet of Things (IoT) context.
However, the IoT based devices and tools are exposed to several security and
privacy threats, especially that these devices are mainly used to gather data
about users’ habits, vital signs, surround environment, etc., which makes them
a lucrative target to intruders. Up to date, conventional security and privacy
mechanisms are not well optimized for IoT devices due to their limited energy,
storage capacity, communication functionality and computing power, which
influenced researchers to propose new solutions and algorithms to handle these
limitations. Fog and cloud computing have been recently integrated in IoT
environment to solve their resources’ limitations, thus facilitating new life
scenarios-oriented applications. In this paper, a security and privacy
preserving framework is proposed, which utilizes Fog and cloud computing in
conjunction with IoT devices that aims at securing the users’ data and
protecting their privacy. The framework has been implemented and tested using
available technologies. Furthermore, a security analysis has been verified by
simulating several hypothetical attack scenarios, which showed the
effectiveness of the proposed framework and its capability of protecting the
users’ information.
Keywords: Internet of thing, cloud computing, fog computing, privacy, security.
Self-Organizing Map vs Initial Centroid Selection Optimization to Enhance K-Means with Genetic Algor
Self-Organizing Map vs Initial Centroid Selection Optimization to Enhance K-Means with Genetic Algorithm to Cluster Transcribed Broadcast News Documents
Ahmed
Maghawry1, Yasser Omar1, and Amr Badr2
1Department of Computer Science, Arab
Academy for Science and Technology, Egypt
2Department
of Computer Science, Cairo University, Egypt
Abstract: A compilation of artificial intelligence techniques
are employed in this research to enhance the process of clustering transcribed
text documents obtained from audio sources. Many clustering techniques suffer
from drawbacks that may cause the algorithm to tend to sub optimal solutions,
handling these drawbacks is essential to get better clustering results and
avoid sub optimal solutions. The main target of our research is to enhance
automatic topic clustering of transcribed speech documents, and examine the
difference between implementing the K-means algorithm using our Initial
Centroid Selection Optimization (ICSO) [16] with genetic algorithm optimization with Chi-square
similarity measure to cluster a data set then use a self-organizing map to
enhance the clustering process of the same data set, both techniques will be
compared in terms of accuracy. The evaluation showed that using K-means with
ICSO and genetic algorithm achieved the highest average accuracy.
Keywords: Clustering, k-means, self-organizing maps,
genetic algorithm, speech transcripts, centroid selection.
A Semantic Framework for Extracting Taxonomic Relations from
Text Corpus
Phuoc Thi Hong Doan,
Ngamnij Arch-int, and Somjit Arch-int
Department of Computer
Science, Khon Kaen University, Thailand
Abstract: Nowadays, ontologies
have been exploited in many current applications due to the abilities in
representing knowledge and inferring new knowledge. However, the manual
construction of ontologies is tedious and time-consuming. Therefore, the
automated ontology construction from text has been investigated. The extraction
of taxonomic relations between concepts is a crucial step in constructing
domain ontologies. To obtain taxonomic relations from a text corpus, especially
when the data is deficient, the approach of using the web as a source of
collective knowledge (a.k.a web-based approach) is usually applied. The
important challenge of this approach is how to collect relevant knowledge from
a large amount of web pages. To overcome this issue, we propose a framework
that combines Word Sense Disambiguation (WSD) and web approach to extract
taxonomic relations from a domain-text corpus. This framework consists of two
main stages: concept extraction and taxonomic-relation extraction. Concepts
acquired from the concept-extraction stage are disambiguated through WSD module
and passed to stage of extraction taxonomic relations afterward. To evaluate
the efficiency of the proposed framework, we conduct experiments on datasets
about two domains of tourism and sport. The obtained results show that the
proposed method is efficient in corpora which are insufficient or have no
training data. Besides, the proposed method outperforms the state of the art
method in corpora having high WSD results.
Keywords: Taxonomic relation, ontology construction, word sense
disambiguation, knowledge acquisition.
A Contrivance to Encapsulate Virtual Scaffold with Comments and Notes
Nagarajan Balasubramanaian1, Suguna
Jayapal2, and Satheeshkumar Janakiraman3
1Department
of Computer Applications, Arunai Engineering College, India
2Department
of Computer Science, Vellalar College for Women, India
3Department of Computer Science, Bharathiar University,
India
Abstract: CLOUD is an elision of Common Location-independent Online Utility
available on-Demand and is based on Service Oriented Architecture (SOA). Today
a chunk of researchers were working towards contrivance based on multi-tenant
aware Software as a Service (SaaS) application development and still a precise pragmatic
solution remains a challenge among the researchers. The first step towards
resolving solution is to enhance the virtual scaffold and propose it as a
System under Test (SuT). The entire work is proposed as a Model View Controller
(MVC) where the tenant login through the View and write their snippet code for
encapsulation. The proposed VirScaff schema acts as Controller and provides
authentication and authorization by role/session assignment for tenant and thus
helps to access data from the dashboard (Viz., Create, Read, Update and Delete
(CRUD)). The SuT supports and accommodates both SQL and Not only
Structured Query Language (NoSQL) dataset. Finally, this paper construed that
SuT behaves well for both SQL and NoSQL dataset in terms of time and space
complexities. To sum-up, the entire work addresses the challenges towards
multitenant aware SaaS application development and highly commendable while
using NoSQL dataset.
Keywords: Virtual scaffold, Multi-Tenant
common gateway, pattern, model view controller, role-based access control,
JavaScript object notation, not only structured query language, software as a service.
Received July 14, 2017; accepted July 28, 2019
Using Total Probability in Image Template
Matching
Haval Sadeq
College of Engineering, Salahaddin University-Erbil, Erbil-Iraq
Abstract: Image template matching is a main task in
photogrammetry and computer vision. The matching can be used to automatically
determine the 3D coordinates of a point. A firstborn image matching method in
fields of photogrammetry and computer vision is area-based matching, which is
based on correlation measuring that uses normalised cross-correlation. However,
this method fails at a discontinuous edge and at the area of low illumination
or at geometric distortion because of changes in imaging location. Thus, these
points are considered outliers. The proposed method measures correlations,
which is based on normalised cross-correlation, at each point by using various
sizes of window and then considering the probability of correlations for each
window. Thereafter, the determined probability values are integrated. On the
basis of a specific threshold value, the point of maximum total probability
correlation is recognised as a corresponding point. The algorithm is applied to
aerial images for Digital Surface Model (DSM) generation. Results show that the
corresponding points are identified successfully at different locations,
especially at a discontinuous point, and that a Digital Surface Model of high
resolution is generated.
Keywords: Digital surface model, template image
matching, normalised cross-correlation, probability.
A Novel Physical Machine Overload Detection Algorithm Combined with Quiescing for Dynamic Virtual Ma
A Novel Physical Machine Overload Detection Algorithm
Combined with Quiescing for Dynamic Virtual Machine Consolidation in Cloud Data
Centers
Loiy Alsbatin1, Gürcü Öz1, and Ali
Ulusoy2
1Department
of Computer Engineering, Eastern Mediterranean University, North Cyprus via
Mersin 10 Turkey
2Department
of Information Technology, Eastern Mediterranean University, North Cyprus via
Mersin 10 Turkey
Abstract: Further growth of
computing performance has been started to be limited due to increasing energy
consumption of cloud data centers. Therefore, it is important to pay attention
to the resource management. Dynamic virtual machines consolidation is a
successful approach to improve the utilization of resources and energy
efficiency in cloud environments. Consequently, optimizing the online
energy-performance trade off directly influences Quality of Service (QoS). In
this paper, a novel approach known as Percentage of Overload Time Fraction
Threshold (POTFT) is proposed that decides to migrate a Virtual Machine (VM) if
the current Overload Time Fraction (OTF) value of Physical Machine (PM) exceeds
the defined percentage of maximum allowed OTF value to avoid exceeding the
maximum allowed resulting OTF value after a decision of VM migration or during
VM migration. The proposed POTFT algorithm is also combined with VM quiescing
to maximize the time until migration, while meeting QoS goal. A number of
benchmark PM overload detection algorithms is implemented using different
parameters to compare with POTFT with and without VM quiescing. We evaluate the
algorithms through simulations with real world workload traces and results show
that the proposed approaches outperform the benchmark PM overload detection
algorithms. The results also show that proposed approaches lead to better time
until migration by keeping average resulting OTF values less than allowed
values. Moreover, POTFT algorithm with VM quiescing is able to minimize number
of migrations according to QoS requirements and meet OTF constraint with a few
quiescings.
Keywords: Distributed systems, cloud computing,
dynamic consolidation, overload detection and energy efficiency.
Issues of Dialectal Saudi Twitter Corpus
Meshrif Alruily
College of Computer and Information Sciences, Jouf University, Saud
Arabia
Abstract: Text mining research relies heavily on the
availability of a suitable corpus. This paper presents a dialectal Saudi corpus
that contains 207452 tweets generated by Saudi Twitter users. In addition, a
comparison between the Saudi tweets dataset, Egyptian Twitter corpus and Arabic
top news raw corpus (representing Modern Standard Arabic (MSA) in various
aspects, such as the differences between formal and colloquial texts was
carried out. Moreover, investigation into the issues and phenomena, such as
shortening, concatenation, colloquial language, compounding, foreign language,
spelling errors and neologisms on this type of dataset was performed.
Keywords: Microblogs, tweets, Saudi colloquial,
corpus and modern standard Arabic.
Received January 27, 2018; accepted August 13,
2018
https://doi.org/10.34028/iajit/17/3/10
An Enhanced MSER Pruning Algorithm for Detection and
Localization of Bangla Texts from Scene Images
Rashedul Islam, Rafiqul Islam, and Kamrul Talukder
Computer Science and Engineering Discipline, Khulna University, Bangladesh
Abstract: Text detection and localization have great importance for content based
image analysis and text based image indexing. The efficiency of text
recognition depends on the efficiency of text localization. So, the main goal
of the proposed method is to detect and localize text regions with high
accuracy. To achieve this goal, a new and efficient method has been introduced for
localization of Bangla text from scene images. In order to improve precision
and recall as well as f-measure, Maximally Stable Extremal Region (MSER) based method
along with double filtering techniques have been used. As MSER algorithm
generates many false positives, we have introduced double filtering method for
removing these false positives to increase the f-measure to a great extent. Our
proposed method works at three basic levels. Firstly, MSER regions are
generated from the input color image by converting it into gray scale image. Secondly,
some heuristic features are used to filter out most of the false positives or
non-text regions. Lastly, Stroke Width Transform (SWT) based filtering method is
used to filter out remaining non-text regions. Remaining components are then
grouped into candidate text regions marked by bounding box over each region. As
there is no benchmark database for Bangla text, the proposed method is
implemented on our own prepared database consisting of 200 scene images of
Bangla texts and has got prominent performance. To evaluate the performance of
our proposed approach, we have also tested the proposed method on International
Conference on Document Analysis and Recognition( ICDAR) 2013 benchmark database
and have got a better result than the related existing methods.
Keywords: MSER, scene image, ICDAR, aspect ratio, euler number, bangla
text.
Received July 27, 2017; accepted June19, 2018
https://doi.org/10.34028/iajit/17/3/11
A Smart Card Oriented Secure Electronic Voting Machine
Built on NTRU
Safdar
Shaheen1, Muhammad Yousaf1, and Mudassar Jalil2
1Riphah
Institute of Systems Engineering, Riphah International University, Pakistan
2Department
of Mathematics, COMSAT Institute of Information Technology, Pakistan
Abstract: Free and fair elections are indispensable to quantify the
sentiments of the populace for forming the government of representatives in
democratic countries. Due to its procedural variation from country to country and
complexity, to maneuverer, it is a challenging task. Since the Orthodox
paper-based electoral systems are slow and error-prone, therefore, a secure and
efficient electoral system always remained a key area of research. Although a
lot of literature is available on this topic. However, due to reported
anomalies and weaknesses in American and France election in 2016, it once again
has become a pivotal subject of research. In this article, we proposed a new
secure and efficient electronic voting scheme based on public key cryptosystem
dubbed as Number Theory Research Unit (NTRU). Furthermore, an efficient and
robust three factors authentication protocol based on a personalized memorable
password, a smartcard, and bioHash is proposed to validate the legitimacy of a
voter for casting a legal vote. NTRU based blind signatures are used to
preserve the anonymity and privacy of vote and voters, whereas the proficiency
of secure and efficient counting of votes is achieved through NTRU based
homomorphic tally. Non-coercibility and individual verifiability are attained
through Mark Pledge scheme. The proposed applied electronic voting scheme is,
secure, transparent and efficient for large scale elections.
Keywords: EVM, blind signature, homomorphic
tally, smart card, NTRU.
Direct Text Classifier for Thematic Arabic Discourse
Documents
Khalid Nahar1, Ra’ed Al-Khatib1,
Moy'awiah Al-Shannaq1, Mohammad Daradkeh2, and Rami
Malkawi3
1Department of Computer Sciences, Yarmouk
University, Jordan
2Department of Management Information System, Yarmouk
University, Jordan
3Department of Computer Information System, Yarmouk
University, Jordan
Abstract: Maintaining the topical coherence while writing a
discourse is a major challenge confronting novice and non-novice writers alike.
This challenge is even more intense with Arabic discourse because of the
complex morphology and the widespread of synonyms in Arabic language. In this
research, we present a direct classification of Arabic discourse document while
writing. This prescriptive proposed framework consists of the following stages:
data collection, pre-processing, construction of Language Model (LM), topics
identification, topics classification, and topic notification. To prove and
demonstrate our proposed framework, we designed a system and applied it on a
corpus of 2800 Arabic discourse documents synthesized into four predefined
topics related to: Culture, Economy, Sport, and Religion. System performance was
analysed, in terms of accuracy, recall, precision, and F-measure. The results
demonstrated that the proposed topic modeling-based decision framework is able
to classify topics while writing a discourse with accuracy of 91.0%.
Keywords: Text mining, Arabic discourse; text classification, topic modling, n-gram
language model, topical coherence.
A Novel Image Retrieval Technique using Automatic and
Interactive Segmentation
Asjad Amin and Muhammad
Qureshi
Department of Telecommunication Engineering, The Islamia
University of Bahawalpur, Pakistan
Abstract: In this paper, we present a new region-based image
retrieval technique based on robust image segmentation. Traditional
content-based image retrieval deals with the global description of a query
image. We combine the state-of-the-art segmentation algorithms with the
traditional approach to narrow the area of interest to a specific region within
a query image. In case of automatic segmentation, the algorithm divides a query
image automatically and computes Zernike moments for each region. For
interactive segmentation, our proposed scheme takes as input a query image and
some information regarding the region of interest. The proposed scheme then
works by computing the Geodesic-based segmentation of the query image. The
segmented image is our region of interest which is then used for computing the
Zernike moments. The Euclidean distance is then used to retrieve different
relevant images. The experimental results clearly show that the proposed scheme
works efficiently and produces excellent results.
Keywords: CBIR, information retrieval, image
segmentation, multimedia image retrieval.
A New Metric for Class Cohesion for Object Oriented
Software
Anjana Gosain1 and Ganga Sharma2
1University School of Information, Communication and Technology, Guru
Gobind Singh Indraprastha University, India
2School of Engineering, G D Goenka
University, India
Abstract: Various
class cohesion metrics exist in literature both at design level and source code
level to assess the quality of Object Oriented (OO) software. However, the idea
of cohesive interactions (or relationships) between instance variables (i.e.,
attributes) and methods of a class for measuring cohesion varies from one
metric to another. Some authors have used instance variable usage by methods of
the class to measure class cohesion while some focus on similarity of methods
based on sharing of instance variables. However, researchers believe that such
metrics still do not properly capture cohesiveness of classes. Therefore,
measures based on different perspective on the idea of cohesive interactions
should be developed. Consequently, in this paper, we propose a source code
level class cohesion metric based on instance variable usage by methods. We
first formalize three types of cohesive interactions and then categorize these
cohesive interactions by providing them ranking and weights in order to compute
our proposed measure. To determine the usefulness of the proposed measure,
theoretical validation using a property based axiomatic framework has been
done. For empirical validation, we have used Pearson correlation analysis and
logistic regression in an experimental study conducted on 28 Java classes to
determine the relationship between the proposed measure and maintenance-effort
of classes. The results indicate that the proposed cohesion measure is strongly
correlated with maintenance-effort and can serve as a good predictor of the
same.
Keywords: Class cohesion, metrics, OO software,
maintenance-effort, metric validation.
Received June 18,
2017; accepted March 11, 2018
https://doi.org/10.34028/iajit/17/3/15
Gene Expression Prediction Using Deep
Neural Networks
Raju
Bhukya and Achyuth Ashok
Department
of Computer Science and Engineering, National Institute of
Technology, India
Abstract: In the field of molecular biology, gene expression is a term that encompasses all the information contained in an organism’s genome. Although, researchers have developed several clinical techniques to quantitatively measure the expressions of genes of an organism, they are too costly to be extensively used. The NIH LINCS program revealed that human gene expressions are highly correlated. Further research at the University of California, Irvine (UCI) led to the development of D-GEX, a Multi Layer Perceptron (MLP) model that was trained to predict unknown target expressions from previously identified landmark expressions. But, bowing to hardware limitations, they had split the target genes into different sets and constructed separate models to profile the whole genome. This paper proposes an alternative solution using a combination of deep autoencoder and MLP to overcome this bottleneck and improve the prediction performance. The microarray based Gene Expression Omnibus (GEO) dataset was employed to train the neural networks. Experimental result shows that this new model, abbreviated as E-GEX, outperforms D-GEX by 16.64% in terms of overall prediction accuracy on GEO dataset. The models were further tested on an RNA-Seq based 1000G dataset and E-GEX was found to be 49.23% more accurate than D-GEX.
Keywords: Gene expression, regression, deep learning, autoencoder, multilayer perceptron.
Received April 25, 2018;
accepted October 28, 2018
https://doi.org/10.34028/iajit/17/3/16