Middle Eastern and North African English Speech
Corpus (MENAESC): Automatic Identification
of MENA English Accents
Abstract: This study aims to explore the English accents in
the Arab world. Although there are limited resources for a speech corpus that
attempts to automatically identify the degree of accent patterns of an Arabic
speaker of English, there is no speech corpus specialized for Arabic speakers
of English in the Middle East and North Africa (MENA). To that end, different
samples were collected in order to create the linguistic resource that we
called Middle Eastern and North African English Speech Corpus (MENAESC). In addition
to the “accent approach” applied in the field of automatic language/dialect
recognition; we applied also the “macro-accent approach” -by employing Mel-Frequency
Cepstral Coefficients (MFCC), Energy and Shifted Delta Cepstra (SDC) features and Gaussian
Mixture Model-Universal Background Model (GMM-UBM) classifier- on four accents
(Egyptian, Qatari, Syrian, and Tunisian accents) among the eleven accents that
were selected based on their high population density in the location where the
experiments were carried out. By using the Equal Error Rate percentage (EER%)
for the assessment of our system effectiveness in the identification of MENA
English accents using the two approaches mentioned above through the employ of
the MENAESC, results showed we reached 1.5 to 2%, for “accent approach” and 2
to 3.5% for “macro-accents approach” for identification of MENA English. It
also exhibited that the Qatari accent, of the 4 accents included, scored the
lowest EER% for all tests performed. Taken together, the system effectiveness
is not only affected by the approaches used, but also by the database size
MENAESC and its characteristics. Moreover, it is impacted by the proficiency of
the Arabic speakers of English and the influence of their mother tongue.
Keywords:
MENAESC, MFCC+Energy and SDC features,
accent, macro-accent, automatic identification.
Received September
9, 2019; accepted April 8, 2020