Assessing the efficacy of benchmarks for automatic speech accent recognition

Journal Title: EAI Endorsed Transactions on Creative Technologies - Year 2015, Vol 2, Issue 4

Abstract

Speech accents can possess valuable information about the speaker, and can be used in intelligent multimedia-based human-computer interfaces. The performance of algorithms for automatic classification of accents is often evaluated using audio datasets that include recording samples of different people, representing different accents. Here we describe a method that can detect bias in accent datasets, and apply the method to two accent identification datasets to reveal the existence of dataset bias, meaning that the datasets can be classified with accuracy higher than random even if the tested algorithm has no ability to analyze speech accent. We used the datasets by separating one second of silence from the beginning of each audio sample, such that the one-second sample did not contain voice, and therefore no information about the accent. An audio classification method was then applied to the datasets of silent audio samples, and provided classification accuracy significantly higher than random. These results indicate that the performance of accent classification algorithms measured using some accent classification benchmarks can be biased, and can be driven by differences in the background noise rather than the auditory features of the accents.

Authors and Affiliations

Benjamin Bock, Lior Shamir

Keywords

Related Articles

Improvement of natural image search engines results by emotional filtering

With the Internet 2.0 era, managing user emotions is a problem that more and more actors are interested in. Historically, the first notions of emotion sharing were expressed and defined with emoticons. They allowed users...

A conceptual framework for audio-visual museum media

In today's history museums, the past is communicated through many other means than original artefacts. This interdisciplinary and theoretical article suggests a new approach to studying the use of audio-visual media, suc...

Implementation of Human Cognitive Bias on Naïve Bayes

We propose a human-cognition inspired classification model based on Naïve Bayes. Our previous study showed that human-cognitively inspired heuristics is able to enhance the prediction accuracy of text classifier based on...

A taxonomy of camera calibration and video projection correction methods

This paper provides a classification of calibration methods for cameras and projectors. From basic homography to complex geometric calibration methods, this paper aims at simplifying the choice of the methods to perform...

A Multimodal Interaction Framework for Blended Learning

Humans interact with each other by utilizing the five basic senses as input modalities, whereas sounds, gestures, facial expressions etc. are utilized as output modalities. Multimodal interaction is also used between hum...

Download PDF file
  • EP ID EP45835
  • DOI http://dx.doi.org/10.4108/icst.mobimedia.2015.259033
  • Views 274
  • Downloads 0

How To Cite

Benjamin Bock, Lior Shamir (2015). Assessing the efficacy of benchmarks for automatic speech accent recognition. EAI Endorsed Transactions on Creative Technologies, 2(4), -. https://www.europub.co.uk/articles/-A-45835