The present emerging trend for innovative artificial intelligence applications and deep learning technologies is unbroken, leading to a tremendous need for large-scale labelled training data to adequately train newly developed systems and their underlying machine learning models.
In particular, for audio classification, training data is required to come from large pools of speakers in order for models to generalise well. Current technologies bring the opportunity to collect masses of new data via the Internet, making use of ubiquitous embedded microphones in laptop PCs, tablets and smartphone devices. This technological progress enables collection of speech data under real-life conditions (e.g., different microphone types, devices, background noises, reverberations, to name but a few) of a large number of speakers with different geographic origins, languages, dialects, cultural backgrounds, age groups, and many more differences. Speech samples collected in-the-wild may also contain various types of environmental noises such as crowd noises at events, traffic noises, and other city noises. This makes them ideally suited for research areas dealing with noise-cancellation or source-separation, e.g., modern speech recognition tasks.
Unfortunately, this mass of data is generally unstructured, lacks reliable labels and high-quality annotation procedures are costly, time-consuming, and tedious work. What seems like the answer to these needs of big data has come in the form of a technique called “crowdsourcing”. Recently, numerous scientific research projects have turned their backs on only collecting annotations in a controlled laboratory setting by groups of experts and started to recruit annotators by outsourcing the labelling tasks to an open, unspecific, and mostly untrained group of individuals on the web. Therefore, crowdsourcing can be harnessed for lots of different application areas and offers immediate access to a wide and diverse range of individuals with different backgrounds, knowledge, and skills, everywhere and at any time.
Hence, crowdsourcing has emerged as a collaborative approach highly applicable to the area of language and speech processing, offering a fast and effective way to gather a large amount of labels that are of the same quality as those determined by small groups of experts but at lower costs.
In this context, we developed the online crowdsourcing-based gamified data collection, annotation and analysis platform iHEARu-PLAY and its integrated novel web-based speech classification tool voice analysis application (VoiLA) to encourage players to provide large-scale labelled speech data in an efficient manner on a voluntary basis while playing a game and supporting science.
Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform
Simone Hantke, Tobias Olenyi, Christoph Hausner, Tobias Appel, Björn Schuller.
Conclusions and outlook
Within this contribution, the browser-based crowdsourcing platform iHEARu-PLAY and its web-based speech classification tool VoiLA were introduced, with VoiLA following a unique approach by leveraging iHEARu-PLAY for speech annotation to obtain required training data.
In detail, VoiLA encourages people who helped annotate data-and anyone else-to try and evaluate the trained system by having their own voice analysed. It allows visitors to record and upload their voice directly from a website in their browser. On the backend, the uploaded speech data is run through a classification pipeline using a set of pre-trained models that target different kinds of speaker states and traits like gender, dominance, 24 kinds of emotions, arousal, and valence. The gathered analysis results are then sent back to the player and visualized in the browser, giving players unique objective insights into how their voice sounds.
Finally, an extensive player assessment and evaluation of the first-of-its-kind proposed platform and the introduced methods was performed. The player evaluation survey showed that the proposed system has an excellent, bordering on best imaginable, usability and the task system proposed for voice recording is accepted well by the players. Additional player comments indicated that some enhancements could be made in terms of accuracy of the emotion classification.
In the future, it is planned to integrate the concept of transfer learning, allowing for the adaptation of existing models to an unseen topic. Hence, the goal is to maximise the knowledge transfer from an existing task and obtain new knowledge relevant to a new task. This adaptive learning strategy could also allow for continuous improvement of the models of VoiLA. In addition, we will further improve the classifiers by retraining them with already collected and annotated player data within VoiLA. In this context, a future idea is to give players the possibility to train their own classifier which in turn would help to improve the overall system. From a player′s point of view, the performed recording and annotation tasks are handled in a gamified way and could be seen as a way of feeding their own “tamagotchi”TM (i.e., the classifier), which can only grow with good care by performing annotation or recording tasks on a daily basis.
Other potential additions to VoiLA include giving players the possibility to have their voice analysed not only by machine learning but by human annotators, as well. We see our platform iHEARu-PLAY as an ideal platform to collect these manual labels and plan a tighter integration with VoiLA. Additionally, we are currently integrating the player feedback from the conducted evaluation survey.
Finally, a long-term goal is to develop and integrate a classifier which is capable of presenting the results to the player in real-time while they are speaking. Therefore, VoiLA has the potential to popularise the science behind voice analysis and the annotation process of iHEARu-PLAY.
For more up-to-date information:
1) WeChat: IJAC
2) Twitter: IJAC_Journal
3) Facebook: International Journal of Automation and Computing
4) Linkedin: Int. J. of Automation and Computing
5) Sina Weibo: IJAC-国际自动化与计算杂志