Despite their successes, all the GWAP described earlier have been designed with a specific aim and target a single-modal labelling task. Furthermore, these GWAP can only be applied for their developed purpose and due to its method of implementation, they cannot be easily adapted to other labelling or data collection procedures. In this regard, the authors proposed the crowdsourcing platform iHEARu-PLAY. This platform is accessible on any standard PC or smartphone and offers audio, video and image labelling for a diverse range of annotation tasks. It also features audio-visual or just audio data collections and analysis, taking into account a range of novel annotator trustability-based machine learning algorithms to reduce the amount of manual annotation work[14, 15, 48].
In detail, iHEARu-PLAY was realised with the free and open-source Python web framework Django, using a free HTML 5 theme which ensured compatibility with all current browsers on standard PCs and mobile devices. Therefore, data recordings and annotations are easily collectable at any time and anywhere as long as audio can be played back and/or a microphone is available.
iHEARu-PLAY′s primary intended use is the collection and annotation of audio datasets. It is, however, modality-independent, i.e., images and videos can also be collected and annotated via the platform. iHEARu-PLAY offers a wide range of multi-task annotation options, including discrete (single-choice and multiple-choice), discrete numeric, continuous numeric, continuous numeric 2D, time-continuous numeric, self-assessment manikin, pairwise comparison, and free-input labels.
An overview of the different components in iHEARu-PLAY is given in Fig. 1. Shown are the data collection options, the pre-processing and intelligent audio analysis components, the integrated machine learning component, and the annotator trustability score calculation[14, 15]:
Data collection. New data can be collected either simply with the recording feature within iHEARu-PLAY or by making use of VoiLA, which will be described in more detail in the following sections.
Pre-processing. Recorded speech of each player automatically runs through a pre-processing step, ensuring a good audio quality by applying voice and event activity detection and a volume normalisation of the recordings.
Intelligent audio analysis. For the annotation part, data owners and researchers upload their audio data to iHEARu-PLAY which then automatically runs through the intelligent audio analysis (IAA) component. After having chosen one of several available feature sets (e.g., IS09 with 384 features or IS16 ComParE with 6 373 features, the acoustic features are automatically extracted by using the integrated openSMILE toolkit. Then, a classifier is trained with the small amount of pre-labelled training data on iHEARu-PLAY and the results are transferred to the trustability-based machine learning component.
Machine learning. A range of selectable machine learning algorithms create a sorted list from the highest confidence on the instances to the lowest and extract a subset of instances based on the prediction confidence values. This low and medium confidence subset is then removed from the unlabelled data and automatically passed on for manual labelling while taking the annotator′s trustability calculation into account.
Trustability score and annotation reduction. Low quality annotations can result in training the model using incorrectly labelled data which may lead to a reduced accuracy of the trained classifier. Therefore, one of the goals of iHEARu-PLAY is to obtain annotations from non-expert annotators that are qualitatively close to gold standard annotations created by experts. In this context, several data quality mechanisms are applied such as pre-time quality checks and tracking the annotator′s behaviour.
For more information on the outlined components, the reader is referred to [14, 15]. In summary, iHEARu-PLAY has many advantages over conventional crowdsourcing platforms and is unique in that it provides volunteers a game-like environment to record and annotate speech, i.e., work is presented to players in an interesting and accessible way by incorporating elements that are typically found only in games.
In this context, just as humans differ from each other, player types can greatly differ as well. Not every player reacts or experiences game design elements in the same way, which makes it difficult to anticipate how well certain design decisions will be received by players. People can have different interests and preferences such as enjoying narratives of the game or playing to compete with others, whereas others may find competition or getting points irrelevant; instead, they might enjoy socialising with others or interacting with the world. Therefore, within the literature, people are categorised into four different types of players: achievers, explorers, socialisers, and killers (Table 1). It should be noted though, that these player types are theoretical extremes of players and their behaviours. In practice, players mostly have the characteristic of all player types. However, only one or two playing styles and behaviours are predominant.
Player type Characteristics Achievers Enjoy completing challenges $ \rightarrow $Collecting achievements, points or items and levelling up Explorers Have a thirst for adventure $ \rightarrow $Exploring the game and strive to discover new features Socialisers Like to stay in the social circle $ \rightarrow $Socialising or interacting with other players Killers Look for open battles $ \rightarrow $Using the virtual construct to cause distress on other players
Table 1. General overview of four different possible player types and their typical characteristics. Note that these types are valid in general and not specifically within iHEARu-PLAY
Therefore, to include as many players as possible, iHEARu-PLAY provides a specially designed gamification concept and collectable points for each annotation or recording handed in by the player depending on different mechanisms. In this context, the platform takes into account the interests of the different player types and utilises a combination of points, leaderboards, badges, and a social platform. These gamification elements are also the most widely used, since if applied right, they are known to be powerful, practical, relevant and potentially able to turn the mundane labelling work into a more enjoyable and motivating task. For more details on iHEARu-PLAY′s gamification concept, the reader is referred to the work presented in .
An evaluation study was conducted to assess the effectiveness of the current system, to determine what could be improved, and to identify the needs and wishes of the players for new features. We evaluated the iHEARu-PLAY platform and VoiLA to answer the following questions:
● How do players feel about the design and content of the platform?
● What is the usability of the current prototype? What are possible usability improvements?
● How interesting are the different recording tasks?
● How well are the current features accepted?
● What would players like to see added to the plat form experience?
● What do players dislike about the platform and how can these issues be improved?
To ensure that all data necessary to answer these questions could be collected, an evaluation survey② was tailored specifically to iHEARu-PLAY. In addition, the system usability scale (SUS) by Brooke was included to evaluate the usability of the platform in a comparable manner.
Over the course of two months, 157 players participated in the online survey describing their iHEARu-PLAY experience. Out of the 157 overall annotators, 131 gave us their complete metadata (Table 2). Among these participants were 72 male and 59 female volunteers. Altogether, we reached a variety of ages from 18 to 57 years (mean: 31.3, standard-deviation: 9.6). A large majority of participants were students (78.2%), followed by people employed for wages (18.7%) and self-employed participants (3.1%). Many players had a high school degree or equivalent (51.7%) as their highest academic degree, followed by a bachelor′s degree (18.3%), a master′s degree (12.3%) and other qualifications (8.4%). Only a few people went to a college without degree (3.4%) or had no qualification (5.9%).
Gender 72 female / 59 male Age Range = 18–57 years old; Medium = 31.3 years old; Standard deviations (SD) = 9.6 years old Occupation Students (78.2%), employed for wages (18.7%), self-employed (3.1%) Education High school degree or equivalent (51.7%), bachelor′s degree (18.3%), master′s degree (12.3%), other qualifications (8.4%), college without degree (3.4%), no qualification (5.9%)
Table 2. Statistics of the participants of the user evaluation Note: Out of 157 overall annotators, only 131 gave their complete metadata
Concerning the usability of iHEARu-PLAY, evaluation of the collected data shows that the platform reaches a 87.9% SUS usability-score (Table 3). According to Bagor who divided this scale into categories, this suggests that iHEARu-PLAY has an excellent, bordering on best imaginable, usability. iHEARu-PLAY′s content (86.6%) and design (83.5%) was rated similar positive, followed by rating the platform as interesting (79.8%) and fun to use (68.1%). Independent of this obtained results, we are aware that there is still space for further improvement like optimising the usability of our mobile version.
Topic Rating % SD General Content 86.6 1.8 Design 83.5 2.3 Usability 87.9 1.6 Fun 68.1 2.9 Interest 79.8 1.7 Tasks Annotation 81.5 2.1 Recording Acting 84.6 1.4 Image 85.2 1.6 Text 76.1 2.3 Results Acceptance 72.4 1.9 Alteration 68.3 1.4 Presentation 73.6 0.9
Table 3. Results of the evaluation survey. Overall results are displayed as star ratings (intervals incrementing in 20% steps), followed by absolute numbers (0–100%) and standard deviations
The acceptance rates of our annotation and recording features with its different tasks were measured individually and answered on a five-point Likert scale. While the annotation feature achieved an acceptance rate of 81.5%, the recording feature image task has the highest acceptance rate (85.2%), followed by the game/acting task (84.6%) and the text task (76.1%). This leads to the conclusion that players generally prefer visual or interactive tasks over the less demanding text task.
The analysed results of VoiLA show an acceptance rate of 72.4%, while its presentation was rated with 73.6% and the alteration with 68.3% by the players. The obtained evaluation results are summarised in Fig. 5.
To gather insights on the opinions of players, and to receive more detailed feedback and feature requests, we encouraged participants of our survey to submit free-text comments where they could explain the choices they made in the survey and could request features or emphasise positive aspects. Among other things, participants mostly reported on the VoiLA feature. A representative example was a blurriness of the classifier near the edges of emotion classes, i.e., incorrectly classified emotions close together – e.g., irritation and anger. This issue will be addressed in a future release of VoiLA, where we will publish an improved classification system based on the label corrections that players can already perform today. Another common player request was the introduction of more diverse recording tasks, allowing the recording procedure to be even more interesting and fun. This feature is already under development and is implemented by introducing an additional task where players are able to play a small game while recording their speech.
Overall, the system predominantly received positive feedback, stating that iHEARu-PLAY and VoiLA were easy to use and that it was interesting to see an automatic analysis demonstrated on the own voice. Additionally, the analysis increased interest in the science behind voice analysis and the willingness to participate in improving the system and therefore performing annotation tasks. This collected feedback allows the conclusion that iHEARu-PLAY is broadly accepted among players.
Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform
- Received: 2018-09-08
- Accepted: 2019-03-26
- Published Online: 2019-06-06
Abstract: In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results.
|Citation:||Simone Hantke, Tobias Olenyi, Christoph Hausner, Tobias Appel and Björn Schuller. Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform. International Journal of Automation and Computing, vol. 16, no. 4, pp. 427-436, 2019. doi: 10.1007/s11633-019-1180-0|