Introduction

This article focuses on Audio Soup, an innovative open-source tool designed specifically for interactive machine learning (IML) in the context of audio datasets. By addressing sample review and feature selection, Audio Soup aims to empower non-expert users to actively participate in the development of algorithmic systems. This article provides an overview of Audio Soup’s functionalities and its potential to enhance transparency, fairness, and accountability in the field of machine learning.

Audio Soup Tool: Key Features and Functionality

Audio Soup is a browser-based interface built in Python and powered by a SQL database using Postgres. It can be easily deployed across different operating systems through a Docker container. The tool includes command-line utilities for dataset loading, and a demo dataset derived from the Google Speech Commands dataset is available for experimentation.

Sample Review (Grid View)

The primary view of Audio Soup is the “grid view,” where audio samples are presented in a paginated card-based layout. Users can filter samples based on labels or view the entire dataset. The grid view employs a modal card-based layout using the Bulma CSS Framework, allowing users to review individual samples. Each sample card contains an audio waveform image generated dynamically during page load, metadata information, and an audio player for sample playback. Users can edit metadata or navigate to the feature selection view for a specific audio sample.

Feature Selection

The feature selection view in Audio Soup enables users to explore the feature space of audio data. It presents three categories of features: spectral, rhythmic, and deltas. Spectral features, such as Mel Spectrogram, Tonal Centroid (Tonnetz), and Spectral Contrast, allow users to analyze the frequency and pitch spectrum across time. Rhythmic features, including Tempogram and Fourier Tempogram, capture information about onset speeds within audio samples. Delta features manipulate existing features to reveal patterns, such as the derivative and second-order derivative of the Mel Spectrogram. Each feature is accompanied by a brief explanation, with links to relevant resources like Wikipedia or research papers.

Next Steps and Future Development

While Audio Soup is currently available as a prototype (v0.1.0) in a Docker container, there are several planned enhancements for future releases. The development roadmap includes features like cross-sample feature comparison, support for exporting features in various file formats (e.g., CSV, YAML, plain text), feature manipulation or augmentation filters, expanded annotations for sample review, and semantic text representation to improve annotation quality. User testing will be conducted to gather feedback and prioritize development efforts based on the needs and preferences of both non-expert users and those with domain expertise.

Conclusion

Audio Soup represents a valuable contribution to the field of interactive machine learning, specifically in the analysis of audio datasets. By providing non-expert users with accessible tools for sample review and feature selection, Audio Soup aims to democratize machine learning development and increase transparency, fairness, and accountability. The tool’s open-source nature and the ongoing development efforts ensure that it can be utilized and improved by researchers and enthusiasts alike.

References:

  1. Dudley, J. J., & Kristensson, P. O. (2018). A review of user interface design for interactive machine learning.
  2. Michael, C. J., Acklin, D., & Scheuerman, J. (2020). On interactive machine learning and the potential of cognitive feedback.
  3. Jo, E. S., & Gebru, T. (2020). Lessons from archives: Strategies for collecting sociocultural data in machine learning.