Audio data publisher
General Description
The Audio ZMQ component is a high-performance audio data processing system that captures audio from input devices and makes it available for other components via ZeroMQ (ZMQ) sockets. It provides two main services: audio capture and audio playback, both configurable through HTTP APIs. The capture service interfaces with audio input devices, processes the captured audio data into standardized PCM format, and publishes it on ZMQ PUB sockets. The playback service receives audio data from ZMQ sockets or via file upload and routes it to output devices. The component offers a RESTful API for configuring audio parameters (sample rate, channels, buffer size) and device selection, with an interactive documentation interface available at http://localhost:8000/docs.
| Resource | Link |
|---|---|
| Source code | https://gitc.piap.lukasiewicz.gov.pl/ai-prism/wp3/ambient-digitalization/audio-zmq |
| Demo Video |
Contact
The following table includes contact information of the main developers in charge of the component:
| Name | Organisation | |
|---|---|---|
| Dorin Clisu | dorin.clisu@nttdata.com |
License
Proprietary.
Technical Foundations
The Audio ZMQ component is built on asynchronous programming using AnyIO, which provides a unified interface over asyncio. Audio capture and playback functionalities are implemented using the SoundDevice library which provides Python bindings for PortAudio. The component uses ZeroMQ for high-performance message passing between services, implementing a publisher-subscriber pattern for streaming audio data. FastAPI is used to expose RESTful endpoints for configuration and control.
Integrated and Open Source Components
Overview
The Audio ZMQ component integrates several high-quality open-source libraries to provide its functionality. It leverages Python's rich ecosystem for audio processing, asynchronous programming, and API development. The core dependencies include sound processing libraries (sounddevice, soundfile, numpy), asynchronous frameworks (anyio), HTTP API development tools (fastapi, uvicorn, python-multipart), and custom utilities (runner-with-api). Together, these components enable the system to efficiently capture, process, and distribute audio data across services.
Pre-existing Components
AnyIO
Source
Open source: https://github.com/agronholm/anyio
Description
AnyIO is an asynchronous I/O library that provides a unified interface for asynchronous programming, working with either asyncio or trio backends.
Modifications
None.
Purpose in AI-PRISM
AnyIO serves as the foundation for asynchronous operations in the Audio ZMQ component, enabling non-blocking audio processing and API handling.
License
MIT License: https://github.com/agronholm/anyio/blob/master/LICENSE
FastAPI
Source
Open source: https://github.com/tiangolo/fastapi
Description
FastAPI is a modern, high-performance web framework for building APIs with Python based on standard type hints.
Modifications
None.
Purpose in AI-PRISM
FastAPI powers the HTTP API layer of the Audio ZMQ component, allowing other services to configure audio parameters and control capture/playback operations.
License
MIT License: https://github.com/tiangolo/fastapi/blob/master/LICENSE
NumPy
Source
Open source: https://github.com/numpy/numpy
Description
NumPy is the fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.
Modifications
None.
Purpose in AI-PRISM
NumPy is used for efficient manipulation and processing of audio data arrays, particularly for formatting PCM audio data.
License
BSD 3-Clause License: https://github.com/numpy/numpy/blob/main/LICENSE.txt
Python-Multipart
Source
Open source: https://github.com/andrew-d/python-multipart
Description
Python-Multipart is a streaming multipart parser for Python that supports file uploads.
Modifications
None.
Purpose in AI-PRISM
Used to enable file upload functionality in the playback API for sending audio files to be played by the service.
License
Apache License 2.0: https://github.com/andrew-d/python-multipart/blob/master/LICENSE.txt
Runner-with-API
Source
Open source: https://github.com/dorinclisu/runner-with-api.git
Description
A utility library that combines process management with API functionality, enabling services to be controlled via HTTP endpoints.
Modifications
None.
Purpose in AI-PRISM
Provides the foundational structure for running the audio services as background processes while exposing control APIs.
License
MIT License: https://github.com/dorinclisu/runner-with-api.git/blob/master/LICENSE
SoundDevice
Source
Open source: https://github.com/spatialaudio/python-sounddevice
Description
SoundDevice provides Python bindings for the PortAudio library, allowing access to audio input and output devices.
Modifications
None.
Purpose in AI-PRISM
Provides the core functionality for interfacing with audio hardware, handling device discovery, configuration, and audio data streaming.
License
MIT License: https://github.com/spatialaudio/python-sounddevice/blob/master/LICENSE
SoundFile
Source
Open source: https://github.com/bastibe/python-soundfile
Description
SoundFile is a library for reading and writing audio files in various formats using the libsndfile C library.
Modifications
None.
Purpose in AI-PRISM
Enables the component to read from and write to audio files in different formats, supporting the playback service's file handling capabilities.
License
BSD 3-Clause License: https://github.com/bastibe/python-soundfile/blob/master/LICENSE
Uvicorn
Source
Open source: https://github.com/encode/uvicorn
Description
Uvicorn is a lightning-fast ASGI server implementation, using uvloop and httptools.
Modifications
None.
Purpose in AI-PRISM
Serves as the ASGI web server that runs the FastAPI applications, providing the HTTP interface for the capture and playback services.
License
BSD 3-Clause License: https://github.com/encode/uvicorn/blob/master/LICENSE.md
How to install
Every AI-PRISM component is installed using the Cluster management service. During the installation process, the user needs to configure a set of high-level parameters.
How to use
The Audio ZMQ component provides two main services, each with its own HTTP API: audio capture and audio playback. The capture API runs on port 8000 and allows listing devices, configuring audio parameters (device, channels, sample rate, format, buffer), and stopping capture. The playback API runs on port 8010 and provides similar device listing and configuration, plus file upload for playing audio clips. For both services, interactive documentation is available at the /docs endpoint (http://localhost:8000/docs and http://localhost:8010/docs). Audio data is automatically published to a ZeroMQ socket, which can be accessed by other components using the AsyncZMQSubscriber from the data-middleware library.