Audio data publisher

General Description

The Audio ZMQ component is a high-performance audio data processing system that captures audio from input devices and makes it available for other components via ZeroMQ (ZMQ) sockets. It provides two main services: audio capture and audio playback, both configurable through HTTP APIs. The capture service interfaces with audio input devices, processes the captured audio data into standardized PCM format, and publishes it on ZMQ PUB sockets. The playback service receives audio data from ZMQ sockets or via file upload and routes it to output devices. The component offers a RESTful API for configuring audio parameters (sample rate, channels, buffer size) and device selection, with an interactive documentation interface available at http://localhost:8000/docs.

Resource Link
Source code https://gitc.piap.lukasiewicz.gov.pl/ai-prism/wp3/ambient-digitalization/audio-zmq
Demo Video

Contact

The following table includes contact information of the main developers in charge of the component:

Name Email Organisation
Dorin Clisu dorin.clisu@nttdata.com NTT Data Romania

License

Proprietary.

Technical Foundations

The Audio ZMQ component is built on asynchronous programming using AnyIO, which provides a unified interface over asyncio. Audio capture and playback functionalities are implemented using the SoundDevice library which provides Python bindings for PortAudio. The component uses ZeroMQ for high-performance message passing between services, implementing a publisher-subscriber pattern for streaming audio data. FastAPI is used to expose RESTful endpoints for configuration and control.

Integrated and Open Source Components

Overview

The Audio ZMQ component integrates several high-quality open-source libraries to provide its functionality. It leverages Python's rich ecosystem for audio processing, asynchronous programming, and API development. The core dependencies include sound processing libraries (sounddevice, soundfile, numpy), asynchronous frameworks (anyio), HTTP API development tools (fastapi, uvicorn, python-multipart), and custom utilities (runner-with-api). Together, these components enable the system to efficiently capture, process, and distribute audio data across services.

Pre-existing Components

AnyIO

Source

Open source: https://github.com/agronholm/anyio

Description

AnyIO is an asynchronous I/O library that provides a unified interface for asynchronous programming, working with either asyncio or trio backends.

Modifications

None.

Purpose in AI-PRISM

AnyIO serves as the foundation for asynchronous operations in the Audio ZMQ component, enabling non-blocking audio processing and API handling.

License

MIT License: https://github.com/agronholm/anyio/blob/master/LICENSE

FastAPI

Source

Open source: https://github.com/tiangolo/fastapi

Description

FastAPI is a modern, high-performance web framework for building APIs with Python based on standard type hints.

Modifications

None.

Purpose in AI-PRISM

FastAPI powers the HTTP API layer of the Audio ZMQ component, allowing other services to configure audio parameters and control capture/playback operations.

License

MIT License: https://github.com/tiangolo/fastapi/blob/master/LICENSE

NumPy

Source

Open source: https://github.com/numpy/numpy

Description

NumPy is the fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices.

Modifications

None.

Purpose in AI-PRISM

NumPy is used for efficient manipulation and processing of audio data arrays, particularly for formatting PCM audio data.

License

BSD 3-Clause License: https://github.com/numpy/numpy/blob/main/LICENSE.txt

Python-Multipart

Source

Open source: https://github.com/andrew-d/python-multipart

Description

Python-Multipart is a streaming multipart parser for Python that supports file uploads.

Modifications

None.

Purpose in AI-PRISM

Used to enable file upload functionality in the playback API for sending audio files to be played by the service.

License

Apache License 2.0: https://github.com/andrew-d/python-multipart/blob/master/LICENSE.txt

Runner-with-API

Source

Open source: https://github.com/dorinclisu/runner-with-api.git

Description

A utility library that combines process management with API functionality, enabling services to be controlled via HTTP endpoints.

Modifications

None.

Purpose in AI-PRISM

Provides the foundational structure for running the audio services as background processes while exposing control APIs.

License

MIT License: https://github.com/dorinclisu/runner-with-api.git/blob/master/LICENSE

SoundDevice

Source

Open source: https://github.com/spatialaudio/python-sounddevice

Description

SoundDevice provides Python bindings for the PortAudio library, allowing access to audio input and output devices.

Modifications

None.

Purpose in AI-PRISM

Provides the core functionality for interfacing with audio hardware, handling device discovery, configuration, and audio data streaming.

License

MIT License: https://github.com/spatialaudio/python-sounddevice/blob/master/LICENSE

SoundFile

Source

Open source: https://github.com/bastibe/python-soundfile

Description

SoundFile is a library for reading and writing audio files in various formats using the libsndfile C library.

Modifications

None.

Purpose in AI-PRISM

Enables the component to read from and write to audio files in different formats, supporting the playback service's file handling capabilities.

License

BSD 3-Clause License: https://github.com/bastibe/python-soundfile/blob/master/LICENSE

Uvicorn

Source

Open source: https://github.com/encode/uvicorn

Description

Uvicorn is a lightning-fast ASGI server implementation, using uvloop and httptools.

Modifications

None.

Purpose in AI-PRISM

Serves as the ASGI web server that runs the FastAPI applications, providing the HTTP interface for the capture and playback services.

License

BSD 3-Clause License: https://github.com/encode/uvicorn/blob/master/LICENSE.md

How to install

Every AI-PRISM component is installed using the Cluster management service. During the installation process, the user needs to configure a set of high-level parameters.

How to use

The Audio ZMQ component provides two main services, each with its own HTTP API: audio capture and audio playback. The capture API runs on port 8000 and allows listing devices, configuring audio parameters (device, channels, sample rate, format, buffer), and stopping capture. The playback API runs on port 8010 and provides similar device listing and configuration, plus file upload for playing audio clips. For both services, interactive documentation is available at the /docs endpoint (http://localhost:8000/docs and http://localhost:8010/docs). Audio data is automatically published to a ZeroMQ socket, which can be accessed by other components using the AsyncZMQSubscriber from the data-middleware library.