Voice Control
General Description
Voice Control is an orchestration system that integrates various AI components for enabling voice-based robot control. It processes audio input through a pipeline of speech recognition, natural language understanding, dialog management, and text-to-speech to facilitate human-robot interaction. The system allows users to issue voice commands that are transformed into structured robot commands. It features configurable activation words, feedback mechanisms, and a flexible intent-slot architecture for command recognition.
| Resource | Link |
|---|---|
| Source code | https://gitc.piap.lukasiewicz.gov.pl/ai-prism/wp4/human-robot-interaction/voice-control |
| Demo Video | KEBA%20AGV%20FollowMe/demo/voice/2025-02-22%2014-15-42.mkv |
Contact
The following table includes contact information of the main developers in charge of the component:
| Name | Organisation | |
|---|---|---|
| Dorin Clisu | dorin.clisu@nttdata.com |
License
Proprietary.
Technical Foundations
Integrated and Open Source Components
Overview
The Voice Control system integrates several open-source components to provide its functionality. These components include FastAPI for API development, Pydantic for data validation, MQTT for messaging, and server-sent events for real-time updates. The system also relies on various custom-built components for audio processing, speech recognition, NLP, and text-to-speech synthesis.
Pre-existing Components
aiomqtt
Source
Description
aiomqtt is an async Python client for MQTT, a lightweight publish/subscribe messaging protocol designed for IoT applications.
Modifications
None.
Purpose in AI-PRISM
Alternative way to interact with robotic systems not implemented in ROS2.
License
MIT License
FastAPI
Source
Description
FastAPI is a modern, fast web framework for building APIs with Python, based on standard Python type hints.
Modifications
None.
Purpose in AI-PRISM
Provides the REST API interface for the Voice Control system, allowing interaction with the voice processing pipeline.
License
MIT License
Pydantic-Settings
Source
Pydantic-Settings GitHub repository
Description
Pydantic-Settings is a configuration management library built on top of Pydantic, providing settings management with environment variable support.
Modifications
None.
Purpose in AI-PRISM
Used to manage configuration settings for the Voice Control and MQTT components.
License
MIT License
PyYAML
Source
Description
PyYAML is a YAML parser and emitter for Python.
Modifications
None.
Purpose in AI-PRISM
Used for reading and writing YAML-formatted configuration files, particularly for the NLP specification.
License
MIT License
runner-with-api
Source
runner-with-api GitHub repository
Description
A utility library for creating long-running Python services with integrated API capabilities.
Modifications
None.
Purpose in AI-PRISM
Provides the foundation for the Voice Control service, combining background processing with a REST API.
License
MIT License
sse-starlette
Source
sse-starlette GitHub repository
Description
Server-Sent Events implementation for Starlette and FastAPI.
Modifications
None.
Purpose in AI-PRISM
Used to provide real-time status updates from the voice control pipeline to clients.
License
MIT License
How to install
Every AI-PRISM component is installed using the Cluster management service. During the installation process, the user needs to configure a set of high-level parameters.
How to use
The Voice Control component can be used through its REST API or by integrating with the provided MQTT bridge. For direct API usage, you can use the VoiceControlAPI client from the data-middleware library. To get voice commands:
```python from data_middleware.clients.voice_control import VoiceControlAPI
vc = VoiceControlAPI('http://localhost:8000') while True: cmd = vc.get_output() print(cmd)