Voice Control

General Description

Voice Control is an orchestration system that integrates various AI components for enabling voice-based robot control. It processes audio input through a pipeline of speech recognition, natural language understanding, dialog management, and text-to-speech to facilitate human-robot interaction. The system allows users to issue voice commands that are transformed into structured robot commands. It features configurable activation words, feedback mechanisms, and a flexible intent-slot architecture for command recognition.

Resource Link
Source code https://gitc.piap.lukasiewicz.gov.pl/ai-prism/wp4/human-robot-interaction/voice-control
Demo Video KEBA%20AGV%20FollowMe/demo/voice/2025-02-22%2014-15-42.mkv

Contact

The following table includes contact information of the main developers in charge of the component:

Name Email Organisation
Dorin Clisu dorin.clisu@nttdata.com NTT Data Romania

License

Proprietary.

Technical Foundations

Integrated and Open Source Components

Overview

The Voice Control system integrates several open-source components to provide its functionality. These components include FastAPI for API development, Pydantic for data validation, MQTT for messaging, and server-sent events for real-time updates. The system also relies on various custom-built components for audio processing, speech recognition, NLP, and text-to-speech synthesis.

Pre-existing Components

aiomqtt

Source

aiomqtt GitHub repository

Description

aiomqtt is an async Python client for MQTT, a lightweight publish/subscribe messaging protocol designed for IoT applications.

Modifications

None.

Purpose in AI-PRISM

Alternative way to interact with robotic systems not implemented in ROS2.

License

MIT License

FastAPI

Source

FastAPI GitHub repository

Description

FastAPI is a modern, fast web framework for building APIs with Python, based on standard Python type hints.

Modifications

None.

Purpose in AI-PRISM

Provides the REST API interface for the Voice Control system, allowing interaction with the voice processing pipeline.

License

MIT License

Pydantic-Settings

Source

Pydantic-Settings GitHub repository

Description

Pydantic-Settings is a configuration management library built on top of Pydantic, providing settings management with environment variable support.

Modifications

None.

Purpose in AI-PRISM

Used to manage configuration settings for the Voice Control and MQTT components.

License

MIT License

PyYAML

Source

PyYAML GitHub repository

Description

PyYAML is a YAML parser and emitter for Python.

Modifications

None.

Purpose in AI-PRISM

Used for reading and writing YAML-formatted configuration files, particularly for the NLP specification.

License

MIT License

runner-with-api

Source

runner-with-api GitHub repository

Description

A utility library for creating long-running Python services with integrated API capabilities.

Modifications

None.

Purpose in AI-PRISM

Provides the foundation for the Voice Control service, combining background processing with a REST API.

License

MIT License

sse-starlette

Source

sse-starlette GitHub repository

Description

Server-Sent Events implementation for Starlette and FastAPI.

Modifications

None.

Purpose in AI-PRISM

Used to provide real-time status updates from the voice control pipeline to clients.

License

MIT License

How to install

Every AI-PRISM component is installed using the Cluster management service. During the installation process, the user needs to configure a set of high-level parameters.

How to use

The Voice Control component can be used through its REST API or by integrating with the provided MQTT bridge. For direct API usage, you can use the VoiceControlAPI client from the data-middleware library. To get voice commands:

```python from data_middleware.clients.voice_control import VoiceControlAPI

vc = VoiceControlAPI('http://localhost:8000') while True: cmd = vc.get_output() print(cmd)