AI-BASED PERCEPTION ENHANCING MODULES

Introduction

The Perception Enhancement (PE) component consists of a suite of AI-based methods capable of processing the workplace digital reconstruction performed by Ambient Digitalization (AD) to accomplish several tasks of interest. Examples of such tasks include: (1) Object Detection and/or Segmentation, (2) Object 6D Pose Estimation, (3) Human action recognition, (4) Human health-status monitoring, and (5) Human postural analysis.

Technical Specifications

Software and hardware Requirements

PE consists of software modules that takes as input the digitalized environment created by AD, and processes it to accomplish several perception-related tasks. PE modules comprehend software modules to:

Get data from the Ambient Digitalization modules describing the current environment and actors present in it in digital form.

Get data from database, including trained weights of the neural network models used together with prior information such as CAD models, object detections or semantic segmentation.

Perform inference with the neural network created for the specific perception-related task at hand.

Send PE results to the AI-PRISM modules performing tasks 4.3, 4.4, and 4.5.

Store PE results in a database for later inspection and for continual learning purposes.

The main actor interacting with PE modules are the expert system administrator who develops them, and the task 4.3, 4.4, and 4.5 modules.

You can find a block diagram representing AD technical specifications below.

PE technical block diagram

Usage Manual

You can find the usage model of AS below.

PE usage model

Functional Specifications

Functional Block Diagram

The PE functional block diagram is reported below.

PE functional block diagram

A PE module takes as input the digital reconstruction of the working environment performed by AD and addresses the perception-related task at hand. At a given time frame, input data comes as coloured point clouds representing the working scene as a collection of points with associated colour information. Each point has a six-dimensional representation containing both geometric information, in the form of its 3D coordinates (x, y, z), and photometric information, in the form of its RGB colour channels. PE modules can operate either at frame level, by producing predictions based only on static information present in the current time frame, or at video level, leveraging dynamic information accumulated across multiple time frames. To collect a video clip enabling the latter family of PE modules, a circular buffer is used to accumulate several consecutive time frames.