Leveraging Foundation Models for Selecting the Most Effective Behavior Tree Action in Obstacle Avoidance
View/ Open
Author
Moriconi, Michele <2000>
Date
2024-10-15Data available
2024-11-07Abstract
The aim of this thesis is to integrate a Foundation Model based semantic scene understanding
pipeline into the Behavior Tree data structure utilized by the ROS2 navigation stack, Nav2.
Obstacle avoidance is a crucial challenge in the field of autonomous navigation, especially in
warehouse environments where the robot has to efficiently and safely transport goods from
one location to another. This integration will allow the robot to understand the environment
and the obstacles present in it semantically, which will allow it to select the best action for
the given scenario.
The pipeline is composed of two modules, the perception module and the reasoning
module. The perception module is responsible for processing the sensor data and, using
a Vision Language Model, generating a description of the obstacles present in the robot’s
path. The reasoning module is responsible for processing the description generated by the
perception module and selecting the best action for the robot to take. The whole pipeline is
integrated into a Behavior Tree, that handles the navigation of the robot and the selection of
the correct sub-tree to execute based on the response generated by the reasoning module.
A novel dataset was created to evaluate the performance of the pipeline. The dataset
consists of fifty scenarios, each associated with the correct action to be selected. The pipeline
was evaluated in a end-to-end manner, showing that the pipeline is able to correctly select the
action 74% of the time using the obstacle description generated by the perception module
and 92% of the time using human descriptions.
Future works will focus on improving the performance of the pipeline by fine-tuning
the models used in the perception and reasoning modules, designing more modules to be
integrated into the pipeline, and creating a richer dataset to include more scenarios during
the evaluation process. The aim of this thesis is to integrate a Foundation Model based semantic scene understanding
pipeline into the Behavior Tree data structure utilized by the ROS2 navigation stack, Nav2.
Obstacle avoidance is a crucial challenge in the field of autonomous navigation, especially in
warehouse environments where the robot has to efficiently and safely transport goods from
one location to another. This integration will allow the robot to understand the environment
and the obstacles present in it semantically, which will allow it to select the best action for
the given scenario.
The pipeline is composed of two modules, the perception module and the reasoning
module. The perception module is responsible for processing the sensor data and, using
a Vision Language Model, generating a description of the obstacles present in the robot’s
path. The reasoning module is responsible for processing the description generated by the
perception module and selecting the best action for the robot to take. The whole pipeline is
integrated into a Behavior Tree, that handles the navigation of the robot and the selection of
the correct sub-tree to execute based on the response generated by the reasoning module.
A novel dataset was created to evaluate the performance of the pipeline. The dataset
consists of fifty scenarios, each associated with the correct action to be selected. The pipeline
was evaluated in a end-to-end manner, showing that the pipeline is able to correctly select the
action 74% of the time using the obstacle description generated by the perception module
and 92% of the time using human descriptions.
Future works will focus on improving the performance of the pipeline by fine-tuning
the models used in the perception and reasoning modules, designing more modules to be
integrated into the pipeline, and creating a richer dataset to include more scenarios during
the evaluation process.
Type
info:eu-repo/semantics/masterThesisCollections
- Laurea Magistrale [4822]