Verifica delle reti neurali per l'apprendimento rinforzato sicuro

Polaka, Surendra Kumar Reddy <1996>

View/Open

tesi28975612.pdf (2.284Mb)

Author

Polaka, Surendra Kumar Reddy <1996>

Date

2024-07-19

Data available

2024-07-25

Abstract

L'obiettivo della tesi è avvicinarsi all'apprendimento di rinforzo sicuro apprendendo (in modo non sicuro) una politica come una rete neurale e quindi verificandola. Partendo dalle motivazioni e dagli obiettivi viene evidenziato il passaggio dai metodi tabulari alle reti neurali, nello specifico alle Reti Neurali Convoluzionali e all'architettura Actor-Critic, con un focus sull'eleganza architetturale e l'introduzione del Soft Actor-Critic (SAC). La tesi approfondisce quindi le implementazioni pratiche attraverso strumenti e framework come Open AI Gym, PyTorch, TensorFlow e Never2 Tool. Vengono delineati la progettazione architettonica, il processo di installazione e le procedure per la creazione di modelli, la definizione delle proprietà e la gestione dei modelli tramite un'interfaccia della riga di comando dello strumento Never2. Le funzionalità dello strumento si estendono alle reti di formazione, alle strategie di verifica e alla visualizzazione dei risultati. I risultati sperimentali nell'ambiente di controllo classico vengono dettagliati, valutando diversi metodi e approcci alla rete neurale. Viene enfatizzato il processo di verifica della rete, garantendo la robustezza dello strumento. La tesi si conclude contribuendo con una prospettiva dettagliata sulla RL, combinando fondamenti teorici con applicazioni pratiche e aprendo la strada a futuri progressi nella ricerca RL e nelle implementazioni nel mondo reale.

The goal of the thesis is to approach safe reinforcement learning by learning (unsafely) a policy as a neural network and then verifying it. Beginning with motivations and objectives the transition from tabular methods to neural networks, specifically Convolutional Neural Networks and Actor-Critic architecture, is highlighted, with a focus on the architectural elegance and the introduction of Soft Actor-Critic (SAC). The thesis then delves into practical implementations through tools and frameworks like Open AI Gym, PyTorch, TensorFlow, and the Never2 Tool. The Never2 Tool’s architectural design, installation process, and procedures for building models, defining properties, and handling models through a command-line interface are outlined. The tool’s functionalities extend to training networks, verification strate- gies, and output visualization. Experimental results in the Classic control environment are detailed, evaluating different methods and neural network approaches. The network verification process is emphasized, ensuring the robustness of the tool. The thesis concludes by contributing a detailed perspective on RL, combining theoretical foundations with practical applications, and paving the way for future advancements in RL research and real-world implementations.

Type

info:eu-repo/semantics/masterThesis