Un robot sociale conversazionale con consapevolezza visiva e contestuale

Hong, Zhouyang <1997>

dc.contributor.advisor	Recchiuto, Carmine <1984>
dc.contributor.advisor	Sgorbissa, Antonio <1970>
dc.contributor.author	Hong, Zhouyang <1997>
dc.date.accessioned	2023-09-07T14:14:35Z
dc.date.available	2023-09-07T14:14:35Z
dc.date.issued	2023-08-31
dc.identifier.uri	https://unire.unige.it/handle/123456789/6208
dc.description.abstract	Il termine ChatGPT è molto popolare nel 2023. Recuperare informazioni ed elaborare testi non è mai stato così conveniente con ChatGPT, e ciò che è più impressionante è che capisce il contesto durante le conversazioni su più round. Tuttavia, l'ultimo modello di OpenAI, GPT-4, non ha ancora la capacità di svolgere conversazioni multi-modali. Considerando le sue potenti capacità conversazionali testuali, in particolare nella comprensione del testo e nel ragionamento, è nata un'idea: se GPT-4 potesse utilizzare naturalmente frammenti di descrizioni testuali di un'immagine e fingere di poter vedere durante la conversazione. Seguendo questa idea, è stato progettato e implementato un sistema che integra la descrizione densa delle immagini con GPT-4. Per valutare la fattibilità dell'idea e le prestazioni del sistema, abbiamo progettato un esperimento. Questo esperimento ha utilizzato questionari per misurare le differenze nelle interazioni con un robot – confrontando quelli dotati di visione con quelli senza. I dati tecnici sono stati raccolti durante l'esperimento, e alla fine, tutti i dati raccolti sono stati analizzati. Parole chiave: GPT-4, multi-modale, conversazione situata, conversazione fondata, descrizione densa, prestazioni del sistema.	it_IT
dc.description.abstract	ChatGPT is a hot term during the year 2023. Retrieving information and processing text has never been more convenient with ChatGPT, and most impressively, it understands the context during multi-round conversations. However, OpenAI’s latest model, GPT-4, still lacks the ability to perform multi-modal conversations. Considering its powerful textual conversational ability especially in understanding text and reasoning, an idea was formed which is: if GPT-4 is able to naturally use pieces of textual descriptions of an image and pretend to be able to see during the conversation. Following this idea, a system integrating image dense captioning with GPT-4 was designed and implemented. To evaluate the idea’s feasi- bility and the system’s performance, we designed an experiment. This experiment used questionnaires to measure the differences in interactions with a robot – comparing those equipped with vision to those without. Technical data were collected throughout the experiment, and in the end, all the gathered data were analyzed. Keywords: GPT-4, multi-modal, situated conversation, grounded conversation, dense captioning, system performance	en_UK
dc.language.iso	en
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.title	Un robot sociale conversazionale con consapevolezza visiva e contestuale	it_IT
dc.title.alternative	A visually and contextually aware conversational social robot	en_UK
dc.type	info:eu-repo/semantics/masterThesis
dc.subject.miur	ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
dc.publisher.name	Università degli studi di Genova
dc.date.academicyear	2022/2023
dc.description.corsolaurea	10635 - ROBOTICS ENGINEERING
dc.description.area	9 - INGEGNERIA
dc.description.department	100023 - DIPARTIMENTO DI INFORMATICA, BIOINGEGNERIA, ROBOTICA E INGEGNERIA DEI SISTEMI

Files in this item

Name:: tesi25034479.pdf
Size:: 6.508Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Laurea Magistrale [6789]

Show simple item record