Da Gherkin a Selenium: Valutazione Empirica e Prototipo di uno Strumento per la Generazione di Test End-to-End basata su LLM

Akimbetova, Ayaulym <2002>

dc.contributor.advisor	Leotta, Maurizio <1983>
dc.contributor.advisor	Ricca, Filippo <1969>
dc.contributor.advisor	Reggio, Gianna <1957>
dc.contributor.author	Akimbetova, Ayaulym <2002>
dc.date.accessioned	2026-04-02T14:24:39Z
dc.date.available	2026-04-02T14:24:39Z
dc.date.issued	2026-03-25
dc.identifier.uri	https://unire.unige.it/handle/123456789/15544
dc.description.abstract	Il testing End-to-End (E2E) è fondamentale per garantire l’affidabilità delle moderne applicazioni web. Tuttavia, lo sviluppo e la manutenzione delle suite di test E2E rimangono attività costose e dispendiose in termini di tempo. Gli script di test presentano spesso locator fragili, problemi di sincronizzazione e frequenti modifiche dell’interfaccia utente. I recenti progressi nei Large Language Models (LLM) hanno introdotto nuove opportunità per supportare lo sviluppo software tramite generazione di codice assistita dall’intelligenza artificiale. Questa tesi analizza se gli strumenti basati su LLM possano ridurre lo sforzo necessario per implementare test End-to-End automatizzati utilizzando Selenium WebDriver a partire da specifiche Gherkin. Per questo scopo è stato condotto un esperimento comparativo utilizzando otto applicazioni web e scenari di test predefiniti. Sono state valutate quattro strategie di sviluppo: implementazione manuale, ChatGPT Lite con generazione del codice in un unico passaggio seguita da rifinitura manuale, ChatGPT Max basato su interazione iterativa con il modello, e GitHub Copilot come assistente integrato nell’IDE. Lo sforzo di sviluppo è stato misurato tramite il Weighted Development Time (WDT), definito come il tempo necessario per produrre una soluzione di test funzionante normalizzato rispetto alla dimensione dello scenario in linee Gherkin. I risultati mostrano che lo sviluppo assistito da LLM riduce significativamente il tempo necessario per implementare i test rispetto allo sviluppo manuale, con ChatGPT Max che ha ottenuto il tempo medio più basso. È stata inoltre sviluppata un’estensione per Visual Studio Code per integrare la generazione di test assistita da AI nel flusso di lavoro di sviluppo. I risultati indicano che gli strumenti basati su LLM possono migliorare la produttività nello sviluppo di test web automatizzati.	it_IT
dc.description.abstract	End-to-End (E2E) testing is essential for ensuring the reliability of modern web applications. However, developing and maintaining E2E test suites remains costly and time-consuming. Test scripts often suffer from fragile element locators, synchronization issues, and frequent user interface changes. At the same time, recent advances in Large Language Models (LLMs) have created new opportunities to support software development through AI-assisted code generation. This thesis investigates whether LLM-based tools can reduce the effort required to implement automated End-to-End tests using Selenium WebDriver from Gherkin specifications. To address this question, a comparative experiment was conducted using eight web applications and predefined testing scenarios. Four development strategies were evaluated: manual implementation, ChatGPT Lite with single-step code generation followed by manual refinement, ChatGPT Max with iterative interaction to improve generated code, and GitHub Copilot as an IDE-integrated assistant providing contextual code suggestions. Development effort was measured using Weighted Development Time (WDT), defined as the time required to produce a working test solution normalized by the size of the scenario in Gherkin lines. The experiment evaluated how these strategies influence the effort required to implement automated test scripts. Results show that LLM-assisted development significantly reduces the time needed to implement tests compared to manual development, with ChatGPT Max achieving the lowest average development time. In addition, a Visual Studio Code extension was developed to integrate AI-assisted test generation directly into the development workflow. Internal validation showed reduced development time for extension-generated tests. Overall, the results indicate that LLM-based tools can improve productivity in automated web testing and have strong potential for integration into everyday software testing workflows.	en_UK
dc.language.iso	en
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.title	Da Gherkin a Selenium: Valutazione Empirica e Prototipo di uno Strumento per la Generazione di Test End-to-End basata su LLM	it_IT
dc.title.alternative	From Gherkin to Selenium: Empirical Evaluation and a Prototype Tool for LLM-Based End-to-End Test Generation	en_UK
dc.type	info:eu-repo/semantics/masterThesis
dc.subject.miur	INF/01 - INFORMATICA
dc.publisher.name	Università degli studi di Genova
dc.date.academicyear	2024/2025
dc.description.corsolaurea	10852 - COMPUTER SCIENCE
dc.description.area	7 - SCIENZE MAT.FIS.NAT.
dc.description.department	100023 - DIPARTIMENTO DI INFORMATICA, BIOINGEGNERIA, ROBOTICA E INGEGNERIA DEI SISTEMI

Files in this item

Name:: tesi37257908.pdf
Size:: 7.365Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Laurea Magistrale [7492]

Show simple item record