Esplorazione di Large Language Model open source per l'insegnamento delle lingue nelle professioni tecniche

Kazemi, Zahra <1995>

View/Open

tesi35028836.pdf (6.482Mb)

allegato350288361.pdf (668.8Kb)

Author

Kazemi, Zahra <1995>

Date

2025-10-24

Data available

2025-10-30

Abstract

Questa tesi indaga come i Large Language Model (LLM) open source possano supportare gli insegnanti nell'insegnamento del linguaggio tecnico. Mentre sistemi proprietari come Google Gemini sono attualmente all'avanguardia in termini di accuratezza, questo studio esamina se i modelli open source possano offrire alternative pratiche, flessibili ed economiche. La ricerca ha utilizzato metodi di valutazione automatizzati (ROUGE-L, BERTScore, Text Similarity, Exact Match e Response Time) insieme a revisioni di esperti. I modelli proprietari sono serviti come benchmark, con particolare attenzione ai sistemi open source. I test automatizzati hanno mostrato che Gemini ha spesso raggiunto la massima accuratezza, ma i modelli open source si sono dimostrati competitivi: LLaMA 2 ha fornito le migliori prestazioni complessive, Phi-2 ha occasionalmente raggiunto i migliori BERTScore nonostante le sue dimensioni ridotte e Mistral ha fornito risultati stabili ma meno completi. Le valutazioni degli esperti hanno rivelato un problema comune a tutti i modelli: sebbene le risposte fossero generalmente corrette, non erano "pronte per l'uso in classe". I revisori hanno sottolineato la necessità di adattare i risultati ai diversi livelli di apprendimento, aggiungere un supporto e allinearli agli obiettivi didattici. Tra i modelli open source, LLaMA 2 è risultato il più efficace in termini di chiarezza, coerenza e adattabilità, mentre Neural Chat ha fornito un feedback esplicativo che ne ha migliorato il valore pedagogico. Mistral è risultato conciso ma meno pratico, e Phi-2 ha prodotto risultati disomogenei.

This thesis investigates how open-source Large Language Models (LLMs) can support teachers in teaching technical language. While proprietary systems like Google Gemini currently lead in accuracy, this study examines whether open-source models can offer practical, flexible, and cost-effective alternatives. The research employed automated evaluation methods (ROUGE-L, BERTScore, Text Similarity, Exact Match, and Response Time) alongside expert reviews. Proprietary models served as benchmarks, with the main focus on open-source systems. Automated tests showed that Gemini often achieved the highest accuracy, but open-source models were competitive: LLaMA 2 delivered the strongest overall performance, Phi-2 occasionally reached top BERTScores despite its small size, and Mistral provided stable yet less complete results. Expert evaluations revealed a common issue across all models: although responses were generally correct, they were not "classroom-ready". Reviewers emphasized the need to adapt outputs to different learner levels, add scaffolding, and align with instructional goals. Among open-source models, LLaMA 2 was most effective in clarity, consistency, and adaptability, while Neural Chat provided explanatory feedback enhancing pedagogical value. Mistral was concise but less practical, and Phi-2 produced uneven outputs.

Type

info:eu-repo/semantics/masterThesis