Apprendimento statistico con catene di Markov a stati finiti

Mulder, Nicolo' <2003>

View/Open

tesi35849901.pdf (843.0Kb)

Author

Mulder, Nicolo' <2003>

Date

2025-11-17

Data available

2025-11-20

Abstract

La motivazione di questa tesi è studiare un modello semplice, ma matematicamente rigoroso, che descriva il problema della predizione del prossimo token (unità linguistica) nei modelli di linguaggio noti come large language model, come ad esempio ChatGPT. In questi modelli, viene fornito un testo costituito da una sequenza di parole e simboli, detti token, come ad esempio "era una notte buia e ...", e in risposta il sistema fornisce la parola più probabile ( ad esempio "tempestosa" ) in base ad esempi di testi simili che sono stati forniti precedentemente al sistema. Il modello matematico completo è molto complesso, poiché occorre tenere conto di tutte le soluzioni sintattiche, semantiche e di contesto anche a lunga distanza. In questa tesi supponiamo che il token successivo dipende solo dal precedente e modelliamo il problema con una catena di Markov a tempo discreto, stati finiti e omogenea.

The motivation for this thesis is to study a simple, yet mathematically rigorous model that describes the problem of next-token prediction (linguistic unit) in the language models known as Large Language Models (LLMs), such as ChatGPT. In these models, an input text consisting of a sequence of words and symbols, called tokens, is provided—for example, 'It was a dark and ... night'. In response, the system provides the most probable word (e.g., 'stormy') based on similar text examples previously fed to the system. The complete mathematical model is highly complex, as it needs to account for all syntactic, semantic, and even long-range context solutions. In this thesis, we simplify the problem by assuming that the next token depends only on the previous one, and we model the problem using a Markov chain, that is discrete-time, finite-state, and homogeneous.

Type

info:eu-repo/semantics/bachelorThesis