Una prospettiva Matematica su NLP

Mostra/ Apri
Autore
Fresu, Luca <2000>
Data
2026-03-25Disponibile dal
2026-04-02Abstract
The thesis is structured as follows.
Chapter 1 is devoted to discrete-time Markov chains, where we introduce their fundamental properties, including state classification, recurrence and transience, invariant measures, and asymptotic behavior.
Chapter 2 presents the foundations of supervised learning within the framework of statistical learning theory. We introduce empirical risk minimization and reproducing kernel Hilbert spaces (RKHS). We derive the closed-form solution of Kernel Ridge Regression.
Chapter 3 formulates text generation as a Markov chain problem, where tokens from a vocabulary are mapped to states and the sequence of words in a corpus is modeled as a discrete-time Markov chain. The transition matrix is estimated via maximum likelihood from empirical transition counts, and its statistical validity is supported by the ergodic theorem, which ensures consistency of the estimator. Higher-order language models are implemented by lifting the state space so that a k-order chain becomes a first-order chain on tuples of tokens.
Chapter 4 combines these two perspectives by applying Kernel Ridge Regression to next-word prediction in natural language. We model language as a first-order Markov chain and show how the ergodic theorem resolves the violation of the i.i.d. assumption, enabling rigorous generalization guarantees. We present a complete Python implementation using the TinyStories dataset, including:
- Word embedding via pre-trained Word2Vec models
- Kernel matrix computation with Linear and Gaussian kernels
- Training via gradient descent with early stopping
- Quantitative evaluation and text generation experiments The thesis is structured as follows:
Chapter 1 is devoted to discrete-time Markov chains, where we introduce their fundamental prop
erties, including state classification, recurrence and transience, invariant measures, and asymptotic be
havior.
Chapter 2presents the foundations of supervised learning within the framework of statistical learning
theory. We introduce empirical risk minimization and reproducing kernel Hilbert spaces (RKHS). We
derive the closed-form solution of Kernel Ridge Regression.
Chapter 3 formulates text generation as a Markov chain problem, where tokens from a vocabulary
are mapped to states and the sequence of words in a corpus is modeled as a discrete-time Markov chain.
The transition matrix is estimated via maximum likelihood from empirical transition counts, and its
statistical validity is supported by the ergodic theorem, which ensures consistency of the estimator.
Higher-order language models are implemented by lifting the state space so that a k-order chain becomes
a first-order chain on tuples of tokens.
Chapter 4 combines these two perspectives by applying Kernel Ridge Regression to next-word pre
diction in natural language. We model language as a first-order Markov chain and show how the ergodic
theorem resolves the violation of the i.i.d. assumption, enabling rigorous generalization guarantees. We
present a complete Python implementation using the TinyStories dataset, including:
Word embedding via pre-trained Word2Vec models
Kernel matrix computation with Linear and Gaussian kernels
Training via gradient descent with early stopping
Quantitative evaluation and text generation experiments
Tipo
info:eu-repo/semantics/masterThesisCollezioni
- Laurea Magistrale [7402]

