CLASSIFICAZIONE DEL RISCHIO DI CREDITO NEL CREDIT SCORING: CONFRONTO TRA REGRESSIONE LOGISTICA, ALBERO DECISIONALE E LOGIC LEARNING MACHINE

Costanzo, Alberto <1999>

Mostra/Apri

tesi34863836.pdf (4.160Mb)

Autore

Costanzo, Alberto <1999>

Data

2025-10-20

Disponibile dal

2025-10-23

Abstract

Il credit scoring costituisce uno strumento essenziale per la gestione del rischio di credito nelle istituzioni finanziarie. Questa tesi ha avuto come obiettivo l’analisi comparativa di tre modelli di machine learning, ossia Regressione Logistica, Albero di Decisione (CART) e Logic Learning Machine (LLM), applicati a un compito di classificazione relativo ai prestiti in default. Per ciascun modello sono stati ricavati i ranking delle variabili, al fine di valutare le caratteristiche con il maggiore impatto nella previsione dell’insolvenza. A tal fine è stato utilizzato un ampio dataset reale proveniente da LendingClub, piattaforma di Peer-to-Peer lending, successivamente suddiviso in tre scenari ipotetici di severità per la classificazione dei prestiti. I risultati evidenziano capacità predittive elevate per tutti i modelli, con valori di AUC costantemente superiori a 0,95 in ciascuno scenario. Ogni modello ha dimostrato punti di forza specifici: la LLM ha mostrato un’accuratezza complessiva robusta e la capacità di generare regole interpretabili con alti tassi di copertura; l’Albero di Decisione si è distinto per la Precision, identificando profili di rischio più articolati; la Regressione Logistica ha fornito le stime probabilistiche più affidabili. Un risultato particolarmente rilevante è stato la convergenza dei tre modelli nei ranking, che ha individuato nel comportamento più recente del debitore, in particolare l’ultimo punteggio FICO e l’ultimo importo pagato, gli aspetti più significativi da monitorare per prevedere la probabilità di default. In conclusione, nessun modello risulta nettamente superiore agli altri, suggerendo che un approccio ibrido, capace di integrare i punti di forza delle diverse metodologie, possa condurre a strategie più efficaci di gestione del rischio di credito.

Credit scoring is a crucial instrument for credit risk management in financial institutions. The objective of this thesis was to conduct a comparative analysis of three different machine learning models, i.e., Logistic Regression, Decision Tree (CART), and the Logic Learning Machine (LLM), applied to a classification task on loan defaults. For each model, feature rankings have been retrieved to assess which characteristics had the greatest impact in predicting defaulted loans. For this study a large, real-world dataset from the LendingClub was utilized, a Peer-to-Peer lending platform. To deepen the analysis, a subdivision of the dataset was implemented, based on three hypothetical severity scenarios in classifying loans as defaulted. The findings show powerful predictive abilities for all three models, with AUC scores consistently beyond 0.95 in each scenario. The models confirmed unique strengths: the LLM showed robust overall accuracy and generated intelligible rules with high-coverage rates; the Decision Tree was the best in Precision by identifying nuanced risk profiles; and Logistic Regression provided the most reliable probability estimations. An important result was the convergence of all three models in feature ranking, identifying recent borrower behavior, in particular the last FICO score and the last payment amount, as the most significant aspects to monitor in order to predict the probability of default. In conclusion, no single model outperforms the others. This would suggest that a hybrid approach should be preferred. In this way, specific strengths of different models could lead to more effective credit risk management strategies.

Tipo

info:eu-repo/semantics/masterThesis