Impatto di attacchi evasivi sulla spiegabilità di malware detector

Lozza, Ludovico <1998>

Mostra/Apri

tesi37346902.pdf (4.493Mb)

Autore

Lozza, Ludovico <1998>

Data

2026-03-23

Disponibile dal

2026-03-26

Abstract

La diffusione dell'intelligenza artificiale negli ultimi anni ha aperto la strada all'integrazione dei modelli di machine learning e deep learning nel rilevamento dei malware. Tuttavia, tali tecnologie risultano vulnerabili agli Adversarial EXEmples, ovvero perturbazioni dei dati di input accuratamente progettate per indurre una classificazione errata del software malevolo. Queste vulnerabilità sollevano interrogativi su come le tecniche di attribuzione delle feature, impiegate per spiegare il processo decisionale del modello, siano influenzate da tali manipolazioni dell'input. Questa tesi indaga sugli effetti degli attacchi evasivi sulle spiegazioni relative ai file Windows Portable Executable (PE) in diversi modelli di rilevamento del malware. Viene implementata una specifica metodologia strutturata in quattro fasi per valutare sistematicamente l'affidabilità e la robustezza di due metodi di attribuzione delle feature, in presenza di due famiglie di perturbazioni. In particolare, lo studio analizza MalConv, spiegato con Integrated Gradients e soggetto a due tipi di attacchi DOS Header, ed Ember GBDT, spiegato con valori SHAP e soggetto ad attacchi Section Injection. La valutazione viene condotta su un dataset bilanciato di campioni di malware appartenenti a diverse categorie. I risultati ottenuti evidenziano un cambiamento significativo e una marcata ridistribuzione delle attribuzioni a seguito delle manipolazioni, suggerendo l'introduzione di pattern che influenzano efficacemente le predizioni dei modelli e portano a errori di classificazione.

The diffusion of AI in the recent years paved the way for the integration of machine learning and deep learning models into malware detection. However, such technologies are vulnerable to Adversarial EXEmples, namely carefully-crafted perturbations on the input data that induce a misclassification of malicious software. These vulnerabilities raises concerns on how the feature attributions techniques, employed to explain the model's decision process, are influenced by such input manipulations . This thesis investigates the effects of adversarial attacks on the explanations for Windows Portable Executable (PE) files across different malware detection models. A specific methodology structured in four stages is implemented to systematically assess the reliability and robustness of two feature attribution methods, under two families of perturbations. In particular, the study analyzes MalConv, explained with Integrated Gradients and targeted by two types of DOS Header attacks, and Ember GBDT, explained with SHAP values and subject to Section Injection attacks. The evaluation is conducted on a balanced dataset of malware samples from multiple categories. The obtained results highlight a significant shift and redistribution of attributions after the manipulations are applied, suggesting the introduction of patterns that effectively influence the models' predictions and lead to misclassification.

Tipo

info:eu-repo/semantics/masterThesis