Quantificazione empirica delle correlazioni spurie nella malware detection

Perasso, Bianca <2001>

View/Open

tesi33443845.pdf (2.604Mb)

Author

Perasso, Bianca <2001>

Date

2025-07-18

Data available

2025-07-24

Abstract

Negli ultimi anni, i modelli end-to-end di deep learning si sono affermati come strumenti efficaci per la malware detection, permettendo l’apprendimento automatico direttamente dai file binari grezzi senza la necessità di un’estrazione manuale delle feature. Tuttavia, la loro efficacia è spesso compromessa dalla presenza di spurious correlations, ovvero pattern non semantici nell’input su cui i modelli possono fare affidamento per generare le predizioni. Queste scorciatoie riducono la robustezza e l’affidabilità dei malware detector, rendendoli vulnerabili ad attacchi adversarial. Questa tesi indaga la suscettibilità dei malware detector end-to-end alle correlazioni spurie. Viene sviluppata una metodologia sistematica basata sulla tecnica di attribuzione Integrated Gradients, con l’obiettivo di quantificare quanto le reti neurali si affidino a regioni non informative dei file Windows Portable Executable (PE), come l’header DOS, lo slack space e l’overlay. Viene introdotto uno spurious correlation score per confrontare l’attenzione assegnata a sezioni semanticamente rilevanti, come la sezione code, rispetto a quelle irrilevanti. Il confronto viene effettuato su un dataset bilanciato composto da sample reali di malware e goodware, utilizzando tre modelli state-of-the-art: MalConv, BBDNN e AvastStyleConv. I risultati mostrano che, sebbene tutti i modelli attribuiscano una rilevanza significativa alla sezione code, essi presentano anche una dipendenza non trascurabile da feature spurie.

In recent years, end-to-end deep learning models have emerged as powerful tools for malware detection, enabling automatic learning from raw binary files without requiring manual feature engineering. However, their effectiveness is often undermined by the presence of spurious correlations, non-semantic patterns in the input that models may rely on to make predictions. These shortcuts compromise the robustness and reliability of malware detectors, making them vulnerable to adversarial attacks. This thesis investigates the susceptibility of end-to-end malware detectors to spurious correlations. We develop a systematic methodology based on the Integrated Gradients attribution technique to quantify the reliance of neural networks on non-informative regions of Windows Portable Executable (PE) files, such as the DOS header, slack space, and overlay. A spurious correlation score is introduced to compare the attention paid to semantically meaningful sections, e.g. the code section, with respect to irrelevant ones. The comparison is evaluated on a balanced dataset of real-world malware and goodware samples using three state-of-the-art models: MalConv, BBDNN, and AvastStyleConv. Results show that, while all models assign significant relevance to the code section, they also exhibit non-negligible reliance on spurious features.

Type

info:eu-repo/semantics/masterThesis