Flexible Clustering Parallelo

Pastorino, Edoardo <1999>

dc.contributor.advisor	Pastore, Vito Paolo <1989>
dc.contributor.advisor	Dell'Amico, Matteo <1979>
dc.contributor.advisor	D'Agostino, Daniele <1976>
dc.contributor.author	Pastorino, Edoardo <1999>
dc.date.accessioned	2023-12-21T15:17:43Z
dc.date.available	2023-12-21T15:17:43Z
dc.date.issued	2023-12-13
dc.identifier.uri	https://unire.unige.it/handle/123456789/7200
dc.description.abstract	La tesi presenta la parallelizzazione di un algoritmo di clustering all'avanguardia, il FISHDBC. Questo obiettivo è stato raggiunto migliorando la creazione delle principali strutture dati e componenti dell'algoritmo: l'HNSW, una struttura dati basata su grafo utilizzata nella ricerca approssimativa dei nearest neighbors; l'MST, un albero che attraversa tutti i vertici nel grafo minimizzando il peso totale degli archi; il clustering HDBSCAN, progettato per eseguire il clustering robusto dei punti dati in base alla loro densità. Il mio contributo si basa su un'implementazione parallela con memoria condivisa e senza lock, resa possibile perché FISHDBC fornisce una soluzione approssimata e offre buone prestazioni. È importante notare che l'algoritmo di Flexible Clustering Parallelo è completamente scritto in Python, senza dipendenze da altri linguaggi. Questa rappresenta una caratteristica importante che lo rende facile da usare e altamente personalizzabile, considerando che le metriche di distanza definite dall'utente, per calcolare la similarità tra i dati, sono per lo più scritte in questo linguaggio.	it_IT
dc.description.abstract	The thesis presents the parallelization of a state-of-the-art clustering algorithm, the FISHDBC. This target has been achieved by improving the creation of the main data structures and components of the algorithm: the HNSW, a graph-based data structure used in approximate nearest neighbor search; the MST, a tree that spans all the vertices in the graph while minimizing the total weight of the edges; the HDBSCAN clustering, designed to perform robust clustering of data points based on their density. My contribution is based on a lock-free strategy parallel implementation with shared memory, made feasible because FISHDBC provides an approximated solution, and provides good performance figures. It is worth noting that the Parallel Flexible Clustering algorithm is completely written in Python, without dependencies on other languages. This represents an important feature making it user-friendly and highly customizable, considering that user-defined distance metrics, for computing similarity among data, are mostly written in this language.	en_UK
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Flexible Clustering Parallelo	it_IT
dc.title.alternative	Parallel Flexible Clustering	en_UK
dc.type	info:eu-repo/semantics/masterThesis
dc.subject.miur	INF/01 - INFORMATICA
dc.publisher.name	Università degli studi di Genova
dc.date.academicyear	2022/2023
dc.description.corsolaurea	10852 - COMPUTER SCIENCE
dc.description.area	7 - SCIENZE MAT.FIS.NAT.
dc.description.department	100023 - DIPARTIMENTO DI INFORMATICA, BIOINGEGNERIA, ROBOTICA E INGEGNERIA DEI SISTEMI

Files in this item

Name:: tesi26654510.pdf
Size:: 6.491Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Laurea Magistrale [6583]

Show simple item record