INTEGRAZIONE DI MODULI SOFTWARE DI BACKEND, COSTRUZIONE PIPELINE CON AIRFLOW E DOCKER
View/ Open
Author
Boriassi, Tommaso <2001>
Date
2025-03-25Data available
2025-03-27Abstract
During my internship at STAM S.r.l., I designed and developed a complete backend infrastructure for the acquisition, management, and storage of data from various sources. This system is part of a broader business context aimed at developing methods to train models with limited data.
The architecture I created features a modular structure with well-isolated components and clear interfaces. I implemented a multi-protocol download system that supports both HTTP/REST for Sentinel Hub APIs (satellite image acquisition) and SFTP for retrieving datasets with annotations.
To optimize bandwidth usage and reduce processing times, I developed an efficient caching mechanism based on SHA-256 hashes to avoid redundant downloads. The system supports three different storage destinations: local filesystem for development and testing, Google Cloud Storage for long-term storage, and MinIO as an S3-compatible alternative.
I designed the system to handle errors gracefully by implementing failure isolation strategies that allow partial operations to be completed even when some storage systems are unavailable. The orchestration of the entire workflow is managed by Apache Airflow, with a parameterized DAG that enables users to easily select the download protocol and storage systems to use.
I containerized the entire environment with Docker.
Durante il mio tirocinio presso STAM S.r.l., ho progettato e sviluppato un'infrastruttura backend completa per l'acquisizione, la gestione e l'archiviazione di dati provenienti da diverse fonti. Questo sistema si inserisce in un contesto aziendale più ampio che mira a sviluppare metodi per addestrare modelli con dati limitati.
L'architettura che ho creato è caratterizzata da una struttura modulare con componenti ben isolati e interfacce chiare.
Ho implementato un sistema di download multi-protocollo che supporta sia HTTP/REST per le API di Sentinel Hub (acquisizione di immagini satellitari) sia SFTP per il recupero di dataset con annotazioni. During my internship at STAM S.r.l., I designed and developed a complete backend infrastructure for the acquisition, management, and storage of data from various sources. This system is part of a broader business context aimed at developing methods to train models with limited data.
The architecture I created features a modular structure with well-isolated components and clear interfaces. I implemented a multi-protocol download system that supports both HTTP/REST for Sentinel Hub APIs (satellite image acquisition) and SFTP for retrieving datasets with annotations.
To optimize bandwidth usage and reduce processing times, I developed an efficient caching mechanism based on SHA-256 hashes to avoid redundant downloads. The system supports three different storage destinations: local filesystem for development and testing, Google Cloud Storage for long-term storage, and MinIO as an S3-compatible alternative.
I designed the system to handle errors gracefully by implementing failure isolation strategies that allow partial operations to be completed even when some storage systems are unavailable. The orchestration of the entire workflow is managed by Apache Airflow, with a parameterized DAG that enables users to easily select the download protocol and storage systems to use.
I containerized the entire environment with Docker.
Type
info:eu-repo/semantics/bachelorThesisCollections
- Laurea Triennale [2776]