Gianmarco Russo

Logo

Milan, Italy
sezc.gianmarco.russo@gmail.com
Resume | LinkedIn | GitHub


25 y/o | M. Sc. in Data Science | B. Sc. in Computer Engineering

Portfolio


Data Science for Cybersecurity

Master Thesis: Unsupervised Machine Learning for Intrusion Detection Systems.

View on GitHub Open PDF

ML Intrusion Detection implemented using Autoencoders, One Class SVM and Isolation Forest.

This thesis explores anomaly detection of web-based attack on microservices based applications by modeling application performance metrics and service logs. The general idea is that a normal activity profile can be built upon the (simulated) normal activity on the web application and then the anomalies such as web attacks can be detected as different behaviour with respect to the normal activity. This task will be carried out by generating a dataset only containing normal activity and then train machine learning models to distinguish between the learnt behaviour and different behaviours.



Adversarial Attacks in Deep Learning Systems.

View on GitHub Open PDF

Adversarial Attack detection using SVD as a proactive measure.

Machine Learning algorithms are used to create mathematical models for data-driven systems. Such a model can make accurate predictions without being explicitly programmed to do so. These techniques have a wide range of applications, from the digital economy to artificial intelligence, including critical areas like autonomous driving or ISDS. ISDS (Intelligent Security Detection Systems) are systems designed to identify and mitigate malicious activity. Anomaly detection is a common example where machine learning models are employed to build a reliable behavior model. The model analyzes normal behavior to subsequently classify abnormal activities and thus detect malicious activity. These security systems are vulnerable to adversarial attacks.




Computer Vision and Signals

Image and Audio SuperResolution using CNN and GANs.

View on GitHub Open PDF

CNN and GANs for super resolution.

The main objective is to train Convolutional Neural Networks and Generative Adversarial Network for the task of super resolution: the enhancement of 1D (audio) and 2D(images) signals. This repository contains the demo that uses our trained models to apply super resolution to images and audio.




Pill Quality Control / Classification and Augmentation.

View on GitHub Open PDF

Image classification and augmentation using traditional techniques and Generative Adversarial Neural Networks.

The first objective of this project is to perform classification on pills, specifically trying to detect if in a quality control scenario is possibile to detect pills with cosmetics defects like chips or dirt. This task has been carried out training CNNs from scratch and comparing them with pre-trained nets. The second objective is to remedy for the lack of training data using generative adversarial neural networks (GANs), combined with traditional data augmentation.




Natural Language Processing

NIPS Papers: Topic Modelling and Text Summarization.

View on GitHub Project Report

The Neural Information Processing System (NIPS) is a machine learning and computational neuroscience competition held every year from 1987 to date. We tackled a dataset containing all the papers submitted from 1987 to 2017, with the aim of applying unsupervised NLP models for topic modelling (LDA/PLSA) and a supervised approach for extractive text summarization (topic representation and indicator representation).




7 Sins Diachronical Analysis

View on GitHub Open PDF

The aim of the work is to perform a diachronic analysis of the 7 deadly sins to find out how their meaning and use has changed from the 19th century to the 21st century. Additionally, we tried implementing some comparison metrics for the embedding models we used.
This work involved using different word embedding techniques: Word2Vec and GloVe, while using CADE to align the text corpora and analyze the semantic difference of the words between the 1800s and 2000s. Additionally, we implemented a geometrical comparison technique to evaluate how different the embeddings are built between W2V and GloVe.




Time Series Analysis

Restaurant’s Revenue Loss during first COVID-19 pandemic lockdown (ITA).

View on GitHub Open PDF

Time Series Analysis and Forecasting (using ARIMA , and Mixture models ) of a restaurant's revenue during the first lockdown of the COVID-19 pandemic in Italy, to estimate the loss incurred..

One of the sectors most affected by the Covid-19 pandemic has certainly been the restaurant industry. Due to the related restrictions, restaurant owners saw their revenues plummet dramatically. In such a historical period, it can be very useful to analyze historical data to try to study and predict what the daily or weekly revenues will be in order to adjust the supply of raw materials accordingly. In this paper, models from the ARIMA family and the Cluster-Weighted Model with and without cross-validation were used for forecasting time series.




Energy Consumption Forecast (ITA).

View on GitHub Open PDF

Time Series Analysis and Forecasting (using ARIMA , UCM and LSTMs models) of energy consumption.

Energy consumption forecasts are crucial for various purposes, from purchasing energy from the producer to managing overloads. This paper analyzes a univariate time series of energy consumption sampled every 10 minutes. The provided period spans from 01/01/2017 to 30/11/2017, with the aim of estimating December's consumption. The dataset, in total, consists of 48,096 observations. There are no additional details such as weather, holidays, and other typical location-specific characteristics.




Data Visualization

Netflix Top 10 Quality: Data Analysis & Interactive Visualization (ITA).

View on GitHub View on Tableau

Netflix calculates the "Weekly Top 10" simply by ordering movies or TV series based on the hours viewed in the last 7 days in descending order. But does this ranking truly reward titles of higher quality or the most popular ones? Are movies and TV series treated equally? Is the Top 10 genuinely helpful for users in selecting the best titles, or does it feature lower-quality content compared to what is available in Netflix's catalog?
Netflix is one of the most widely used streaming platforms, boasting over 214 million accounts. The platform employs one of the most effective recommendation systems, featuring on each user's homepage the most popular movies and TV series that align with the subscriber's preferences. A common feature among all accounts is the "Weekly Top 10," which appears at the top of the homepage and is updated every Sunday. Millions of people see this list of the top ten movies or TV series every day, inevitably influencing users' choices. Furthermore, it serves as an invaluable showcase for every actor and director, both emerging and established. The objective of this research is to analyze data related to each movie and TV series that made it into the Top 10 in the last six months to answer these research questions.




Infographics: PROM score and the possible relationship with weather conditions (ITA).

PROMs are patient-reported outcome measures following an operation or health treatment, often used to assess the quality of health care.
We evaluated, through some infographics made through Python, using the matplotlib and Seaborn libraries, the possible presence of a relationship between the outcomes of mental and physical health status assessments of a sample of patients, following surgery, and the gender and time(day and night) relative to the time of questionnaire completion.




Data Management

Data Acquisition and Modeling: Movies and Tv series using Netflix and IMDB Data (ITA).

Open PDF

Data Acquisition and Modeling: Document-based database containing information related to IMDB and Netflix, scraped from various sources and obtained via API.

The following study focuses on the acquisition, aggregation, integration, cleaning, and storage of a series of datasets related to the Netflix streaming platform in MongoDB. In particular, various data acquisition techniques such as web scraping and APIs were employed. Once the necessary data to answer research questions were obtained, they underwent cleaning and enrichment. Dozens of attributes related to titles that made it into Netflix's weekly Top 10 were derived from data provided by IMDb. Through the comprehensive enrichment process, a more enriched dataset with a more "flexible" structure was obtained and stored in the document-based MongoDB database.



© 2023 Russo Gianmarco. Powered by Jekyll and the Minimal Theme.