Data-Science-Portfolio

Data Science Projects Portfolio

The portfolio contains my projects from data science, data analysis, SQL databases and python programming which show my all self-study progress.

The projects includes a few categories:

Projects:

Machine learning:

ML supervised & unsupervised:

The project concerns prediction of the advertisement click using the machine learning. The main aim of this project is predict who is going to click ad on a website in the future. The analysis includes data analysis, data preparation and creation model by different machine learning models.

Churn prediction

The project concerns churn prediction in the bank customers. It includes data analysis, data preparation and create model by using different machine learning algorithms to predict whether the client is going to leave the bank or not.

Books Recommendation System

The project concerns the books recommendation system. It includes data analysis, data preparation and build model by using colaborative filtering and matrix factorization to get books recommendations.

Customer segmentation

The project contains customer segmentation by using the RFM method (RFM score) and K-Means clustering for creating customer segments based on data provided.

Fraud Detection

The project concerns the anomaly detection in credit cards transactions using machine learning models and Autoencoders. The main aim of this project is predict whether a given transaction was a fraud or not.

Sales forecasting

The project concerns sales forecasting by using time series model. The project includes sales data analysis and forecast of the number of orders by using Prophet library.

Real Estate price prediction

The project concerns real estate price prediction using linear regression models. I have build a model which predict real estate price based on historical data.

Natural Language Processing:

Product Categorization

The project concerns product categorization (make-up products) based on their description. I have build multi-class text classification model (with ML algorithms, MLP, CNN and Distilbert model) to predict the category (type) of a product. From the data I also have trained Word2vec and Doc2vec model and created Topic Modeling and EDA analysis.

Text Summarization

Text summarization based on extractive and abstractive methods by using python. The analysis includes text summary by calculating word frequency with spacy library, TFIDF vectorizer implementation, automatic text summarization with gensim library and abstractive techniques by using Hugging Face library.

Spam detection

The project concerns spam detection in SMS messages to determine whether the messages is spam or not. I have build model by using pretrained BERT model and different machine learning algorithms. The analysis includes also text mining with NLP methods to prepare and clean data.

Sentiment analysis reviews

The project concerns sentiment analysis of women’s clothes reviews. I have built model to predict if the review is positive or negative. I have used different machine learning algorithms and a pre-trained Glove word embeddings with Bidirectional LSTM. The project also includes EDA analysis and sentiment analysis by using Vader and TextBlob methods.

Computer vision/Image processing:

Plant pathology

The project concerns recognition diseases on apple leaves based on their images. The solution includes data analysis, data preparation, CNN model with data augmentation and transfer learning to recognition of leaves diseases.

Waste Classification

The project concerns waste classification to determine if it may be recycle or not. In the analysis I have used Convolutional Neural Network (CNN) model with data augumentation and transfer learning with pre-trained MobileNet V2 model.

Face Detection

In the project I have used OpenCV library to detect faces, eyes and smile in an image.

Data analysis:

Market Basket analysis

The project concerns market basket analysis and product recommendation by using the association methods. I have build model by using the Apriori algorithm to products recomendation based on our data.

IT job market analysis

The project concerns the analysis of the IT job market using data from GitHub, StackOverflow and Web scraping data. I have used SQL, Google Big Query and Python (pandas, numpy, matplotlib, seaborn) to analyze the data.

Sales analysis - SQL Data Analysis

The project contains the analysis of example sales data with SQL. The project showcase my knowledge and skils in SQL such as data manipulation, analysis and querying.

HR Analytics Dashboard

The project contains the analysis of employee attrition data and create an interactive dashboard using Power BI.

Air quality analysis

The project includes data analysis and outliers detection of air quality data. The outliers detection have been made with a few methods such as Tukey’s method (IQR) and Isolation Forest algorithm.

World happiness reports analysis

The project includes world happiness analysis over 5 years (2015-2019). For analysis I have used SQL (SQLite) and python.

Sales Dashboard

The project allows to build interactive dashboard from sales data by using pandas-bokeh library.

Python projects

Sentiment analysis app

The REST API Web App for Sentiment analysis of clothes reviews by using Flask and Machine Learning model.

Waste app

It is Streamlit application with using a Deep Learning model to determine if a given waste are recycle or organic. I have used a previous trained CNN (Convolutional Neural Networks) algorithm to detect waste.

Excel report

Automating the Excel report with python and openpyxl library.

CSV Report Processing

This Python script allows to read a CSV file entered by the user, changes the data contained in it and returns the transformed data as a new CSV one.

Extracting data using API

In the project I have used the API to get the data and create a dataset. I have created two examples of get the data from an API. The data received was saved in json format and they were exported to a csv file.

SQL and Python projects

ETL in python and SQLite

The project includes a simple ETL process using Python and SQLite database. This pipeline allows to match reported chargebacks (Excel file) with transactions from the database.

CRUD in python and SQLite

The script allows to make a basic crud operations by using python and SQLite3.