Skip to content

Study projects developed during data science courses

Notifications You must be signed in to change notification settings

gir2017/dst_55_Alena_Kur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data science beginner projects

Description

Study projects developed during data science courses.

module_0

The function guesses the number and prints the number of attempts.

module_1

Studying the provided data using pandas.

EDA, prepare the data for the machine learning

  • Filter outliers
  • Perform correlation analysis in quantitative data
  • Perform analysis of the nominative variables
  • Select columns for the machine learning step.

Predict tripadvisor restaurant rating.

  • Data cleaning
  • Filling NA
  • Outlier removing
  • Feature Engineering
  • EDA
  • Using ML first time with default parameters
    First whole data preprocessing with eda and feature engineering.

Bank score prediction project

  • Data cleaning
  • Filling NA
  • Outlier removing
  • Feature Engineering
  • EDA
  • ML
  • Naive model
  • PCA, SVD to reduce the matrix size
  • Hyperparameter tuning

Predict car classes from the pictures using deep learning

  • 6 types of augmentation
  • Different sizes of images starting from 512 to 224
  • Different number of epochs
  • Different batch sizes
  • All model types that are presented in tf.keras.applications
  • Fine-tuning and transfer learning
  • LR were optimized using ReduceLROnPlateau
  • Different optimizers
  • Batch Normalization
  • Different callback Keras functions
  • TTA
  • Different head architecture

Analysis of vacancies from HeadHunter using SQL query in jupyter notebook

Property price prediction
The data have a lot of outliers, mistakes, input errors, slang abbreviations, that's why the project was split into 2 parts data_cleaning.ipynb and eda_ml.ipynb

  • Data cleaning
  • Data Enrichment
  • EDA
  • Feature Engineering
  • ML
  • Outlier removal using different models: IsolationForest, EllipticEnvelope, LocalOutlierFactor
  • Feature selection using different methods: RFE, SelectFromModel, FeatureImportance
  • Testing of linear models. Baseline.
  • Testing of 5 different advanced models: Random Forest, CatBoost, Gradient Boosting, XGBoost, LightGBM. Bagging and stacking have also been tested.
  • Hyperparameter tuning