Machine Learning

A brief summary of the topics covered in this course is as below. This is 150 hours course, it is suggested to complete this course in 2 Months. Apart from my classroom course, you will be given exercises, and it will take another 100 hours in the course duration to complete these exercises.

Introduction to Machine Learning Supervised Learning

  • What is Machine Learning?
  • Supervised vs Unsupervised Learning
  • Type of ML problems
  • High Level view of ML Project Lifecycle

Linear Regression

  • Introduction to regression – equation, limitations
  • Types of regressions
  • Simple linear regression – Best-fit line, OLS, goodness of fit, Assumptions
  • Model building
  • Model Evaluation (regression parameters), Residual analysis and prediction, model interpretation
  • The Mathematics of regression (parameter estimation using OLS, the gradient descent algorithm, ANOVA)
  • Transformation of variables : Scaling and Standardization
  • Polynomial regression
  • Ordinary Least Squares
  • Linear Regression
  • Gradient Descent

Multiple linear regression

  • SLS vs MLR
  • Multicollinearity
  • Dummy Variable
  • Polynomial regression
  • Feature Selection
  • Model Building: BACKWARD, FORWARD, STEPWISE
  • R Square and Adjusted R Square
  • Loss: RMSE , MSE, MAE Comparison
  • Interpreting coefficients of MLR

Regularization

  • Introduction to Regularization
  • Regularized linear models
  • Ridge regression
  • Lasso regression
  • Elastic net

Classification

  • Introduction : Regression vs classification, types of classification, evaluating classification models
  • Logistic Regression : Best-fit sigmoid curve, odds & log odds, multivariate logistic regression
  • Building Logistic Regression Model
  • Model Evaluation: Confusion metrics and accuracy, sensitivity & specificity, precision & recall, trade-offs, ROC-AUC, predictions
  • Transformation of variables : Scaling and Standardization (optional)
  • Decision Trees : Descriptive vs Discriminative classification, the decision tree algorithm, measuring purity (Gini index, Entropy, Information gain),
  • Building Decision Trees Model
  • K-Nearest Neighbor Model
  • Telecom Churn Case Study

Ensamble Model

  • Introduction to Ensemble Modelling
  • Bagging (Bootstrap Aggregation) Model Introduction
  • Random Forest
  • Boosting Model Introduction
  • Adaboost, Gradient Boost, XGBoost, Light GBM, CatBoost
  • Stacking
  • Bledning
  • Out of Bag (OOB)
  • Feature importance in random forests
  • Building Random Forest Model
  • Building Boost Based Model

Support Vector Machine (SVM)

  • Linear SVM classification
  • Mathematical/ geometrical  intuition
  • In-depth geometrical intuition
  • Soft margin classification
  • Nonlinear SVM classification
  • Polynomial kernel
  • Gaussian, RBF kernel
  • Data leakage
  • SVM Regression
  • Mathematical/ geometrical intuition

Naïve Bayesian

  • Introduction to Bayes theorem
  • Multinomial naïve Bayes
  • Gaussian naïve Bayes
  • Various type of Bayes theorem and their intuition

Clustering & Market Basket Analysis

  • Introduction to clustering, types of clustering, Euclidean distance & centroid
  • K-means clustering algorithm
  • Transformation of variables : Scaling and Standardization (Optional)
  • Building K-means model
  • Introduction to market basket analysis, cross-selling & upselling, bag vs basket of products, the Apriori algorithm,
  • Market Basket Analysis
  • Gaussian Mixture Model
  • K-Means
  • K-Means++
  • Batch K-Means
  • Hierarchical Clustering
  • DBSCAN
  • Evaluation of clustering
  • Homogeneity, completeness and v-measure
  • Silhouette coefficient
  • Davies-bouldin index
  • Contingency matrix
  • Confusion matrix

Model Evaluation & Model Selection

  • Principles of model selection – model & learning algorithm
  • Simplicity, Complexity & overfitting, bias-variance trade off.
  • Tuning Complexity and Regularization
  • Regularization, hyperparameters, and cross validation
  • Model building & Model evaluation
  • Hyperparameter tuning using grid-search and randomized-search CV
  • Handling class imbalance
  • Model Selection

Feature Engineering

  • Feature engineering – introduction
  • Handling numeric features, handling categorical features, handling time-based features
  • Feature selection using CV
  • Feature selection
  • Recursive feature elimination
  • Backward elimination
  • Forward elimination
  • Handling missing data
  • Handling outliers
  • Filter method
  • Wrapper method
  • Embedded methods
  • Feature scaling
  • Standardization
  • Mean normalization
  • Min-max scaling
  • Unit vector
  • Feature extraction
  • PCA (principle component analysis)
  • Introduction to Data encoding
  • Nominal encoding
  • One hot encoding
  • One hot encoding with multiple categories
  • Mean encoding
  • Ordinal encoding
  • Label encoding
  • Target guided ordinal encoding
  • Covariance
  • Correlation check
  • Pearson correlation coefficient
  • Spearman’s rank correlation
  • VIF

Handling Imbalance Data

  • Introduction to Data Imbalance
  • Up-sampling
  • Down-sampling
  • Undersampling using Tomek Links
  • K-Fold Cross Validation
  • Stratified K-Fold
  • Synthetic Minority Oversampling technique (SMOTE)
  • Adjusting Class Weight
  • Random Oversampling
  • Data interpolation
  • Choosing Right Evaluation Metric
  • Treat problem as Anomaly Detection

Model Evaluation Metrics

  • Confusion Matrix
  • Accuracy, Recall (Sensitivity/ TPR), Precision, F1, ROC, AUC
  • Error Rate, Specificity, FPR, Prevalence
  • RMSE, MAE, MSE
  • R Square, Adjusted R Square

Loss Function

  • Introduction to Regression and Classification Loss Function
  • Root Mean Square Error (RMSE)
  • Mean Square Error (MSE)
  • Mean Average Error (MAE)
  • Huber Loss
  • Maximum Likelihood Estimation
  • Binary Cross Entropy Loss
  • Hinge Loss
  • Multi Class Cross Entropy Loss
  • KL (Kullback Leibler) Divergence Loss

Model Monitoring

  • Introduction to model monitoring
  • Model Drifting
  • What to monitor?
  • How frequently evaluate?
  • How to take decision?

Model Retraining

  • Introduction to model retraining
  • Retraining on same algorithm and new data
  • Trying new features
  • Trying new algorithms

Dimensionality reduction

  • The curse of dimensionality
  • Dimensionality reduction technique
  • PCA (principle component analysis) Introduction & Maths
  • Scree plots
  • Eigen-decomposition approach
  • tNSE

Decision Trees Based ML

  • Decision Tree
  • Definition of Ensemble techniques
  • Bagging technique
  • Bootstrap aggregation
  • Random forest (bagging technique)
  • Random forest repressor
  • Random forest classifier
  • Complete end-to-end project with deployment
  • Adaboost, LGBM, XGBoost
  • Gradient Boost

Recommendation Systems

  • Introduction to Recommendation Systems
  • Application of Recommendation Systems
  • Collaborative Filtering
  • Content Based Filtering

Multilayer Perceptron

Hidden Markov Models (HMM)

ML Libraries / Algorithm

  • scipy (pandas, numpy, matplotlib, sympy, scikit-learn, scikit-image)
  • scikit-learn, scikit-image, statsmodel