Statistics for Data Science

A brief summary of the topics covered in this course is as below. This is 25 hours course, it is suggested to complete this course in 3 weeks.

Introduction of Statistics for Data Scientist

  1. Introduction to basic statistics terms
  2. Types of statistics
  3. Types of data
  4. Levels of measurement (nominal, ordinal, and interval/ratio)
  5. Measures of central tendency
  6. Measures of dispersion
  7. Random variables
  8. Concept of Set
  9. Skewness, Kurtosis
  10. Covariance and correlation
  11. Data Visualization
  12. Data summarization methods
  13. Tables, Graphs, Charts, Histograms,
  14. Frequency distributions
  15. Box Plot
  16. Chebychev’s Inequality on relationship

Descriptive & Inferential Statistics for Data Scientist

  1. Type of Probability distributions – discrete vs continuous distributions,
  2. Cumulative Probabilities, Normal & Standard Normal Distribution
  3. Discrete Distributions
  4. Binomial Distributions
  5. Poisson Distribution
  6. Continuous Distributions
  7. Uniform Distribution
  8. Normal Distribution
  9. Standard Normal Distribution
  10. Exponential Distribution
  11. Sampling methods
  12. Interval Estimation
  13. Central limit theorem – sampling, sampling distribution, properties of sampling distribution, central limit theorem, estimating mean using CLT

Hypothesis Testing for Data Scientist

  1. Concepts of hypothesis testing – business relevance, framing hypotheses, hypothesis testing process and p-value
  2. Types of hypothesis tests – left- and right-tailed tests, two-tailed tests, types of errors, hypothesis testing using T-distribution
  3. Industry demos on hypothesis testing (Excel) – two-sample mean test, two-sample proportion test, A/B testing
  4. Z-Test, normal standard distribution
  5. T-Test, t-stats, Student t distribution
  6. T-stats vs. Z-stats
  7. Type 1 & type 2 error
  8. Bayes statistics (Bayes theorem)
  9. Confidence interval (CI),  margin of error
  10. Interpreting confidence levels and confidence intervals
  11. Chi-square test
  12. Chi-square distribution using python
  13. Chi-square for goodness of fit test
  14. When to use which statistical distribution?
  15. Analysis of variance (ANOVA)
  16. Assumptions to use ANOVA
  17. ANOVA three type
  18. Partitioning of variance in the ANOVA
  19. Calculating using python
  20. F-distribution
  21. F-test (variance ratio test)
  22. Determining the values of f
  23. F distribution using python

Project & Resources

  1. Resources for practice
  2. A Final Assignment