1 minute read

Data Engineering

Data Engineering

Exploratory Data Analysis (EDA)

  1. Data sourcing – public and private data
  2. Data cleaning – handling missing values, handling invalid data, filtering data, standardization, etc.
  3. Univariate analysis – unordered vs ordered categorical variables, quantitative variables & measures of central tendency
  4. Segmented univariate analysis – basis of segmentation, comparison of averages and other metrics
  5. Bivariate analysis – bivariate on quantitative and categorical variables, correlation
  6. Derived metrics – introduction, type-driven metrics, business-driven metrics, data-driven metrics
  7. Examples
    • Feature engineering and selection.
    • Analyzing bike sharing trends.
    • Analyzing movie reviews sentiment.
    • Customer segmentation and effective cross-selling.
    • Analyzing wine types and quality.
    • Analyzing music trends and recommendations.
    • Forecasting stock and commodity prices

Data Mining

  1. Introduction: Definitions, Activities, Process, Key challenges
  2. Understanding Data:  Data Types/ Attribute Types,  Statistical Descriptions of Data
  3. Measuring Data Similarity and Dissimilarity
  4. Data Pre-processing – Overview
  5. Association Rule Mining

Data Modelling

Data Cleaning

Data Visualization & Interpretation

  1. Information overload and issues in decision making
  2. Design of visual encoding schemes to improve comprehension of data and their use in decision making
  3. Presentation and visualization of data for effective communication
  4. Elementary graphics programming, charts, graphs, animations, user interactivity, hierarchical layouts, and techniques for visualization of high dimensional data & discovered patterns

EDA Tools

  1. Pandas Profiling
  2. Lux

Updated: