Data Engineering
Data Engineering
Exploratory Data Analysis (EDA)
- Data sourcing – public and private data
- Data cleaning – handling missing values, handling invalid data, filtering data, standardization, etc.
- Univariate analysis – unordered vs ordered categorical variables, quantitative variables & measures of central tendency
- Segmented univariate analysis – basis of segmentation, comparison of averages and other metrics
- Bivariate analysis – bivariate on quantitative and categorical variables, correlation
- Derived metrics – introduction, type-driven metrics, business-driven metrics, data-driven metrics
- Examples
- Feature engineering and selection.
- Analyzing bike sharing trends.
- Analyzing movie reviews sentiment.
- Customer segmentation and effective cross-selling.
- Analyzing wine types and quality.
- Analyzing music trends and recommendations.
- Forecasting stock and commodity prices
Data Mining
- Introduction: Definitions, Activities, Process, Key challenges
- Understanding Data: Data Types/ Attribute Types, Statistical Descriptions of Data
- Measuring Data Similarity and Dissimilarity
- Data Pre-processing – Overview
- Association Rule Mining
Data Modelling
Data Cleaning
Data Visualization & Interpretation
- Information overload and issues in decision making
- Design of visual encoding schemes to improve comprehension of data and their use in decision making
- Presentation and visualization of data for effective communication
- Elementary graphics programming, charts, graphs, animations, user interactivity, hierarchical layouts, and techniques for visualization of high dimensional data & discovered patterns
EDA Tools
- Pandas Profiling
- Lux