Reinforcement Learning
Reinforcement Learning
Classical Reinforcement Learning
Markov Decision Process
- Introduction
- What is Reinforcement Learning?
- Agent-Environment Interaction
- State Vectors
- Objective of RL Agent
- Actions & Policy
- Exploration vs Exploitation
- Markov State
- Markov Decision Process (MDP)
- Value Function
- Optimal Policy
- Model of the Environment
- RL vs Supervised Learning
- Inventory Management (MDP)
Fundamental Equations in RL
- Introduction
- RL Equations – State Value Function
- RL Equations – Action Value Function
- Understanding the RL Equations
- Bellman Equations of Optimality
- Policy Improvement
- Introduction
Model-Based Method – Dynamic Programming
- Dynamic Programming
- Policy Iteration – Algorithm
- Policy Evaluation – Prediction
- Policy Improvement – Control
- Policy Iteration – GridWorld
- Value Iteration
- Generalised Policy Iteration (GPI)
- Ad Placement Optimization (Demo)
Model-Free Methods
- Introduction
- Intuition behind Monte-Carlo Methods
- Monte-Carlo Prediction & Demo
- Monte-Carlo Control
- Off Policy
- Temporal Difference
- Q-Learning with Pseudocode
- Cliff Walking Demo
- Ad Placement Optimization Demo -Q Learning
- OpenAI Gym -Taxi v2
Inventory Management Demo
- Introduction
- Problem Statement
- MDP code
- Q-Learning code
- Results
Assignment -Classical Reinforcement Learning
Assignment – Tic-Tac-Toe
Deep Reinforcement Learning
Introduction
Want to build your own Atari Game? Learn the Q-function or policy using the various Deep Reinforcement Learning algorithms: Deep Q Learning, Policy Gradient Methods, Actor-Critic method.
Architectures of Deep Q Learning
- Architectures of Deep Q Network
- DQN Architecture II – Visualisation
- DQN Demo – Cartpole Environment
- Double DQN – A DQN Variation
Deep Q Learning
- Introduction
- Why Deep Reinforcement Learning?
- Parameterised Representation
- Generalizability in Deep RL
- Deep Q Learning
- Training in Deep Reinforcement Learning
- Replay Buffer
- Generate Data for Training
- Target in DQN
- When to stop training?
- Atari Game
- Introduction
Policy Gradient Methods
- Introduction to Policy Gradient Methods
- The Intuition of Policy-Based Methods
- Comparing DQN and Policy-Based Methods
- Path Probability
- Objective Function
- Gradient of the Objective Function
- The Update Rule
- Step-by-Step Update
Actor-Critic Methods
- Introduction
- The Need for Actor-Critic Methods
- Addressing the Problem of Variance
- Justification for Adding the Baseline
- Reducing Variance Using the Baseline
- Appropriate Choice of the Baseline
- Policy Gradient (REINFORCE)
- Actor-Critic Methods: Training
- Training Process: Summary
- Illustration: Defining the State Space
Reinforcement Learning Project
Problem Statement
Improve the recommendation of the rides to the cab drivers by creating an RL-based algorithm using vanilla Deep Q-Learning (DQN) to maximize the driver’s profits and in turn help in retention of the driver on the cab aggregator service.