1 minute read

Reinforcement Learning

Reinforcement LearningPermalink

Classical Reinforcement LearningPermalink

Markov Decision ProcessPermalink

  • Introduction
  • What is Reinforcement Learning?
  • Agent-Environment Interaction
  • State Vectors
  • Objective of RL Agent
  • Actions & Policy
  • Exploration vs Exploitation
  • Markov State
  • Markov Decision Process (MDP)
  • Value Function
  • Optimal Policy
  • Model of the Environment
  • RL vs Supervised Learning
  • Inventory Management (MDP)

Fundamental Equations in RLPermalink

  • Introduction
  • RL Equations – State Value Function
  • RL Equations – Action Value Function
  • Understanding the RL Equations
  • Bellman Equations of Optimality
  • Policy Improvement
  • Introduction

Model-Based Method – Dynamic ProgrammingPermalink

  • Dynamic Programming
  • Policy Iteration – Algorithm
  • Policy Evaluation – Prediction
  • Policy Improvement – Control
  • Policy Iteration – GridWorld
  • Value Iteration
  • Generalised Policy Iteration (GPI)
  • Ad Placement Optimization (Demo)

Model-Free MethodsPermalink

  • Introduction
  • Intuition behind Monte-Carlo Methods
  • Monte-Carlo Prediction & Demo
  • Monte-Carlo Control
  • Off Policy
  • Temporal Difference
  • Q-Learning with Pseudocode
  • Cliff Walking Demo
  • Ad Placement Optimization Demo -Q Learning
  • OpenAI Gym -Taxi v2

Inventory Management DemoPermalink

  • Introduction
  • Problem Statement
  • MDP code
  • Q-Learning code
  • Results

Assignment -Classical Reinforcement LearningPermalink

Assignment – Tic-Tac-ToePermalink

Deep Reinforcement LearningPermalink

IntroductionPermalink

Want to build your own Atari Game? Learn the Q-function or policy using the various Deep Reinforcement Learning algorithms: Deep Q Learning, Policy Gradient Methods, Actor-Critic method.

Architectures of Deep Q LearningPermalink

  • Architectures of Deep Q Network
  • DQN Architecture II – Visualisation
  • DQN Demo – Cartpole Environment
  • Double DQN – A DQN Variation

Deep Q LearningPermalink

  • Introduction
  • Why Deep Reinforcement Learning?
  • Parameterised Representation
  • Generalizability in Deep RL
  • Deep Q Learning
  • Training in Deep Reinforcement Learning
  • Replay Buffer
  • Generate Data for Training
  • Target in DQN
  • When to stop training?
  • Atari Game
  • Introduction

Policy Gradient MethodsPermalink

  • Introduction to Policy Gradient Methods
  • The Intuition of Policy-Based Methods
  • Comparing DQN and Policy-Based Methods
  • Path Probability
  • Objective Function
  • Gradient of the Objective Function
  • The Update Rule
  • Step-by-Step Update

Actor-Critic MethodsPermalink

  • Introduction
  • The Need for Actor-Critic Methods
  • Addressing the Problem of Variance
  • Justification for Adding the Baseline
  • Reducing Variance Using the Baseline
  • Appropriate Choice of the Baseline
  • Policy Gradient (REINFORCE)
  • Actor-Critic Methods: Training
  • Training Process: Summary
  • Illustration: Defining the State Space

Reinforcement Learning ProjectPermalink

Problem StatementPermalink

Improve the recommendation of the rides to the cab drivers by creating an RL-based algorithm using vanilla Deep Q-Learning (DQN) to maximize the driver’s profits and in turn help in retention of the driver on the cab aggregator service.

Updated: