Reinforcement Learning
#

Classical Reinforcement Learning
#

Markov Decision Process
#

Introduction
What is Reinforcement Learning?
Agent-Environment Interaction
State Vectors
Objective of RL Agent
Actions & Policy
Exploration vs Exploitation
Markov State
Markov Decision Process (MDP)
Value Function
Optimal Policy
Model of the Environment
RL vs Supervised Learning
Inventory Management (MDP)

Fundamental Equations in RL
#

Introduction
RL Equations – State Value Function
RL Equations – Action Value Function
Understanding the RL Equations
Bellman Equations of Optimality
Policy Improvement
Introduction

Model-Based Method – Dynamic Programming
#

Dynamic Programming
Policy Iteration – Algorithm
Policy Evaluation – Prediction
Policy Improvement – Control
Policy Iteration – GridWorld
Value Iteration
Generalised Policy Iteration (GPI)
Ad Placement Optimization (Demo)

Model-Free Methods
#

Introduction
Intuition behind Monte-Carlo Methods
Monte-Carlo Prediction & Demo
Monte-Carlo Control
Off Policy
Temporal Difference
Q-Learning with Pseudocode
Cliff Walking Demo
Ad Placement Optimization Demo -Q Learning
OpenAI Gym -Taxi v2

Inventory Management Demo
#

Introduction
Problem Statement
MDP code
Q-Learning code
Results

Assignment -Classical Reinforcement Learning
#

Assignment – Tic-Tac-Toe
#

Deep Reinforcement Learning
#

Introduction
#

Want to build your own Atari Game? Learn the Q-function or policy using the various Deep Reinforcement Learning algorithms: Deep Q Learning, Policy Gradient Methods, Actor-Critic method.

Architectures of Deep Q Learning
#

Architectures of Deep Q Network
DQN Architecture II – Visualisation
DQN Demo – Cartpole Environment
Double DQN – A DQN Variation

Deep Q Learning
#

Introduction
Why Deep Reinforcement Learning?
Parameterised Representation
Generalizability in Deep RL
Deep Q Learning
Training in Deep Reinforcement Learning
Replay Buffer
Generate Data for Training
Target in DQN
When to stop training?
Atari Game
Introduction

Policy Gradient Methods
#

Introduction to Policy Gradient Methods
The Intuition of Policy-Based Methods
Comparing DQN and Policy-Based Methods
Path Probability
Objective Function
Gradient of the Objective Function
The Update Rule
Step-by-Step Update

Actor-Critic Methods
#

Introduction
The Need for Actor-Critic Methods
Addressing the Problem of Variance
Justification for Adding the Baseline
Reducing Variance Using the Baseline
Appropriate Choice of the Baseline
Policy Gradient (REINFORCE)
Actor-Critic Methods: Training
Training Process: Summary
Illustration: Defining the State Space

Reinforcement Learning Project
#

Problem Statement
#

Improve the recommendation of the rides to the cab drivers by creating an RL-based algorithm using vanilla Deep Q-Learning (DQN) to maximize the driver’s profits and in turn help in retention of the driver on the cab aggregator service.

Follow Me

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Reinforcement Learning

On This Page

Reinforcement Learning
#

Classical Reinforcement Learning
#

Markov Decision Process
#

Fundamental Equations in RL
#

Model-Based Method – Dynamic Programming
#

Model-Free Methods
#

Inventory Management Demo
#

Assignment -Classical Reinforcement Learning
#

Assignment – Tic-Tac-Toe
#

Deep Reinforcement Learning
#

Introduction
#

Architectures of Deep Q Learning
#

Deep Q Learning
#

Policy Gradient Methods
#

Actor-Critic Methods
#

Reinforcement Learning Project
#

Problem Statement
#

Dr. Hari Thapliyaal

Comments:

Related

On This Page

Reinforcement Learning#

Classical Reinforcement Learning#

Markov Decision Process#

Fundamental Equations in RL#

Model-Based Method – Dynamic Programming#

Model-Free Methods#

Inventory Management Demo#

Assignment -Classical Reinforcement Learning#

Assignment – Tic-Tac-Toe#

Deep Reinforcement Learning#

Introduction#

Architectures of Deep Q Learning#

Deep Q Learning#

Policy Gradient Methods#

Actor-Critic Methods#

Reinforcement Learning Project#

Problem Statement#

Dr. Hari Thapliyaal

Comments:

Related

Reinforcement Learning
#

Classical Reinforcement Learning
#

Markov Decision Process
#

Fundamental Equations in RL
#

Model-Based Method – Dynamic Programming
#

Model-Free Methods
#

Inventory Management Demo
#

Assignment -Classical Reinforcement Learning
#

Assignment – Tic-Tac-Toe
#

Deep Reinforcement Learning
#

Introduction
#

Architectures of Deep Q Learning
#

Deep Q Learning
#

Policy Gradient Methods
#

Actor-Critic Methods
#

Reinforcement Learning Project
#

Problem Statement
#