Reinforcement Learning with TensorFlow

Sayon Dutta

更新时间：2021-08-27 18:52:42

封面

Title Page

Reinforcement Learning with TensorFlow

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Deep Learning – Architectures and Frameworks

Deep learning

Activation functions for deep learning

The sigmoid function

The tanh function

The softmax function

The rectified linear unit function

How to choose the right activation function

Logistic regression as a neural network

Notation

Objective

The cost function

The gradient descent algorithm

The computational graph

Steps to solve logistic regression using gradient descent

What is xavier initialization?

Why do we use xavier initialization?

The neural network model

Recurrent neural networks

Long Short Term Memory Networks

Convolutional neural networks

The LeNet-5 convolutional neural network

The AlexNet model

The VGG-Net model

The Inception model

Limitations of deep learning

The vanishing gradient problem

The exploding gradient problem

Overcoming the limitations of deep learning

Reinforcement learning

Basic terminologies and conventions

Optimality criteria

The value function for optimality

The policy model for optimality

The Q-learning approach to reinforcement learning

Asynchronous advantage actor-critic

Introduction to TensorFlow and OpenAI Gym

Basic computations in TensorFlow

An introduction to OpenAI Gym

The pioneers and breakthroughs in reinforcement learning

David Silver

Pieter Abbeel

Google DeepMind

The AlphaGo program

Libratus

Summary

Training Reinforcement Learning Agents Using OpenAI Gym

The OpenAI Gym

Understanding an OpenAI Gym environment

Programming an agent using an OpenAI Gym environment

Q-Learning

The Epsilon-Greedy approach

Using the Q-Network for real-world applications

Summary

Markov Decision Process

Markov decision processes

The Markov property

The S state set

Actions

Transition model

Rewards

Policy

The sequence of rewards - assumptions

The infinite horizons

Utility of sequences

The Bellman equations

Solving the Bellman equation to find policies

An example of value iteration using the Bellman equation

Policy iteration

Partially observable Markov decision processes

State estimation

Value iteration in POMDPs

Training the FrozenLake-v0 environment using MDP

Summary

Policy Gradients

The policy optimization method

Why policy optimization methods?

Why stochastic policy?

Example 1 - rock paper scissors

Example 2 - state aliased grid-world