更新时间:2021-08-27 18:52:42
封面
Title Page
Copyright and Credits
Reinforcement Learning with TensorFlow
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Deep Learning – Architectures and Frameworks
Deep learning
Activation functions for deep learning
The sigmoid function
The tanh function
The softmax function
The rectified linear unit function
How to choose the right activation function
Logistic regression as a neural network
Notation
Objective
The cost function
The gradient descent algorithm
The computational graph
Steps to solve logistic regression using gradient descent
What is xavier initialization?
Why do we use xavier initialization?
The neural network model
Recurrent neural networks
Long Short Term Memory Networks
Convolutional neural networks
The LeNet-5 convolutional neural network
The AlexNet model
The VGG-Net model
The Inception model
Limitations of deep learning
The vanishing gradient problem
The exploding gradient problem
Overcoming the limitations of deep learning
Reinforcement learning
Basic terminologies and conventions
Optimality criteria
The value function for optimality
The policy model for optimality
The Q-learning approach to reinforcement learning
Asynchronous advantage actor-critic
Introduction to TensorFlow and OpenAI Gym
Basic computations in TensorFlow
An introduction to OpenAI Gym
The pioneers and breakthroughs in reinforcement learning
David Silver
Pieter Abbeel
Google DeepMind
The AlphaGo program
Libratus
Summary
Training Reinforcement Learning Agents Using OpenAI Gym
The OpenAI Gym
Understanding an OpenAI Gym environment
Programming an agent using an OpenAI Gym environment
Q-Learning
The Epsilon-Greedy approach
Using the Q-Network for real-world applications
Markov Decision Process
Markov decision processes
The Markov property
The S state set
Actions
Transition model
Rewards
Policy
The sequence of rewards - assumptions
The infinite horizons
Utility of sequences
The Bellman equations
Solving the Bellman equation to find policies
An example of value iteration using the Bellman equation
Policy iteration
Partially observable Markov decision processes
State estimation
Value iteration in POMDPs
Training the FrozenLake-v0 environment using MDP
Policy Gradients
The policy optimization method
Why policy optimization methods?
Why stochastic policy?
Example 1 - rock paper scissors
Example 2 - state aliased grid-world