![Hands-On Intelligent Agents with OpenAI Gym](https://wfqqreader-1252317822.image.myqcloud.com/cover/567/36699567/b_36699567.jpg)
上QQ阅读APP看书,第一时间看更新
Markov Decision Process
A Markov Decision Process (MDP) provides a formal framework for reinforcement learning. It is used to describe a fully observable environment where the outcomes are partly random and partly dependent on the actions taken by the agent or the decision maker. The following diagram is the progression of a Markov Process into a Markov Decision Process through the Markov Reward Process:
![](https://epubservercos.yuewen.com/F87B1A/19470390008867106/epubprivate/OEBPS/Images/477e4f60-bc2f-4bfa-bbc5-cfbd6b932f43.jpg?sign=1738805895-uod3Wt4m3MguaJBg1udyIts9vx0n4YxN-0-ffc32361414d017aaea6c4d75922baf4)
These stages can be described as follows:
- A Markov Process (or a markov chain) is a sequence of random states s1, s2,... that obeys the Markov property. In simple terms, it is a random process without any memory about its history.
- A Markov Reward Process (MRP) is a Markov Process (also called a Markov chain) with values.
- A Markov Decision Process is a Markov Reward Process with decisions.