Definations

As a beginner, the definations are so importnant for us. Only we are clear about these compelax things, then we could be much easier to understand the knowledge later.

Note: The lower case letters means the observation variable, the uppercase letters mean the random variable.

Terminologies

Agent
Environment
State $S$
Action $a$
Reward $r$
Policy $\pi(a|s)$
State transsition $p(s’|s,a)$.

Return and Value

Return: $$ U_t = R_t+\gamma R_{t+1}+\gamma^2 R_{t+2}+ \dots $$
Action-value function: $$ Q_\pi(s_t,a_t) = \mathbb{E}[U_t|s_t,a_t] $$
Optimal action-value function: $$ Q^*(s_t,a_t) = \mathop{max}\limits_{\pi}Q_\pi(s_t,a_t) $$
State-value funtion:

$$ V_\pi(s_t) = \mathbb{E}_{A}[Q_{\pi}(s_t,A)] $$

Core

During the iteration process, the agent can be controlled by either $\pi(a|s)$ or $Q^*(s,a)$.

So these two things are the targets we should estimate and we will learn some methods to finish this process in the later lessions.

Definations#

Terminologies#

Return and Value#

Core#

Definations

Terminologies

Return and Value

Core