# Definations

As a beginner, the definations are so importnant for us. Only we are clear about these compelax things, then we could be much easier to understand the knowledge later.

**Note**: The lower case letters means the observation variable, the uppercase letters mean the random variable.

## Terminologies

- Agent
- Environment
- State $S$
- Action $a$
- Reward $r$
- Policy $\pi(a|s)$
- State transsition $p(s’|s,a)$.

## Return and Value

Return: $$ U_t = R_t+\gamma R_{t+1}+\gamma^2 R_{t+2}+ \dots $$

Action-value function: $$ Q_\pi(s_t,a_t) = \mathbb{E}[U_t|s_t,a_t] $$

Optimal action-value function: $$ Q^*(s_t,a_t) = \mathop{max}\limits_{\pi}Q_\pi(s_t,a_t) $$

State-value funtion:

$$ V_\pi(s_t) = \mathbb{E}_{A}[Q_{\pi}(s_t,A)] $$

## Core

During the iteration process, the agent can be controlled by either $\pi(a|s)$ or $Q^*(s,a)$.

So these two things are the targets we should estimate and we will learn some methods to finish this process in the later lessions.