Faculty Contact: Ambuj Singh
Markov Decision Processes (MDPs) capture the state of the world and how an agent receives rewards in that world. An agent receives a quantified reward for being in a specific state. This can be thought of as a graph, where states are the nodes and the set of actions A are edges between nodes. The edges represent probabilities of going from one state S to the next S’ (i.e. weights are between zero and one). Consequently, every node would have all of its outgoing edge weights sum to one. The policy for the agent is the actions taken in every state S such that rewards are maximized. With a robot, we can define a specific state to be when its sensors and actuators have specific velocity and position readings. As one can expect, one single sensor can have multiple values, and each of these values can correspond to a single state. In reinforcement learning, an agent knows which state it is in and which actions it can take. However, it does not initially know how one state relates to another. The agent also does not know the reward that comes from being in a state. These unknowns must be learned through Q-learning. As robots have multiple actuators, we can have several degrees of freedom in our movement (e.g. an arm with 5 joints has multiple options to reach an object in space). This leads to a problem in which there is an incredible number of dimensions for our MDP graphs to be learned on. This leads to learning multiple states that all can be consolidated into fewer states. By using Poincare mappings, we can achieve this reduction in dimensionality in order end up with a smaller, and less complicated, MDP graph.
- Spring 2018: Roman Aguilera