Learning is the essence of artificial intelligence(AI). Most people associate machine learning(ML) and AI with two fundamental learning models: supervised and unsupervised. While those two models, in fact, constitute the main groups of AI learning techniques, there are many variations in between. Reinforcement learning is one the AI learning techniques that have been gaining a lot of traction in recent months.
From a conceptual standpoint, reinforcement learning address some of the limitation of supervised learning methods by introducing a reward or reinforcement systems based on expert feedback. Let’s consider the example of training an AI agent on a specific chess opening( ex: Ruy Lopez, Sicilian Defense, etc). Using a traditional supervised learning model requires labeled data that teaches the agent every possible variation of every possible position . That volume of detailed training data is seldom available in real world scenarios. Alternatively, the AI agent can start collecting continuous feedback or reinforcements from experts when it achieves a favorable position. That model allows the AI agent to progressively learning from its own experiences.
In strategy games such as Chess, Poker or Go, reinforcements are only received after a few moves that take the game to a favorable or unfavorable state for a specific participant. Other games such as ping-pong require more frequent reinforcements as each point is considered a potential reward.
In reinforcement learning theory, rewards can be seen as policies that change the state of the AI environment. The goal of reinforcement learning is to discover a policy that maximizes the reward for a participant. Algorithms such as Markov Position Process are particularly good at this.
Reinforcement learning techniques can be classified in two main groups: active and passive. In a passive model, the policies for an AI agent are fixed and the goal of the model is to learn how good those policies are as well as to better understand the environment. techniques such as Direct Utility Estimation, Adaptive Dynamic Programming or Temporal Difference Learning are typically used to infer the “utility” of a policy (aka: how good the policy is).
Active reinforcement learning doesn’t operate with fixed policies. Instead, the AI agent must decide which actions to take on any given stage. One fascinating aspect of active reinforcement learning is the friction between maximizing the reward for a specific state and the potential of rlearning new information. This is commonly known as the exploitation-exploration trade off.
Reinforcement learning action-reward model seems like an obvious way to train AI agents but it is not applicable to all AI scenarios. AI environments that deal with incomplete information are noot well equipped for reinforcement learning models.
The theory behind reinforcement learning goes all the way back to 1940s computer scientist such as John Von Newman and Alan Turing but it is only recently that has achieve practical applicability. Some of the tools and frameworks released by OpenAI (gym) or DeepMind are a great example of how reinforcement learning can be incorporated in AI solutions in the real world.