Artificial Intelligence uses three basic ways on which machines learn: supervised learning, unsupervised learning, and reinforcement learning. In general, they are called learning paradigms.


The choice of the learning paradigm is determined by the specific task. We choose supervised learning for the classification and regression task. Cluster identification or anomaly detection are typical tasks that could be solved within the unsupervised learning paradigm. The primary goal of reinforced learning is to create software agents that can automatically interact with an environment, learn from it, and determine the optimal behavior in order to optimize its performance. In this article, we will consider a reinforced learning paradigm in detail.


Basics Concepts of Reinforced Learning

Reinforcement learning is a goal-oriented algorithm. Let’s describe shortly how it works. The machine has an explicit goal, for example, a checkmate in chess. Also, we set the rules of the game (pieces movements, castling, and so on). Then, the machine has to choose actions to win the game. During learning, a computer explores an environment (in our case is a chessboard with pieces) and tries different actions to influence the environment (pieces movements). Each action of the machine produces certain reward, for example, machines receive a high reward for capturing an opponent’s piece and a small reward for the regular piece’s movement. In this way, the machine learns the strategy that dictates the best movements at the current state. So, the machine interacts with the environment and observes the result of its actions, learns from it and changes its behavior in response to received rewards. We can say that a reinforced learning machine learns from its mistakes.


Reinforced learning has roots in behavioral psychology. This kind of learning is very close to the way in which humans and animals learn.


In comparison to other types of learning paradigms, reinforcement learning takes place in between supervised and unsupervised learning. In reinforced learning, the machine doesn’t know which actions are correct or incorrect, as it is in supervised learning, but it “knows” whether it does a good job or not due to reward. Also, reinforced learning is different from unsupervised learning since it is not trying to find hidden patterns but trying to maximize a reward instead. So, it is a kind of semi-supervised method.


Elements of Reinforcement Learning

Let’s identify the main elements of a reinforced learning system.


The central elements are an agent that takes actions and learns (e.g., robot, computer) and an environment (physical or virtual world). Sometimes, instead of an environment it could be a model that mimics the environment's behavior.


The other elements of reinforcement learning are a policy, a reward, and a value function.


A policy is a strategy that an agent learns from the environment during interaction with it. The optimal policy provides the actions of an agent that promise the highest reward.


A reward is the feedback for an agent after each action. An agent tries to increase a reward at each step. There are also negative rewards (penalty) in some models when the received feedback aims to minimize unwanted actions.


A value function is an expected long-term return.


Challenges in Reinforced Learning

Reinforced learning is a very difficult task that requires to solve plenty of problems:

  1. The first problem is that the agent receives only the reward as a learning signal. The agent must find the best policy based on trial-and-error interactions with the environment that is based only on this feedback.
  2. The second is the presence of strong temporal correlations that appears when the agent bases the observations on its own actions.
  3. Third, the agent needs many actions to realize if its strategy was good or not. For example, in the case of indoor robotics navigation, a robot can overcome long distances to its aim and appears in a dead end.

To solve these problems many algorithms were developed. All reinforced algorithms can be divided into two main types: approaches based on value function and those who based on policy search. Among them are Monte Carlo methods, Q-learning, Deep reinforcement learning, and many other algorithms.


Applications of Reinforced Learning

Software that uses reinforced learning is the next level in building autonomous systems. Some of the algorithms based on reinforced learning have been already applied in robotics, video games, navigation. At the end of this article we gave examples of applications of reinforced learning.


Robotics. The idea of using reinforced learning in robotics is ambitious and complex. Robot reads raw video images from its camera. Then it uses a deep neural network for processing. The outputs in this case are the motor torques. Thus, robots learn the policy from video images and maps these images to actions.


Games. Reinforced algorithms are used to solve games. The biggest success was achieved in Go. Algorithms AlphaGo and AlphaGo Zero that are based on value network and Monte Carlo tree search already achieved human performance.