Reinforced Learning

Artificial Intelligence uses three basic methods for machine learning: supervised learning, unsupervised learning, and reinforcement learning. In general, these methods are called learning paradigms. The learning paradigm chosen is determined by the specific task at hand. We choose supervised learning for classification and regression tasks. Cluster identification or anomaly detection are typical tasks that can be solved within the unsupervised learning paradigm. The primary goal of reinforced learning is to create software agents that can automatically interact with an environment, learn from it, and determine the optimal behavior in order to optimize its performance. In this article, we will discuss reinforced learning paradigms in detail.

Basic Concepts of Reinforced Learning

Reinforcement learning is a goal-oriented algorithm. Let’s describe how it works briefly. The machine has an explicit goal, for example, a checkmate in chess. Also, we set the rules of the game (pieces’ movements, castling, and so on). Then, the machine has to choose actions to win the game. During learning, a computer explores an environment (in our case, a chessboard with pieces) and tries different actions to influence the environment (pieces’ movements). Each action of the machine produces a certain reward - for example, machines receive a high reward for capturing an opponent’s piece and a small reward for the regular piece’s movement. In this way, the machine learns the strategy that dictates the best movements for the current situation. So, the machine interacts with the environment and observes the result of its actions, learns from it and changes its behavior in response to the rewards received. We can say that a reinforced learning machine learns from its mistakes.

Reinforced learning has roots in behavioral psychology. This kind of learning is very close to the way in which humans and animals learn.

In comparison to other types of learning paradigms, reinforcement learning takes place in between supervised and unsupervised learning. In reinforced learning, the machine doesn’t know which actions are correct or incorrect (as is the case in supervised learning) but it “knows” whether it does a good job or not due to the resulting reward. Also, reinforced learning is different from unsupervised learning, since it is not trying to find hidden patterns, but rather trying to maximize a reward instead. So, it is a kind of semi-supervised method.

Elements of Reinforcement Learning

Let’s identify the main elements of a reinforced learning system.

The central elements consist of: an agent that takes actions and learns (e.g., robot, computer) and an environment (physical or virtual world). Sometimes, instead of an environment, there may be a model which mimics the environment's behavior.

The other elements of reinforcement learning include a policy, a reward, and a value function.

A policy is a strategy that an agent learns from the environment during interaction with it. The optimal policy provides the actions of an agent that promise the highest reward.

A reward is the feedback for an agent after each action. An agent tries to increase the reward at each step. There are also negative rewards (penalty) in some models when the received feedback aims to minimize unwanted actions.

A value function is an expected long-term return.

Challenges in Reinforced Learning

Reinforced learning is a very difficult task that requires the resolution of multiple problems:

The first problem is that the agent receives only the reward as a learning signal. The agent must find the best policy based on trial-and-error interactions with the environment that is based only on this feedback.
The second is the presence of strong temporal correlations, which appear when the agent bases the observations on its own actions.
Third, the agent needs many actions to realize whether its strategy was good or not. For example, in the case of indoor robotics navigation, a robot can overcome long distances to reach its goal and then reach a dead end.

Many algorithms were developed to solve these specific problems. All reinforced algorithms can be divided into two main types: approaches based on value function and those based on policy search. Among them are Monte Carlo methods, Q-learning, Deep reinforcement learning, and many other algorithms.

Applications of Reinforced Learning

Software that uses reinforced learning is the next level in building autonomous systems. Some of the algorithms based on reinforced learning have been already applied in robotics, video games, and navigation. At the end of this article, we have given examples of applied reinforced learning.

Robotics. The idea of using reinforced learning in robotics is ambitious and complex. A robot reads raw video images from its camera. Then it uses a deep neural network for processing. The outputs in this case are the motor torques. Thus, the robot learns the policy from video images and maps these images to actions.

Games. Reinforced algorithms are used to solve games. The biggest success was achieved in Go. Algorithms AlphaGo and AlphaGo Zero, which are based on value network and Monte Carlo tree search, and have already achieved human performance.

Let’s have talk

Get a first consultation on your project

Interesting For You

Deep Learning Platforms

Artificial neural networks (ANN) have become very popular among data scientists in recent years. Despite the fact that ANNs have existed since the 1940s, their current popularity is due to the emergence of algorithms with modern architecture, such as CNNs (Convolutional deep neural networks) and RNNs (Recurrent neural networks). CNNs and RNNs have shown their exceptional superiority over other Machine Learning algorithms in computer vision, speech recognition, acoustic modeling, language modeling, and natural language processing (NLP). Machine Learning algorithms based on ANNs are attributed to Deep Learning.

Read article

Chatbots in NLP

Chatbots or conversational agents are so widespread that the average person is no longer surprised to encounter them in their daily life. What is remarkable is how quickly chatbots are getting smarter, more responsive, and more useful. Sometimes, you don’t even realize immediately that you are having a conversation with a robot. So, what is a chatbot? Simply put, it is a communication interface which can interpret users’ questions and respond to them. Consequently, it simulates a conversation or interaction with a real person. This technology provides a low-friction, low-barrier method of accessing computational resources.

Read article

What is Data Science?

In recent years, data science has become increasingly prominent in the common consciousness. Since 2010, its popularity as a field has exploded. Between 2010 and 2012, the number of data scientist job postings increased by 15 000%. In terms of education, there are now academic programs that train specialists in data science. You can even complete a PhD degree in this field of study. Dozens of conferences are held annually on the topics of data science, big data and AI. There are several contributing factors to the growing level of interest in this field, namely: 1. The need to analyze a growing volume of data collected by corporations and governments 2. Price reductions in computational hardware 3. Improvements in computational software 4. The emergence of new data science methods. With the increasing popularity of social networks, online services discovered the unlimited potential for monetization to be unlocked through (a) developing new products and (b) having greater information and data insights than their competitors. Big companies started to form teams of people responsible for analyzing collected data.

Read article