Reinforcement learning is an area of machine learning concerned about an agent that interacts with the environment and learns an optimal policy by trail and error for sequential decision making problems in a wide range of fields in engineering and both natural and social sciences. Usually we categorize machine learning as supervised, unsupervised, and reinforcement learning. In supervised learning, there are labeled data; in unsupervised learning, there are no labeled data; and in reinforcement learning, there are evaluative feedbacksbut no supervised signals.
In addition, with recent achievements of deep learning benefiting from big data, high performance computing, new algorithmic techniques, and mature software packages and architectures, reinforcement learning it is getting relevant importance in the literature with the combination of deep neural networks. Representation learning with deep learning enables automatic feature engineering and end-to-end learning through gradient descent, so that reliance on domain knowledge is significantly reduced or even removed.
Deep learning and reinforcement learning, being selected as one of the MIT Technology Review 10 Breakthrough Technologiesin 2013 and 2017 respectively, will play their crucial role in achieving artificial general intelligence. David Silver, the major contributor of AlphaGo (https://deepmind.com/research/alphago/) even made a formula: artificial intelligence = reinforcement learning + deep learning. In a sentence: Deep reinforcement learning is artificial intelligence.
Basis reinforcement learning is modeled as a Markov decision process:
- a set of environmentand agentstatesSt;
- a set of actionsAtof the agent;
- theprobability of transitionbetween states (St, St+1) because of an action At;
- therewardRtafter a transition between states (St,St+1) because of an actionAt;
- a set of rulesthat describe what the agent observes.
Rules are often stochastic and the observation involves the scalar and intermediate reward, that is associated with the last transition. An agent can fully or partially observe the environment, thus restricting the set of actions available for that agent.
Agents in reinforcement learning interact with the environment in discrete time steps. Each time step t, an agent receives an observation Ot, which typically includes a reward Rt. Then the agent chooses an action afrom the set of available actions At and is sent to the environment. Then the environment moves to a new state St+1and the next reward Rt+1associated with the transition (St,At, St+1) is determined. The aim of a reinforcement learning agent is to collect as much reward as possible.
Learning is one of the main challenges in reinforcement learning as it requires smart exploration mechanisms. The agents’ selection is modeled as a map called policyπ, and this map gives the probability of taking an action Awithin a state S. Learning approaches attempt to find a policy that maximizes the return by maintaining a set of estimates of the expected returns for some policy. There are different algorithms for control learning, but current literature is focused in deep learning models (deep reinforcement learning).
Deep reinforcement learning tries to improve the Q-learning technique, which includes a q-value that represents how good is a pair state-action. This technique does not require a model of the environment and can handle problems with stochastic transitions and rewards without requiring adaptations. The deep learning technique is called Deep Q-Learning (DQL) and uses a deep convolutional neural network as a function approximator to represent the q-value. This technique will be better explained in future posts, same as a recent learning modification called Double Deep Q-Learning (Double DQN), which outperforms the original DQN algorithm.
Enterprises can leverage reinforcement learning and simulation techniques to solve real-world industrial AI problems. One of the main applications of reinforcement learning is resourcesoptimization (also robotics, healthcare, games, natural language processing, machine translation, finances, computer vision, etc.), which includes:
- Process planning
- Job scheduling
- Yield management
- Supply chain
- Demand forecasting
- Warehouse operations
- Production coordination
- Fleet logistics
- Product design
- Facilities location
- Search ordering
- Service availability
- Predictive maintainance
- Inventory monitoring
- Quality control
- Fault detection and isolation
Resources optimization and the mentioned features are key elements in intelligent engines, so we will go deeper into this powerful area of machine learning and how it can be applied to companies needs.