The Brains Behind AI: How Pavlov’s Dogs & Weight Loss Tips Influenced Reinforcement Learning

Artificial intelligence has, in essence, executed many psychological concepts in a digital form. Fittingly, one of the biggest parts of human intelligence is the ability to learn and improve upon past tries of the same task.

While this has been extended into AI as machine learning, there exists a specific type of ML that borrows heavily from psychology. Reinforcement learning is based on the concept of conditioning in psychology and applies it in a unique way to facilitate dependable learning.


Sign up for your weekly dose of what's up in emerging technology.

What Is Conditioning?

‘Conditioning’ is a general term used to describe a phenomenon where a previously unconnected stimulus and response are linked by learning. One of the earliest, and most famous, types of conditioning is classical conditioning, also known as Pavlovian conditioning.

Classical Conditioning:

First discovered by Russian physiologist Ivan Pavlov, this method of conditioning focuses on pairing a neutral stimulus with the response from a biologically potent stimulus. This can be seen in the example of Pavlov’s dogs.

The physiologist discovered this phenomenon when he was studying digestion in dogs. When the food was brought in, the dogs salivated; an involuntary biological response to food. However, he experimented with ringing a bell every time the food was brought in, thus creating a connection between the sound of the bell and the food.

This resulted in the dogs salivating whenever they heard the bell ring, thus being ‘conditioned’ to respond in a way similar to how they would to a conditioned stimulus (food), except without the stimulus being present. Thus, they had ‘learned’ that the sound of the bell meant food was coming.

Today, classical conditioning has found applications in diet watch gadgets. These gadgets give the user a mild electric shock upon them exhibiting an unfavourable behaviour, usually binge eating. A connection is formed between the unpleasant stimulus of an electric shock towards the response of eating, eventually cutting down the eating habits of the wearer.

Operant Conditioning:

Another type of conditioning is operant conditioning, which is built on top of classical conditioning principles and was the inspiration for RL. Pioneered by psychologist BF Skinner, this was looked at as a method to explain more complex human behaviours that could not be explained by classical conditioning.

Operant conditioning takes a more in-depth look into the process of conditioning, and also brings a way to influence human behaviour by inflicting actions. The process features 3 main principles; reinforcement, punishment, and extinction.

Operant conditioning functions on the idea that encouraging positive behaviour and discouraging negative behaviour can have positive effects on the psyche. Encouraging positive behaviour through favourable changes to the environment is known as reinforcement while discouraging negative behaviour through unfavourable changes is known as punishment.

Extinction is the removal of a connection between a stimulus and a response after a long period of neither punishment or reinforcement. This results in behaviours being eliminated altogether.

Reinforcement and its subcategories are the basis of what make up the concepts of reinforcement learning.

How Psychology Is Implemented In RL

Instead of using both reinforcement and punishment, RL utilizes two forms of reinforcement. These are positive reinforcement and negative reinforcement, and are seen in the reward systems of a reinforcement learning workflow. Positive reinforcement is when a reward is given to encourage positive behaviour. Negative reinforcement is when a punishment is taken away to encourage the behaviour.

While it is not this black and white in RL, these concepts are used in a gradient form to ensure that the system continues on its path of self-improvement. More effective solutions are given a higher amount of rewards, while less effective solutions are provided with a lower amount of rewards.

This creates a conditioning within the algorithm that more effective solutions offer a higher chance of obtaining rewards, leading to the agent to try and pick the solution that gives the maximum amount of rewards.

The concept of extinction also finds a use in this approach, as older, less effective paths to a solution are effectively weeded out due to a lack of reinforcement.

Conditioning In Reinforcement Learning

RL is a direct representation of the concept of reinforcement used for learning. In a typical RL workflow, an agent (algorithm) performs its designated function in the environment. The result is then passed on to an interpreter, which decodes both the state of the environment and the reward to be given to the algorithm.

The reward given to the system depends on the degree of success or efficiency with which the problem is solved. Therefore, the algorithm tries to solve the problem with varying degrees of effectiveness. On the first iteration, the system will, most likely, come up with the least effective solution.

However, as more effective solutions are found and reinforced by offering rewards to the system, the solution itself moves towards being more efficient. This then creates a self-learning algorithm that improves itself using the feedback given to it by the interpreter.

Reinforcement learning is different from other machine learning methodologies, as it does not need to be told how exactly to solve the problem. It uses psychological methods to simulate human learning processes.

This is just one of the many psychological concepts applied for use in AI, with a plausible way forward being to apply more complicated theories to machines. Hence, the rise of a true artificial intelligence can come from a deeper, psychological understanding of human consciousness.

More Great AIM Stories

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.