The basics of behavior analysis and clicker training

      Comments Off on The basics of behavior analysis and clicker training

Everything we do in this world is governed by different “laws”. It might be the law of gravity (where an apple will fall to the ground, because of gravity), or the 3 laws of motion (where our bodies, under the laws of physics, will move in a certain way depending on how we’re constituted- biomechanics). This can also be applied to behaviour, where different “laws” control what we do, why we do it, and how it can change.

These things shape our behaviours, and our horse’s behaviours.
It can therefore be a good idea to know a little bit about this, to help us in our training and every day life with our friends – the horses.
Here are some of the concepts/”laws” that will influence our time together.

Classical/Respondent conditioning
Something happens at the same time as something else that is completely unrelated to the first thing. If this happens a few times we will start to associate the first with the other, even though they were not connected to begin with. One example of this is a friend of mine that listened to a certain music album when she was pregnant a had morning sickness. It was one of her favourite bands and albums, but now every time she listens to it she gets nauseous.
This process is what we use when we teach the horse the meaning of the clicker. It will learn to associate the sound of the clicker/marker with an appetitive stimulus (like treats), even though they usually have nothing to do with eachother. This will make the clicker elicit the same response (saliva etc) as the sight of the treats would.

Operant conditioning
Our behaviours are also influenced by the consequences that follows those behaviours. Some consequences will make a behaviour more likely to occur again in the future, and some consequences will make the behaviours less likely to occur again in the future.
This is what happens in all our training (and also when just being with our horse without any intention of “training”). Depending on the consequences we and the environment impose on the behaviours the horse exhibit, those behaviours will be reinforced (become more likely in future) or punished (become less likely in future).

ABC
A key element in Applied Behavior Analysis (ABA) is our “ABC”. This is an analysis of what is actually happening. We look at the behaviour, and what happens before it, and the consequences that follows it and influences the appearance of the behaviour in the future. A stands for Antecedent, B stands for Behaviour, and C stands for Consequence.

Antecedent, and arranging the environment to set up for success
What happens before a behaviour, i.e. the situation in which a certain behaviour might occur. We put on sunglasses when the sky is blue and the sun is strong. We probably won’t put on sunglasses when the sky is full of clouds and it’s raining. So, the sun is an antecedent to the behaviour of putting sunglasses on.
With our horses we can use this to arrange the environment in such a way that it is very likely that the horse will show wanted behaviours, and we can think about avoiding situations where unwanted behaviours are more likely to occur.
Our cues/signals also fall under this category. We teach the horse that a certain cue will “open up” for a certain consequence if the horse does a certain behaviour.

Consequences
The things that happen at the same time or straight after a behaviour and that influence if that behaviour will become more likely (=reinforced) or less likely (=punished) to occur in the future.
Something can either be ADDED into the equation or SUBTRACTED/REMOVED from the equation.
These are pure mathematical terms; Add = Positive = +, and Subtract/remove = Negative = –
This gives us:
Positive Reinforcement (R+) = Something is added that will make the behaviour more likely to occur in the future. The thing that is added will be an appetitive stimulus that the individual wants more of.
Negative Punishment (P-) = Something is removed that will make the behaviour less likely to occur in the future. The thing that is removed will be an appetitive stimulus that the individual wants more of.
Negative Reinforcement (R-) = Something is removed that will make the behaviour more likely to occur in the future. The thing that is removed will be an aversive stimulus that the individual wants less of.
Positive Punishment (P+) = Something is added that will make the behaviour less likely to occur in the future. The thing that is added will be an aversive stimulus that the individual wants less of.

What will be a reinforcer and a punisher might change from situation to situation. If you’re really hungry candy might be a reinforcer, but if you’re really full candy might even be a punisher. The different consequences just describe what happens, and we can not be completely sure until we’re in the future and can look back at if the behaviour has increased or decreased. So, even if our intention is to reinforce or punish a behaviour, we might see in the future that what actually has happened is the complete opposite! We then need to go back to our ABC and look at what things have influenced the direction of the behaviour.

Usually though we can make pretty good guesses at what is appetitive, and what is aversive. A treat/lollie or play is usually appetitive, and pressure or pain is usually aversive.

The quadrant
To make it a little bit easier to understand the different consequences, they can be put into a quadrant.
operantconditioningquadrant

When we use pressure/realease we are using Negative Reinforcement (R-). The horse is working to AVOID the pressure. This is the most common way today of teaching horses, even in Natural Horsemanship.
When we use clicker training we are using Positive Reinforcement (R+). The horse is working to ACHIEVE the reward. We focus on what we want the horse to do, and reward it for it. So if it does something unwanted, we focus on what we want it to do instead and reinforce that (e.g. standing relaxed with head forward, instead of mugging our pockets for treats).

Extinction
Another aspect of learning are the times when there is “no consequence” after the behaviour. If there is nothing that reinforces the behaviour it will simply die out over time. With “new” behaviours they will usually vanish quietly, but behaviours that earlier have been reinforced will sometime die out with a bang, causing an extinction burst. Extinction is a natural process in learning, but depending on how it’s done it can be utterly frustrating. This is why we focus on what we WANT and reinforce that, instead of focusing too much on what we don’t want and trying to get rid of it. If done in a good way the unwanted behaviour will die out over time any way, and the wanted behaviour will take its place.

Emotions
All animals share the same core emotions. These core emotions will shape the way we experience the world, and they will also shape our behaviours. Jaak Panksepp has studied these emotions and where they’re located in the brain. He has identified 7 different core emotions:
SEEKING
PLAY
CARE
LUST

FEAR
RAGE
GRIEF/PANIC

The four first ones feel “good”, whereas the three last ones feel “bad”. In our life we experience all of them, but if the majority of our emotions are the three last ones the welfare is low. To increase welfare we want to make sure the four first ones takes over in our life. This is also connected to what things might act as reinforcers and what might act as punishers. Things that feel good will work as positive reinforcers, with things that feel bad having the possibility of acting as positive punishers (and negative reinforcers when you learn you can avoid them). Hence working with positive reinforcement we increase the likelyhood of the good feelings and increase the welfare of the horse.
You can learn more about Core Affect Space from Illis ABC, or about the different core emotions from Connection Training.

The marker – “click”
When we are working with shaping a new behaviour, timing is everything. With classical/respondent conditioning we have taught our horse that click=”good, treat is on its way”. We can then use that association to get our timing better. We can mark the desired behaviour with a “click” (or whatever we use as a marker), and a few seconds later give the horse its treat. This makes it easier for us to be more precise, and helps the horse to understand the details we’re after.

Shaping, “guessing game”, timing and frustration
When we start with “nothing” and our goal is a quite complex behaviour (e.g. standing still for 1 min, sidepass, or whatever it might be) we need to shape that behaviour. We do that by picking out the smallest attempt towards the final goal and reinforce that. Along the way we can increase the difficulty and reinforce steps closer and closer to the final goal. It’s like a guessing game for the horse, and depending on our timing of the reinforcement it makes it easier or harder for the horse to guess what it has to do to get its next reinforcement.
If the steps are too big or the timing is off a little too often, it’s harder for the horse to guess what we’re after, and the horse might get frustrated and angry. The way around this is not to punish (neither P+ nor P-) the horse, but to work on our timing and to reinforce the horse more often for the desired behaviour.

From theory to practice
When you understand some of the concepts it might be time to start putting it into practice. It’s always a good idea to get someone experienced to help you with these first steps. What you will start with is teaching the horse to stand relaxed with its head forward. This will then be the “default” behaviour the horse returns to when it doesn’t know what it’s supposed to do. After that you can use the clicker and the treats to teach your horse just about everything!