Tuesday 3 February 2015

What is reinforcement in psychology?


Introduction

The casual, everyday use of the word “reinforcement” generally refers to the granting of a reward for some behavior. While the use of this term by psychologists is more formal, a great deal of research has been dedicated to studying the effects of rewards on behavior. The most influential of the early studies were those done in the 1890s by American psychologist Edward L. Thorndike
. Thorndike created a problem box from which a hungry cat could escape by performing a specific action, such as pulling on a wire, stepping on a pedal, or some similar behavior, thereby gaining access to food. From these studies Thorndike proposed his famous law of effect; that is, actions that are followed by satisfying events are more likely to recur while actions that are followed by discomfort will become less likely. The more satisfying or the more discomfort, the greater the effect on subsequent behavior.











Not all psychologists have used the word “reinforcement” to describe the same processes. In research where he conditioned dogs to salivate upon hearing certain tones, the Russian physiologist Ivan Petrovich Pavlov
called pairing a stimulus (food) that automatically elicits a response (salivation) to a second stimulus (a tone) reinforcing; that is, the food reinforced the ability of the tone to generate the same response. This process has come to be known as Pavlovian conditioning. Unlike Thorndike, who was referring to consequences after the organism emitted some specific behavior, Pavlov was describing an effect that occurred during the presentation of stimuli before the organism responded. Another difference was that Thorndike studied an animal’s voluntary behavior while Pavlov studied a reflexive, glandular response.


Most psychologists followed Thorndike and reserved the term “reinforcer” for voluntary behavior and its consequences. For many, though, it meant any consequence to a behavior, whether it increased or decreased the behavior’s future probability. In this usage, a reinforcer could mean any kind of motivation, whether it was to seek a pleasant or to avoid an unpleasant set of circumstances. To be sure, there were modifying words for these specific situations. Thus, if a behavior resulted in the acquisition of some desired commodity (such as food), reducing a need or a drive state, it was said to be positive reinforcement. On the other hand, if the behavior caused an unpleasant situation to be terminated or avoided, it was called negative reinforcement. Both of these consequences would increase the rate of the behavior.


To make matters more confusing, some psychologists employed the term “reinforcement” even when the consequence reduced the likelihood of a specific behavior. In the 1960s, American psychologist Gregory A. Kimble described omission training as withholding a positive reinforcer when a specified response occurs. Conversely, Kimble said that if a negative reinforcer is given when the response occurs, this is punishment.




Modern Definitions

To maintain a reasonable degree of consistency, most psychologists use the term “reinforcement” exclusively for a process of using rewards to increase voluntary behavior. The field of study most associated with this technique is instrumental conditioning. In this context, the formal definition states that a reinforcer is any consequence to a behavior that is emitted in a specified situation that has the effect of increasing that behavior in the future. It must be emphasized that the behavior itself is not sufficient for the consequence to be delivered. The circumstances in which the behavior occurs are also important. Thus, standing and cheering at a basketball game will likely lead to approval (social reinforcement), whereas this same response is not likely to yield acceptance if it occurs at a funeral.


A punisher is likewise defined as any consequence that reduces the probability of a behavior, with the same qualifications as for reinforcers. A behavior that occurs in response to a specified situation may receive a consequence that reduces the likelihood that it will occur in that situation in the future, but the same behavior in another situation would not generate the same consequence. For example, drawing on the walls of a freshly painted room would usually result in an unpleasant consequence, whereas the same behavior (drawing) in one’s coloring book would not.


The terms “positive” and “negative” are also much more tightly defined. Former use confused these with the emotional values of good or bad, thereby requiring the counterintuitive and confusing claim that a positive reinforcer is withheld or a negative reinforcer presented when there is clearly no reward, and, in fact, the intent is to reduce the probability of that response (such as described by Kimble). A better, less confusing definition is to consider “positive” and “negative” as arithmetic symbols, as for adding or subtracting. They therefore are the methods of supplying reinforcement (or punishment) rather than descriptions of the reinforcer itself. Thus, if a behavior occurs, and as a consequence something is given that will result in an increase in the rate of the behavior, this is positive reinforcement. Giving a dog a treat for executing a trick is a good example. One can also increase the rate of a behavior by removing something on its production. This is called negative reinforcement. A good example might be when a child who eats his or her vegetables does not have to wash the dinner dishes. Another example is the annoying seat belt buzzer in cars. Many people comply with the rules of safety simply to terminate that aversive sound.


The descriptors “positive” and “negative” can be applied to punishment as well. If something is added on the performance of a behavior which results in the reduction of that behavior—that is positive punishment. On the other hand, if this behavior causes the removal of something that reduces the response rate—negative punishment. A dog collar that provides an electric shock when the dog strays too close to the property line is an example of a device that delivers positive punishment. Loss of television privileges for rudeness is an example of negative punishment.




Types of Reinforcers

The range of possible consequences that can function as reinforcers is enormous. To make sense of this assortment, psychologists tend to place them into two main categories: primary reinforcers and secondary reinforcers. Primary reinforcers are those that require little, if any, experience to be effective. Food, drink, and sex are common examples. While it is true that experience will influence what would be considered desirable for food, drink, or an appropriate sex partner, there is little argument that these items, themselves, are natural reinforcers. Another kind of reinforcer that does not require experience is called a social reinforcer. Examples are social contact and social approval. Even newborns show a desire for social reinforcers. Psychologists have discovered that newborns prefer to look at pictures of human faces more than practically any other stimulus pattern, and this preference is stronger if that face is smiling. Like the other primary reinforcers, experience will modify the type of social recognition that is desired. Still, it is clear that most people will go to great lengths to be noticed by others or to gain their acceptance and approval.


Though these reinforcers are likely to be effective, most human behavior is not motivated directly by primary reinforcers. Money, entertainment, clothes, cars, and computer games are all effective rewards, yet none of these would qualify as natural or primary reinforcers. Because they must be acquired, they are called secondary reinforcers. These become effective because they are paired with primary reinforcers. The famous American psychologist B. F. Skinner
found that the sound of food being delivered was sufficient to maintain a high rate of bar pressing in experienced rats. Obviously, under normal circumstances the sound of the food only occurred if food was truly being delivered.


How a secondary reinforcer becomes effective is called two-factor theory
and is generally explained through a combination of instrumental and Pavlovian conditioning (hence the label “two-factor”). For example, when a rat receives food for pressing a bar (positive reinforcement), at that same time a neutral stimulus is also presented, the sound of the food dropping into the food dish. The sound is paired with a stimulus that naturally elicits a reflexive response; that is, food elicits satisfaction. Over many trials, the sound is paired consistently with food; thus, it will be conditioned via Pavlovian methods to elicit the same response as the food. Additionally, this process occurred during the instrumental conditioning of bar pressing by using food as a reinforcer.


This same process works for most everyday activities. For most humans, money is an extremely powerful reinforcer. Money itself, though, is not very attractive. It does not taste good, does not reduce any biological drives, and does not, on its own, satisfy any needs. However, it is reliably paired with all of these things and therefore becomes as effective as these primary reinforcers. In a similar way, popular fashion in clothing, hair styles, and personal adornment, popular art or music, even behaving according to the moral values of one’s family or church group (or one’s gang) can all come to be effective reinforcers because they are reliably paired with an important primary reinforcer, namely, social approval. The person who will function most effectively as the approving agent changes throughout life. One’s parents, friends, classmates, teachers, teammates, coaches, spouse, children, and colleagues at work all provide effective social approval opportunities.




Why Reinforcers Work

Reinforcers (and punishers) are effective at influencing an organism’s willingness to respond because they influence the way in which an organism acquires something that is desired, or avoids something that is not desired. For primary reinforcers, this concerns health and survival. Secondary reinforcers are learned through experience and do not directly affect one’s health or survival, yet they are adaptive because they are relevant to those situations that are related to well-being and an improved quality of life. Certainly learning where food, drink, receptive sex partners, or social acceptance can be located is useful for an organism. Coming to enjoy being in such situations is very useful, too.


An American psychologist, David Premack, has argued that it is the opportunity to engage in activity, and not the reinforcer itself, that is important; that is, it is not the food, but the opportunity to eat that matters. For example, he has shown that rats will work very hard to gain access to a running wheel. The activity of running in the wheel is apparently reinforcing. Other researchers have demonstrated that monkeys will perform numerous boring, repetitive tasks to open a window just to see into another room. This phenomenon has come to be known as the Premack principle. Premack explains that any high-probability activity can be used to reinforce a lower-probability behavior. This approach works for secondary reinforcers, too. The opportunity to spend money may be the reinforcer, not the money itself. Access to an opportunity to eat, to be entertained, to be with others who are complimentary about one’s taste are all highly probable behaviors; thus, they reinforce work for which one may be paid.


According to Premack’s position, a child might eat vegetables to gain access to apple pie, but not vice versa. Obviously, for most children getting apple pie is a far more effective reward than getting vegetables. Nonetheless, as unexpected as this is, such a reversal is possible. For this to work two conditions must be met: The child must truly enjoy eating the vegetables (though apple pie could still be preferred), and the child must have been deprived of these vegetables for a fair amount of time. This may make more sense when considering what happens to a child who overindulges in a favored treat. The happy child who is allowed to dive into a bag of Halloween candy, after having polished off a few pounds of sweets, would not find candy all that attractive.


A newer view of Premack’s position that incorporates situations such as these is called the bliss point. That is, for each organism there is a particular level of each activity that is most desirable (that is, the bliss point). If one is below that level, that activity has become more probable and can be used as a reinforcer for other behaviors, even those that normally have a higher probability. Thus, if a child has not had vegetables in quite a while and has become tired of apple pie, the vegetables would be effective as reinforcers to increase pie eating, though only temporarily. Once the child has acquired the bliss point for vegetable eating (which is likely to be fairly quickly), its effectiveness is ended.


The bliss point idea addresses some of the confusion about positive and negative reinforcers as well. Intuitively, it seems that positive reinforcement should be the addition of a pleasant stimulus, and that negative reinforcement would be the removal of an unpleasant stimulus. However, as anyone who has overindulged in some favored activity knows, there are times when what is normally very pleasant becomes distinctly unpleasant. Thus, adding this stimulus would not be reinforcing, even though in general it seems that it should be. It is as if the organism conducts a cost-benefit analysis concerning its current state. If the consequence is preferable to the alternative, even one that is not particularly attractive, it will function as a reinforcer. Therefore, adding what would normally be an unpleasant stimulus is positively reinforcing if it is better than going without.


Another useful idea about what makes a particular situation reinforcing is called the establishing operation. This concept describes the process of creating a need for the particular stimulus. After a large meal, food is not an effective reinforcer, but after a period of not eating, it is. Denying an organism food establishes food as an effective reinforcer. The organism is below its bliss point. Secondary reinforcers can be explained by this concept as well. By pairing neutral stimuli with primary reinforcers, one is establishing their effectiveness. Finally, that different organisms find different situations or stimuli satisfying is no surprise. Ducks find the opportunity to swim satisfying; chickens do not. A species’ natural history establishes what will be effective as well.




Patterns of Reinforcer Delivery

It is not necessary to deliver a reinforcer on every occurrence of a behavior to have the desired effect. In fact, intermittent reinforcement
has a stronger effect on the stability of the response rate than reinforcing every response. If the organism expects every response to be reinforced, suspending reinforcement will cause the response to disappear very quickly. If, however, the organism is familiar with occasions of responding without reinforcement, responding will continue for much longer on the termination of reinforcers.


There are two basic patterns of intermittent reinforcement: ratio and interval. Ratio schedules are based on the number of responses required to receive the reinforcer. Interval schedules are based on the amount of time that must pass before a reinforcer is available. Both schedules have fixed and variable types. On fixed schedules, whatever the rule is, it stays that way. If five responses are required to earn a reinforcer (a fixed ratio 5, or FR 5), every fifth response is reinforced. A fixed interval of ten seconds (FI 10) means that the first response after ten seconds has elapsed is reinforced, and this is true every time (responding during the interval is irrelevant). Variable schedules change the rule in unpredictable ways. A VR 5 (variable ratio 5) is one in which, on the average, the fifth response is reinforced, but it would vary over a series of trials. A variable interval of ten seconds (VI 10) is similar. The required amount of time is an average of ten seconds, but on any given trial it could be different.


An example of an FR schedule is pay for a specific amount of work, such as stuffing envelopes. The pay is always the same; stuffing a certain number of envelopes always equals the same pay. An example of an FI is receiving the daily mail. Checking the mailbox before the mail is delivered will not result in reinforcement. One must wait until the appropriate time. A VR example is a slot machine. The more attempts, the more times the player wins, but in an unpredictable pattern. A VI example would be telephoning a friend whose line is busy. Continued attempts will be unsuccessful until the friend hangs up the phone, but when this will happen is unknown.


Response rates for fixed schedules follow a fairly specific pattern. Fixed ratio schedules tend to have a steady rate until the reinforcer is delivered; then there is a short rest, followed by the same rate. A fixed interval is slightly different. The closer one gets to the required time, the faster the response rate. On receiving the reinforcer there will be a short rest, then a gradual return to responding, becoming quicker and quicker over time. This is called a “scalloped” pattern. (Though not strictly an FI schedule, it does have a temporal component, so it illustrates the phenomenon nicely.) Students are much more likely to study during the last few days before a test and very little during the days immediately after the test. As time passes, study behavior gradually begins again, becoming more concentrated the closer the next exam date comes.




Bibliography


Flora, Stephen Ray. The Power of Reinforcement. Albany: State U of New York P, 2004. Print.



Hilgard, Ernest Ropiequet. Psychology in America: A Historical Review. San Diego: Harcourt, 1987. Print.



Kimble, Gregory A. Hilgard and Marquis’ Conditioning and Learning. 2nd ed. New York: Appelton-Century-Crofts, 1968. Print.



Kimble, Gregory A., Michael Wertheimer, and Charlotte L. White, eds. Portraits of Pioneers in Psychology. Washington: APA, 1991. Print.



Lieberman, David A. Learning: Behavior and Cognition. 3rd ed. Belmont: Wadsworth, 2000. Print.



Ormrod, Jeanne E. Human Learning. 6th ed. Boston: Pearson, 2012. Print.



"Positive Reinforcement: A Self-Instructional Exercise." Athabasca University. Athabasca University, 2013. Web. 7 July 2014.

No comments:

Post a Comment

How can a 0.5 molal solution be less concentrated than a 0.5 molar solution?

The answer lies in the units being used. "Molar" refers to molarity, a unit of measurement that describes how many moles of a solu...