page9

[|Internal **Validity** Tutorial]
 * Gives examples of **threats to validity** then challanges students to identify problems with hypothetical studies.

psych.athabascau.ca/html/**Validity**/ - 7k - [|Cached] - [|Similar pages] ||

http://psych.athabascau.ca/html/Validity/

http://psych.athabascau.ca/html/Validity/concept.shtml

Concept Definition: Internal Validity
> Experimental psychologists select or manipulate one or more conditions in order to determine their effects on a measure of the behavior of a subject. For example, the smell of delicious food may be presented periodically to subjects in order to assess its effect on their salivation response. > The manipulated condition is referred to as the **independent variable,** and the behavioral measure is called the **dependent variable.** In our example, the smell of delicious food would be the independent variable and the salivation response would be the dependent variable. > An experiment is **internally valid** to the extent that it shows a cause-effect relationship between the independent and dependent variables. Suppose the experimenter observes an increase in the probability of salivating after presenting the smell of delicious food, leading him or her to conclude that the food smell produces salivation. For this conclusion to be internally valid, the experiment must be designed so that conditions other than the food smell are ruled out as potential causes for the behavior change. For example, if the sight of the food is presented along with its smell, then an alternative explanation could be that the sight of the delicious food, and not its smell, is responsible for the increased probability of salivating. > There are nine sources of threat to internal validity. They are: > Before learning more about each one, first read the following background information to a hypothetical experiment. >
 * 1) [|Selection]
 * 2) [|History]
 * 3) [|Maturation]
 * 4) [|Repeated Testing]
 * 5) [|Instrumentation]
 * 6) [|Regression to the Mean]
 * 7) [|Experimental Mortality]
 * 8) [|Selection-Maturation Interaction]
 * 9) [|Experimenter Bias]

Item 1 of 36
//Decide whether or not the following experiment is internally valid. If not, identify the source of threat to its internal validity.// Jack had a bad habit of biting his nails. Having had some training in psychology, he felt confident to conduct a self-experiment. First, he needed to know how often he typically bit his nails as a comparison measure. Every day for a week, each time he caught himself doing so, he made a tick mark in a small daily calendar he kept in his pocket (Baseline). At the end of the week, he tallied the tick marks and was surprised to find that the problem was even worse than he thought: he bit his nails about a hundred times a day! Then, while continuing to monitor himself, he started a week long self-punishment program (Treatment). Whenever he found himself in the middle of biting his nails, he snapped an elastic band around his wrist. He tried to schedule things so that no other significant changes in his life coincided with the start of his treatment program. To see the data that he plotted, click here. Jack concluded from looking at this graph that his self-punishment procedure was effective.

This experiment is internally valid. Selection History Maturation
 * No.** This experiment is not internally valid, and the source of threat is:

--

a. Selection
Subjects bring with them into the investigation unique characteristics, some learned and some inherent. Examples include sex, height, weight, color, attitude, personality, motor ability, and mental ability. If assigning subjects to comparison groups results in unequal distribution of these subject-related variables, then there is a possible threat to internal validity. Suppose that subjects in two comparison groups are unlike with respect to the independent variable and one of these subject-related variables. If scores on the dependent measure differ between the groups, the discrepancy may be due to the independent variable **or** to the subject-related variable. [|Background Information] >> It so happened that there were an equal number of boys and girls in the classes, so for convenience the boys were assigned to the Control Group and the girls to the Experimental Group. One day at school, the boys were told to go to one room and the girls to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and the children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|selection] is a threat to internal validity. The children in the two comparison groups are unlike with respect to whether or not they viewed the interactive video and with respect to gender. The higher Generalization Probe score by the Experimental Group may be due to exposure to the interactive video **or** to the sex of each child. >> In the second item, by randomly allocating subjects to conditions, the only way sex (and other subject-related variables) can be unevenly distributed between the two comparison groups is through chance. Thus, the children in the two groups appear to be unlike only with respect to whether or not they viewed the interactive video. We can be more confident that the better Generalization Probe score by the Experimental Group was not the result of [|selection.]
 * Example**
 * Nonexample**
 * Analysis**

b. History
Outside events may influence subjects in the course of the experiment or between repeated measures of the dependent variable. Suppose that the dependent variable is measured twice for a group of subjects, once at Time A and later at Time B, and that the independent variable is introduced in the interim. Suppose also that Event A occurs between Time A and Time B. If scores on the dependent measure differ at these two times, the discrepancy may be due to the independent variable **or** to Event A. [|Background Information] >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and the children in the Experimental Group to another room, where they were exposed to their respective conditions. Immediately afterwards, while walking back to their regular classroom, all the children in the Control Group saw a man laughing and joking with their school principal. Two days later, the Generalization Probe was conducted, during which many of the Control Group children recognized the stranger as the man who made their principal laugh. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and the children in the Experimental Group to another room, where they were exposed to their respective conditions. The rooms were adjacent to each other, and, when the special class was over, the two groups left their rooms at exactly the same time. Immediately afterwards, while walking back to their regular room, some of the children saw a man laughing and joking with their school principal. Two days later, the Generalization Probe was conducted, during which some of the children recognized the stranger as the man who made their principal laugh. It appears that the number who did so was equally proportioned between the two groups. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|history] is a threat to internal validity. The children in the two comparison groups are unlike with respect to whether or not they viewed the interactive video and with respect to another event encountered in the course of the experiment, that being whether or not they saw the confederate laughing and joking with their school principal. The higher Generalization Probe score by the Experimental Group may be due to exposure to the interactive video **or** to the fact that only the Control Group subjects witnessed this other event. Seeing the confederate laugh and joke with their school principal may have made him a less intimidating figure for the Control Group subjects, which, in turn, may have caused them to be less likely to verbally refuse and run away from him on the subsequent Generalization Probe. >> In the second item, the experimenter attempted to control for differential exposure to outside influences by treating the two comparison groups as equally as possible other than which video she showed them. While some of children saw the confederate laughing and joking with their school principal, in this case it appears that the number who did so was equally proportioned between the two groups. If this is true, then we can be more confident that the better Generalization Probe score by the Experimental Group was not the result of [|history.]
 * Example**
 * Nonexample**
 * Analysis**

c. Maturation
Subjects may change in the course of the experiment or between repeated measures of the dependent variable due to the passage of time //per se.// Some of these changes are permanent (e.g., biological growth), while others are temporary (e.g., fatigue). Suppose that the dependent variable is measured twice for a group of subjects, once at Time A and later at Time B, and that the independent variable is introduced in the interim. If scores on the dependent measure differ at these two times, the discrepancy may be due to the independent variable **or** to naturally occurring developmental processes. [|Background Information] >> During a class early in the school year, the children viewed the 20-minute cartoon (Control condition). Two days later, the Generalization Probe was conducted. The experimenter fell ill soon afterwards, and so it wasn't until a class late in the school year that the children viewed the 20-minute interactive video (Experimental condition). Two days after that, a second Generalization Probe was conducted. The mean score for the children on the first Generalization Probe was 1.2 and their mean score on the second Generalization Probe was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. During a class early in the school year, a Generalization Probe was conducted for all children. The experimenter fell ill soon afterwards, and so it wasn't until a class late in the school year that children in the comparison groups were separated, with the Control Group children viewing the 20-minute cartoon and the Experimental Group children viewing the 20-minute interactive video. Two days after that, a second Generalization Probe was conducted. To see the results, [|click here (1)]. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|maturation] is a threat to internal validity. Almost a full school year separated the two Generalization Probes. Thus, the children at the time of the second probe differ from themselves at the time of the first probe in two ways: they viewed the interactive video two days earlier and they were ten months older. The improved score across the two Generalization Probes may be due to intervening exposure to the interactive video **or** to normal psychological development during those ten months in a child's life. For example, first grade children may naturally learn to become more assertive over the school year. >> In the second item, if the improvement across the two Generalization Probes was simply a function of the passage of the school year for the Experimental Group, then we would expect to see a similar trend for the Control Group. Because we do not observe this, we can be more confident that the improved score for the Experimental Group was not the result of [|maturation].
 * Example**
 * Nonexample**
 * Analysis**

d. Repeated Testing
The prior measurement of the dependent variable may affect the results obtained from subsequent measurements. Suppose that the dependent variable is recorded twice for a group of subjects, once at Time A and later at Time B, and that the independent variable is introduced in the interim. If scores on the dependent measure differ at these two times, the discrepancy may be due to the independent variable **or** to the procedure involved in measuring the dependent variable at Time A. [|Background Information] >> Due to time constraints, the experiment was run over four consecutive days. On Day 1, children viewed the 20-minute cartoon (Control condition). On Day 2, the Generalization Probe was conducted. On Day 3, the children were exposed to the 20-minute interactive video (Experimental condition). Finally, on Day 4, a second Generalization Probe was conducted. The mean score for children on the first Generalization Probe was 1.2 and their mean score on the second Generalization Probe was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> Due to time constraints, the experiment was run over three consecutive days. The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. On Day 1, a Generalization Probe was conducted for all children. On Day 2, the children in the comparison groups were separated, with the Control Group children viewing the 20-minute cartoon and the Experimental Group children viewing the 20-minute interactive video. On Day 3, a second Generalization Probe was conducted. To see the results, [|click here (1)]. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|repeated testing] could be a threat to internal validity. A reasonable assumption is that few, if any, of the children in the experiment had experienced a potential abduction situation before the study started. On Day 1, they find themselves in this situation, and then again two days later. This unlikely state of affairs may have led at least some of them to believe that the second Generalization Probe was not real, but rather a test, especially if the probe procedure and the confederates running it were exactly the same. As a result, these children may have done what the teacher wanted them to do. The improvement across the two Generalization Probes may be due to intervening exposure to the interactive video **or** to the effects of prior experience with the Generalization Probe procedure on the second Generalization Probe score. >> In the second item, if the improvement observed for the Experimental Group was a function of multiple exposure to the probe procedure, then we would also expect to see similar improvement for the Control Group. Because we do not observe this, we can be more confident that the improved score for the Experimental Group was not the result of [|repeated testing].
 * Example**
 * Nonexample**
 * Analysis**

e. Instrumentation
The reliability of the instrument used to gauge the dependent variable or manipulate the independent variable may change in the course of an experiment. Examples include changes in the calibration of a mechanical measuring device as well as the proficiency of a human observer or interviewer. Suppose that the dependent variable is measured twice for a group of subjects, once at Time A and later at Time B, and that the independent variable is introduced in the interim. Suppose also that the ability of a recording device to detect instances of the target behavior improves (declines) as the experiment progresses. If scores on the dependent measure differ at these two times, the discrepancy may be due to the independent variable **or** to more (less) sensitive recordings of the target behavior at Time B relative to at Time A. [|Background Information] >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted. For ease of record keeping, all Control Group children were tested first, then all the Experimental Group children. The student teacher scored children's responses to the confederate's lures. In the beginning, he hid indoors and strained to see and hear through an open window; later on, he discovered he could see and hear better by hiding outside and peeking around a corner. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted, in which children were selected from class to be tested in random order. The student teacher scored each child's response to the confederate's lures. Pilot research at the same school revealed that the best observation procedure was to hide outside and peek around a corner, which the student teacher did consistently throughout testing. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|instrumentation] is a threat to internal validity. Two factors compound the problem. First, the student teacher's ability to detect instances of the target behaviors improved over time. Second, children in the Control Group were tested first. The higher Generalization Probe score by the Experimental Group may be due to exposure to the interactive video **or** to fewer missed observations of the target behaviors for the Experimental Group children than for the Control Group children. >> In the second item, because the recording location remained constant throughout probe testing, we expect the number of missed observations also to be constant. However, even if observer proficiency did change over time for other reasons, perhaps the result of fatigue, the random ordering during the Generalization Probe ensured approximately equal numbers of students in the two groups both early and late in testing. Thus, while missed observations may increase with the number of students tested, they would be equally distributed between the two groups. With these two procedural changes, we can be more confident that the better Generalization Probe score for the Experimental Group was not the result of instrumentation.
 * Example**
 * Nonexample**
 * Analysis**

f. Regression to the Mean
Subjects with extreme scores on a first measure of the dependent variable tend to have scores closer to the mean on a second measure. According to [|Campbell] (1969, p. 414): "Take any dependent measure that is repeatedly sampled, move along it as in a time dimension, and pick a point that is the "highest (lowest) so far. On the average, the next point will be lower (higher), nearer the general trend." Suppose that the dependent variable is measured twice for a group of subjects, once at Time A and later at Time B, and that the independent variable is introduced in the interim. Suppose also that value observed for subjects at Time A is considerably higher (lower) than would typically be the case. If scores on the dependent measure differ at these two times, it may be due to the independent variable **or** to a regression artifact. [|Background Information] >> One day at school, the children viewed the 20-minute cartoon (Control condition). Two days later, the Generalization Probe was conducted. Then, in a class the following week, the children viewed the 20-minute interactive video (Experimental condition). The plan was to administer a second Generalization Probe two days after that. However, at this point, the experimenter realized that she had insufficient funding to complete the study and would only be able to retest ten children. She selected the ten poorest performing children on the first Generalization Probe, the mean score of which was 0.1. Their mean score on the second Generalization Probe was 2.5. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> One day at school, the children viewed the 20-minute cartoon (Control condition). Two days later, the Generalization Probe was conducted. Then, in a class the following week, the children viewed the 20-minute interactive video (Experimental condition). The plan was to administer a second Generalization Probe two days later. However, at this point, the experimenter realized that she had insufficient funding to complete the study and would only be able to retest ten children. She wrote the name of each child on a separate slip of paper, put all the slips in a bowl, and the first ten names she pulled out were selected for the second Generalization Probe. Their mean score on the first Generalization Probe was 1.1 and their mean score on the second Generalization Probe was 2.5. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|regression to the mean] is a threat to internal validity. The children were selected for retesting on the basis of their extremely low scores on the first Generalization Probe. Based on the principle of statistical regression, these children will tend to score higher on the second Generalization Probe. Their improvement across the two Generalization Probes may be due to intervening exposure to the interactive video **or** to a regression artifact. >> In the second item, because the children were selected for retesting on the basis of chance, their mean score on the first Generalization Probe more likely represents a value closer to the true mean of the population of children in the Control condition. We can be more confident that the improved score across the two Generalization Probes was not the result of [|regression to the mean].
 * Example**
 * Nonexample**
 * Analysis**

g. Experimental Mortality
In the course of an experiment, some subjects may drop out before it is completed. Suppose that subjects in two comparison groups differ with respect to the independent variable. Suppose also that subjects in one group are more likely to discontinue their participation part way through an experiment than subjects in another group and that dependent variable is measured at the end of the experiment. If scores on the dependent measure differ between those subjects remaining in the two groups, the discrepancy may be due to the independent variable **or** to a unique characteristic of subjects able to endure a particular condition, a subject-related variable that would be disproportionately present in each group. [|Background Information] >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, the children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Some of the children in the Experimental Group appeared bored by the interactive video, became disruptive, and were removed from the room. Two days later, the Generalization Probe was conducted. The mean score for children in the Control Group was 1.2 and the mean score for the remaining children in the Experimental Group was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> One day at school, the children viewed the 20-minute cartoon (Control condition). Two days later, the Generalization Probe was conducted. Then, in a class the following week, the children viewed the 20-minute interactive video (Experimental condition). Some of the children appeared bored by the interactive video, became disruptive, and were removed from the room. Two days after that, a second Generalization Probe was conducted. The data for the children who left the room during the interactive video were discarded. For the remaining children, their mean score on the first Generalization Probe was 1.2 and their mean score on the second Generalization Probe was 3.4. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which [|experimental mortality] is a threat to internal validity. Children in the Experimental Group who were unable to watch the entire interactive video may possess a unique characteristic, such as a poor attention span. Children having this trait were excluded from the Experimental Group but not from the Control Group. The higher Generalization Probe score by the Experimental Group may be due to exposure to the interactive video **or** to a subject-related variable such as attention span ability, which would be unequally distributed between the two comparison groups. >> In the second item, any subject-related variable pertinent to the inability to watch the entire interactive video was excluded from both comparison groups, and thus would be equally distributed between the two of them. We can be confident that the better Generalization Probe score for the Experimental Group was not the result of [|experimental mortality]. While this experiment appears to be internally valid, discarding the data of those children who did not watch the entire video lowers the external validity of the study, a concept that is beyond the scope of this tutorial.
 * Example**
 * Nonexample**
 * Analysis**

h. Selection-Maturation Interaction
Subject-related variables and time-related variables may interact. Suppose that subjects in two comparison groups differ with respect to the independent variable and a subject-related variable such as age. Suppose also that the dependent variable is measured twice for each group, once at Time A and later at Time B, and that the independent variable is introduced in the interim. If the change in scores on the dependent measure from Time A to Time B differs between the two groups, this discrepancy may be due to the independent variable **or** to distinctive naturally occurring developmental processes for the two age categories that comprise the two comparison groups. [|Background Information] >> It so happened that there were an equal number of boys and girls, so for convenience the boys were assigned to the Control Group and the girls to the Experimental Group. During a class early in the school year, a Generalization Probe was conducted for all children. The experimenter fell ill soon afterwards, and so it wasn't until a class late in the school year that the children in the comparison groups were separated, with the Control Group children viewing the 20-minute cartoon and the Experimental Group children viewing the 20-minute interactive video. Two days after that, a second Generalization Probe was conducted. To see the results, [|click here (1)]. We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. During a class early in the school year, a Generalization Probe was conducted for all children. The experimenter fell ill soon afterwards, and so it wasn't until a class late in the school year that the children in the comparison groups were separated, with the Control Group children viewing the 20-minute cartoon and the Experimental Group children viewing the 20-minute interactive video. Two days after that, a second Generalization Probe was conducted. To see the results, click here (1). We conclude that the 20-minute interactive video improved the children's self-protection skills in a potential abduction situation. >> The first item is an example in which a [|selection-maturation interaction] is a threat to internal validity. If the improvement across the two Generalization Probes was a function of the passage of the school year only for the Experimental Group, then we would also expect to see similar improvement for the Control Group. Because we do not observe this, we might erroneously conclude that the enhanced score for the Experimental Group was not the result of maturation. However, the children in the two comparison groups are unlike with respect to whether or not they viewed the interactive video two days prior to the second Generalization Probe and with respect to gender. Perhaps relevant assertion skills are naturally learned sometime during first grade for girls and later during second grade for boys. The better second Generalization Probe score by the Experimental Group relative to the Control Group may be due to exposure to the interactive video **or** to the differential development of boys and girls. >> In the second item, by randomly allocating subjects to conditions, the only way gender (and other subject-related variables) can be unevenly distributed between the two comparison groups is through chance. Thus, the children in the two groups appear to be unlike only with respect to whether or not they viewed the interactive video two days prior to the second Generalization Probe. Because of this and the fact that the Generalization Probe scores increased for the Experimental Group and not for the Control Group, we can be more confident that the improvement was not the result of gender, maturation, or an interaction between the two.
 * Example**
 * Nonexample**
 * Analysis**

i. Experimenter Bias
Expectations of an outcome by persons running an experiment may significantly influence that outcome. As with instrumentation, the reliability of the instrument used to gauge the dependent variable or deliver the independent variable is suspect, but here the reason for that unreliability is the impartiality of persons in direct contact with the subjects or the data. Suppose that subjects in two comparison groups differ with respect to the independent variable. Suppose also that the experimenter is responsible for administering the appropriate condition to each group and measuring the dependent variable. If scores on the dependent measure differ between the two groups, the discrepancy may be due to the independent variable **or** to differential treatment of the two groups by the experimenter who is under the influence of his or her hypothesis. [|Background Information] >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted. For ease of record keeping, all the Control Group children were tested first, then all the Experimental Group children. Both the student teacher, who recorded how the children responded to the confederate's lures, and the confederate who presented the lures, were heavily involved in the production of the interactive video, and both of them strongly believed in its efficacy. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We can conclude that the 20-minute interactive video was effective in changing what the children did in a potential abduction situation. >> The name of each child in the classes was written on a separate slip of paper. All the slips were put in a bowl and mixed up thoroughly. Students were assigned to the Experimental Group and to the Control Group alternately as their names were pulled out of the bowl one at a time. One day at school, children in the Control Group were told to go to one room and children in the Experimental Group to another room, where they were exposed to their respective conditions. Two days later, the Generalization Probe was conducted, in which children were selected from class to be tested in random order. Both the student teacher, who recorded how the children responded to the confederate's lures, and the confederate who presented the lures, were heavily involved in the production of the interactive video, and both of them strongly believed in its efficacy. Only the experimenter knew which child was exposed to which condition. The mean score for children in the Control Group was 1.2 and the mean score for children in the Experimental Group was 3.4. We can conclude that the 20-minute interactive video was effective in changing what the children did in a potential abduction situation. >> The first item is an example in which [|experimenter bias] is a threat to internal validity. We assume that both the student teacher and the confederate knew the experimental status of each child, given that they tested the Control Group first. Because both the student teacher and the confederate had a stake in the outcome, they may have inadvertently (or not) treated the two comparison groups differently which, in turn, would affect the results. For example, given a tough choice of deciding between a score of two or three for any particular child, the student teacher may be more likely to assign a score of two to a Control Group child and a score of three to an Experimental Group child. The confederate may be more persistent when attempting to lure Control Group children than when luring Experimental Group children. The higher Generalization Probe score by the Experimental Group may be due to exposure to the interactive video **or** to differential treatment of children in the two groups by the graduate students because of their biases. >> In the second item, given that neither the student teacher nor the confederate was aware of whether a child was in the Control Group or the Experimental Group, they could not treat the children in the two groups differently, despite their expectations. We can be more confident that the higher Generalization Probe score by the Experimental Group was not the result of the biases of the persons running the experiment. (Of course it is possible that each child may have given subtle clues as to his or her experimental status. As a further safeguard against experimenter bias, the role of the student teacher and the confederate should be played by persons who have no knowledge of or stake in an expected outcome.)
 * Example**
 * Nonexample**
 * Analysis**

Next
> You have reached the end of Part 1 of this tutorial. You may now proceed to Part 2, a 36-item exercise that will give you practice assessing the internal validity of hypothetical experiments. For those experiments that are not internally valid, you will be asked to go one step further and identify the source of the threat. > [|Proceed to the exercise...]

Item 1 of 36
//Decide whether or not the following experiment is internally valid. If not, identify the source of threat to its internal validity.// The potential of disulfiram medication for treating problem drinking was investigated. Severe adverse physical reactions result from drinking alcohol after taking disulfiram. Outpatients of a rural community alcoholism treatment clinic were randomly assigned to either a Traditional Group or a Disulfiram Group. Patients in the Traditional Group were given five structured sessions devoted to education about alcoholism through films and discussion of printed materials. Patients in the Disulfiram Group were treated similarly; in addition, their intervention stressed the importance of taking disulfiram at a set time, place, and in the company of another, and they practiced doing this through role playing and communication training with a significant other. Ninety-five percent of the patients in the Traditional Group completed the program compared to 60% of the patients in the Disulfiram Group. Six months following treatment, patients who completed the Traditional treatment reported being abstinent on 15 of the previous 30 days while patients who completed the Disulfiram treatment reported being abstinent on 25 of these days. We can conclude that the Disulfiram treatment was more effective than the Traditional treatment.

This experiment is internally valid. Selection
 * No.** This experiment is not internally valid, and the source of threat is:

History

Maturation

Repeated Testing

Instrumentation

Regression to the Mean

Experimental Mortality

Selection-Maturation Interaction

Experimenter Bias Review Definitions]

Concept Definition: Internal Validity
> Experimental psychologists select or manipulate one or more conditions in order to determine their effects on a measure of the behavior of a subject. For example, the smell of delicious food may be presented periodically to subjects in order to assess its effect on their salivation response. > The manipulated condition is referred to as the **independent variable,** and the behavioral measure is called the **dependent variable.** In our example, the smell of delicious food would be the independent variable and the salivation response would be the dependent variable. > An experiment is **internally valid** to the extent that it shows a cause-effect relationship between the independent and dependent variables. Suppose the experimenter observes an increase in the probability of salivating after presenting the smell of delicious food, leading him or her to conclude that the food smell produces salivation. For this conclusion to be internally valid, the experiment must be designed so that conditions other than the food smell are ruled out as potential causes for the behavior change. For example, if the sight of the food is presented along with its smell, then an alternative explanation could be that the sight of the delicious food, and not its smell, is responsible for the increased probability of salivating. > There are nine sources of threat to internal validity. They are: > Before learning more about each one, first read the following background information to a hypothetical experiment.
 * 1) [|Selection]
 * 2) [|History]
 * 3) [|Maturation]
 * 4) [|Repeated Testing]
 * 5) [|Instrumentation]
 * 6) [|Regression to the Mean]
 * 7) [|Experimental Mortality]
 * 8) [|Selection-Maturation Interaction]
 * 9) [|Experimenter Bias]

Item 1 of 36
//Decide whether or not the following experiment is internally valid. If not, identify the source of threat to its internal validity.// Mrs. Grayson, a high school teacher, believed that her Dramatic Arts course not only improved her students' ability to act but also their ability to socially interact. She decided to put this hunch to the test. She hired, Sam, an experimental psychologist. At the beginning of the school year, Sam administered a questionnaire to two classes of similarly aged students taught by Mrs. Grayson, one in English and the other in Dramatic Arts. The questionnaire was designed to measure social adjustment. Then, at the end of the school year, having finished their respective courses, the students once again completed the questionnaire. A few students were enrolled in both courses, so they were excluded from the data analysis. On the first test, both groups scored in the "normally adjusted" range; on the second test, the English class retained its "normally adjusted" status while the Dramatic Arts class improved to within the "well adjusted" range. Looking over these results, Mrs. Grayson was even more convinced that her course in Dramatic Arts enhances her students' social skills.

This experiment is internally valid. Selection History
 * No.** This experiment is not internally valid, and the source of threat is:

Item 1 of 36
//Decide whether or not the following experiment is internally valid. If not, identify the source of threat to its internal validity.// Mary, a psychology student hoping to enroll in law school, was interested in the malleability of people's recollections. She conducted the following experiment. She randomly divided subjects from her volunteer list into an Experimental Group and a Control Group. On Day 1, Experimental Group subjects were shown a videotape of a husband and wife squabbling, after which they answered questions about what they saw. One of these questions was phrased: "Did you hear **the** profanity?" In fact, the argument did not contain any swearing. On Day 2, they viewed the same videotape. This time the wording of that one question was altered to: "Did you hear **any** profanity?" Also on Day 2, Control Group subjects watched the videotape and were asked: "Did you hear **the** profanity?" Consistent with Mary's expectations, Experimental Group subjects' were less likely to recall hearing profanity in response to the second question than to the first question; and, on Day 2, reports of hearing profanity by Experimental Group subjects were fewer than by Control Group subjects. Mary concluded that what her subjects remembered was influenced by how she worded the leading question.

This experiment is internally valid. Selection
 * No.** This experiment is not internally valid, and the source of threat is:

History

Maturation

Repeated Testing

Instrumentation

Regression to the Mean

Experimental Mortality

Selection-Maturation Interaction

Experimenter Bias Review Definitions]

.