A LESSON IN RESAMPLING STATISTICS Julian L. Simon Teacher ("T"): Good morning. Let's talk about poker. What is the chance of getting a pair of two cards of the same denomination -- two fives, say, or two queens -- in a hand of five cards dealt to you? Student Abel: l in 5. T: What do you mean by "1 in 5"? Students: [Silence] T: You mean that every single time you deal five hands you can expect to get a pair? Doug: One in five times on the average. T: How sure are you that it's one in five? Abel: Well, it seems to me that I usually get a pair about every five times. T: What would you say if I told you it's not one in five, but instead the chances are 1 in 2? Becky: I'd say "Prove it". T: Who said "Prove it?" Becky: Me -- I say it's about one in twenty. T: So we've got a variety of views here -- one in twenty, one in five, one in two. How would you go about finding out who's right? Becky: Ask an expert. T: Well, that's one possibility. Getting advice from people who know a lot about a subject is always a wise first tactic. But how would you know for sure whether the so-called expert knows what she or he is talking about? Finding an expert who is really an expert is not easy unless you are an expert yourself. Let's assume that you don't have a tested expert handy. How would you go about finding a reliable answer on your own? Charlie: Calculate from how many cards are in the deck, and how many cards you have. Use a formula. T: Okay, how exactly should we calculate? Does anyone here know what the right formula is? [Silence] T: Does that mean that we are stuck? Is there anything we can do if we don't know the formula? And by the way, people often think they know the right formula but don't, and therefore calculate the wrong answer. That is a very big danger unless you are a skilled mathematician. Is there anything we can do now? Charlie: Deal some hands. T: Deal some hands? That's a wild and radical idea. [Laughter] What do you mean? Charlie: Deal some cards. T: Give us an example of what you mean. Charlie: Play poker and keep track. T: Let's be more specific. How would you do it? Charlie: Okay, deal five cards -- T: Well, just by coincidence I brought a few cards with me. [Dumps thirty decks of cards on the table.] Pass them around. Charlie, tell us exactly what to do with the cards. You're the boss. Stand up here in front and give us instructions. [Charlie gets up and comes up front] Charlie: Okay, you students [laughter] this is what we're going to do. Everybody deal out five cards. Becky: Do we shuffle the deck first? T: Good question. Should they shuffle, Charlie? Charlie: First shuffle the deck and then deal five cards. [Students shuffle and deal a hand.] Charlie: How many of you have a pair? [Students raise their hands if they have a pair.] T: [Charlie] Now what? Charlie: We can say that the chances are 7 out of 12 [the number who have a pair among the 12 students] that you get a pair. T: [To the class] Does that do it? Is that our answer? Abel: Next time we might get a different number of pairs. T: Why is that? Abel: Because the results differ from deal to deal. T: Very important. Very very important. The difference from trial to trial is one of the key ideas in probability and statistics -- is the idea of random variability. The results vary from one event to to the next. A large proportion of the world's mistakes in business, sports, and politics occur because people do not recognize random variability for what is, and instead attach some meaning to the pattern in one particular trial. So what should we do about the random variability? Becky: Deal the cards again and again, and mark down the results. T: So we must keep track of the results. Alright, Becky, you're in charge now, tell us what to do. Becky: Everybody shuffle your cards. Doug: Do we have to shuffle the cards? How about just dealing a second hand from the deck? Would it make a difference whether we do that, or instead shuffle the deck and deal out five cards from the entire shuffled deck? [Becky is silent.] T: What do you all think? Does it make a difference whether we simply deal a second hand from the unshuffled deck, or shuffle and start again? [Some hubbub, various voices and opinions] T: So there is a difference of opinion. How should we settle the difference of opinion? [Silence] T: We can't answer every question at once. Let's assume for the moment that it doesn't matter, but let's also agree that we will settle the question later by the best possible method -- that is, try it out both ways. May I have your permission to postpone? Of course if we do replace the five cards in the hand we deal, and use the entire deck, Doug's comment is very important, because if you replace the cards and don't shuffle them you have a big problem. Now what, Becky? Becky: Deal another hand. Charlie: Wait a minute. How many people are playing in this game? You could have like five people playing, or three people playing. Wouldn't that make a difference? T: You say the chances might be different if you had five people playing or three people. That's a very interesting question. But let's put that aside for the moment, and go on with what we were doing. Essie: Shuffle them up and do it again. T: How many times are we going to do this, Essie? Essie: Everyone should deal ten hands. T: You're the boss, Becky, tell people what to do. Becky: Everybody, ten times, deal a hand, see if you have a pair, write down what you get. Do the whole thing ten times. [Much dealing and writing] Becky: Each of you tell me how many pairs you got. [Gets the the results and writes them on the board.] T: So what's the answer, Becky? Becky: The chances are 55 out of 120. T: What's that as a fraction, and as a probability? Abel: Eleven twenty-fourths, or about 46 per cent. T: Are 120 hands enough? Foxey: Yeah. T: Well, 120 hands might be enough. Obviously it depends on how accurate you want to be, right? If we had more time, we could deal out another 120 hands, and compare the result. If there wasn't much difference we could be satisfied. Or we could do it again and again. And sooner or later we would get enough accuracy to safely play poker with, which is what we are interested in here. So that's how you could go about finding out the chances of getting one pair or two pair or a royal flush in poker. If you tried to figure it out mathematically it might take you a lot longer to learn what you need to know. You might have to wait a few years until you go to college and then take two courses or six courses in probability theory, then work out the formula, and even then there would still be a fair chance you would wind up with the wrong formula. But with the method you all have just worked out, you're going to get a very good answer. Now, What are the chances of getting a seven in two throws of the dice? Of course you've all lived very sheltered lives and none of you have ever seen a pair of dice before, right? [laughter] So what are the chances of throwing a seven? [Silence] Doug: Throw the dice and see. T: Good move, Doug. Throw the dice once, and then what? Doug: Write down what happens. T: Then what? Doug: Do it again. T: Alright, Doug, you're the boss. You get it done. Narrator: Doug runs the class experiment, which we won't show to save time. T: Now let's consider a different kind of problem. Let's say that somebody comes along and says, what are the chances if I have four children that three of those children will be girls? How would you go about finding that out? Foxey: Shuffle up a bunch of kids and deal out four. [Laughter] T: Sounds fine in theory, but it might be a bit difficult to actually carry out...How about some other suggestions?. Essie: Have four kids and see what you get. T: Sounds good. But let's say you have four children once. Is that going to be enough to give you a decent answer? Charley: No. You need more families. T: How many families do we need? Charley: How about a hundred families? T: So you're going to produce a hundred families. That's reasonable. But it could take you a little while to have a hundred families, a little strength and energy and money. So we scratch our heads and say, hold on here. Producing a hundred families is a very sensible idea, but it doesn't seem to be practical at the moment. Another suggestion? Doug: Take a survey. T: What do you mean by "take a survey"? Doug: You go around and ask people who have four children how many are girls. T: Super idea. Absolutely super. A survey is a terrific idea because it focuses us on trying to get an answer to a problem like this one by going out and looking at the world instead of just trying to do mathematics. Nothing wrong with mathematics, but there's always a great deal to be said for trying to get the answer by going out into the world and looking. How many families are you going to survey, Doug? Doug: A hundred. T: Any particular families? Doug: Families with four children. T: What are you going to ask the hundred families? Doug: How many of your children are girls? T: You're going to find a hundred families that have four kids, and ask each one how many are girls. Sounds good. Any problems? Essie: It's going to take a lot of time to find a hundred families with four kids. T: Yes, but it's a lot quicker than growing a hundred families. I'll bet if the twelve of you went out now, by the end of the day you could find a hundred families with four children and you could get a pretty good answer to this. T: Let's try it. Okay teacher? Regular class teacher: We have some other things we have to do today, unfortunately. T: Okay, but let's remember that we could try it, and as scientists that would be an excellent way to do it. T: Is there another way we can tackle the problem? What else can we do? Let's say that some businessperson comes in here and says, "I'm going to give you a thousand dollars if you can come up with a pretty good answer inside of one hour." You don't have time to take a survey. What would you do? Think about it for a few minutes. Keep in mind that a good solution might be worth a thousand bucks. That should be enough to make you think. Foxey: You can think about your friends's families that have four kids, and count how many of them have three girls. T: Terrific idea. That's like taking a survey, but a lot faster. Maybe that will get you the thousand dollars. Without in any way being critical of that terrific idea, let's ask how else might you go about it. Think back to the first problems we solved with poker and dice. Charlie: Simulation. T: Simulation? What's a simulation? Charlie: You take something like a four-sided die or something like that. T: In other words, you want to do something here in the classroom which is like having kids. Can somebody get more specific? Essie: We could put an equal number of red and black balls in a pot, and pull four of them out. That would be like a family. T: Does that make sense? Several students: Yeah. T: Essie, how many balls are you going to put in the pot? Essie: Four of each. T: How about if we put in two of each -- two red and two black -- and you reach in and you mush them around and take out four. Essie: That wouldn't work. T: Why not? Essie: Because you'd have to have at least three red ones. T: Exactly. So you couldn't possibly get three red ones if you only had four balls, two red and two black. How about if you only had six balls in there? Essie: That wouldn't work, either. T: Why wouldn't it work? Essie: Because you couldn't have a combination of all girls. T: That's right. If every combination isn't possible, there obviously is something wrong. Now what about four red and four black? George: The chance of getting four girls would still be pretty small. T: Let's see what is going on when we only have a few balls in the pot? What is the chance of having a girl the first time you have a child? Class voices: Fifty-fifty. One in two. Fifty percent. etc. T: If you have four red and four black balls, what is the chance of getting one red one? Becky? Becky: Fifty per cent. T: What is the chance of having a girl the second time a real family has a child? Becky: Fifty per cent again, I guess. T: Now, what is the chance of drawing a red ball from a pot that starts with four red and four black, after you draw a red ball? Doug: Three in seven, which is less than fifty percent. T: Right you are, Doug. So you can see why we can't have a pot with just three red or three black, or 4 and 4, or 10 and 10, for the same sort of reason. Foxey: But if we have a big pot of both red and black balls, it would almost be okay, wouldn't it? T: You're right, Foxey. That would be a very satisfactory approximation. But we would need a lot of balls. Is there some other method we could use to get around this problem? Let's try someone we haven't heard from lately. George, what would you do? How would you go about it? What are you going to put into the pot and how are you going to deal with it? George: How about putting just two balls in, one red and one black, and put the ball back after you draw it? T: Bingo. You've got it exactly. We call this "sampling with replacement", meaning that we put the ball back each time to keep the chance of drawing a red one the same. George, tell us exactly how we would go about making an estimate of the chances of getting three girls in four children using just the two balls. George: Draw a ball, and write down what color it is. Repeat that four times. Count the number of red balls. If the number is "3", write down "yes", otherwise write down "no". T: Is once through enough? George: Do the whole operation about a hundred times. T: Does that make sense, class? Class voices: Yeah, yes, okay... T: That procedure would work quite well. But we don't have any balls. Essie, you suggested the balls. Is there any way that we could use this thing instead? [Holds up a quarter.] Essie: I suppose we could flip a coin and the head could be like red, like a girl, and the tail like black. T: Absolutely. And a coin will be easier to think about later on. So -- how would we do it with a coin? George: Flip the coin, Teach. T: [Flips]. Heads. Now what? George: Record it. T: You do it, George. Now what are we going to do next? George: Do it four times. T: Ok, do it George. [Does it] T: What happened? George: Two and two. T: What does that mean? George: It means we didn't get three girls. T: Now what? George: We've got to do it a lot of times. T: Can you get the class to help you, George? Yes? Then go ahead and do it. Come on up here and do it. I suggest you put the results on the blackboard. George [comes up to front]: Everybody take a a coin, flip it, write down what you get, and do that four times. [All do it] George: What did you get? Abel? [Writes on board] Becky? [Etc.] T: What do the results say, George? George: The results say that 2 out of 12 times we get three girls. Charlie: What happens if we get four girls? Do we count that? Narrator: Here there is discussion about whether four girls should be counted. T ends by emphasizing that the decision should be made with an eye to the purpose for which the estimate is being developed. T: Let's continue. Do we have enough trials? Essie: With only 12, we might get different results next time. T: Okay, how many more trials should we do? Essie: Let's do a hundred altogether. T: Okay, let's let George do it. [A couple of students groan at the joke.] Narrator: George presides over a hundred trials and compiles the results from each student on the blackboard. T: What do we do with the results, Essie? Essie: We count the number of yes's and make a ratio. T: A ratio of what? Essie: The ratio of yes's to yes's plus no's, because we want to know what proportion of all the times we get yes, right? So we compute the ratio of the yes's to all the times we tried, all the families we had. And that will be our answer. T: Sounds good to me. When the guy with the thousand bucks comes storming in here and says, "Have you got my answer?," we can say, "Ah yes," very coolly. And we'll be a thousand dollars richer. Foxey: I have a question. Do an equal number of boys and girls get born? Are boys fifty per cent? T: That is an important question. And the answer is "No." About 105 boys are born for every 100 girls, or 106 or 104, depending on the country. Now I ask you, Foxey, is the fact that the ratio is, say, 105 to 100, rather than 100 to 100, a difference big enough to spoil our method here? Foxey: No. T: Why not? Foxey: Because 100 to 100 might be close enough. T: Yes, you are right that we're interested in getting an answer which we can consider close enough for what we want to do. In practical life we're never interested in getting a perfectly accurate answer, because there is never a perfectly accurate answer. That is, the question is only whether 100 to 100 rather than 105 to 100 is good enough for our purposes here. But that means we've got to ask what our purposes are here. Maybe we should ask the person who's offering to give you a thousand dollars, "What do you want this estimate for?" And if this person says, "Well I want to go into business making boys clothes and girls clothes," then probably an answer which is off by as much as would be caused by 100-100 instead of 105-100 wouldn't cause much harm. If we were trying to aim a rocket at the moon, however, this procecure might cause us to be off target by thousands of miles. In that case we would be sensible to pay more attention to the accuracy and carry out the procedure a bit differently. So it is crucial always to know just how much accuracy we need. Let's say that the 105 to 100 isn't all that much of a problem for our purposes, and assume it's fifty-fifty for convenience. We're doing terrifically. The only problem is that this cardshuffling and coinflipping takes time, and in more complex problems it would take even more time. So let's speed up the work with a handy-dandy card-dealer and coin-flipper called a computer, this machine here. We're going to make this machine do the same thing that we did with our coins. But we've got to tell this machine some special words to get it to do what we want it to do, because it is not as smart as you kids are. Let's get the computer to flip coins for us, or rather, to do something which "simulates" flipping coins, which in turn simulates having children. Of course the machine doesn't really flip coins. Rather, it only deals with symbols like numbers and letters. So let's let "1" be a girl, and "2" be a boy. Before we begin to write a program, we've got to do the really hard stuff, like figuring out how to turn the machine on. Narrator: Here we briefly show how to insert a floppy disk, find the "On" switch, and call up the program RESAMPLING STATS with the command "Stats". The students also are shown how to begin with the main menu [show] and get a file [show] and then edit a file [show cursor movement] and afterwards how to run the file from the main menu. They are also shown that there is a tutorial for them to study when they are alone. T: We first give the computer a command that tells it to make numbers. The command we use to make numbers is "generate." [show GENERATE on screen] You must spell each of these commands exactly, and provide it exactly the information it requires. If you write "yenerate" or "venerate" the machine isn't going to understand you, although if we wanted to, we could write a program that would correctly read most of our errors. But ordinarily the computer is very, very specific. You've got to get it right. But if you get the commands right, the computer won't make a mistake. So it's a pretty good deal -- you do your part correctly, and the machine will do its part correctly. We want to generate four numbers, "1"s and "2"s, chosen randomly just like flipping a coin. So we look in the Manual, or on this "Quick List," which tells us that the first number we write after "generate" specifies to the computer how many numbers to generate randomly, using a random-number device inside the computer that works like a lottery. How many numbers do we need? Doug: A hundred. T: That might be the number of families we want to create. But first we must tell the computer how many children in one family, just as in our first step when working with coins we decided how many times to flip a coin to get one family in our first step. Foxey: Four numbers. T: Okay, we write "GENERATE (4)" The Manual tells us that the next part of the GENERATE command is the numbers the computer is going to make for us. Let's make it one's and two's, but it could be "zero's" and "one's" or whatever. So we're going to randomly generate four numbers that are either "1" or "2". Now we must put these numbers someplace so that we can keep track of them. We tell the computer to put them in a little slot someplace, and we'll call that slot "A", a special location in the computer. So we write "GENERATE (4) (1,2) (A)". Up until now I have been putting parentheses around what we call the "parameters" of the command. The Apple program requires that we do that. But for the IBM program the parentheses are not necessary, and a space between the parameters is sufficient to do what we call "delimit" each parameter. From here on I'll leave off the parentheses for convenience. Now we must tell the computer to count how many girls are born. The next command logically is called "count". The Manual says that we must first tell the computer where to count. So we tell the computer to look in location A where we had put the result from the previous step. Next we tell the computer what to count in A -- the number of "1"s for girls -- and where to to put the result of the COUNT, which we decide will be location J. The command then is COUNT A J 1. These actions by the computer simulate what we do with coins. We have now constructed one family with those two commands. We must keep a record of this result, so we put it on a scoreboard inside the computer with the command SCORE. We must tell the computer where to put the score. (I always call the scoreboard Z.) We've also got to tell the computer where to look for the result -- the Scalar J where we had stashed the result. So -- score J Z. You said we need not just one trial "family" but a hundred families. So we've got to tell the computer to carry out this whole operation a bunch of times. We order REPEAT a hundred times to make one family. We put the REPEAT command at the beginning of the commands for a single trial, along with the number of repetititions we want, and then we use the command END to finish a repetition. You don't need to know this word, but just for the fun of it we have just completed a "loop", which makes sense because the machine goes round and round that loop a hundred times between REPEAT and END. When you get finished going around this loop you stop because it told you how many times to go around this loop, a hundred times. Okay? So now we've got the results of a hundred familites. Right? After we have completed our hundred families we need to check the record on our scoreboard. We COUNT among the hundred yes's (that is, 3's) and no's (that is, numbers other than 3) how many yes'es there are. We put the answer in K and PRINT it. Now we can extract our result from the machine. So we tell the machine to PRINT the result. In this case the word PRINT tells the machine to show the result on the screen. We could also print on paper. So let's actually print. [show PRINT.] We want to know if we got three girls. See we have our scoreboard show the number of families with zero girls, one girl, two girls, three girls, or four girls, in each and every family. Of course we especially want to know how many families with three girls. Now we must tell the computer to RUN the program. Let's. The program is doing it, it's going through the loops right now. Now we can look at Z for each case you looked to see the number of girls. And we can look at K to see how many families out of the hundred had three and exactly three girls. So far we have worked problems in "probability". Let's now consider a problem in the sub-field of probability called "statistics". First I'm going to tell you something you won't believe. Professional baseball players do not suffer from slumps, and professional basketball players do not have "hot hands". Anybody here ever hear of Larry Bird? Well, in the first three games of the 1988 NBA playoff series between Boston and Detroit, Larry Bird got only baskets 20 of the 57 shots he attempted in the first three games. Everybody agreed that Bird was in a slump. As the Washington Post said (May 30, 1988, p. D4): Larry Bird is so cold he couldn't throw a beach ball in the ocean... They fully expect Bird to come out of his horrendous shooting slump... It is safe to assume that if Bird doesn't shake out of his slump Monday, it will be difficult and probably even impossible for Boston [continue] What does "slump" mean? If it means anything it means that the chance of Bird scoring a basket at the end of that period is lower than usual. And coaches and players usually conclude that the player should take fewer shots than usual because he does not have a "hot hand". Narrator: In a regular class, the following ideas would be drawn from the students by the instructor. For lack of time, the instructor will simply lecture. But did Bird really have a "cool" hand? That is, was his shooting eye less good during this period than it usually is? Or could that sequence of events have occurred just by chance, just as if he was a coin, which coin cannot have a hot hand? The coin's chance of success and failure stays the same from flip to flip, even though gamblers feel that a coin or a set of dice is hot or cold when the coin shows a long run of misses. Therefore, let's see just how unusual it would be for a coin that "succeeds" 48 percent of the time to show a "slump" like Bird's. First we generate 57 numbers between 1 and 100. GENERATE 57 1,100 A [show on screen, or printout] Next, we count how many of those 57 shots were "baskets", that is, were between 1 and 48 (remember that Bird is a 48 percent shooter on the average). COUNT A 1,48 J Next we score the result. SCORE J Z Then we repeat those operations 1000 times by putting a REPEAT statement in front of those three operations that make up one trial, and an END statement after them. Our program now looks like this: REPEAT 1000 GENERATE 57 1,100 A COUNT A 1,48 J SCORE J Z END Afterwards, we count the number of trials in which the result is fewer than 21 baskets. COUNT Z K < 21 Then we PRINT the result from K, and the results for the separate trials in Z. REPEAT 1000 GENERATE 57 1,100 A COUNT A 1,48 J SCORE J Z END COUNT Z < 21 K PRINT Z K Now let's run our program and see what we get. [Program runs. Show program] The results suggest that in about four trials out of a hundred, our simulated Larry Bird gets 20 or fewer baskets in 57 shots. That means that even if nothing changes in his shooting, during one in every 25 series of 57 shots, on average, he would shoot that poorly or worse. (This does not mean that the chances are 24 in 25 that such an event did not happen by chance. Rather, it means that in every hundred sets of 57 shots, we can expect four to be that poor. Similarly, we can expect some series to seem terrific when they also are occurring just by chance without a change in the system.) It would seem, then, that it would be a a mistake for the Celtics to tell Bird to do anything different after this cold streak than ordinarily. Bird should take just as many shots as usual, in his usual style, just as one continues to use a coin even after it has come down heads a bunch of times in a row. In other words, if it ain't broke, don't fix it. Here we note the importance of the context in which we get the data. The reason we are not impressed with a 4-in-100 probability, and continue to expect that in his upcoming games Bird will have shooting success at his long-run average of 48 percent, is that Bird shoots hundreds and hundreds of shots each year, and sooner or later he will have a set of 57 shots with very poor results, a set of shots with very good results, and a variety of other outcomes. But if this were a person for whom we had no other information - say, a high school basketball player at the beginning of his first season - then our best guess would be that in the future he would shoot baskets at the rate of 20 in 57. Understanding variability of this kind is the key to Japanese quality control, taught to them by an American statistician named Edward Deming. And resampling is a remarkably effective and easy tool to use in studying such quality control in practical situations. lesson 9-175 dir statwork August 9, 1992