CHAPTER III-4 THE PEDAGOGICAL USES OF PUZZLES Students mostly come into an introductory statistics course with fear, and sometimes with loathing. When taught the resampling way, the instructor's first task is to resassure the students that they need not fear. Though the course is difficult because inference is difficult, and though it will require hard thinking, they will be able to understand all that is taught, and most of them will finish the course having enjoyed it and having been glad they took it. The instructor can honestly say these things because they are demonstrably true, as Chapter I-1 shows. Baffling though simple probability puzzles can help students see that simulation works. Our practice is to begin a course by telling students that the most important element in the course is the learn the habit of saying "Try it" when faced with a problem in probability or statistics, and then to actually do so. We then continue with the famous two-heads puzzle, in connection with a one-question test that can be said - only half in jest - to increase the class's IQ. ships and Monty Hall. HOW IQ'S DOUBLE IN TEN MINUTES On Tuesday, September 12, 1995 the Style section of The Washington Post presented an IQ test, many of whose questions were mathematical. Ever wonder just how smart you are? Gary Gruber...has constructed this short test - just 12 questions - to challenge your intelligence and help you determine whether you really have the smarts. There is no time limit. The first question was as follows (except that I substitute 20 blue and 20 brown socks for 40 and 40): Suppose I have 20 blue socks and 20 brown socks in a drawer. If I reach into the drawer without looking at the socks, what is the smallest number of socks I must take out to make sure that I have a pair of socks of the same color? On September 13 I read that question to an introductory statistics class at the University of Maryland, and asked the students to write down and hand in their answers. The distribution of answers was as follows: 5 "1's"; 7 "2's"; 12 "3's"; 1 "4"; 9 "40s". Even before we know the right answer, the great dispersion among the answers proves conclusively that many of the students got the wrong answer. Without further ado, I a) took out a deck of 80 playing cards, half red and half black, b) shuffled them, and c) began to lay them one by one out on the light table so that they showed on the screen. "Tell me to stop when we get a pair", I said. I dealt a red, then another red. "Stop", someone hollered. So I replaced the two cards, shuffled again, and dealt out a red, a black, then another red. "Stop", I heard. So I repeated the operation, and again, ten times. At this point I again asked the class the answer to the original question. All except one confused kid said "Three", and he quickly changed his mind. A miracle! We have moved from a sum of 12 correct answers out of 34 to a sum of 33 correct answers out of 34. Speaking analogically, we might say that the collective IQ of the class has more than doubled in less than ten minutes. Please note that the experiment not only led people to reach the correct answer inductively - which is plenty of benefit by itself - but it also led most of them to understand why the correct answer is what it is, a bonus. Now consider what would have happened if immediately after reading the question to the class I had given the following instruction: "Before answering the question, test out any answer by experimenting a number of times with a set of cards or other devices that can be likened to a drawer full of socks, or even with an actual drawer of 80 socks", and made available to them a variety of materials including playing cards. Since then this experiment has been done with a slightly more sophisticated research with other classes. The numbers of socks are changed to 20 and 20 so that one deck of cards will suffice. The questions are written and presented on paper or by overhead projector in order to ensure that no one can claim he s/he misheard or that the problem was wrongly stated. Index cards are given out with spaces numbered 1-4 for studentsto write their answers on. The instructor opens with "We're now going to double the smarts of this class", following on the language in the statement of the socks problem. Then the socks problem is presented, and the students are instructed to write down their answers. Then without further discussion the follow problem is presented: I toss two coins into the air and catch them on my palm. I look at the two of them and say, "One of the coins shows a head. What is the probability that the other shows a head, too?" The students are asked to write down their answers. Then the class is asked for their answers orally, and when most or all have indicated answers - perhaps by raising their hands in a big class - the instructor tells them that most or all are wrong, and ends with "Now what are you going to do?' Usually silence follows. So the instructor says: "What would you do if you knew that in ten minutes someone is going to come through that door and give a thousand dollars to all who can tell her or him the right answer. After some banter about bribing the instructor for the right answer, either someone spontaneously says "Try it", or the instructor induces that response. The instructor distributes coins, and tells the students to write down their new answers on line three. And when they have finished, s/he polls the students and shows that most now have reached the correct answer. Incidentally, Marilyn vos Savant published the same problem in Parade, as follows: A shopkeeper says she has two new baby beagles to show you, but she doesn't know whether they're male, female, or a pair. You tell her that you want only a male, and she telephones the fellow who's giving them a bath. "Is at least one a male?" she asks him. "Yes!" she informs you with a smile. What is the probability that the other one is [also] a male? vos Savant gave the answer as one in three, and many PhDs wrote her to say - with great confidence - that she was all wrong. It is a crucial part of the lesson we want the students to learn that with the simulation method, students can obtain correct answers to problems that baffle highly-trained professionals when they attempt to address the problems with only reason and mathematical deduction. Another way to do this problem with simulation is with a random number number, this also is shown to the students, as follows: Consider a two-digit column of random numbers in Table 3, using odd numbers for females and even for males. The first forty lines are sufficient to suggest the correct probability, and also to make clear the mechanism: Two-female pairs, a fourth of the cases, are excluded from the sample. And mixed pairs - which give a "no" answer - are two-thirds of the remaining pairs, whereas the only pairs that give "yes" answers - two males - are only a third of the remaining pairs. So once more simulation gets it right very quickly and easily, whereas the deductive method of mathematical logic results in much confusion. Next the instructor shows the group that the experiment can also be done with two pairs of playing cards, each pair containing one red ("heads") and one black ("tails"), choosing at random one card from each pair. Then the instructor says: "Now think again about the socks problem, and write down an answer on line 4 of the index card". The results of a typical class were as follows: INSERT RESULTS If we call zero answers correct "No smarts", and all answers correct "Total smarts", can we not say that raising the score from 2 of 15 correct to 10 of fifteen correct - as was the case in my spring class - more than doubles smarts? Or more soberly, is it not legitimate to say that one can raise a group's IQ by giving a specific instruction, or even the more general instruction to obtain an answer to any mathematical question that permits it (and many or most questions do) by using the simulation process of experiment with actual physical objects? At this point someone says: "But you haven't actually raised people's IQ," or "You haven't really made them smarter." Is that so? IQ is defined as the score on an IQ test. All attempts to find some "real" entity that IQ supposedly represents have been fruitless. So if one can raise a test score in this fashion, one can just as legitimately claim to have raised IQ as one can by special training of young children. But no need to argue this linguistic point. The key finding is that this procedure greatly increases people's ability to reach sound solutions to problems in probability. If one can raise IQs as markedly as can this device, two important questions arise: 1. If it can be done this way for these questions, why not in this or other ways for other questions? 2. If educating people to remember and practice the simple instruction "Try it" can increase the proportion of correct answers to this question - and to the entire range of questions in probability and statistics (as it does; see Simon, Atkinson, and Shevokas (1976); Simon and Bruce (1995) - why do we not teach people this method in addition to, if not as a substitute for, conventional formulaic methods in statistics and probability? SOME OTHER CLASSIC PUZZLES The Problem of Three Chests Here is another problem that shows the power of simulation: A Spanish treasure fleet of three ships was sunk at sea off Mexico. One ship had a trunk of gold forward and another aft, another ship had a trunk of gold forward and a trunk of silver aft, while a third ship had a trunk of silver forward and another trunk of silver aft. Divers just found one of the ships and a trunk of silver in it. They are now taking bets about whether the other trunk found on the same ship will contain silver or gold. What are fair odds? This is a restatement of a problem that Joseph Bertrand posed early in the 19th century. In the Goldberg variation: "Three identical boxes each contain two coins. In one box both are pennies, in [the second both are nickels, and in the third there is one penny and one nickel. A man chooses a box at random and takes out a coin. If the coin is a penny, what is the probability that the other coin in the box is also a penny?" The following simulation arrives at the correct answer: 1. Construct three urns containing the numbers "7,7", "7,8", and "8,8" respectively. 2. Choose an urn at random, and shuffle the numbers in it. 3. Choose the first element in the chosen urn's vector. If "8", stop trial and make no further record. If "7", continue. 4. Record the second element in the chosen urn's vector on the scoreboard. 5. Repeat steps (2 - 5), and calculate the proportion "7's" on a scoreboard. (The answer should be about 2/3.) The three-door problem The great-grandaddy of baffling-though-simple puzzles is the famous problem of the three doors, long known by statisticians but recently popularized as the Monty Hall game show problem in Parade by vos Savant: The player faces three closed containers, one containing a prize and two empty. After the player chooses, s/he is shown that one of the other two containers is empty. The player is now given the option of switching from her/his original choice to the other closed container. Should s/he do so? Answer: Switching doubles the chances of winning. When this problem was published in the Sunday newspapers across the U.S., the thousands of letters - including a good many from Ph.D.'s in mathematics - show that logical mathematical deduction fails badly in this case. Most people - both laypersons and statisticians - arrive at the wrong answer. Simulation, however - and hands-on simulation with physical symbols, rather than computer simulation - is a surefire way of obtaining and displaying the correct solution. Table 6-1 shows such a simple simulation with a random-number table. Column 1 represents the box you choose, column 2 where the prize is. Based on columns 1 and 2, column 3 indicates the box that the "host" would now open and show to be empty. Lastly, column 4 scores whether the "switch" or "remain" strategy would be preferable. A count of the number of winning cases for "switch" and the "remain" gives the result sought. Table 6-1 Not only is the best choice obvious with this simulation method, but you are likely to understand quickly why switching is better. No other mode of explanation or solution brings out this intuition so well. And it is much the same with other problems in probability and statistics. Simulation can provide not only answers but also insight into why the process works as it does. In contrast, formulas frequently produce obfuscation and confusion for most non-mathematicians. The Birthday Problem We then move from the pure brain-teasers to a famous examination question used in probability courses: What is the probability that two or more people among a roomful of (say) twenty-five people will have the same birthday? To obtain an answer we need simply examine the first twenty-five numbers from the random-number table that fall between "001" and "365" (the number of days in the year), record whether or not there is a duplication among the twenty-five, and repeat the process often enough to obtain a reasonably stable probability estimate. Pose the question to a mathematical friend of yours, then watch her or him sweat for a while, and afterwards compare your answer to hers/his. I think you will find the correct answer very surprising. It is not unheard of for people who know how this problem works to take advantage of their knowledge by making and winning big bets on it. (See how a bit of knowledge of probability can immediately be profitable to you by avoiding such unfortunate occurrences?) More specifically, these steps answer the question for the case of twenty-five people in the room: Step 1. Let three-digit random numbers "001-365" stand for the 365 days in the year. (Ignore leap year for simplicity.) Step 2. Examine for duplication among the first twenty-five random numbers chosen "001-365". (Triplicates or higher-order repeats are counted as duplicates here.) If there is one or more duplicate, record "yes." Otherwise record "no." Step 3. Repeat perhaps a thousand times, and calculate the proportion of a duplicate birthday among twenty-five people. Here is the first experiment from a random-number table, starting at the top left of the page of numbers: 021, 158, 116, 066, 353, 164, 019, 080, 312, 020, 353... This leads us into showing how one can handle problems like the birthday problem with with the computer. A program with the language RESAMPLING STATS is amazingly simple. With the command GENERATE, produce 25 numbers between "1" and "365" into a location we can call A. Then determine whether any two people have the same birthday with the MULTIPLES command which checks whether the same number came up more than once, and put the result in a location we can call B. Next, SCORE this result from B into a vector we may call Z. REPEAT, say, 1000 times. After the END of the loop, COUNT in the scoreboard Z the number of samples out of the 1000 trials that had at least one birthday shared by two or more people. This result is placed in K. We then try the program written as follows. REPEAT 1000 Do 1000 trials (experiments) GENERATE 25 1,365 A Generate 25 numbers randomly between 1 and 365, put them in A. MULTIPLES A > 1 B Looking in A, count the number of multiples and put the result in B. We request multiples > 1 because we are interested in any multiple, whether it is a duplicate, triplicate, etc. Had we been interested only in duplicates, we would have put in MULTIPLES A = 2 B. SCORE B Z Score the result of each trial to Z. END End the loop for the trial, go back and repeat the trial until all 1000 are complete, then proceed. COUNT Z > 0 K Determine how many trials had at least one multiple. DIVIDE K 1000 KK Convert to a proportion. PRINT KK Print the result. Three Daughters Among Four Children Now we are ready to demonstrate a realistic though simple problem: What is the probability that exactly three of the four children in a four-child family will be daughters? The first step is to state that the approximate probability that a single birth will produce a daughter is 50-50 (1 in 2). This estimate is not strictly correct, because there are roughly 106 male children born to each 100 female children. But the approximation is close enough for most purposes, and the 50-50 split simplifies the job considerably. (Such "false" approximations are part of the everyday work of the scientist. The appropriate question is not whether or not a statement is "only" an approximation, but whether or not it is a good enough approximation for your purposes.) The probability that a fair coin will turn up heads is .50 or 50-50, close to the probability of having a daughter. Therefore, flip a coin in groups of four flips, and count how often three of the flips produce heads. (You must decide in advance whether three heads means three girls or three boys.) It is as simple as that. In resampling estimation it is of the highest importance to work in a careful, step-by-step fashion - to write down the steps in the estimation, and then to do the experiments just as described in the steps. Here are a set of steps that will lead to a correct answer about the probability of getting three daughters among four children: Step 1. Using coins, let "heads" equal "boy" and "tails" equal "girl." Step 2. Throw four coins. Step 3. Examine whether the four coins fall with exactly three tails up. If so, write "yes" on a record sheet; otherwise write "no." Step 4. Repeat step 2 perhaps two hundred times. Step 5. Count the proportion "yes." This proportion is an estimate of the probability of obtaining exactly 3 daughters in 4 children. The first few experimental trials might appear in the record sheet as follows: Number of Tails Yes or No 1 No 0 No 3 Yes 2 No 1 No 2 No . . . . . . The probability of getting three daughters in four births could also be found with a deck of cards, a random number table, a die, or with RESAMPLING STATS. For example, half the cards in a deck are black, so the probability of getting a black card ("daughter") from a full deck is 1 in 2. Therefore, deal a card, record "daughter" or "son," replace the card, shuffle, deal again, and so forth for 200 sets of four cards. Then count the proportion of groups of four cards in which you got four daughters. A RESAMPLING STATS computer solution to the "3Girls" problem mimics the above steps: REPEAT 1000 Do 1000 trials GENERATE 4 1,2 A Generate 4 numbers at random, either 1 or 2. This is analogous to flipping a coin 4 times to generate 4 heads or tails. We keep these numbers in A, letting "1" represent girls. COUNT A = 1 B Count the number of girls and put the result in B. SCORE B Z Keep track of each trial result in Z. END End this trial, repeat the experiment until 1000 trials are complete, then proceed. COUNT Z = 3 K Count the number of experiments where we got exactly 3 girls, and put this result in K. DIVIDE K 1000 KK Convert to a proportion. PRINT KK Print the results. Notice that the procedure outlined in the steps above would have been different (though almost identical) if we asked about the probability of three or more daughters rather than exactly three daughters among four children. For three or more daughters we would have scored "yes" on our scorekeeping pad for either three or four heads, rather than for just three heads. Likewise, in the computer solution we would have used the command "Count Z >= 3 K." It is important that, in this case, in contrast to what we did in Example 6-1 (the introductory poker example), the card is replaced each time so that each card is dealt from a full deck. This method is known as sampling with replacement. One samples with replacement whenever the successive events are independent; in this case we assume that the chance of having a daughter remains the same (1 girl in 2 births) no matter what sex the previous births were [2]. But, if the first card dealt is black and would not be replaced, the chance of the second card being black would no longer be 26 in 52 (.50), but rather 25 in 51 (.49), if the first three cards are black and would not be replaced, the chances of the fourth card's being black would sink to 23 in 49 (.47). To push the illustration further, consider what would happen if we used a deck of only six cards, half (3 of 6) black and half (3 of 6) red, instead of a deck of 52 cards. If the chosen card is replaced each time, the 6-card deck produces the same results as a 52-card deck; in fact, a two-card deck would do as well. But, if the sampling is done without replacement, it is impossible to obtain 4 "daughters" with the 6-card deck because there are only 3 "daughters" in the deck. To repeat, then, whenever you want to estimate the probability of some series of events where each event is independent of the other, you must sample with replacement. REFERENCES Simon, Julian L., Atkinson, David T., and Shevokas, Carolyn, "Probability and Statistics: Experimental Results of a Radically Different Teaching Method", American Mathematical Monthly, vol. 83, no. 9, Nov. 1976, pp. 733-739. Simon, Julian L. and Peter C. Bruce, "Evaluations of Teaching Introductory Statistics via Resampling", xerox, 1995. The Washington Post, September 12, 1995, "Brain Teaser or No-Brainer", no author, p. D5 page # teachbk III-4puz May 9, 1996