HOW TO TEACH RESAMPLING STATS ALONG WITH A STANDARD TEXT Julian L. Simon and Peter Bruce INTRODUCTION A simple and effective way to teach the resampling method at the introductory level is to use your usual text and course outline, and present the resampling method immediately following the conventional method for all the problems that you demonstrate in class. This tactic may be illustrated with the text Introductory Statistics for Business and Economics (Wiley, 1990), by Thomas H. Wonnacott and Ronald J. Wonnacott. This text was chosen for illustration because one of us expects to use it for a class soon, and also is acquainted with Tom Wonnacott. It was not chosen because it lends itself particularly well to the resampling approach; resampling fits with other texts just about as well. Notes to teachers are either indented or in brackets. Other material is intended to be read by students. CONFIDENCE INTERVALS W and W begin their book, and the first chapter on "The Nature of Statistics," with an example of the reliability of a simple randomly-selected 1988 presidential election poll, showing 840 votes for Bush and 660 votes for Dukakis out of 1500. W and W estimate 95% confidence limits for the population proportion of Bush supporters in conventional fashion. After showing this, the teacher may proceed by lecturing as follows: One can also estimate the confidence intervals in a fashion different from the classical approach just shown. The resampling method works by experimentally drawing samples from a population like the one you wish to investigate. Let's see how it is done. We draw samples of size 1500 from a population whose proportion we estimate using the information from the survey results, which showed a proportion of .56 Bush supporters. (One makes the same assumption when using the classical method. [Note to the teacher: Spend some more time here on the logic of this assumption, or else we postpone the discussion until later.] Then we examine the results of those samples to see how much they vary from one another. We can do this with an urn containing 56 red balls and 44 black balls (or 5600 red and 4400 black balls), putting back the ball every time we draw one. [The class can actually do this, and then go on to the computer procedure below, after noting that the procedure by hand is perfectly satisfactory, but gets tedious. Or the teacher can immediately skip to the computer procedure, after just describing the urn procedure. So we move on to:] Let's do this with the computer program RESAMPLING STATS. We first draw a single sample of 1500 "voters" with these commands:1 GENERATE 1500 1,100 A This command draws 1500 balls randomly with numbers between 1 and 100, and puts them in a location we'll call A. We will let 1-56 = red (Bush), 57-100 = black (Dukakis) COUNT A between 1 56 B This command counts the number of red balls in the sample of 1500, and puts the count in location B. ----------------------------------------------------------------- 1[Technical note to teacher: To conserve memory, Resampling Stats limits vectors to 1000 elements unless you otherwise specify. Therefore, this program needs the following command: MAXSIZE A 1500 This increases the size allowed for vector A to accommodate our 1500 "voters". ---------------------------------------------------------------- Please recall our purpose, which is to find out how much the sample results vary from one another. Therefore, to find out the results from a good many samples, we next repeat the process (say) 100 times, keep score of the result each time, and then end the process when 100 trials are completed. Then after the 100 simulated samples have been drawn, we construct a histogram of the results. We do all this by adding a few commands to the one- sample program we wrote above, as follows: REPEAT 100 Take 100 samples from our simulated population GENERATE 1500 1,100 A Take 100 balls randomly between 1 and 100, and put them in a location we'll call A. Let 1-56 = red (Bush), 57-100 = blue (Dukakis). COUNT A between 1 56 Count the number of red balls and put the count in location B. SCORE B Z. Record the result of this trial on the "scoreboard" Z. END End the above experiment loop, go back to the beginning, and repeat until 100 trials have been completed. HISTOGRAM Z. Diagram the results of the 100 trials, and show the mean. The results may be seen Figure W1. In the histogram we see that sample results range all the way from 786 (53%) favoring Bush to 888 (59%) favoring Bush. The results clearly vary greatly from one trial sample to another, teaching the crucial lesson of variability. Our first estimate of the sampling "margin of error" is clearly about 6%. If we were to do a thousand more samples, or ten thousand, however, we would expect that the range of samples to be greater: a few "far out" samples are more likely to be generated by chance in a thousand than in a hundred samples. We solve this dilemma by specifying a "confidence interval" that includes the vast majority -- say 95% -- of our sample results. In this case, the range 801 (53.4%) to 871 (58.1%) includes 95% of the trial results and, therefore, is our estimate of a "95% confidence interval". You will learn later how to get RESAMPLING STATS to examine all your trial results and find the endpoints of this interval for you. It is important that without any further ado, resampling provides an intellectually complete answer to the question that W and W raise in their very first pages but cannot answer in a meaningful fashion. They must throw a formula at the reader that the reader cannot possibly understand at that point, and indeed may never be able to fully understand, even after waiting many chapters for the answer to be provided with classical methods. But because W and W are so anxious to immediately get the reader swimming in the waters of inferential statistics, rather than postponing that entry for several chapters, they are forced to provide a baffling formula. In contrast, resampling can in the very first pages provide a procedure and an answer to the problem at hand that students can follow and understand in its entirety. This enables W and W to satisfy their desire to immediately introduce inferential statistics, without paying the price of baffling and scaring the reader. The instructor might try to construct a program in BASIC to handle the resampling procedure. But it will soon be clear even to a person adept with that language that the program will not be simple to write. And the program will certainly be quite obscure to students who do not already understand BASIC, whereas the RESAMPLING STATS program above can be understood without prior programming experience in any language. Showing a conventional solution with Minitab at this point would be entirely meaningless to the beginning student, another point in favor of resampling and of RESAMPLING STATS. PROBABILITY THEORY W and W next present a lovely opportunity to show what resampling can do in the context of probability theory. On page 83 they show how to calculate the probability of not getting a boy in five children, using the multiplication rule. The teacher can then continue and ask: What is the probability of getting exactly four girls in five children? The amswer cannot be arrived at with a simple rule. You could work this problem in the same manner that the earlier problem about boy-girl-boy was worked, constructing the entire sample space (W and W examples 3-2 to 3-4), but this obviously would be tedious. And if the problem were 14 girls out of 19 children, it would obviously be impossible to handle with sample-space analysis. Another way to estimate the chances of getting four girls in five children is by resampling (or Monte Carlo) experimentation. You might make a first approximation that the probability of a girl being born is the same as that of a boy. And you could then use coins to stand for children, a head for a boy and a tail for a girl. Continue as follows: 1. Toss a coin 5 times, letting heads = girl, tails = boy. 2. Count how often you got a head. 3. Record "yes" if 4 heads, "no" if not. 4. Repeat steps 1-3, say, 50 times. 5. Count how many of the 50 trials had a "yes". Instead of using coins, we can do the simulation on the computer with RESAMPLING STATS. This time we'll be more realistic and assume that the probability of girl is 48%, and a boy 52%. A program to arrive at an estimate is REPEAT 1000 Do the experiment 1000 times GENERATE 5 1,100 A Generate randomly five numbers between 1 and 100 and put them in a location called A. Let 1-48 = girl, 49- 100 = boy. COUNT A <=48 B Count the number of girls, put the result in B SCORE B Z Keep score of the result of each trial END End one trial, go back and repeat until all 1000 are complete, then proceed HISTOGRAM Z Produce a histogram of the trial results. BINOMIAL DISTRIBUTION When W and W discuss the binomial distribution, they show how to calculate the probability that, from a population of microwave ovens that are 80% perfect, a sample of 10 will be half perfect and half imperfect (p. 119). After that deductive calculation, the resampling procedure -- just like the program for four girls out of five children just above -- may be shown. Students may be told that they can take their choice of which way to handle problems in real life, and on exams -- with the binomial formula, or with the RESAMPLING STATS program. If correctly done, both methods will arrive at the same result. If experience holds, most students will tend to opt for simulation. Some students will feel that there is something illegitimate about simulation, perhaps because it is not "exact". It sometimes helps to point out to the students that any probability formula such as the binomial is itself only a mathematical shortcut to the full procedure of specifying the entire sample space. The use of the t-distribution in a two-sample problem is an excellent example: it is a mathematically convenient way of describing what happens in a randomization procedure, developed in an era in which lack of computing power kept people from carrying out randomizations for all but the smallest data sets. Simulation is simply another shortcut. When one sees that both the formula method and the simulation method are on the same footing in this respect, resampling is more likely to seem legitimate. W and W then (p. 120) tell the students that instead of the formula, they can use a table in the back of the book. At this point the student's intuition is of course shut off, because the logic of a table is inpenetrable to all. Once again the Resampling Stats procedure is shown, and the students can see for themselves that they can completely understand everything that is happening. Here again one may wish to compare a BASIC program with RESAMPLING STATS in performing the resampling procedure. This is the program that Gnanadesikan et. al (The Art and Technique of Simulation, Dale Seymour, 1987) use to simulate repeated coin tosses: 80 INPUT "ENTER THE NUMBER OF KEY COMPONENTS";N 100 INPUT "ENTER THE NUMBER OF TRIALS";NT 120 DIM T$(NT,N),C(2*N) 140 FOR I = 1 to NT 150 LET NH = 0 160 FOR J = 1 TO N 170 LET X = RND (1) 180 IF X < .5 THEN 220 190 T$ (I,J) = "H" 200 NH = NH + 1 210 GOTO 230 220 T$ (I,J) = "T" 230 IF J = N THEN 260 250 GOTO 270 270 NEXT J 280 C(NH + 1) = C(NH + 1) + 1 290 NEXT I 330 FOR K = 1 TO N + 1 350 NEXT K 360 END The above BASIC program is written in general form and does not specify a particular number of coins and heads, as RESAMPLING STATS does. (We have simplified the program by removing the many "print" statements.) Note that the RESAMPLING STATS program listed above does the same job, for a sample of 5 coins. INTERLUDE: THE GENERAL PROCEDURE The procedural steps taken in solving the particular problem above were chosen to fit the specific facts. We can also describe the steps in a more general fashion. The generalized procedure simulates what we do when we estimate a probability using resampling problem-solving operations. Step A. Construct a simulated population or "universe" of random numbers or cards or dice or another randomizing mechanism whose composition is similar to the universe whose behavior we wish to describe and investigate. The term "universe" refers to the system that is relevant for a single simple event. For example: A coin with two sides, or two sets of random numbers "1- 52" and 53-100", simulates the system that produces a single male or female birth, when we are estimating the probability of four girls in the first five children. Notice that in this universe the probability of a girl remains the same from trial event to trial event -- that is, the trials are independent -- demonstrating a universe from which we sample without replacement. Hard thinking is required in order to determine the appropriate "real" universe whose properties interest you. Step(s) B. Specify the procedure that produces a pseudo- sample which simulates the real-life sample in which we are interested. That is, one must specify the procedural rules by which the sample is drawn from the simulated universe. These rules must correspond to the behavior of the real universe in which you are interested. To put it another way, the simulation procedure must produce simple experimental events with the same probabilities that the simple events have in the real world. For example: In the case of four daughters in five children, you can draw a card and then replace it if you are using a deck of red and black cards. Or if you are using a random-numbers table, the random numbers automatically simulate replacement. Just as the chances of having a boy or a girl do not change depending on the sex of the preceding child, so we want to ensure through replacement that the chances do not change each time we choose from the deck of cards. Recording the outcome of the sampling must be indicated as part of this step, e.g. "record `yes' if girl `no' if a boy. Step(s) C. If several simple events must be combined into a composite event, and if the composite event was not described in the procedure in step B, describe it now. For example: For the four girls in five children, the procedure for each simple event of a single birth was described in step B. Now we must specify repeating the simple event four times, and determine whether the outcome is or is not four girls. Recording of "four or more girls" or "three or less girls" is part of this step. This record indicates the results of all the trials and is the basis for a tabulation of the final result. Step(s) D. Calculate from the tabulation of outcomes of the resampling trials. For example: the proportion of "yes" or "no" estimates the likelihood we wish to estimate in step C. RANDOM SAMPLING AND THE DISTRIBUTION OF THE MEAN W and W pose the following problem (p. 202): "A population of men on a large midwestern campus has a mean height of mu = 69 inches, and a standard deviation sigma = 3.22 inches. If a random sample of n = 10 men is drawn, what is the chance the sample mean X-bar will be within 2 inches of the population mean mu?" The framing of this question reveals the unrealistic fashion in which classical statistics poses most question. The data for the population necessarily arise discretely, and the parameter of the standard deviation is a derived computation; beginning with the discussion the standard deviation given as a datum immediately removes the problem from a realistic setting. Luckily, W and W earlier present data on the heights of 200 men (p. 28). We take those observations as our supposed population, that is, as our best estimate of what the population is like. We now draw samples of 10 from this collection. Whether we draw them with or without replacement depends on what we are assuming the collection to be - the entire population, or a sample from it. If the latter, we must discuss why it is reasonable to consider it our best estimate of the population, and then draw from it. [It is unfortunate for pedagogical purposes that W and W present the data in grouped format. The student may therefore leap to the unsound conclusion that the appropriate procedure is to rearrange the raw data into bins to produce a frequency histogram, and then do a bootstrap confidence interval using not the original data we collected, but the values of the bin centers and their frequencies. The teacher should forestall that possibility. ] Programs for the two different situations are as follows: SAMPLING WITHOUT REPLACEMENT: READ file "heights" A Read the height data from an ASCII file called "heights" located in the same directory as RESAMPLING STATS. The heights should be listed in a column; they will become vector A. REPEAT 100 Repeat the following trial 100 times SHUFFLE A A Shuffle the height vector A, keep calling it A TAKE A 1,10 B Take the first 10 (without replacement), put them in B MEAN B C Calculate their mean SCORE B Z Keep score END End one trial, go back and repeat until all 100 are complete, then proceed to the next step HISTOGRAM Z Produce a histogram of the "resample" means SAMPLING WITH REPLACEMENT: READ file "heights" A Read the height data from an ASCII file called "heights" located in the same directory as RESAMPLING STATS. The heights should be listed in a column; they will become vector A. REPEAT 100 Repeat the following trial 100 times SAMPLE 10 A B Take a sample of size 10, with replacement, put them in B MEAN B C Calculate its mean SCORE B Z Keep score END End one trial, go back and repeat until all 100 are complete, then proceed to the next step HISTOGRAM Z Produce a histogram of the "resample" means W and W show a Monte Carlo simulation for their height problem (p. 222). The teacher may compare the clarity of the RESAMPLING STATS bootstrap-like treatment with the treatment using the normal distribution and the computer. THE BOOTSTRAP Happily, W and W provide an introduction to the bootstrap in the context of confidence intervals. They suggest, however, that it is for use "in situations too complex for standard theory to handle" (p. 277). Here the teacher may recall how a very similar technique was used successfully right at the start of the course (see above), and remind students how easy it is to do this with RESAMPLING STATS. So how about doing a bootstrap right here, using the 200 heights as a sample, not a population? Here's the program: BOOTSTRAP SAMPLING: READ file "heights" A Read the height data from an ASCII file called "heights" located in the same directory as RESAMPLING STATS. The heights should be listed in a column; they will become vector A. REPEAT 100 Repeat the following trial 100 times SAMPLE 200 A B Take a sample of size 200, selected randomly and with replacement, from our original sample MEAN B C Calculate the mean of the resample SCORE B Z Keep score END End one trial, go back and repeat until all 100 are complete, then proceed to the next step HISTOGRAM Z Produce a histogram of the "resample" means HYPOTHESIS TESTING W and W begin their discussion of hypothesis testing (p. 288) with samples of 10 men's salaries and 5 women's salaries, and they ask if there is a difference between the groups. (The actual difference is $5,000.) They deal with the problem with the t test. Minitab or other software may also be presented at this point. After completing the demonstration with the t test (and perhaps standard software), the teacher may proceed as follows by a modified randomization test that samples without replacement. COPY (13 11 19 15 22 20 14 17 14 15) A Copy the data for the men's salaries COPY (9 12 8 10 16) B Copy the data for the women's salaries CONCAT A B C Put all the data together in the same vector REPEAT 100 Repeat the following procedure 100 times SAMPLE 10 C D Select 10 salaries, at random and with replacement (our original sample was assumed to be from a larger population), and put them in a vector called D MEAN D DD Calculate the mean salary in this group SAMPLE 5 C E Select 5 salaries, at random and with replacement, and put them in E MEAN E EE Calculate the mean salary in this group SUBTRACT DD EE F Find out by how much the "male" average exceeds the "female" average SCORE F Z Keep score of the difference END HISTOGRAM Z Produce a histogram of trial differences In the histogram we see that randomly-drawn samples produced differences in average salary that were generally less than $4,000; only once was there a difference greater than $5,000. The class may then discuss the pro's and con's of the classical and the resampling approaches for this problem. Again, the students may be told that they may use either method on examinations. As long as the data are given in their full form, the students are likely to opt for the resampling method. DISCUSSION We have presented only a very few illustrative problems. But even with this small set, the teacher should be able to have a good idea of the place of resampling when taught in parallel with the classical methods. And even this small a sample of problems is sufficient to provide a reasonable sense of how the general resampling method deals with the garden variety of statistical and probabilistic problems. A definition of resampling, a bit of its history, and other background materials that may be used one place or another in the course may be found in the enclosed article from Chance. howteach statwork disk 1-210 May 14, 1991