CHAPTER II-3 THE SAGA OF RESAMPLING STATISTICS I'll try to liven up the story a bit by telling it as a drama. Too much book-learning, too little understanding. The students had swallowed but not digested a bundle of statistical ideas which now misled them, taught by professors who valued fancy mathematics even if useless or wrong. It was the spring of 1967 at the University of Illinois, a class in research methods in business with four graduate students working toward the PhD degree. I required each student to start and finish an empirical research project as a class project. Now the students were presenting their work in class. Each used wildly wrong statistical tests to analyze their data. "Why do you use the technique of seemingly-unrelated regressions?" I asked Moe Taher (names here are fictitious). "I want to be up-to-date," said Taher. "How much statistics have you studied?" I asked. "Two undergraduate and three graduate courses," Taher answered proudly. I cradled my head in my hands, frustrated because a simple count of the positive and negative cases in Taher's sample was enough to reveal a clear-cut conclusion. The fancy method the student used was window-dressing, and wrong at that. It was the same story with the other three students. All had had several courses in statistics. But when the time came to apply even the simplest statistical ideas and tests in their research projects, they were lost. Their courses had plainly failed to equip them with the simplest usable statistical tools. I wondered: How could I teach the students to distill the meaning from their data? Simple statistical methods suffice in most cases. But by chasing after the latest sophisticated fashions the students overlook these simple methods, and instead use unsound methods. I remembered trying to teach a friend a concept in elementary statistics by illustrating it with some coin flips. I wondered: Given that the students' data had a random element, could not the data be "modeled" with coins or cards or random numbers, doing away with any need for complicated formulas? Next class I shelved the scheduled topics, and tried out some problems using the resampling method (though that label had not yet been invented). First I had the students estimate the chance of getting two pairs in a poker hand by dealing out hands. Then I asked them the chances of getting three girls in a four- child family. After they recognized that they did not know the correct formula, I demanded an answer anyway. After some other interesting ideas -- of the sort illustrated later -- one of the students eventually suggested flipping coins. With that the class was off to the races. Soon the students were inventing ingenious ways to get answers -- and sound answers -- to very subtle questions in probability and statistics by flipping coins and using random numbers. The students were excited, and so was I. Then it was natural to wonder: Could even children learn this powerful way of dealing with the world's uncertainty? And might it be possible that young people who had not yet been influenced by formula-type methods would pick up these simulation methods even faster than the graduate students? Max Beberman, the inventive guru of the "new math", then headed the mathematics department in the University High School. In literally four minutes, Beberman agreed that the method had promise, and asked me if I would be willing to spend some hours with a class of volunteer juniors and seniors. This quick acceptance surprised me, because in the prior weeks I had shown the method to several colleagues in various departments (including mathematics) but had been received with a thundering lack of enthusiasm. This was to be repeated again and again: The most creative mathematicians and scientists respond favorably to the method, whereas the more humdrum professors tend to be unenthusiastic or hostile. The dozen high-school kids in Uni High's special math course had a ball. In six class hours they were able to discover solutions and generate correct numerical answers for the entire range of problems ordinarily found in a semester-long university introductory statistics class. Furthermore, the students loved the work. Together with Allen Holmes, the regular teacher of the class, I published the results in The Mathematics Teacher. The article generated a bit of discussion, but it petered out over the next few years. Burning with the zealot's fire, I presented the new method to any group or class that would listen. The response was generally cool. The most curious experience was when, in the spring of 1969, I was teaching at Hebrew University in Jerusalem. Louis Guttman, the famous psychometrician, found the concept interesting, and invited me to lecture on it to a statistics workshop. The first part of the lecture was a disaster. The audience looked blank and uncomprehending, and I broke into a cold sweat. Later it came out that the Israeli audience did not know the game of poker, from which I drew several examples, which is why they did not understand what I was saying. Over the objections of the Random House editor (one of the few such battles I've ever won), my 1969 Basic Research Methods in Social Science included five chapters detailing various applications. Not only did I hope to reach some working researchers who teach statistics (as distinguished from mathematical statisticians), but I wanted to stake out the ground for the future moment when the statistics profession would finally come to these methods - as I believed it inevitably would. The method did not sweep into the high schools and universities and research laboratories like a tidal wave - or even a trickle. Of course, entirely new scientific ideas often take decades to penetrate people's thinking. And there were some signs of progress. Nevertheless, progress was almost imperceptible. After developing one of the applications of resampling -- a powerful substitute for the t-test in the tradition of the "Exact" permutation (also called "randomization" test), based on an idea by R.A. Fisher and worked out by E. J. G. Pitman-- I sent it to a journal for publication. The editor referred me to two papers as predecessors. Neither Fisher nor Pitman apparently had thought of sampling among permutations, but Meyer Dwass in 1957, and J.H. Chung and D. Fraser 1958, had discovered that development. One could argue that those two papers could claim the main discovery of resampling, though they had limited themselves to just that single application. Years later I found that Alvan Feinstein had re-discovered the Monte Carlo version of Fisher's permutation test in 1962 (???) though he did not make much fuss about it, and much later later Brian Manly (1991) also rediscovered it and made it the centerpiece of a book on statistical inference in biology. I did not bother to write up any more specific applications for technical journals because I figured that they would not be perceived as fundamentally new, given that the resampling idea is the key discovery and the rest is elaboration. And the idea of using the method across the board - which I saw as the central idea - was not something that one could present in a technical paper. At about the same time there was a lengthy correspondence with William Kruskal. I argued to him (and also in journal publications) that resampling methods could fill all statistical needs, and are sufficient as a body of knowledge, even if traditional analytic methods have advantages to the professional mathematician in providing additional insights. This observation I consider the most important proposition about resampling as a method. Kruskal did not accept those claims. In the course of the correspondence, Kruskal asked if resampling could handle the confidence intervals, and I proceeded to show how it could be done with what is now known as the "bootstrap"; a closely-related example was in the 1969 book. This was the idea that ten years later, when independently stated by Bradley Efron, took the world of mathematical statistics by storm and is now regarded as one of the handful of great twentieth-century discoveries in statistics. In the textbook on research methods in social science which I published in 1969, I included five (?) chapters on resampling methods, intending the chapters to be a basic compendium as well as a device for staking out the field. The series editor who worked closely on the research methods text with me was Hanan Selvin, a sociologist who also was an accomplished statistician. Amiable and broad-minded though Hanan was, I could never get even him enthusiastic about resampling. He loved his formal mathematics, though he never tried to dragoon me into offering a conventional treatment of statistics. He would have preferred, however, if I had omitted the resampling material. My reason for using the bootstrap device only in the context of sample size rather than confidence intervals (except in the correspondence with Kruskal) was mainly that confidence intervals wee at that time (at perhaps still to this date) almost never seen in practice in the fields to which the book was mainly addressed - sociology, business, and economics. I may also have been deterred by the difficulty of interpretation of the concept of confidence intervals in an introductory text. (Hanan Selvin wrote perhaps the first - and still well-known - paper criticising common use of significance tests. And he was even less sympathetic toward confidence intervals, as I remember, which I think also contributed toward my leaving out a treatment of them in the book.) It seemed to me then - and seems to me still - that it is the idea of considering first a resampling test for all situations is the radical idea, and the bootstrap itself is rather obvious once one develops the resampling propensity. That is why I did not consider it a huge discovery when I used it in the context of choosing a sample size (1969, p. 000) or when I set it out in detail in the context of confidence intervals in my correspondence with William Kruskal. After a presentation of the resampling method in 1972 or so to the University of Illinois mathematics department seminar of Joseph Doob -- by general agreement as good a probabilist as there is on the face of the earth -- Doob said not a word. At the end of the seminar I asked him: "Are you silent because you find problems with the method?" Doob answered: "No theoretical problems. My only question is whether you can teach teachers to teach it." A prophetic statement. An early difficulty with resampling had been that users and students complained that dealing cards, flipping coins, and consulting tables of random numbers gets tiresome. Therefore, in 1973, with the programming assistance of Dan Weidenfeld, I developed the computer language called RESAMPLING STATS (earlier called SIMPLE STATS). My method was to work through a series of problems one by one, write down the steps need to handle the problem with non-computer methods, and then design a computer command that would mimic the non-computer operation; by the time I had worked through fifteen or so types of problems, I figured that I had covered most of the necessary operations, at least for a start. We then published a letter about it in American Statistician. Early in the 1970s I got in touch with Kenneth Travers, who was responsible for secondary mathematics at the College of Education at the University of Illinois. He liked the idea, and we agreed that together we would organize systematic controlled experimental tests of the method. And over the next several years Travers served as PhD adviser to several students who did just that. He also organized summer workshops at the University of Illinois for high school teachers, and wrote texts co-authors. He melded the resampling material with the conventional approach as a tactical device, and kept clear of sharp statements of the method. (He even refused to be a co-author with me and his two PhD students of an article mentioned below, though he properly could share the credit.) Carolyn Shevokas's thesis (see Chapter 00) studied junior college students who had little aptitude for mathematics. She taught the resampling approach to two groups of students (one with and one without computer), and taught the conventional approach to a control group. She then tested the groups on problems that could be done either analytically or by resampling. Students taught with the resampling method were able to solve more than twice as many problems correctly as students who were taught the conventional approach. David Atkinson taught the resampling approach and the conventional approach to matched classes in general mathematics at a small college (see Chapter 00). The students who learned the resampling method did better on the final exam with questions about general statistical understanding. They also did much better solving actual problems, producing 73 percent more correct answers than the conventionally-taught control group. These experiments were (and are) strong evidence that students who learn the resampling method are able to solve problems better than are conventionally taught students. And since then we have acquired a mess of corroborating evidence. A book describing a range of applications seemed a possible way to get the message out. So I wrote such a book in about 1973, using the chapters in the 1969 research methods text as the base. But though I sent the typescript to dozens of publishers, I could not find a taker - right up to 1992. When doing empirical work I found a resampling approach useful again and again. For example, when comparing elasticities of consumption of cigarettes with respect to price (Lyon and Simon, 1968) it was natural to do a resampling test comparing states with higher and lower income levels. And in a complex econometric paper on the effect of advertising expenditures on sales (1969), I used a bootstrap procedure to decide whether successive variables were likely to be meaningful. But - an experience shared by many researchers in the 1980s - referees did not comprehend the procedure and therefore I found it prudent or required not to mention the resampling test in the final texts. Over the years, I sent various materials to many of the notables in statistics. Kruskal responded with extended thought- ful correspondence. But others - such as Frederick Mosteller, who now says about the bootstrap that "There's no question but that it's very, very important" (New York Times, Nov. 8, 1988, C1, C6) - did not even acknowledge my letters. The mainframe computer program Dan Weidenfeld wrote was not interactive, and therefore an erroneous comma could force the user to wait another day for another try, at which time another comma might be out of order. No professional programmer seemed able to or interested in producing an interactive program until a bright high school kid, Derek Kumar, came along. By this time it was 1981, so Kumar wrote a lovely little program for the Apple computer. The lack of readily available computing power and tools had been an additional obstacle. The advent of the PC has changed that. (Later on at the University of Maryland, an interactive program for the IBM PC was developed with the help of Chad McDaniels and others; Carlos Puig has brought the program to the state of the art.) Then, in the late 1970s a great wave of work followed Efron's initial publications on the bootstrap. Efron and other mathematical statisticians focused on studying the properties of the bootstrap, especially for advanced applications. In contrast, my main point is that resampling could and perhaps should be used for all (or almost all) probabilistic-statistical applications, simple as well as complex. Eric Noreen apparently reinvented the entire resampling idea for himself in the context of accounting and related business problems (1989), and has attracted attention to the idea with his emphasis on the role of the computer, calling these "computer- intensive methods". Among applied statisticians interest has recently exploded, in conjunction with the availability of easy, fast, and inexpensive computer simulations. The bootstrap excited the most interest at first, but across-the-board use of these methods now seems at hand. An entire book has now appeared with this message: Basically, there is a computer-intensive alternative to just about every conventional parametric and nonparametric test. If the significance of a test statistic can be assessed using conventional techniques, then its significance can almost always be assessed using computer-intensive techniques. The reverse is not true, however. (Eric Noreen, Computer Intensive Methods for Testing Hypotheses, Wiley, 1989.) Leading mathematical statisticians - starting with Doob, as mentioned earlier - agree that the resampling method is logically flawless and intellectually beyond reproach. But still there is enormous resistance to introducing the method for everyday use, and in teaching. In response I have made this still-standing public offer: I will stake $5,000 in a contest against any teacher of conventional statistics, with the winner to be decided by whose students get the larger number of both simple or complex realistic numerical problems correct, when teaching similar groups of students for a limited number of class hours -- say, six or ten. And if I should win, as I am confident that I will, I will con- tribute the winnings to the effort to promulgate this teaching method. (Here it should be noted that I am far from being the world's best teacher, and I certain- ly am not among the more charming. It is the material that I have going for me, and not my personality or teaching skills.) Alas, no takers. (This is not the sort of talk heard in academia every day, and it turns off some conservative types, so I add, "The intel- lectual history of probability and statistics began with gambling games and betting. Therefore, perhaps a lighthearted but very serious offer would not seem inappropriate here.") Though statisticians are busily exploring the properties of the bootstrap, and applying it regularly to problems that are difficult with conventional analysis, interest has been concentrated on the method's properties rather than in its instruction and use. Almost no one has made it her/his business to take these ideas to teachers and students and introduce resampling into the regular curriculum in high schools and colleges The American Statistical Association has recently moved in this direction, as has the National Council of Teachers of Mathematics, which urges that simulation be given as much attention as analytics in teaching probability. Like all innovations, this one has encountered massive resistance. Many factors always militate against adoption of new technology, including the accumulated intellectual and emotional investment in existing methods. Early on, leading statisticians either did not accept the idea, or ignored it; now they say that the method is a great breakthrough, but should not be taught to introductory students. Numerous technical journals rejected articles on the method because it is too simple and lacks "real mathematics"; for years publishers have turned down my book about resampling on grounds that the ideas are sound but that there would be no market because instructors would not accept them -- and the publishers may be right. The National Science Foundation has rejected applications for grants in several categories, on assorted grounds. School systems have simply been too preoccu- pied with their usual business to be willing to develop new curricula. The American Statistical Association has invested large amounts of money and effort in developing a video series and printed materials to try to teach the old ways more effec- tively. There is no conspiracy. But individually, just about every channel has been closed. This is despite the fact that no one any longer denies the basic validity or the practical usefulness of these ideas. Resistance stems from many roots. ROOTS OF RESISTANCE TO RESAMPLING Legions of instructors have an investment in their stocks of conventional knowledge, their reputations, and their lecture notes, which it is costly for them to replace with an unfamiliar method. Some lack conviction that resampling is better than "real" mathematics. Others reject simulation methods because they find "real math" more aesthetic, and cannot or will not recognize that most people do not share their mathematical aptitudes and aesthetic tastes. Still others won't teach the method because they feel that it is difficult to do eye-catching "sophisticated" research with it that will be published well and advance their careers. And some applied departments use analytic statistics as a tool to weed out students who do not care for mathematics. Furthermore, resampling requires more spontaneity on the part of the instructor that do conventional formulas. The instructor must interact with the students as they invent anew the appropriate methods for particular problems. Many instructors are more comfortable simply handing down formulae from on high, with the students scrambling to keep up and too frantic to ask hard questions such as "Where do the data come from?" In many schools, too, there has been the logistic problem that computers are absent. (Another difficulty is that my central scholarly interest is the economics of population, which has absorbed most of his energies over the years. And I have not been a card-carrying statistician, which inevitably puts off the statistical establishment.) Over the years I have made a vast number of attempts, along a great number of lines, to interest people in the subject. The major jump has been being joined by Peter Bruce, a former foreign service officer and recent MBA, who is now promoting the method full time. Commercial distribution of the computer has been one of the dissemination methods we have worked on, not primarily to make money but rather to use the power of the market mechanism to reach persons who may be interested. To date, however, marketing initiatives have not produced the revenue necessary to get the enterprise flying. I have been financing this operation out of savings, just because there has seemed no other way to give these ideas a chance to be used. Another reason for the lack of penetration into the curriculum is the usual barrier against innovation -- the conservatism of the instructors who have a huge investment in their stock of conventional knowledge, their reputations, and their lecture notes; this is discussed at greater length in Chapter 00. THE RELATIONSHIP OF RESAMPLING TO THE HISTORY OF STATISTICS Resampling returns to a very old tradition. In ancient times, mathematics in general, and statistics in particular, developed from the needs of governments and rich persons to number armies, flocks, and especially to count the taxpayers and their possessions. Up until the beginning of the twentieth century, the term "statistic" meant the number of something the "state" was interested in -- soldiers, births, or what-have-you. In many cases, the term "statistic" still means the number of something; the most important statistics for the United States are in the Statistical Abstract of the United States. These numbers are now known as "descriptive statistics." Another stream of thought appeared by way of gambling in France in the 17th century. Throughout history people had learned about the odds in gambling games by trial-and-error experience. But in the year 1654, the French nobleman Chevalier de Mere asked the great mathematician and philosopher Pascal to help him determine what the odds ought to be in some gambling games. Pascal, the famous Fermat, and others went on from there to develop modern probability theory. Later on these two streams of thought came together. People wanted to know the accuracy of their descriptive statistics, not only the descriptive statistics originating from sample surveys but also the numbers arising from experiments. Therefore, statisticians applied the theory of probability to the accuracy of the data arising from sample surveys and experiments; this is the theory of inferential statistics. Later, probability theory began to be developed for another context in which there is uncertainty -- decision-making. Descriptive statistics like those used by insurance companies -- for example, the number of people per thousand in each age bracket who die in a five-year period -- have for centuries been used in deciding how much to charge for insurance policies. The likelihoods usually can be estimated on the basis of a great many observations with rather good precision without complex calculation, and the main statistical task is gathering this information. In business and political decision-making situations, however, one usually works with likelihoods that are based on very limited information, often little better than guesses. The question is how best to combine these guesses about various likelihoods into an overall likelihood estimate. Therefore, in the modern probabilistic theory of decision-making in business, politics, and war, the emphasis is on methods of combining estimates of probabilities which depend upon each other in complicated ways in order to arrive at a desirable decision -- similar to the gambling games which were the origin of probability and statistics. Estimating probabilities with conventional mathematical methods is often so complex that the process scares many people. And properly so, because the difficulties lead to errors. The statistical profession has expressed grave concern about the widespread use of conventional tests whose foundations are poorly understood. The ready availability of statistical computer packages that can easily perform these tests with a single command, irrespective of whether the user understands what is going on or whether the test is appropriate, has exacerbated this problem. Probabilistic analysis is crucial, however. Judgments about whether to allow a new medicine on the market, or whether to re- adjust a screw machine, require more than eyeballing the data to assess chance variability. But until now, the practice and teaching of probabilistic statistics, with its abstruse structure of mathematical formulas, tables of values, and restrictive assumptions concerning data distributions -- all of which separate the user from the actual data or physical process under consideration -- has not kept pace with recent developments in the practice and teaching of descriptive statistics. Beneath every formal statistical procedure there lies a physical process. Resampling methods allow one to work directly with the underlying physical model by simulating it. The term "resampling" refers to the use of the given data, or a data generating mechanism such as a die, to produce new samples, the results of which can then be examined. The term `computer- intensive methods' is also used to refer to techniques such as these. The resampling method enables people to obtain the benefits of statistics and predictability without the shortcomings of conventional methods, because it is free of mathematical formulas and restrictive assumptions. Hence the method seemed to have extraordinary promise. Not only did Simon think so, but so did some of the readers of his THE FUTURE? What will happen? Eventually progress will win out, as always. The question is: How long will it take? How many more bad analyses will be done in science and business because black- box formulae are misused? How many more students will suffer and be turned off of this extraordinarily valuable tool of thinking and acting? More about this in Chapter 00 AFTERNOTE: IMAGINED DIALOGUE WITH A STATISTICIAN RE RESAMPLIHG STATS U: Is resampling theoretically acceptable? S: It has its practical drawbacks, but there is nothing wrong with it theoretically. U: Was that always the opinion of the profession? S: [Laughs]. To tell the truth, earlier it was the opposite. People said it was OK practically, but no good theoretically. U: What about Simon's claim to some priority with resampling and bootstrap? S: Ridiculous. Efron invented the bootstrap way back before 1979. U: What about Simon's claim to have written and taught this stuff back in the 1960s? S: That wasn't bootstrap, just Monte Carlo probability, which is old stuff. U: Have you looked at Simon's writings? S: No, I don't need to. U: How come you don't need to? S: No one cites it. If it was really original it would have been picked up, used, and now cited. U: Then how come you know that it is just old-stuff Monte Carlo probability? S: It must be.... And on and on.