Volume 2 Number 1 - September 2004

 

Correlation Research Home  |  Consulting Services  |  About the Company  |  Contact Form

 

Correlation Research Statistical Retorts
 
 

Introduction: Statistical Predictions

Introduction

 

News in Brief

 

Feature Article

 

How to Contact Us

Predictions based on statistical analysis can prove remarkably accurate when the phenomena under study are relatively stable. In this issue, we focus on two well-publicized issues to illustrate the strengths and limitations of statistical predictions. While not directly applicable to litigation, these examples illustrate general principles that can be relevant in a legal context.

The first article pertains to the recently concluded 2004 Olympic Games held in Athens. Prior to the Olympics, two economists developed statistical models to predict the medal totals for all major participating nations. We will see how well their model fared. The second article is a followup to the last issue's discussion of political polling. The important but confusing concept of the "margin of error" will be explained.

 
 



 

Olympian Predictions

 

Prior to this year's Olypmic Games in Athens, two intrepid economists once again attempted to predict the number of total medals, and gold medals, for all of the major competing nations. Four years ago, in the initial exercise of this kind, Profs. Andrew Bernard and Meghan Busse amazed the sporting world with their remarkably accurate prognostications. For example, their prediction of 97 medals for the U.S. was exactly right.

According to these economists, three major factors dominate their statistical predictions for any country: population, per capita income and past record of success. It turns out that these three factors provide very accurate forecasts. Applying their approach to 2004, the U.S. was expected to snag 93 medals, including 37 gold, followed by Russia (83), China (57), Australia (54) and Germany (55). Interestingly, the predictions also implied that the top ten nations would garner less than the 55% of total medals in 2000, continuing a long-term trend in which the big, rich countries have become less dominant since the 1960's.

So what actually happened in Athens? The good news (statistically speaking) is that overall the predictions were highly accurate. The correlation between predicted and actual was a remarkable .98 for total medals, and .94 for gold medals. However, some of the details did not go exactly according to form. For example, the U.S. actually increased from 97 in Sydney to103 in Athens. Moreover, the top ten finishers accounted for 65% of all medals, up 10% from 2000.

Among other major nations, Russia achieved 92, China 63, Australia 49 and Germany 48, all quite close to their predictions. However, a big surprise was Japan, whose team won 37 medals, compared with only 19 predicted. This haul included 16 golds, when only 6 were forecast by the statistical model. Such a large excess over the model predictions suggests that something highly unusual was going on. It might behoove some of Japan's future competitors to find out the secret of Japan's non-standard deviation from the norm.

From a statistical perspective, one lesson here concerns the importance of a proper standard of comparison. In the Olympic context, for example, Japan's performance would not in itself raise eyebrows. But, relative to what would normally be expected given its size, economic status and past history, something highly unusual occurred.

Similarly, such a perspective can be important in the legal context. For example, suppose that a particular firm is accused of discrimination against women in hiring. Their rate of hires for female applicants is only 10%, compared with 20% for males. Is this evidence of discrimination? The answer depends largely on the qualifications of the male and female candidates. Factors such as prior education, work experience, specific training etc. must be taken into account to predict the expected rate for each group. After accounting for all such plausible factors, a meaningful assessment of the apparent disparity is possible.

 
 



 

Margin of Error

 

This discussion was excerpted with permission from a website www.electoral-vote.com, produced by a friend of mine whose "nom de net" is The Votemaster. This non-profit website has become the premier source summarizing U.S. Presidential polling data, drawing a quarter million hits per day. Here is the Votemaster's excellent laymen's explanation of what margin of error really means:

There is no concept as confusing as 'Margin of Error.' It is used a lot but few people understand it. Suppose a polling company calls 1000 randomly selected people in a state that is truly divided 50-50, they may, simply by accident, happen to call 520 Democrats and 480 Republicans and announce that Kerry is ahead 52% to 48%. But another company on the same day may happen to get 510 Republicans and 490 Democrats and announce that Bush is ahead 51% to 49%. The variation caused by having such a small sample is called the margin of error and is usually between 2% and 4% for the sample sizes used in state polling. This means that with a margin of error of, say, 3%, a reported 51% really means that there is a 95% chance that the correct number is between 48% and 54% (and a 5% chance that it is outside this range).

In the first above example, with a 3% MoE, the 95% confidence interval for Kerry is 49% to 55% and for Bush 45% to 51%. Since these overlap, we cannot be 95% certain that Kerry is really ahead, so this is called a statistical tie. Nevertheless, the probability that Kerry is ahead is greater than the probability that Bush is ahead, only we cannot be very sure of the conclusion. When the ranges of the candidates do not overlap (i.e., the difference between them is at least twice the margin of error), then we can be 95% certain the leader is really ahead.

For this reason, the white states in our maps should be regarded as tossups no matter who is currently slightly ahead; the results could easily flip in the next poll without a single voter changing his or her mind. Of course, the margin of error can be reduced by using a bigger sample, but that takes longer and costs more money, so most clients opt for 500 to 1000 respondents.

 
 

 

Contact Correlation Research at:

Herbert I. Weisberg, Ph.D.
61 Pheasant landing Road
Needham, MA 02192-1000

Phone (781) 455-6850

Fax (781) 444-9563

email: hweisberg@correlation.com
Please let us know what you think about our newsletter. Thanks.

Back to top