The Deception of Statistics

I have a deep appreciation for statistics, at least partially due to my background in mathematics. When I read an article, my attention is drawn first to the charts and tables, and only secondarily to the text. I'm especially fond of statistical analysis of the stock market, political polls, and election coverage. I enjoy reading a world almanac, and especially the "Statistical Abstract of the United States."

However, I have also discovered how profoundly true it is that any set of statistics can usually be easily manipulated to defend both sides of an argument. In the hands of a skilled debater, statistics can be made to illuminate whatever needs to be illustrated. By considering a few examples, we can quickly learn some of the deceptive techniques used in this art form.

Sample Sets

Suppose that a political campaign is underway, and the Democratic and Republican candidates are engaged in a debate concerning which party is truly the modern day party of the people. Both politicians are trying to show that their own party is the party of choice, and they each begin citing statistics to prove their case:

- The Democrat argues that statistics show that the Democratic Party has won 60% of the most recent presidential elections.

- The Republican argues that the Republican Party has won 62% of the most recent presidential elections.

At first glance, most of us would assume that (at least) one of the candidates is incorrect. After all, anyone with an elementary understanding of percentages would know that only one side can hold a majority. We would probably want to investigate these claims to find out which candidate is right, and which one is wrong. Then we would want to further investigate the candidate who is wrong, in order to determine whether or not he is actually lying, or just mistaken. Yes, we tell ourselves, we will soon get to the bottom of this, and clearly expose the culprit. Our research reveals the following winners of the most recent presidential elections:

  1- 2008 - Obama - Democrat
  2- 2004 - Bush 43 - Republican
  3- 2000 - Bush 43 - Republican
  4- 1996 - Clinton - Democrat
  5- 1992 - Clinton - Democrat
  6- 1988 - Bush 41 - Republican
  7- 1984 - Reagan - Republican
  8- 1980 - Reagan - Republican
  9- 1976 - Carter - Democrat
10- 1972 - Nixon - Republican
11- 1968 - Nixon - Republican
12- 1964 - Johnson - Democrat
13- 1960 - Kennedy - Democrat
14- 1956 - Eisenhower - Republican
15- 1952 - Eisenhower - Republican
16- 1948 - Truman - Democrat
17- 1944 - F. Roosevelt - Democrat
18- 1940 - F. Roosevelt - Democrat
19- 1936 - F. Roosevelt - Democrat
20- 1932 - F. Roosevelt - Democrat
21- 1928 - Hoover - Republican
22- 1924 - Coolidge - Republican
23- 1920 - Harding - Republican
24- 1916 - Wilson - Democrat
25- 1912 - Wilson - Democrat
26- 1908 - Taft - Republican
27- 1904 - T. Roosevelt - Republican
28- 1900 - McKinley - Republican
29- 1896 - McKinley- Republican
30- 1892 - Cleveland - Democrat
31- 1888 - Harrison- Republican
32- 1884 - Cleveland - Democrat
33- 1880 - Garfield- Republican
34- 1876 - Hayes- Republican
35- 1872 - Grant- Republican
36- 1868 - Grant- Republican
37- 1864 - Lincoln- Republican
38- 1860 - Lincoln- Republican

Well, we quickly see that both candidates were indeed correct. What seems impossible turns out to be factual, because each candidate manipulated the statistics in his own favor. The Democrat was correct because three of the last five elections (60%) were won by Democrats. However, the Republican was also correct, because five of the last eight elections (62.5%) were won by Republicans. So, the first trick we've learned is that of manipulating what is called the sample set. The Democrat used the last five elections as his sample set, while the Republican used the last eight elections. Clearly, each chose a particular sample set that was more favorable to his own cause.

Sample Set Size

So, one might think that all we need to do in order to find the real truth is to increase the size of the sample set. Unfortunately, this only leads to more contradictions, as follows:

- The last 11 elections - Republicans have won 64% (7 to 4).
- The last 20 elections - Democrats have won 55% (11 to 9).
- The last 38 elections - Republicans have won 61% (23 to 15).

To keep pressing this line of reasoning would be a moot point, because we've already reached the decade during which the Republican Party overtook the Whig Party as the primary challengers to the Democratic Party, so further analysis would no longer result in an apples-to-apples comparison (see below).

Definitions

However, this is related to the next technique--that of the manipulation of the definition of terms. In this case, returning to the initial arguments above, each side might believe that they have the best definition of what it means to be "modern day" party of the people. Here's how that
debate might ensue:

- The Democrat might argue that "modern day" refers to the era since the end of the Cold War, so we should go back only to 1992.

- The Republican would then argue that the first term of Ronald Reagan, the "modern-day" hero of the Republican Party, actually marked the beginning of the end of the Cold War, so we should go back to 1980.

- Then the Democrat would counter that if the Republican gets to go back to the time of his party's hero, then the Democrats should get to go back to the time of their hero, Franklin Roosevelt, so we should start with the year 1932.

- The Republican rejects this argument based upon the "logical" meaning of "modern-day" being our post-World War era, so we shouldn't go back beyond 1948.

- But when the Democrat is unwilling to bend on this point, the Republican then says, "OK. If you can have your FDR, then we can have our Lincoln," so they want to start with the year 1860. It's easy to see how this just gets sillier and sillier.

Categories

Another favorite trick is to manipulate statistics by changing the categories of the entities being compared. For example, in the above political example, one might be able to draw advantage by quoting statistics of "presidents" instead of "presidential elections. Here's how this works:

  1- Obama - Democrat
  2- Bush 43 - Republican
  3- Clinton - Democrat
  4- Bush 41 - Republican
  5- Reagan - Republican
  6- Carter - Democrat
  7- Nixon - Republican
  8- Johnson - Democrat
  9- Kennedy - Democrat
10- Eisenhower - Republican
11- Truman - Democrat
12- F. Roosevelt - Democrat
13- Hoover - Republican
14- Coolidge - Republican
15- Harding - Republican
16- Wilson - Democrat
17- Taft - Republican
18- T. Roosevelt - Republican
19- McKinley- Republican
20- Cleveland - Democrat
21- Harrison- Republican
22- Cleveland - Democrat
23- Garfield- Republican
24- Hayes- Republican
25- Grant- Republican
26- Lincoln- Republican

Let's see how this argument plays out as we increase the sample size:

- 66% of the last three presidents (2 to 1) have been Democrats.

- 60% of the last five presidents (3 to 2) have been Republicans.

- 58% of the last 12 presidents (7 to 5) have been Democrats.

- 61% of the last 26 presidents (16 to 10) have been Republicans.

Again this results in more silliness. As one might expect, the "category pool" is running over with an infinite number of possibilities that might play in the favor of one side or the other (or both!). These additional categories might include electoral votes, popular votes, the dominant party in the Congress (or perhaps just the Senate, or just the House), etc.

One More

For an additional illustration of how sample size can be an issue, consider the above statistic that 58% of the last 12 presidents (7 to 5) have been Democrats. Whenever we hear a statistic like this, we can bet that the results would always be less convincing if the sample size had been increased by one. Instead of quoting the percentage for the last 12 presidents, look what happens as we increase the sample size:

- 54% of the last 13 presidents (7 to 6) have been Democrats.

- 50% of the last 14 presidents (7 to 7) have been Democrats.

- 47% of the last 15 presidents (7 to 8) have been Democrats.

Apples-to-Apples

We're all familiar with the old adage about not being able to compare apples to oranges. In its simplest form, the analogy goes like this: Which are the better-tasting apples: the apples in the first box, or the oranges in the second box. Obviously, the apples in the first box are probably the better tasting apples, because the oranges in the second box don't taste much at all like apples--good or bad. Even if the oranges are fresh and sweet, and the apples are rotten, the apples are probably still "better-tasting apples" than the oranges, simply because the oranges are not apples at all.

I have found that this deceptive technique seems to be particularly useful in the business world. I am amused almost every time I read a quarterly earnings report from a publicly-traded company. The company always wants to make the report sound as promising as possible, but sometimes the reporter covering the story (and who was fired by this company in the past) would like to make the report sound as dismal as possible; as would the stock analyst, hoping to be a hero, since he didn't recommended that company's stock). Here's how this sometimes proceeds:

- The company reports strong earnings (profit) of $1.00 per share, on revenue of $1B, signifying year-over-year growth of 12%; all-in-all, a very respectable report of a growing and money-making company.

- The reporter's headline reads, "Company's Profit plunges 50%."

Well, this is somewhat factual, because the company's profit for the same quarter in the previous year was $1.90, so this period's $1.00 is 47% less. However, what this reporter knew, but didn't report, was that the company spent $1.10 per share on an acquisition that will make this company much more competitive, and increase profits in as little as one year from now. Without this acquisition, profits would have been $2.10 per share, or an increase of 10% over the same quarter in the previous year.

Again, there are many categories that can be manipulated in this analogy as well, including profit, revenue, growth, forecast, impact of foreign currency, etc.

Exaggeration / Embellishment

There are actually a couple of other deceptive techniques in this example as well. The first one is embellishment. This was done when the reporter embellished his story with a falsely negative slant by not comparing apples-to-apples, but reporting a positive accomplishment in a negative light, and by failing to report all of the many other positive accomplishments at all.

Exaggeration

The second additional deceptive technique is that of exaggeration. This was done when the reporter cited that profits fell by 50% when they actually fell only 47%. They really makes an impression upon their readers, when key numerical boundaries can be compromised, such as the 50% barrier; i.e., why report a number in the 50s when you can report one in the 40s.

Expectations

Continuing with the above example from the business world, of course the people who are buying and selling that company's stock are fully aware of the acquisition and its benefits. So, they tend to ignore the deceitful reporter (and stock analyst). Instead, they were closely monitoring the report relative to "expectations;" i.e., what a bunch of smart boys on Wall Street expected the company to earn. If the average expectations were that the company would report a profit of $1.02, then the news of a profit of $1.00 is likely to be taken as disappointing news. As a result, logic is overcome by the masses, and a good company with good top-line revenue, good bottom-line profit, and an improved outlook for the future sees the price of its stock drop by 20% as soon as this news is announced, and it is likely not to recover this loss for years, despite the increased profits coming only one year down the road.

Changing the Rules

Another favorite deceptive practice is to change the rules. Periodically, when a company reports earnings, they will simultaneously announce that they have changed accounting methods and practices since their last quarterly statement. However, no matter how hard they try to convince the public that this is a good thing, their stock often gets pummeled, simply because even the Wall Street experts have trouble figuring out all of the implications of the accounting change.

Risk Factors

In the book, Worried Sick, Nortin M. Hadler points out some significant techniques of deception involving the way risk factors are discussed. One of the primary reasons that doctors treat high cholesterol so aggressively with statin drugs is because of a study called the West of Scotland Study. This five-year study of men with high cholesterol was published in 1995. It found that 38 of 3,302 patients (1.2%) who took a statin drug subsequently had fatal hearts, compared to 52 of 3,293 patients (1.7%) who took a placebo. Well, there are two distinctly different ways that these results can be presented--using absolute risk, or using relative risk:

Absolute Risk

The absolute difference between the two groups of patients is 0.5% (1.7% - 1.2%). This absolute risk implies that the statin drug reduced the risk of a fatal heart attack by 0.5%.

Margin of Error

First of all, this 0.5% difference is within the margin of error. We have all seen polls where the results were not definitive because they were within the margin of error (usually +- 2%). This suggests that the 0.5% difference is not statistically significant, and it would be premature to
place a high degree of confidence in the results.

Relative Risk Reduction

The other way to interpret these results is to consider the relative risk. This relative risk reduction would be the percentage that the risk would be reduced, calculated like this:

0.5% / 1.7% = 29%

So, which of the following sounds better?

- The effect of the statin drug was statistically insignificant.

or

- Those taking the statin drug were 29% less likely to have a fatal heart attack.

Well, the drug companies selling the statin drugs would certainly use the latter for marketing purposes. However, the objective reader might well question the supposed benefit of 5 fewer heart attacks in 1,000 men, especially if it meant exposing many "well" men to unnecessary drugs and surgeries.

Bias

The interpretation of statistics is often subject to bias, like the drug companies noted above, whether intended or unintended. I apologize, but the above analogy is actually biased, because of the particular statistics that I chose as an illustration. I could have used the statistics for the men who suffered nonfatal heart attacks: The study found that 143 of 3,302 patients (4.6%) who took a statin drug had nonfatal hearts, compared to 204 of 3,293 patients (6.5%) who took a placebo. This absolute risk of 1.9% would suggest that 19 of 1,000 men would benefit from the statin drugs, instead of only 5 of 1,000 (although still within the margin of error).

Hype

Another deceptive technique used to interpret statistics is that of hype--creating an excitement, or a call to action. We all remember how the swine flu "pandemic" was initially hyped by the statistics of those in Mexico. Schools were closed, and Vice President Biden even questioned the safety of riding on an airplane. Then, one week later, the panic was canceled and schools reopened. Suddenly, of the few, yet tragic, deaths attributed to the swine flu in the U.S., a major percentage of them had to be qualified with statements such as, "... but the victim lived next to the Mexican border, and already had many chronic health problems."

The Great Deceiver

The Bible says that Satan is the great deceiver (2 Corinthians 11:3, Revelation 13:14, Revelation 20:8, and Revelation 20:10). Statistics can also be deceiving, and they often are. Whenever we hear statistics quoted in order to promote a cause, we should check them out carefully.

Owen Weber 2009