|
The Deception of Statistics
I have a deep appreciation for statistics, at least partially due to my
background in mathematics. When I read an article, my
attention is drawn first to the charts and tables, and only secondarily
to the text. I'm especially fond of statistical analysis of
the stock market, political polls, and election coverage. I
enjoy reading a world almanac, and especially the "Statistical Abstract of the United States."
However, I have also discovered how profoundly true it is that any set
of statistics can usually be easily manipulated to defend both sides of
an argument. In the hands of a skilled debater, statistics
can be made to illuminate whatever needs to be illustrated.
By considering a few examples, we can quickly learn some of the
deceptive techniques used in this art form.
Sample Sets
Suppose that a political campaign is underway, and the Democratic and
Republican candidates are engaged in a debate concerning which party is
truly the modern day party of the people. Both politicians
are trying to show that their own party is the party of choice, and
they each begin citing statistics to prove their case:
- The Democrat argues that statistics show that the Democratic Party
has won 60% of the most recent presidential elections.
- The Republican argues that the Republican Party has won 62% of the
most recent presidential elections.
At first glance, most of us would assume that (at least) one of the
candidates is incorrect. After all, anyone with an elementary
understanding of percentages would know that only one side can hold a
majority. We would probably want to investigate these claims
to find out which candidate is right, and which one is wrong.
Then we would want to further investigate the candidate who is wrong,
in order to determine whether or not he is actually lying, or just
mistaken. Yes, we tell ourselves, we will soon get to the
bottom of this, and clearly expose the culprit. Our research
reveals the following winners of the most recent presidential elections:
1- 2008 - Obama - Democrat
2- 2004 - Bush 43 - Republican
3- 2000 - Bush 43 - Republican
4- 1996 - Clinton - Democrat
5- 1992 - Clinton - Democrat
6- 1988 - Bush 41 - Republican
7- 1984 - Reagan - Republican
8- 1980 - Reagan - Republican
9- 1976 - Carter - Democrat
10- 1972 - Nixon - Republican
11- 1968 - Nixon - Republican
12- 1964 - Johnson - Democrat
13- 1960 - Kennedy - Democrat
14- 1956 - Eisenhower - Republican
15- 1952 - Eisenhower - Republican
16- 1948 - Truman - Democrat
17- 1944 - F. Roosevelt - Democrat
18- 1940 - F. Roosevelt - Democrat
19- 1936 - F. Roosevelt - Democrat
20- 1932 - F. Roosevelt - Democrat
21- 1928 - Hoover - Republican
22- 1924 - Coolidge - Republican
23- 1920 - Harding - Republican
24- 1916 - Wilson - Democrat
25- 1912 - Wilson - Democrat
26- 1908 - Taft - Republican
27- 1904 - T. Roosevelt - Republican
28- 1900 - McKinley - Republican
29- 1896 - McKinley- Republican
30- 1892 - Cleveland - Democrat
31- 1888 - Harrison- Republican
32- 1884 - Cleveland - Democrat
33- 1880 - Garfield- Republican
34- 1876 - Hayes- Republican
35- 1872 - Grant- Republican
36- 1868 - Grant- Republican
37- 1864 - Lincoln- Republican
38- 1860 - Lincoln- Republican
Well, we quickly see that both candidates were indeed
correct. What seems impossible turns out to be factual,
because each candidate manipulated the statistics in his own
favor. The Democrat was correct because three of the last
five elections (60%) were won by Democrats. However, the
Republican was also correct, because five of the last eight elections
(62.5%) were won by Republicans. So, the first trick we've
learned is that of manipulating what is called the sample
set. The Democrat used the last five elections as his sample
set, while the Republican used the last eight elections.
Clearly, each chose a particular sample set that was more favorable to
his own cause.
Sample Set Size
So, one might think that all we need to do in order to find the real
truth is to increase the size of the sample set.
Unfortunately, this only leads to more contradictions, as
follows:
- The last 11 elections - Republicans have won 64% (7 to 4).
- The last 20 elections - Democrats have won 55% (11 to 9).
- The last 38 elections - Republicans have won 61% (23 to
15).
To keep pressing this line of reasoning would be a moot point, because
we've already reached the decade during which the Republican Party
overtook the Whig Party as the primary challengers to the Democratic
Party, so further analysis would no longer result in an
apples-to-apples comparison (see below).
Definitions
However, this is related to the next technique--that of the
manipulation of the definition of terms. In this case,
returning to the initial arguments above, each side might believe that
they have the best definition of what it means to be "modern day" party
of the people. Here's how that
debate might ensue:
- The Democrat might argue that "modern day" refers to the
era since the end of the Cold War, so we should go back only to 1992.
- The Republican would then argue that the first term of
Ronald Reagan, the "modern-day" hero of the Republican Party, actually
marked the beginning of the end of the Cold War, so we should go back
to 1980.
- Then the Democrat would counter that if the Republican gets
to go back to the time of his party's hero, then the Democrats should
get to go back to the time of their hero, Franklin Roosevelt, so we
should start with the year 1932.
- The Republican rejects this argument based upon the
"logical" meaning of "modern-day" being our post-World War era, so we
shouldn't go back beyond 1948.
- But when the Democrat is unwilling to bend on this point,
the Republican then says, "OK. If you can have your FDR, then
we can have our Lincoln," so they want to start with the year
1860. It's easy to see how this just gets sillier and
sillier.
Categories
Another favorite trick is to manipulate statistics by changing the
categories of the entities being compared. For example, in
the above political example, one might be able to draw advantage by
quoting statistics of "presidents" instead of "presidential
elections. Here's how this works:
1- Obama - Democrat
2- Bush 43 - Republican
3- Clinton - Democrat
4- Bush 41 - Republican
5- Reagan - Republican
6- Carter - Democrat
7- Nixon - Republican
8- Johnson - Democrat
9- Kennedy - Democrat
10- Eisenhower - Republican
11- Truman - Democrat
12- F. Roosevelt - Democrat
13- Hoover - Republican
14- Coolidge - Republican
15- Harding - Republican
16- Wilson - Democrat
17- Taft - Republican
18- T. Roosevelt - Republican
19- McKinley- Republican
20- Cleveland - Democrat
21- Harrison- Republican
22- Cleveland - Democrat
23- Garfield- Republican
24- Hayes- Republican
25- Grant- Republican
26- Lincoln- Republican
Let's see how this argument plays out as we increase the sample
size:
- 66% of the last three presidents (2 to 1) have been Democrats.
- 60% of the last five presidents (3 to 2) have been Republicans.
- 58% of the last 12 presidents (7 to 5) have been Democrats.
- 61% of the last 26 presidents (16 to 10) have been Republicans.
Again this results in more silliness. As one might expect,
the "category pool" is running over with an infinite number of
possibilities that might play in the favor of one side or the other (or
both!). These additional categories might include electoral
votes, popular votes, the dominant party in the Congress (or perhaps
just the Senate, or just the House), etc.
One More
For an additional illustration of how sample size can be an issue,
consider the above statistic that 58% of the last 12 presidents (7 to
5) have been Democrats. Whenever we hear a statistic like this, we can
bet that the results would always be less convincing if the sample size
had been increased by one. Instead of quoting the percentage for the
last 12 presidents, look what happens as we increase the sample size:
- 54% of the last 13 presidents (7 to 6) have been Democrats.
- 50% of the last 14 presidents (7 to 7) have been Democrats.
- 47% of the last 15 presidents (7 to 8) have been Democrats.
Apples-to-Apples
We're all familiar with the old adage about not being able to compare
apples to oranges. In its simplest form, the analogy goes
like this: Which are the better-tasting apples: the
apples in the first box, or the oranges in the second box.
Obviously, the apples in the first box are probably the better tasting
apples, because the oranges in the second box don't taste much at all
like apples--good or bad. Even if the oranges are fresh and
sweet, and the apples are rotten, the apples are probably still
"better-tasting apples" than the oranges, simply because the oranges
are not apples at all.
I have found that this deceptive technique seems to be particularly
useful in the business world. I am amused almost every time I
read a quarterly earnings report from a publicly-traded
company. The company always wants to make the report sound as
promising as possible, but sometimes the reporter covering the story
(and who was fired by this company in the
past) would like to make the report sound as dismal as possible; as
would the stock analyst, hoping to be a hero, since he didn't
recommended that company's stock). Here's how this sometimes
proceeds:
- The company reports strong earnings (profit) of $1.00 per share, on
revenue of $1B, signifying year-over-year growth of 12%; all-in-all, a
very respectable report of a growing and money-making company.
- The reporter's headline reads, "Company's Profit plunges
50%."
Well, this is somewhat factual, because the company's profit for the
same quarter in the previous year was $1.90, so this period's $1.00 is
47% less. However, what this reporter knew, but didn't
report, was that the company spent $1.10 per share on an acquisition
that will make this company much more competitive, and increase profits
in as little as one year from now. Without this acquisition,
profits would have been $2.10 per share, or an increase of 10% over the
same quarter in the previous year.
Again, there are many categories that can be manipulated in this
analogy as well, including profit, revenue, growth, forecast,
impact of foreign currency, etc.
Exaggeration / Embellishment
There are actually a couple of other deceptive techniques in this
example as well. The first one is embellishment.
This was done when the reporter embellished his story with a falsely
negative slant by not comparing apples-to-apples, but reporting a
positive accomplishment in a negative light, and by failing to report
all of the many other positive accomplishments at all.
Exaggeration
The second additional deceptive technique is that of
exaggeration. This was done when the reporter cited that
profits fell by 50% when they actually fell only 47%. They
really makes an impression upon their readers, when key numerical
boundaries can be compromised, such as the 50% barrier; i.e., why
report a number in the 50s when you can report one in the
40s.
Expectations
Continuing with the above example from the business world, of course
the people who are buying and selling that company's stock are fully
aware of the acquisition and its benefits. So, they tend to
ignore the deceitful reporter (and stock analyst). Instead,
they were closely monitoring the report relative to "expectations;"
i.e., what a bunch of smart boys on
Wall Street expected the company to earn. If the average
expectations were that the company would report a profit of $1.02, then
the news of a profit of $1.00 is likely to be taken as disappointing
news. As a result, logic is overcome by the masses, and a
good company with good top-line revenue, good bottom-line profit, and
an improved outlook for the future sees the price of its stock drop by
20% as soon as this news is announced, and it is likely not to recover
this loss for years, despite the increased profits coming only one year
down the road.
Changing the Rules
Another favorite deceptive practice is to change the rules.
Periodically, when a company reports earnings, they will simultaneously
announce that they have changed accounting methods and practices since
their last quarterly statement. However, no matter how hard
they try to convince the public that this is a good thing, their stock
often gets pummeled, simply
because even the Wall Street experts have trouble figuring out all of
the implications of the accounting change.
Risk Factors
In the book, Worried Sick, Nortin M. Hadler points out some significant
techniques of deception involving the way risk factors are
discussed. One of the primary reasons that doctors treat high
cholesterol so aggressively with statin drugs is because of a study
called the West of Scotland Study. This five-year study of
men with high cholesterol was published in 1995. It found
that 38 of 3,302 patients (1.2%) who took a statin drug subsequently
had fatal hearts, compared to 52 of 3,293 patients (1.7%) who took a
placebo. Well, there are two
distinctly different ways that these results can be presented--using
absolute risk, or using relative risk:
Absolute Risk
The absolute difference between the two groups of patients is 0.5%
(1.7% - 1.2%). This absolute risk implies that the statin
drug reduced the risk of a fatal heart attack by 0.5%.
Margin of Error
First of all, this 0.5% difference is within the margin of
error. We have all seen polls where the results were not
definitive because they were within the margin of error (usually +-
2%). This suggests that the 0.5% difference is not
statistically significant, and it would be premature to
place a high degree of confidence in the results.
Relative Risk Reduction
The other way to interpret these results is to consider the relative
risk. This relative risk reduction would be the percentage
that the risk would be reduced, calculated like this:
0.5% / 1.7% = 29%
So, which of the following sounds better?
- The effect of the statin drug was statistically
insignificant.
or
- Those taking the statin drug were 29% less likely to have a fatal
heart attack.
Well, the drug companies selling the statin drugs would certainly use
the latter for marketing purposes. However, the objective
reader might well question the supposed benefit of 5 fewer heart
attacks in 1,000 men, especially if it meant exposing many "well" men
to unnecessary drugs and surgeries.
Bias
The interpretation of statistics is often subject to bias, like the
drug companies noted above, whether intended or unintended. I
apologize, but the above analogy is actually biased, because of the
particular statistics that I chose as an illustration. I
could have used the statistics for the men who suffered nonfatal heart
attacks: The study found that 143 of 3,302 patients (4.6%)
who took a statin drug had nonfatal hearts, compared to 204 of 3,293
patients (6.5%) who took a placebo. This absolute risk of
1.9% would suggest that 19 of
1,000 men would benefit from the statin drugs, instead of only 5 of
1,000 (although still within the margin of error).
Hype
Another deceptive technique used to interpret statistics is that of
hype--creating an excitement, or a call to action. We all
remember how the swine flu "pandemic" was initially hyped by the
statistics of those in Mexico. Schools were closed, and Vice
President Biden even questioned the safety of riding on an
airplane. Then, one week later, the panic was canceled and
schools reopened. Suddenly, of the few, yet tragic, deaths
attributed to the swine flu in the U.S., a major percentage of them had
to be qualified with statements such as, "... but the victim lived next
to the Mexican border, and already had many chronic health
problems."
The Great Deceiver
The Bible says that Satan is the great deceiver (2 Corinthians 11:3,
Revelation 13:14, Revelation 20:8, and Revelation 20:10).
Statistics can also be deceiving, and they often are.
Whenever we hear statistics quoted in order to promote a cause, we
should check them out carefully.
Owen Weber 2009
|
|