How to Lie with Statistics Chad Orzel Physics and Astronomy 10/5/04 What’s This All About? Statistics are commonly used to deceive Technically true, but deceptive Preys.
Download ReportTranscript How to Lie with Statistics Chad Orzel Physics and Astronomy 10/5/04 What’s This All About? Statistics are commonly used to deceive Technically true, but deceptive Preys.
How to Lie with Statistics Chad Orzel Physics and Astronomy 10/5/04 What’s This All About? Statistics are commonly used to deceive Technically true, but deceptive Preys on fear of numbers “Math is hard!” --Barbie False impression of accuracy “Figures never lie, but liars figure.” Need to know how to lie with statistics, to keep from being lied to with statistics. “There are three kinds of lies: Lies, Damned Lies, and Statistics.” --attributed to Benjamin Disraeli Ways to Lie to Voters 0) Fabrication Just make things up… Can be very effective: Lyndon Johnson: “Make the son of a bitch deny it.” Swift Boat Veterans for “Truth” Not what we’re talking about today Talking about ways to say things that are true, but misleading… Example: A typical person in this class: 1) Is Male 2) Plans to Vote for Kerry 3) Has two siblings 4) Is 26 years old 5) Made $18,000 last year All true statements, based on survey results! Ways to Lie to Voters 1) Omission Leave Things Out Previous slide: What does “typical” mean? Specify what kind of average you’re using: Mean: Add ‘em up, divide by total number Median: value in middle (half higher, half lower) Not the same Mean and Median Physics Data 10 “Normal Distribution” “Bell Curve” # Measurements Nearly identical for random variables Very different for skewed data: 8 Mean: 190.1 Median: 190 6 4 2 0 Mean affected by extreme values 186 188 190 Height Diverse populations Median less sensitive to extremes Usually better for economic data 192 194 Example 1: Siblings Sibling Distribution Most people have 0,1,2 Few people with huge families Limited range Can’t have < 0 siblings 50 # Respondants Pull mean up Median 60 40 Mean 30 20 10 0 0 2 4 6 Number of Siblings 8 10 Example 2: Age Age Distribution Diverse Population Problem 35 Median (Much) older faculty Nobody at mean age Very bad description 30 # of Respondants Students, mostly 19-22 25 20 15 10 Mean 5 0 20 30 40 Age 50 60 Example 3: Income Sort of silly, really… “The average family will save $2,000 under my tax plan…” What kind of average? Remember: The mean includes Bill Gates… 100 Number of Respondants Usually where this lie comes up: Income Distribution Median 80 60 40 Mean 20 0 0 20 40 60 Income ($1,000's) 80 100 Campaign Examples Bush Tax Cut Kerry’s $9,000 “We're told that jobs that pay $9,000 less than the jobs that have been lost is the best that we can do.” “111 million taxpayers will save, on average, $1,586 off their taxes.” Facts: 1) 25% receive NO cut (drops mean to $1,217) 2) Median cut: $470 Fact: Based on comparison of broad categories Lost: Manufacturing jobs Gained: “Service” jobs Half of all taxpayers get $470 or less (http://www.factcheck.org/article.aspx?docID=145) Includes burger flippers (http://www.factcheck.org/article.aspx?docID=228) Ways to Lie to Voters 1) Omission (Continued) The Fifth Dentist Problem “Four out of five dentists surveyed…” How many dentists total? 5 total: not a good sample Leave out the sample size, and you can prove just about anything… “Four out of five cards drawn from this deck were black!” Campaign Example “And that's what people are seeing now is happening in Afghanistan. Ten million citizens have registered to vote. It's a phenomenal statistic. That if given a chance to be free they will show up at the polls. Forty-one percent of those 10 million are women.” --G.W. Bush, 1st Presidential Debate • Ratio of men registered to women registered: 58.6 to 41.4 percent • Estimated eligible voting population in Afghanistan: 9.8 million • Registered voters in Afghanistan, as of August 21: 10.3 million • Reported number of registration cards a single Afghan has been able to obtain: from 2 to 40 • Percent of the estimated eligible male population that is now registered to vote: 120 percent • Number of provinces that are over-registered: 13 (out of 30) • Number of provinces which registered voters exceed the population by 40% or more: 4 (http://www.tcf.org/afghanistanwatch/main.htm#voterregistrationfraud) Ways to Lie to Voters 2) Exaggeration Make Something of Nothing Fear of big numbers: “My opponent wants to spend $2 million on [something]…” Sounds bad… $2 million = 1/1,000,000th of the budget = chump change Need to put big numbers in context Example: Guys Rule! Gender Distribution More Survey Data… 56 Nothing false in graph Creates false impression 54 % of Respondants Scale axes to blow up small differences 52 50 48 46 44 Male Female Example: Guys Rule! Gender Distribution 100 Honest presentation: Bars same width, color Slightly more male students Not that big a difference % of Respondants Full scale shown 75 50 25 0 Male Female Example (http://www.pollkatz.homestead.com/) Campaign Example “According to the first post-debate poll, from Newsweek, John Kerry leads President Bush by a margin of 49% to 46%. Put Nader in the mix and Kerry's margin drops from 3 to 2.” --Josh Marshall, Talking Points Memo (weblog) “In the first national telephone poll using a fresh sample, NEWSWEEK found the race now statistically tied among all registered voters, 47 percent of whom say they would vote for Kerry and 45 percent for George W. Bush in a three-way race.” --MSNBC (1,013 voters surveyed, Margin of Error +/- 4%) What does margin of error really mean? (http://www.washingtonmonthly.com/archives/individual/2004_08/004536.php) Other Ways to Lie 3) Misdirection True, but Irrelevant Quote impressive statistics about side issues Creates false impression of real support 4) False Correlation Post Hoc Fallacy Homicide rates peak in summer Ice cream sales peak in summer Therefore, ice cream leads to murder? Correlation is not Causation What to Do? Questions to ask about any statistic: 1) Who created it? Do they have an agenda? 2) Why was it created? Research or politics? 3) How was it created? Methodology What to Do? (continued) Questions to ask about any statistic: 4) What’s missing? Is there hidden context? 5) Is it relevant? Avoid misdirection 6) Does it make sense? If it sounds ridiculous, it probably is…