Tag Archives: Mathematical Malpractice

Mathematical Malpractice Watch: Inequality

Politico has a silly article up from Michael Lind, claiming that the South skews all the statistics for the country and that, without the South, our country would be in awesome shape. A typical example:

Economic inequality? Apart from California and New York, where statistics reflect the wealth of Wall Street, Hollywood and Silicon Valley, the South is the region with the greatest income inequality. Southern exceptionalism has helped to ensure that the American Dream is more likely to be realized in the Old World than in the New.

Yes … when we eliminate 60 million people from consideration, the statistics look good for our side! And let’s just ignore that whole Texas oil thing.

The thing is, the difference in inequality is something that you can measure. I looked at a population-weighted mean gini index for various subsets of states. States that voted for Romney in 2012 have a weighted gini index of .460. States that voted for Obama are way more equal at uh … um, actually, they’re higher at .464.

If you restrict that analysis to Southern States, you do get a higher index of .467. But then again, if you restrict that analysis to coastal blue states, you get .469.

God damn those coastal elites, ruining the country!

Crime is a little different, being higher in the red states (384 per 100,000 vs. 359 per 100,000) and higher in the South specifically (402), but the difference is not that dramatic and even the blue state crime rates would be very high (although both would be lower than violent crime rates in the UK). And all regions have seen a huge drop in violent crime rates over the last two decades.

He goes on to point out that the South is much more religious (which … is bad thing?) and has a higher rate of gun ownership (although even the North would still have one of the highest rates in the world. And again … gun ownership is not ipso facto a bad thing). He is right that the South has more executions but then fumbles the ball again, arguing that Obama’s poor showing in Southern states (except Virginia and North Carolina and Florida and strong showings in Missouri and Georgia) proves racial animus and not that the South vote Republican no matter who is running.

He also ignores basically anything that might that favor the South, such as 40% of our military hailing from the South. Or that most of the job growth in Obama’s presidency has been in the South (specifically in Texas). Or the fact that, over the last few decades, the American people have been voting with their feet, moving South by the millions.

South-bashing is in these days, I guess, so I can’t blame Lind for trying. Better luck next time.

Mathematical Malpractice Watch: IUDs and Teens

Right now, the liberal blogosphere is erupting over Republican plans to not fund a program to give free IUDs to low income women:

Republican legislators in Colorado will not authorize funding for a program that gives free IUDs to low-income women — an effort that many believe was responsible for hugely driving down teen births.

Colorado has recently experienced a stunning decline in its teen birth rate. Between 2007 and 2012, federal data shows that births declined 40 percent — faster than any other state in the country.

State officials attributed part of this success to the Colorado Family Planning Initiative, which provided free IUDs to low-income women seen at 68 family planning clinics across the state. Last year, state officials estimated that young women served by those family planning clinics accounted for about three-fourths of the overall decline in Colorado’s teen birth rate.

I disagree with the Republicans on this. But the idea that free IUD program cut Colorado’s teen birth rate by 40% or 3/4 of 40% or anywhere close to 40% is high-test nonsense.

Here is the data from the CDC on teen brith rates. From the first graph, you’ll see that teen birth rates have been steadily falling for seventy years. Like most positive social trends, it has many, um, parents, each of which are flogged by whomever supports that particular issue. Availability of contraception has certainly played a role. The legalization of abortion played a role (although abortion rates peaked in the early 80’s). As social and professional barrier have fallen, many more women are delaying pregnancy for college and jobs. And there is some evidence that teenagers are waiting longer to have sex (that would be the dreaded “abstinence”).

Since 2007, however, the teen birth rate has fallen off a cliff. But not just in Colorado. It’s fallen everywhere, by an average of 30%. If anything, it’s fallen faster in red states than in blue ones (see Figure 9 of the CDC’s report). Colorado has seen the steepest decline (39%), but just behind it are the red states of Arizona (37%), Georgia (37%), North Carolina (34%), Utah (34%) and Virginia (33%).

Is Colorado’s IUD program so awesome that it dropped the teen birth rate for the entire country?

Given the extent of the program and Colorado having the largest reduction, it’s very probable that the IUD program did play a role here. But I would ballpark it at maybe 10% at the most.1 That’s not nothing and it’s probably worth continuing the program. But let’s not pretend the reduction is due only to that.

So what is causing the large reduction? Availability of contraception is playing a role, yes, but there’s something else going on. Birth rates have fallen for all women since 2007, not just teenagers. I don’t think it’s coincidence (and neither does the CDC) that the teen birth rate plunged when we hit the worst recession since the Great Depression. If you look at historical birth rates, you’ll see a similar plunge in during the 1930’s. And that was long before almost the entirety of modern birth control, least of all free birth control.

I think that’s the story here. Colorado’s program was fortuitously timed in that regard and there is likely some synergy between the economic downturn and the IUD program (i.e., the program kicked in right when a bunch of women were more eager for birth control).

One of the difficult things about Mathematical Malpractice Watch is that I frequently end up attacking people I fundamentally agree with. I think Colorado should extend their IUD program (although I’m old enough to remember, in the 90’s, when Republican governors offering incentives for low-income women to use Norplant was denounced as eugenics). But the claim that it has produced a “huge” reduction in the teen birth rate is just not true.

Actually, there is a chance that the effect is 0%. Colorado had the sharpest reduction in teen pregnancy rates. It’s easy to go in, post facto, and identify a pet policy to pin it on while ignoring the thousand other factors occurring in fifty states. It’s called the Texas Sharpshooter Fallacy. Colorado might just be a statistical outlier and we’re crediting a policy for that outlierness because we like the policy. Colorado’s barely two standard deviations from the mean. I think it’s likely the IUD fund had an effect, but I’d be pressed to prove it statistically.

How Many Women?

Campus sexual violence continues to be a topic of discussion, as it should be. I have a post going up on the other site about the kangaroo court system that calls itself campus justice.

But in the course of this discussion, a bunch of statistical BS has emerged. This centers on just how common sexual violence is on college campuses, with estimates ranging from the one-in-five stat that has been touted, in various forms, since the 1980’s, to a 0.2 percent rate touted in a recent op-ed.

Let’s tackle that last one first.

According to the FBI “[t]he rate of forcible rapes in 2012 was estimated at 52.9 per 100,000 female inhabitants.”

Assuming that all American women are uniformly at risk, this means the average American woman has a 0.0529 percent chance of being raped each year, or a 99.9471 percent chance of not being raped each year. That means the probability the average American woman is never raped over a 50-year period is 97.4 percent (0.999471 raised to the power 50). Over 4 years of college, it is 99.8 percent.

Thus the probability that an American woman is raped in her lifetime is 2.6 percent and in college 0.2 percent — 5 to 100 times less than the estimates broadcast by the media and public officials.

This estimate is way too low. It is based on taking one number and applying high school math to it. It misses the mark because it uses the wrong numbers and some poor assumptions.

First of all, the FBI’s stats are on documented forcible rape and does not account for under-reporting and does not includes sexual assault. The better comparison is the National Crime Victimization Survey, which estimates about 300,000 rapes or sexual assaults in 2013 for an incidence rate of 1.1 per thousand. But even that number needs some correction because about 2/3 of sexual violence is visited upon women between the ages of 12 and 30 and about a third among college-age women. The NCVS rate indicates about a 10% lifetime risk or about 3% college-age risk for American women. This is lower than the 1-in-5 stat but much higher than 1-in-500.

(*The NCVS survey shows a jump in sexual violence in the 2000’s. That’s not because sexual violence surged; it’s because they changed their methodology, which increased their estimates by about 20%.)

So what about 1-in-5? I’ve talked about this before, but it’s worth going over again: the one-in-five stat is almost certainly a wild overestimate:

The statistic comes from a 2007 Campus Sexual Assault study conducted by the National Institute of Justice, a division of the Justice Department. The researchers made clear that the study consisted of students from just two universities, but some politicians ignored that for their talking point, choosing instead to apply the small sample across all U.S. college campuses.

The CSA study was actually an online survey that took 15 minutes to complete, and the 5,446 undergraduate women who participated were provided a $10 Amazon gift card. Men participated too, but their answers weren’t included in the one-in-five statistic.

If 5,446 sounds like a high number, it’s not — the researchers acknowledged that it was actually a low response rate.

But a lot of those responses have to do with how the questions were worded. For example, the CSA study asked women whether they had sexual contact with someone while they were “unable to provide consent or stop what was happening because you were passed out, drugged, drunk, incapacitated or asleep?”

The survey also asked the same question “about events that you think (but are not certain) happened.”

That’s open to a lot of interpretation, as exemplified by a 2010 survey conducted by the U.S. Centers for Disease Control and Prevention, which found similar results.

I’ve talked about the CDC study before and its deep flaws. Schow points out that the victimization rate they are claiming is way more than the National Crime Victimization Survey (NCVS), the FBI and the Rape, Abuse and Incest National Network (RAINN) estimates. All three of those agencies use much more rigorous data collection methods. NCVS does interviews and asks the question straight up: have you been raped or sexually assaulted? I would trust the research methods of these agencies, who have been doing this for decades, over a web-survey of two colleges.

Another survey recently emerged from MIT which claimed 1-in-6 women are sexually assaulted. But only does this suffer from the same flaws as the CSA study (a web survey with voluntary participation), it’s not even claiming what it claims:

When it comes to experiences of sexual assault since starting at MIT:

  • 1 in 20 female undergraduates, 1 in 100 female graduate students, and zero male students reported being the victim of forced sexual penetration
  • 3 percent of female undergraduates, 1 percent of male undergraduates, and 1 percent of female grad students reported being forced to perform oral sex
  • 15 percent of female undergraduates, 4 percent of male undergraduates, 4 percent of female graduate students, and 1 percent of male graduate students reported having experienced “unwanted sexual touching or kissing”
  • All of these experiences are lumped together under the school’s definition of sexual assault.

    When students were asked to define their own experiences, 10 percent of female undergraduates, 2 percent of male undergraduates, three percent of female graduate students, and 1 percent of male graduate students said they had been sexually assaulted since coming to MIT. One percent of female graduate students, one percent of male undergraduates, and 5 percent of female undergraduates said they had been raped.

    Note that even with a biased study, the result is 1-in-10, not 1-in-5 or 1-in-6.

    OK, so web surveys are a bad way to do this. What is a good way? Mark Perry points out that the one-in-five stat is inconsistent with another number claimed by advocates of new policies: a reporting rate of 12%. If you assume a reporting rate near that and use the actual number of reported assaults on major campuses, you get a rate of around 3%.

    Hmmm.

    Further research is consistent with this rate. For example, here, we see that UT Austin has 21 reported incidents of sexual violence. That’s one in a thousand enrolled women. Texas A&M reported nine, one in three thousand women. Houston reported 11, one in 2000 women. If we are to believe the 1-in-5 stat, that’s a reporting rate of half a percent. A reporting rate of 10%, which is what most people accept, would mean … a 3-5% risk for five years of enrollment.

    So … Mark Perry finds 3%. Texas schools show 3-5%. NCVS and RAINN stats indicate 2-5%. Basically, any time we use actual numbers based on objectives surveys, we find the number of women who are in danger of sexual violence during their time on campus is 1-in-20, not 1-in-5.

    One other reason to disbelieve the 1-in-5 stat. Sexual violence in our society is down — way down. According to the Bureau of Justice Statistics, rape has fallen from 2.5 per 1000 to 0.5 per thousand, an 80% decline. The FBI’s data show a decline from 40 to about 25 per hundred thousand, a 40% decline (they don’t account for reporting rate, which is likely to have risen). RAINN estimates that the rate has fallen 50% in just the last twenty years. That means 10 million fewer sexual assaults.

    Yet, for some reason, sexual assault rates on campus have not fallen, at least according to the favored research. They were claiming 1-in-5 in the 80’s and they are claiming 1-in-5 now. The sexual violence rate on campus might fall a little more slowly than the overall society because campus populations aren’t aging the way the general population is and sexual violence victims are mostly under 30. But it defies belief that the huge dramatic drops in violence and sexual violence everywhere in the world would somehow not be reflected on college campuses.

    Interestingly, the decline in sexual violence does appear if you polish the wax fruit a bit. The seminal Koss study of the 1980’s claimed that one-in-four women were assaulted or raped on college campuses. As Christina Hoff Summer and Maggie McNeill pointed out, the actual rate was something like 8%. A current rate of 3-5% would indicate that sexual violence on campus has dropped in proportion to that of sexual violence in the broader society.

    It goes without saying, of course that 3-5% of women experiencing sexual violence during their time at college is 3-5% too many. As institutions of enlightenment (supposedly), our college campuses should be safer than the rest of society. I support efforts to clamp down on campus sexual violence, although not in the form that it is currently taking, which I will address on the other site.

    But the 1-in-5 stat isn’t reality. It’s a poll-test number. It’s a number picked to be large enough to be scary but not so large as to be unbelievable. It is being used to advance an agenda that I believe will not really address the problem of sexual violence.

    Numbers means things. As I’ve argued before, if one in five women on college campuses are being sexually assaulted, this suggests a much more radical course of action than one-in-twenty. It would suggest that we should shut down every college in the country since they are the most dangerous places for women in the entire United States. But 1-in-20 suggests that an overhaul of campus judiciary systems, better support for victims and expulsion of serial predators would do a lot to help.

    In other words, let’s keep on with the policies that have dropped sexual violence 50-80% in the last few decades.

    Mathematical Malpractice Watch: Non-Citizen Voters

    Hmmm:

    How many non-citizens participate in U.S. elections? More than 14 percent of non-citizens in both the 2008 and 2010 samples indicated that they were registered to vote. Furthermore, some of these non-citizens voted. Our best guess, based upon extrapolations from the portion of the sample with a verified vote, is that 6.4 percent of non-citizens voted in 2008 and 2.2 percent of non-citizens voted in 2010.

    The authors go on to speculate that non-citizen voting could have been common enough to swing Al Franken’s 2008 election and possibly even North Carolina for Obama in 2008. Non-citizens vote overwhelmingly Democrat.

    I do think there is a point here which is that non-citizens may be voting in our elections, which they are not supposed to do. Interestingly, photo ID — the current policy favored by Republicans — would do little to address this as most of the illegal voters had ID. The real solution … to all our voting problems … would be to create a national voter registration database that states could easily consult to verify someone’s identity, citizenship, residence and eligibility status. But this would be expensive, might not work and would very likely require a national ID card, which many people vehemently oppose.

    However …

    The sample is very small: 21 non-citizens voting in 2008 and 8 in 2010. This is intriguing but hardly indicative. It could be a minor statistical blip. And there have been critiques that have pointed out that this is based on a … wait for it … web survey. So the results are highly suspect. It’s likely that fair number of these non-citizen voters are, in fact, non-correctly-filling-out-a-web-survey voters.

    To their credit, the authors acknowledge this and say that while it is possible non-citizens swung the Franken Election (only 0.65% would have had to vote), speculating on other races is … well, speculation.

    So far, so good.

    The problem is how the blogosphere is reacting to it. Conservative sites are naturally jumping on this while liberals are talking about the small number statistics. But those liberal sites are happy to tout small numbers when it’s, say, a supposed rise in mass shootings.

    In general, I lean toward to the conservatives on this. While I don’t think voter fraud is occurring on the massive scale they presume, I do think it’s more common than the single-digit or double-digit numbers liberals like to hawk. Those numbers are themselves based on small studies in environments where voter ID is not required. We know how many people have been caught. But assuming that represents the limit of the problem is like assuming the number of speeders on a highway is equal to the number of tickets that are given out. One of the oft-cited studies is from the President’s Commission on Election Administration, which was mostly concerned with expanding access, not tracking down fraud.

    Here’s the thing. While I’m convinced the number of fraudulent votes is low, I note that, every time we discuss this, that number goes up. It used to be a handful. Now it’s a few dozen. This study hints it could be hundreds, possibly thousands. There are 11 million non-citizens living in this country (including my wife). What these researchers are indicating is that, nationally, their study could mean a many thousands of extra votes for Democrats. Again, their study is very small and likely subject to significant error (as all web surveys are). It’s also likely the errors bias high. But even if they have overestimated the non-citizen voting by a factor of a hundred, that still means a few thousands incidents of voter fraud. That’s getting to the point where this may be a concern, no?

    Do I think this justifies policy change? I don’t think a web-survey of a few hundred people justifies anything. I do think this indicates the issue should be studied properly and not just dismissed out of hand because only a few dozen fake voters have actually been caught.

    Mother Jones Revisited

    A couple of years ago, Mother Jones did a study of mass shootings which attempted to characterize these awful events. Some of their conclusions were robust — such as the finding that most mass shooters acquire their guns legally. However, their big finding — that mass shootings are on the rise — was highly suspect.

    Recently, they doubled down on this, proclaiming that Harvard researchers have confirmed their analysis1. The researchers use an interval analysis to look at the time differences between mass shootings and claim that the recent run of short intervals proves that the mass shootings have tripled since 2011.2

    Fundamentally, there’s nothing wrong with the article. But practically, there is: they have applied a sophisticated technique to suspect data. This technique does not remove the problems of the original dataset. If anything, it exacerbates them.

    As I noted before, the principle problem with Mother Jones’ claim that mass shootings were increasing was the database. It had a small number of incidents and was based on media reports, not by taking a complete data set and paring it down to a consistent sample. Incidents were left out or included based on arbitrary criteria. As a result, there may be mass shootings missing from the data, especially in the pre-internet era. This would bias the results.

    And that’s why the interval analysis is problematic. Interval analysis itself is useful. I’ve used it myself on variable stars. But there is one fundamental requirement: you have to have consistent data and you have to account for potential gaps in the data.

    Let’s say, for example, that I use interval analysis on my car-manufacturing company to see if we’re slowing down in our production of cars. That’s a good way of figuring out any problems. But I have to account for the days when the plant is closed and no cars are being made. Another example: let’s say I’m measuring the intervals between brightness peaks of a variable star. It will work well … if I account for those times when the telescope isn’t pointed at the star.

    Their interval analysis assumes that the data are complete. But I find that suspect given the way the data were collected and the huge gaps and massive dispersion of the early intervals. The early data are all over the place, with gaps as long as 500-800 days. Are we to believe that between 1984 and 1987, a time when violent crime was surging, that there was only one mass shooting? The more recent data are far more consistent with no gap greater than 200 days (and note how the data get really consistent when Mother Jones began tracking these events as they happened, rather than relying on archived media reports).

    Note that they also compare this to the average of 172 days. This is the basis of their claim that the rate of mass shootings has “tripled”. But the distribution of gaps is very skewed with a long tail of long intervals. The median gap is 94 days. Using the median would reduce their slew of 14 straight below-average points to 11 below-median points. It would also mean that mass shootings have increased by only 50%. Since 1999, the median is 60 days (and the average 130). Using that would reduce their slew of 14 straight short intervals to four and mean that mass shootings have been basically flat.

    The analysis I did two years ago was very simplistic — I looked at victims per year. That approach has its flaws but it has one big strength — it is less likely to be fooled by gaps in the data. Huge awful shootings dominate the number of victims and those are unlikely to have been missed in Mother Jones’ sample.

    Here is what you should do if you want to do this study properly. Start with a uniform database of shootings such as those provided by law enforcement agencies. Then go through the incidents, one by one, to see which ones meet your criteria.

    In Jesse Walker’s response to Mother Jones, in which he graciously quotes me at length, he notes that a study like this has been done:

    The best alternative measurement that I’m aware of comes from Grant Duwe, a criminologist at the Minnesota Department of Corrections. His definition of mass public shootings does not make the various one-time exceptions and other jerry-riggings that Siegel criticizes in the Mother Jones list; he simply keeps track of mass shootings that took place in public and were not a byproduct of some other crime, such as a robbery. And rather than beginning with a search of news accounts, with all the gaps and distortions that entails, he starts with the FBI’s Supplementary Homicide Reports to find out when and where mass killings happened, then looks for news reports to fill in the details. According to Duwe, the annual number of mass public shootings declined from 1999 to 2011, spiked in 2012, then regressed to the mean.

    (Walker’s article is one of those “you really should read the whole thing” things.)

    This doesn’t really change anything I said two year ago. In 2012, we had an awful spate of mass shootings. But you can’t draw the kind of conclusions Mother Jones wants to from rare and awful incidents. And it really doesn’t matter what analysis technique you use.


    1. That these researchers are from Harvard is apparently a big deal to Mother Jones. As one of my colleague used to say, “Well, if Harvard says it, it must be true.”

    2. This is less alarming than it sounds. Even if we take their analysis at face value, we’re talking about six incidents a year instead of two for a total of about 30 extra deaths or about 0.2% of this country’s murder victims or about the same number of people that are crushed to death by their furniture. We’re also talking about two years of data and a dozen total incidents.

    Mathematical Malpractice Watch: Torturing the Data

    There’s been a kerfuffle recently about a supposed CDC whistleblower who has revealed malfeasance in the primary CDC study that refuted the connection between vaccines and autism. Let’s put aside that the now-retracted Lancet study the anti-vaxxers tout as the smoking gun was a complete fraud. Let’s put aside that other studies have reached the same conclusion. Let’s just address the allegations at hand, which include a supposed cover up. These allegations are in a published paper (now under further review) and a truly revolting video from Andrew Wakefield — the disgraced author of the fraudulent Lancet study that set off this mess — that compares this “cover-up” to the Tuskegee experiments.

    According to the whistle-blower, his analysis shows that while most children do not have an increased risk of autism (which, incidentally, discredits Wakefield’s study), black males vaccinated before 36 months show a 240% increased risk (not 340, as has been claimed). You can catch the latest from Orac. Here’s the most important part:

    So is Hooker’s result valid? Was there really a 3.36-fold increased risk for autism in African-American males who received MMR vaccination before the age of 36 months in this dataset? Who knows? Hooker analyzed a dataset collected to be analyzed by a case-control method using a cohort design. Then he did multiple subset analyses, which, of course, are prone to false positives. As we also say, if you slice and dice the evidence more and more finely, eventually you will find apparent correlations that might or might not be real.

    In other words, what he did was slice and dice the sample to see if one of those slices would show a correlation. But by pure chance, one of those slices would show a correlation, even there wasn’t one. As best illustrated in this cartoon, if you run twenty tests for something that has no correlation, statistics dictate that at least one of those will show a spurious correlation at the 95% confidence level. This is one of the reasons many scientists, especially geneticists, are turning to Bayesian analysis, which can account for this.

    If you did a study of just a few African-American boys and found a connection between vaccination and autism, it would be the sort of preliminary shaky result you would use to justify looking at a larger sample … such as the full CDC study that the crackpot’s own analysis shows refutes such a connection. To take a large comprehensive study, narrow it down to a small sample and then claim the result of this small sample override those of the large one is ridiculous. It’s the opposite of how epidemiology works (and there is no suggestion that there is something about African American males that makes them more susceptible to vaccine-induced autism).

    This sort of ridiculous cherry-picking happens a lot, mostly in political contexts. Education reformers will pore over test results until they find that fifth graders slightly improved their reading scores and claim their reform is working. When the scores revert back the next year, they ignore it. Drug warriors will pore over drug stats and claim that a small drop in heroine use among people in their 20’s indicates that the War on Drugs is finally working. When it reverts back to normal, they ignore it.

    You can’t pick and choose little bits of data to support your theory. You have to be able to account for all of it. And you have to be aware of how often spurious results pop up even in the most objective and well-designed studies, especially when you parse the data finer and finer.

    But the anti-vaxxers don’t care about that. What they care about is proving that evil vaccines and Big Pharma are poisoning us. And however they have to torture the data to get there, that’s what they’ll do.

    Mathematical Malpractice Watch: Hurricanes

    There’s a new paper out that claims that hurricanes with female names tend to be deadlier than ones with male names based on hurricane data going back to 1950. They attribute this to gender bias, the idea that people don’t take hurricanes with female-names seriously.

    No, this is not the onion.

    I immediately suspected a bias. For one thing, even with their database, we’re talking about 92 events, many of which killed zero people. More important, all hurricanes had female names until 1979. What else was true before 1979? We had a lot less advanced warning of hurricanes. In fact, if you look up the deadliest hurricanes in history, they are all either from times before we named them or when hurricanes all had female names. In other words, they may just be measuring the decline in hurricane deadliness.

    Now it’s possible that the authors use some sophisticated model that also account for hurricane strength. If so, that might mitigate my analysis. But I’m dubious. I downloaded their spreadsheet, which is available for the journal website. Here is what I found:

    Hurricanes before 1979 averaged 27 people killed.

    Hurricanes since 1979 average 16 people killed.

    Hurricanes since 1979 with male names average … 16 people killed.

    Hurricanes since 1979 with female names averaged … 16 people killed.

    Maybe I’m missing something. How did this get past a referee?

    Update: Ed Yong raises similar points here. The authors say that cutting the sample at 1979 made the numbers too small and so therefore use an index of how feminine or masculine the names were. I find that dubious when a plain and simple average will give you an answer. Moreover, they try this qualifier in the comments:

    What’s more, looking only at severe hurricanes that hit in 1979 and afterwards (those above $1.65B median damage), 16 male-named hurricane each caused 23 deaths on average whereas 14 female-named hurricanes each caused 29 deaths on average. This is looking at male/female as a simple binary category in the years since the names started alternating. So even in that shorter time window since 1979, severe female-named storms killed more people than did severe male-named storms.

    You be the judge. I average 54 post-1978 storms totally 1200 deaths and get even numbers. They narrow it to 30 totally 800 deaths and claim a bias based on 84 excess deaths. That really crosses as stretching to make a point.

    Update: My friend Peter Yoachim did a K-S test of the data and found a 97% chance that the male- and female-named hurricanes were drawn from the same distribution. This is a standard test of the null hypothesis and wasn’t done at all. Ridiculous.

    The Return of Linkorama

    Linkoramas are getting rarer these days mostly because I tweet most articles. But I will still be occasionally posting something more long-form.

    To wit:

  • A fascinating article about how Vermeer used a camera obscura to enable his paintings. Yet another example about how people were pretty damn clever in the supposedly unenlightened past.
  • This is a couple of months late, but someone posted up Truman Capote’s christmas story. The recent death of Phillip Seymour Hoffman reminded me of this little gem.
  • This is the second and by far the largest study yet to show that routine mammography is basically a gigantic waste of money, being just as likely to precipitate unnecessary treatment as to discover a tumor that a breast exam wouldn’t. Do you think our “evidence-based” government will embrace this? No way. They already mandated mammogram coverage when the first study showed it to be a waste.
  • I don’t know even know if this counts as mathematical malpractice. There’s no math at all. It’s just “Marijuana! RUN!”. Simply appalling reporting by the MSM.
  • This on the other hand, does count as mathematical malpractice. The gun control advocates are hyping a Missouri study that shows a rise in murder rate after a change in the gun control laws. However, in doing so they are ignoring data from 17 other states, data on all other forms of violent crime and data from Missouri that showed a steep rise in the murder rate before the laws were changed. They are picking a tiny slice of data to make a huge claim. Disgraceful. And completely expected from the gun-grabbers.
  • I love color photos from history. Just love them.
  • This is old but worth reposting: one of the biggest feminists texts out there is loaded with garbage data, easily checked facts that are completely wrong. This was a big reason I distanced myself from third-wave feminism in college: it had been taken over by crackpots who would believe any statistic as long as it was bad. In college, we were told that one in three women are raped (they aren’t) that abuse is the leading cause of admission to ER’s (it isn’t), that violence erupts very Superbowl (it doesn’t). I even had one radical tell me — with no apparent self-awareness, that murder was the second leading cause of death among women (it’s not even close). As I seem to say about everything: reality is bad enough; we don’t need to invent stuff.
  • Mathematical Malpractice Watch: A Trilogy of Error

    Three rather ugly instances of mathematical malpractice have caught my attention in the last month. Let’s check them out.

    The Death of Facebook or How to Have Fun With Out of Sample Data

    Last month, Princeton researchers came out with the rather spectacular claim that the social network Facebook would be basically dead within a few years. The quick version is that they fit an epidemiological model to the rise and fall of MySpace. They then used that same model, varying the parameters, to fit Google trends on searches for Facebook. They concluded that Facebook would lose 80% of its customers by 2017.

    This was obviously nonsese as detailed here and here. It suffered from many flaws, notably assuming that the rise and fall of MySpace was necessarily a model for all social networks and the dubious method of using Google searches instead of publicly available traffic data as their metric.

    But there was a deeper flaw. The authors fit a model of a sharp rise and fall. They then proclaim that this model works because Facebook’s google data follows the first half of that trend and a little bit of the second. But while the decline in Facebook Google searches is consistent with their model, it is also consistent with hundreds of others. It would be perfectly consistent with a model that predicts a sharp rise and then a leveling off as the social network saturates. Their data are consistent with but not discriminating against just about any model.

    The critical part of the data — the predicted sharp fall in Facebook traffic — is out of sample (meaning it hasn’t happened yet). But based on a tiny sliver of data, they have drawn a gigantic conclusion. It’s Mark Twain and the length of the Mississippi River all over again.

    We see this a lot in science, unfortunately. Global warming models often predict very sharp rises in temperature — out of sample. Models of the stock market predict crashes or runs — out of sample. Sports twerps put together models that predict Derek Jeter will get 4000 hits — out of sample.

    Anyone who does data fitting for a living knows this danger. The other day, I fit a light curve to a variable star. Because of an odd intersection of Fourier parameters, the model predicted a huge rise in brightness in the middle of its decay phase because there were no data to constrain it there. So it fit a small uptick in the decay phase as though it were the small beginning of a massive re-brightening.

    The more complicated the model, the more danger there is of drawing massive conclusions from tiny amounts of data or small trends. If the model is anything other than a straight line, be very very wary at out-of-sample predictions, especially when they are predicting order-of-magnitude changes.

    A Rape Epidemic or How to Reframe Data:

    The CDC recently released a study that claimed that 1.3 million women were raped and 12.6 million more were subject to sexual violence in 2010. This is six or more times the estimates of the FBI’s extremely rigorous NCVS estimate. Christina Hoff Summers has a breakdown of why the number is so massive:

    It found them by defining sexual violence in impossibly elastic ways and then letting the surveyors, rather than subjects, determine what counted as an assault. Consider: In a telephone survey with a 30 percent response rate, interviewers did not ask participants whether they had been raped. Instead of such straightforward questions, the CDC researchers described a series of sexual encounters and then they determined whether the responses indicated sexual violation. A sample of 9,086 women was asked, for example, “When you were drunk, high, drugged, or passed out and unable to consent, how many people ever had vaginal sex with you?” A majority of the 1.3 million women (61.5 percent) the CDC projected as rape victims in 2010 experienced this sort of “alcohol or drug facilitated penetration.”

    What does that mean? If a woman was unconscious or severely incapacitated, everyone would call it rape. But what about sex while inebriated? Few people would say that intoxicated sex alone constitutes rape — indeed, a nontrivial percentage of all customary sexual intercourse, including marital intercourse, probably falls under that definition (and is therefore criminal according to the CDC).

    Other survey questions were equally ambiguous. Participants were asked if they had ever had sex because someone pressured them by “telling you lies, making promises about the future they knew were untrue?” All affirmative answers were counted as “sexual violence.” Anyone who consented to sex because a suitor wore her or him down by “repeatedly asking” or “showing they were unhappy” was similarly classified as a victim of violence. The CDC effectively set a stage where each step of physical intimacy required a notarized testament of sober consent.

    In short, they did what is called “reframing”. They took someone’s experiences, threw away that person’s definition of them and substituted their own definition.

    This isn’t the first time this has happened with rape stats nor the first time Summers had uncovered this sort of reframing. Here is an account of how researchers decided that women who didn’t think they had been raped were, in fact, raped, so they could claim a victimization rate of one in four.

    Scientists have to classify things all the time based on a variety of criteria. The universe is a messy continuum; to understand it, we have to sort things into boxes. I classify stars for a living based on certain characteristics. The problem with doing that here is that women are not inanimate objects. Nor are they lab animals. They can have opinions of their own about what happened to them.

    I understand that some victims may reframe their experiences to try to lessen the trauma of what happened to them. I understand that a woman can be raped but convince herself it was a misunderstanding or that it was somehow her fault. But to a priori reframe any woman’s experience is to treat them like lab rats, not human beings capable of making judgements of their own.

    But it also illustrates a mathematical malpractice problem: changing definitions. This is how 10,000 underage prostitutes in the United States becomes 200,000 girls “at risk”. This is how small changes in drug use stats become an “epidemic”. If you dig deep into the studies, you will find the truth. But the banner headline — the one the media talk about — is hopelessly and deliberately muddled.

    Sometimes you have to change definitions. The FBI changed their NCVS methodology a few years ago on rape statistics and saw a significant increase in their estimates. But it’s one thing to hone; it’s another to completely redefine.

    (The CDC, as my friend Kevin Wilson pointed out, mostly does outstanding work. But they have a tendency to jump with both feet into moral panics. In this case, it’s the current debate about rape culture. Ten years ago, it was obesity. They put out a deeply flawed study that overestimated obesity deaths by a factor of 14. They quickly admitted their screwup but … guess which number has been quoted for the last decade on obesity policy?)

    You might ask why I’m on about this. Surely any number of rapes is too many. The reason I wanted to talk about this, apart from my hatred of bogus studies, is that data influences policy. If you claim that 1.3 million women are being raped every year, that’s going to result in a set of policy decisions that are likely to be very damaging and do very little to address the real problem.

    If you want a stat that means something, try this one: the incidence of sexual violence has fallen 85% over the last 30 years. That is from the FBI’s NCVS data so even if they are over- or under-estimating the amount of sexual violence, the differential is meaningful. That data tells you something useful: that whatever we are doing to fight rape culture, it is working. Greater awareness, pushing back against blaming the victim, changes to federal and state laws, changes to the emphasis of attorneys general’s offices and the rise of internet pornography have all been cited as contributors to this trend.

    That’s why it’s important to push back against bogus stats on rape. Because they conceal the most important stat; the one that is the most useful guide for future policy and points the way toward ending rape culture.

    The Pending Crash or How to Play with Scales:

    Yesterday morning, I saw a chart claiming that the recent stock market trends are an eerie parallel of the run-up to the 1929 crash. I was immediately suspicious because, even if the data were accurate, we see this sort of crap all the time. There are a million people who have made a million bucks on Wall Street claiming to pattern match trends in the stock market. They make huge predictions, just like the Facebook study above. And those predictions are always wrong. Because, again, the out of sample data contains the real leverage.

    This graph is even worse than that, though. As Quartz points out, the graph makers used two different y-axes. In one, the the 1928-29 rise of the stock market was a near doubling. In the other, the 2013-4 rise was an increase of about 25%. When you scale them appropriately, the similarity vanishes. Or, alternatively, the pending “crash” would be just an erasure of that 25% gain.

    I’ve seen this quite a bit and it’s beginning to annoy me. Zoomed-in graphs of narrow ranges of the y-axis are used to draw dramatic conclusions about … whatever you want. This week, it’s the stock market. Next week, it’s global warming skeptics looking at little spikes on a 10-year temperature plot instead of big trends on a 150-year one. The week after, it will be inequality data. Here is one from Piketty and Saez, which tracks wealth gains for the rich against everyone else. Their conclusion might be accurate but the plot is useless because it is scaled to intervals of $5 million. So even if the bottom 90% were doing better, even if their income was doubling, it wouldn’t show up on the graph.

    Halloween Linkorama

    Three stories today:

  • Bill James once said that, when politics is functioning well, elections should have razor thin margins. The reason is that the parties will align themselves to best exploit divisions in the electorate. If one party is only getting 40% of the vote, they will quickly re-align to get higher vote totals. The other party will respond and they will reach a natural equilibrium near 50% I think that is the missing key to understanding why so many governments are divided. The Information Age has not only given political parties more information to align themselves with the electorate, it has made the electorate more responsive. The South was utterly loyal the Democrats for 120 years. Nowadays, that kind of political loyalty is fading.
  • I love this piece about how an accepted piece of sociology turned out to be complete gobbledygook.
  • Speaking of gobbledygook, here is a review of the article about men ogling women. It sounds like the authors misquoted their own study.
  • Mathematical Malpractice: Food Stamps

    I’m sorry, but I’m going to have to call out my favorite website again.

    One of the things that drives budget hawks nuts is baseline spending. In baseline spending, government program X is projected to grow in the future and any slice of that growth that is removed by budget-cutters is called a “cut” even though it really isn’t.

    Let’s say you have a government program that pays people to think about how wonderful our government is. Call it the Positing Thinking Initiative and fund it at $1 billion. Future spending for PTI will be projected to grow a few percent a year for cost of living, a few percent for increase utilization, etc. so that, in FY 2014, it’s a $1.2 billion program. And by FY2023, it’s a $6 billion program.

    Congress will then “cut” the funding a little bit so that, by FY2023 it’s “only” a $4 billion program. They’ll then claim a few billion in spending cuts and go off for tea and medals.

    This drives budget hawks nuts because it changes the language. It makes spending increases into spending “cuts” and makes actual spending cuts (or just level spending) into “savage brutal cuts”. This one of the reasons the sequester drew as much opposition as opponents thought it would. The sequester actually did cut spending for programs but everyone was so used to the distorted language of Washington that they couldn’t distinguish a real cut from a faux cut.

    So I can understand where Ira Stoll is coming from when he claims that the cuts to the food stamp program aren’t actually cuts. The problem is that he’s not comparing apples to apples:

    The non-partisan Congressional Budget Office estimates that the House bill would spend $725 billion on food stamps over the years 2014 to 2023. The Department of Agriculture’s web site offers a summary of spending on the program that reports spending totaling $461.7 billion over the years 2003 to 2012, a period that included a dramatic economic downturn.

    This is a great example of how and why it is so difficult to cut government spending, and how warped the debate over spending has become. The Republicans want to increase food stamp spending 57 percent. The Democrats had previously planned to increase it by 65 percent (to $764 billion over 10 years instead of the $725 billion in the Republican bill), so they depict the Republicans as “meanspirited class warriors” seeking “deep cuts.”

    Stoll acknowledges the economic downturn but ignores that the time period he’s talking about includes five years of non-downturn time. Food stamp spending tracks unemployment; the economy is the biggest reason food stamp spending has exploded in recent years. So this isn’t really a spending “hike” so much as the CBO estimating that unemployment will be a bigger problem in the next decade than it was in the last one.

    Here is the CBOs report. Pay particular attention to Figure 2, which clearly shows that food stamp spending will decline every year for the next decade (a little more sharply in inflation-adjusted terms). It will be a very long time before it is back to pre-recessionary levels, but it is, in fact, declining, even in nominal dollars. This isn’t a baseline trick; this is an actual decline.

    Spending (mostly for benefits and administrative costs) on SNAP in 2022 will be about $73 billion, CBO projects. In inflation-adjusted dollars, spending in 2022 is projected to be about 23 percent less than it was in 2011 but still about 60 percent higher than it was in 2007.

    In fact, long-term projections of food stamp spending are very problematic since they depend heavily on the state of the economy. If the economy is better than the CBO anticipates, food stamp spending could be down to pre-recession levels by the end of the decade.

    So with a program like food stamps, you really can’t play with decade-long projections like Stoll. That’s mathematical malpractice: comparing two completely different sets of budgets. CBO does decade-long projections because they are obligated to. But the only thing you can really judge is year-to-year spending.

    Food stamp spending in FY2012 was $78 billion. FY2014 spending, under the Republican bill, will be lower than that (how much lower is difficult to pin down).

    That’s a cut, not an increase. Even by Washington standards.

    Mathematical Malpractice Watch: Cherry-Picking

    Probably one of the most frustrating mathematical practices is the tendency of politicos to cherry-pick data: only take the data points that are favorable to their point of view and ignore all the others. I’ve talked about this before but two stories circling the drain of the blogosphere illustrated this practice perfectly.

    The first is on the subject of global warming. Global warming skeptics have recently been crowing about two pieces of data that supposedly contradict the theory of global warming: a slow-down in temperature rise over the last decade and a “60% recovery” in Arctic sea ice.

    The Guardian, with two really nice animated gifs, show clearly why these claims are lacking. Sea ice levels vary from year to year. The long-term trend, however, has been a dramatic fall with current sea ice levels being a third of what they were a few decades ago (and that’s just area: in terms of volume it’s much worse with sea ice levels being a fifth of what they were). The 60% uptick is mainly because ice levels were so absurdly low last year that the natural year-to-year variation is equal to almost half the total area of ice. In other words, the variation in yearly sea levels has not changed — the baseline has shrunk so dramatically that the variations look big in comparison. This could easily — and likely will — be matched by a 60% decline. Of course, that decline will be ignored by the very people hyping the “recovery”.

    Temperature does the same thing. If you look at the second gif, you’ll see the steady rise in temperature over the last 40 years. But, like sea ice levels, planetary temperatures vary from year to year. The rise is not perfect. But each time it levels or even falls a little, the skeptics ignore forty years worth of data.

    (That having been said, temperatures have been rising much slower for the last decade than they were for the previous three. A number of climate scientists now think we have overestimated climate sensitivity).

    But lest you think this sort of thing is only confined to the Right …

    Many people are tweeting and linking this article which claims that Louis Gohmert spouted 12 lies about Obamacare in two minutes. Some of the things Gohmert said were not true. But other were and still others can not really be assessed at this stage. To take on the lies one-by-one:

    Was Obamacare passed against the will of the people?

    Nope. It was passed by a president who won the largest landslide in two decades and a Democratic House and Senate with huge majorities. It was passed with more support than the Bush tax cuts and Medicare Part D, both of which were entirely unfunded. And the law had a mostly favorable perception in 2010 before Republicans spent hundreds of millions of dollars spreading misinformation about it.

    The first bits of that are true but somewhat irrelevant: the Iraq War had massive support at first, but became very unpopular. The second is cherry-picked. Here is the Kaiser Foundation’s tracking poll on Obamacare (panel 6). Obamacare barely crested 50% support for a brief period, well within the noise. Since then, it has had higher unfavorables. If anything, those unfavorables have actually fallen slightly, not risen in response to “Republican lies”.

    Supporters of the law have devised a catch-22 on the PPACA: if support falls, it’s because of Republican money; if it rises it’s because people are learning to love the law. But the idea that there could be opposition to it? Perish the thought!

    Is Obamacare still against the will of American people?

    Actually, most Americans want it implemented. Only 6 percent said they wanted to defund or delay it in a recent poll.

    That is extremely deceptive. Here is the poll. Only 6% want to delay or defund the law because 30% want it completely repealed. Another 31% think it needs to be improved. Only 33% think the law should be allowed to take effect or be expanded.

    (That 6% should really jump out at you since it’s completely at variance with any political reality. The second I saw it, I knew it was garbage. Maybe they should have focus-group-tested it first to come up with some piece of bullshit that was at least believable.)

    Of the remaining questions, many are judgement calls on things that have yet to happen. National Memo asserts that Obamacare does not take away your decisions about health care, does not put the government between you and your doctor and will not keep seniors from getting the services they need. All of these are judgement calls about things that have yet to happen. There are numerous people — people who are not batshit crazy like Gohmert — who think that Obamacare and especially the IPAB will eventually create government interference in healthcare. Gohmert might be wrong about this. But to call it a lie when someone makes a prediction about what will happen is absurd. Let’s imagine this playing out in 2002:

    We rate Senator Liberal’s claim that we will be in Iraq for a decade and it will cost 5000 lives and $800 billion to be a lie. The Bush Administration has claimed that US troops will be on the ground for only a few years and expect less than a thousand casualties and about $2 billion per month. In fact, some experts predict it will pay for itself.

    See what I did there?

    Obamacare is a big law with a lot of moving parts. There are claims about how it is going to work but we won’t really know for a long time. Maybe the government won’t interfere with your health care. But that’s a big maybe to bet trillions of dollars on.

    The article correctly notes that the government will not have access to medical records. But then it is asserts that any information will be safe. This point was overtaken by events this week when an Obamacare site leaked 2400 Social Security numbers.

    See what I mean about “fact-checking” things that have yet to happen?

    Then there’s this:

    Under Obamacare, will young people be saddled with the cost of everybody else?

    No. Thanks to the coverage for students, tax credits, Medicaid expansion and the fact that most young people don’t earn that much, most young people won’t be paying anything or very much for health care. And nearly everyone in their twenties will see premiums far less than people in their 40s and 50s. If you’re young, out of school and earning more than 400 percent of the poverty level, you may be paying a bit more, but for better insurance.

    This is incorrect. Many young people are being coerced into buying insurance that they wouldn’t have before. As Avik Roy has pointed out, cheap high-deductible plans have been effectively outlawed. Many college and universities are seeing astronomical rises in health insurance premiums, including my own. The explosion of invasive wellness programs, like UVAs, has been explicitly tied to the PPACA. Gohmert is absolutely right on this one.

    The entire point of Obamacare was to get healthy people to buy insurance so that sick people could get more affordable insurance. That is how this whole thing works. It’s too late to back away from that reality now.

    Does Obamacare prevent the free exercise of your religious beliefs?

    No. But it does stop you from forcing your beliefs on others. Employers that provide insurance have to offer policies that provide birth control to women. Religious organizations have been exempted from paying for this coverage but no one will ever be required to take birth control if their religion restricts it — they just can’t keep people from having access to this crucial, cost-saving medication for free.

    This is a matter of philosophy. Many liberals think that if an employer will not provide birth control coverage to his employees, he is “forcing” his religious views upon them (these liberals being under the impression that free birth control pills are a right). I, like many libertarians and conservatives (and independents), see it differently: that forcing someone to pay for something with which they have a moral qualm is violating their religious freedom. The Courts have yet to decide on this.

    I am reluctant to call something a “lie” when it’s a difference of opinion. Our government has made numerous allowance for religious beliefs in the past, including exemptions from vaccinations, the draft, taxes and anti-discrimination laws. We are still having a debate over how this applies to healthcare. Sorry, National Memo, that debate isn’t over yet.

    So let’s review. Of Gohmert’s 12 “lies”, the breakdown is like so:

    Lies: 4
    Debatable or TBD: 5
    Correct: 3
    Redundant: 1

    (You’ll note that’s 13 “lies”; apparently National Memo can’t count).

    So 4 only out of 13 are lies. Hey, even Ty Cobb only hit .366

    Mathematical Malpractice: Focus Tested Numbers

    One of the things I keep encountering in news, culture and politics are numbers that appear to be pulled out of thin air. Concrete numbers, based on actual data, are dangerous enough in the wrong hands. But when data get scarce, this doesn’t seem to intimidate advocates and some social scientists. They will simply commission a “study” that produces, in essence, any number they want.

    What is striking is that the numbers seem to be selected with the diligent care and skill that the methods lack.

    The first time I became aware of this was with Bill Clinton. According to his critics — and I can’t find a link on this so it’s possibly apocryphal — when Bill Clinton initiated competency tests for Arkansas teachers, a massive fraction failed. He knew the union would blow their stack if the true numbers were released so he had focus groups convened to figure out what percentage of failures was expected, then had the test curved so that the results met the expectation.

    As I said, I can’t find a reference for that. I seem to remember hearing it from Limbaugh, so it may be a garbled version (I can find lawsuits about race discrimination with the testing, so it’s possible a mangled version of that). But the story struck me to the point where I remember it twenty years later. And the reason it struck is because:

  • It sounds like the sort of thing politicians and political activists would do.
  • It would be amazingly easy to do.
  • Our media are so lazy that you could probably get away with it.
  • Since then, I’ve seen other numbers which I call “focus tested numbers” even tough they may not have been run by focus groups. But they cross me as numbers derived by someone coming up with the number first and then devising the methodology second. They first part is the critical one. Whatever the issue is, you have to come with a number that is plausible and alarming without being ridiculous. Then you figure out the methods to get the number.

    Let’s just take an example. The first time I became aware of the work of Maggie McNeill was her thorough debunking of the claim that 200,000 underage girls are trafficked for sex in the United States. You should read that article, which comes to an estimate of about 15,000 total underage prostitutes (most which are 16 or 17) and only a few hundred to a few thousand that are trafficked in any meaningful sense of that word. That does not make the problem less important, but it does make it less panic-inducing.

    But the 200,000 number jumped out at me. Here’s my very first comment on Maggie’s blog and her response:

    Me: Does anyone know where the 100,000 estimate comes from? What research it’s based on?

    It’s so close to 1% [of total underage girls] that I suspect it may be as simple as that. We saw a similar thing in the 1980′s when Mitch Snyder claimed (and the media mindlessly repeated) that three million Americans were homeless (5-10 times the estimates from people who’d done their homework). It turned out the entire basis of that claim was that three million was 1% of the population.

    This is typical of the media. The most hysterical claim gets the most attention. If ten researchers estimates there are maybe 20,000 underage prostitutes and one big-mouth estimates there are 300,000, guess who gets a guest spot on CNN?

    —–

    Maggie: Honestly, I think 100,000 is just a good large number which sounds impressive and is too large for most people to really comprehend as a whole. The 300,000 figure appears to be a modification of a figure from a government report which claimed that something like 287,000 minors were “at risk” from “sexual exploitation” (though neither term was clearly defined and no study was produced to justify the wild-ass guess). It’s like that game “gossip” we played as children; 287,000 becomes 300,000, “at risk” becomes “currently involved” and “sexual exploitation” becomes “sex trafficking”. 🙁

    The study claimed that 100-300,000 girls were “at risk” of exploitation but defined “at risk” so loosely that simply living near a border put someone at risk. With such methods, the authors could basically claim any number they wanted. After reading that analysis and picking my jaw up off of the floor, I wondered why anyone would do it that way.

    And then it struck me: because the method wasn’t the point; the result was. Even the result wasn’t the point; the issue they wanted to advocate was. The care was not in the method: it was in the number. If they had said that there were a couple of thousand underage children in danger, people would have said, “Oh, OK. That sounds like something we can deal with using existing policies and smarter policing.” Or even worse, they might have said, “Well, why don’t we legalize sex work for adults and concentrate on saving these children?” If they had claimed a million children were in danger, people would have laughed. But claim 100-300,000? That’s enough to alarm people into action without making them laugh. It’s in the sweet spot between the “Oh, is that all?” number of a couple thousand and the “Oh, that’s bullshit” number of a million.

    Another great example was the number SOPA supporters bruted about to support their vile legislation. Julian Sanchez details the mathematical malpractice here. At first, they claimed that $250 billion was lost to piracy every year. That number — based on complete garbage — was so ridiculous they had to revise it down to $58 billion. Again, notice how well-picked that number is. At $250 billion, people laughed. If they had gone with a more realistic estimate — a few billion, most likely — no one would have supported such draconian legislation. But $58 billion? That’s enough to alarm people, not enough to make them laugh and — most importantly — not enough to make the media do their damn job and check it out.

    I encountered it again today. The EU is proposing to put speed limiters on cars. Their claim is this will cut traffic deaths by a third. Now, we actually do have some data on this. When the national speed limit was introduced in America, traffic fatalities initially fell about 20%, but then slowly returned to normal. They began falling again, bumped up a bit when Congress loosened the law, then leveled out in the 90’s and early 00’s after Congress completely repealed the national speed limit. The fatality rate has plunged over the last few years and is currently 40% below the 1970’s peak — without a speed limit.

    That’s just raw numbers, of course. In real terms — per million vehicle miles driven — fatalities have plunged almost 75% of the last forty years, with no effect of the speed limit law. Of course, more cars contain single drivers than ever before. But even on a per capita basis, car fatalities are half of what they once were.

    That’s real measurable progress. Unfortunately for the speed limiters, it’s result of improved technology and better enforcement of drunk driving laws.

    So the claim that deaths from road accidents will plunge by a third because of speed limits is simply not supported by data in the United States. They might plunge as technology, better roads and laws against drunk driving spread to Eastern Europe. And I’m sure one of the reasons they are pushing for speed limits is that they can claim credit for that inevitable improvement. But a one-third decline is just not realistic.

    No, I suspect that this is a focus tested number. If they claimed fatalities would plunge by half, people would laugh. If they claimed 1-2%, no one would care. But one-third? That’s in the sweet spot.

    August Linkorama

    Time to clear out a few things I don’t have time to write lengthy posts about.

  • I’m tickled that Netflix garnered Emmy nominations. Notice that none of the nominated dramas are from the major networks. Their reign of terror is ending.
  • This look at Stand Your Ground laws look state by state to see if murder rates went up. I find this far more convincing than the confusing principle component analysis being cited. Also, check out this analysis of the complicated relationship these laws have with race.
  • Speaking of guns, we have yet another case of Mathematical Malpractice. Business Insider claims California’s gun laws have dramatically dropped the rate of gun violence. But their lead graphic shows California’s rate of gun violence has fallen … about as much as the rest of the country’s.