Linkoramas are getting rarer these days mostly because I tweet most articles. But I will still be occasionally posting something more long-form.
Linkoramas are getting rarer these days mostly because I tweet most articles. But I will still be occasionally posting something more long-form.
Three rather ugly instances of mathematical malpractice have caught my attention in the last month. Let’s check them out.
The Death of Facebook or How to Have Fun With Out of Sample Data
Last month, Princeton researchers came out with the rather spectacular claim that the social network Facebook would be basically dead within a few years. The quick version is that they fit an epidemiological model to the rise and fall of MySpace. They then used that same model, varying the parameters, to fit Google trends on searches for Facebook. They concluded that Facebook would lose 80% of its customers by 2017.
This was obviously nonsese as detailed here and here. It suffered from many flaws, notably assuming that the rise and fall of MySpace was necessarily a model for all social networks and the dubious method of using Google searches instead of publicly available traffic data as their metric.
But there was a deeper flaw. The authors fit a model of a sharp rise and fall. They then proclaim that this model works because Facebook’s google data follows the first half of that trend and a little bit of the second. But while the decline in Facebook Google searches is consistent with their model, it is also consistent with hundreds of others. It would be perfectly consistent with a model that predicts a sharp rise and then a leveling off as the social network saturates. Their data are consistent with but not discriminating against just about any model.
The critical part of the data — the predicted sharp fall in Facebook traffic — is out of sample (meaning it hasn’t happened yet). But based on a tiny sliver of data, they have drawn a gigantic conclusion. It’s Mark Twain and the length of the Mississippi River all over again.
We see this a lot in science, unfortunately. Global warming models often predict very sharp rises in temperature — out of sample. Models of the stock market predict crashes or runs — out of sample. Sports twerps put together models that predict Derek Jeter will get 4000 hits — out of sample.
Anyone who does data fitting for a living knows this danger. The other day, I fit a light curve to a variable star. Because of an odd intersection of Fourier parameters, the model predicted a huge rise in brightness in the middle of its decay phase because there were no data to constrain it there. So it fit a small uptick in the decay phase as though it were the small beginning of a massive re-brightening.
The more complicated the model, the more danger there is of drawing massive conclusions from tiny amounts of data or small trends. If the model is anything other than a straight line, be very very wary at out-of-sample predictions, especially when they are predicting order-of-magnitude changes.
A Rape Epidemic or How to Reframe Data:
The CDC recently released a study that claimed that 1.3 million women were raped and 12.6 million more were subject to sexual violence in 2010. This is six or more times the estimates of the FBI’s extremely rigorous NCVS estimate. Christina Hoff Summers has a breakdown of why the number is so massive:
It found them by defining sexual violence in impossibly elastic ways and then letting the surveyors, rather than subjects, determine what counted as an assault. Consider: In a telephone survey with a 30 percent response rate, interviewers did not ask participants whether they had been raped. Instead of such straightforward questions, the CDC researchers described a series of sexual encounters and then they determined whether the responses indicated sexual violation. A sample of 9,086 women was asked, for example, “When you were drunk, high, drugged, or passed out and unable to consent, how many people ever had vaginal sex with you?” A majority of the 1.3 million women (61.5 percent) the CDC projected as rape victims in 2010 experienced this sort of “alcohol or drug facilitated penetration.”
What does that mean? If a woman was unconscious or severely incapacitated, everyone would call it rape. But what about sex while inebriated? Few people would say that intoxicated sex alone constitutes rape — indeed, a nontrivial percentage of all customary sexual intercourse, including marital intercourse, probably falls under that definition (and is therefore criminal according to the CDC).
Other survey questions were equally ambiguous. Participants were asked if they had ever had sex because someone pressured them by “telling you lies, making promises about the future they knew were untrue?” All affirmative answers were counted as “sexual violence.” Anyone who consented to sex because a suitor wore her or him down by “repeatedly asking” or “showing they were unhappy” was similarly classified as a victim of violence. The CDC effectively set a stage where each step of physical intimacy required a notarized testament of sober consent.
In short, they did what is called “reframing”. They took someone’s experiences, threw away that person’s definition of them and substituted their own definition.
This isn’t the first time this has happened with rape stats nor the first time Summers had uncovered this sort of reframing. Here is an account of how researchers decided that women who didn’t think they had been raped were, in fact, raped, so they could claim a victimization rate of one in four.
Scientists have to classify things all the time based on a variety of criteria. The universe is a messy continuum; to understand it, we have to sort things into boxes. I classify stars for a living based on certain characteristics. The problem with doing that here is that women are not inanimate objects. Nor are they lab animals. They can have opinions of their own about what happened to them.
I understand that some victims may reframe their experiences to try to lessen the trauma of what happened to them. I understand that a woman can be raped but convince herself it was a misunderstanding or that it was somehow her fault. But to a priori reframe any woman’s experience is to treat them like lab rats, not human beings capable of making judgements of their own.
But it also illustrates a mathematical malpractice problem: changing definitions. This is how 10,000 underage prostitutes in the United States becomes 200,000 girls “at risk”. This is how small changes in drug use stats become an “epidemic”. If you dig deep into the studies, you will find the truth. But the banner headline — the one the media talk about — is hopelessly and deliberately muddled.
Sometimes you have to change definitions. The FBI changed their NCVS methodology a few years ago on rape statistics and saw a significant increase in their estimates. But it’s one thing to hone; it’s another to completely redefine.
(The CDC, as my friend Kevin Wilson pointed out, mostly does outstanding work. But they have a tendency to jump with both feet into moral panics. In this case, it’s the current debate about rape culture. Ten years ago, it was obesity. They put out a deeply flawed study that overestimated obesity deaths by a factor of 14. They quickly admitted their screwup but … guess which number has been quoted for the last decade on obesity policy?)
You might ask why I’m on about this. Surely any number of rapes is too many. The reason I wanted to talk about this, apart from my hatred of bogus studies, is that data influences policy. If you claim that 1.3 million women are being raped every year, that’s going to result in a set of policy decisions that are likely to be very damaging and do very little to address the real problem.
If you want a stat that means something, try this one: the incidence of sexual violence has fallen 85% over the last 30 years. That is from the FBI’s NCVS data so even if they are over- or under-estimating the amount of sexual violence, the differential is meaningful. That data tells you something useful: that whatever we are doing to fight rape culture, it is working. Greater awareness, pushing back against blaming the victim, changes to federal and state laws, changes to the emphasis of attorneys general’s offices and the rise of internet pornography have all been cited as contributors to this trend.
That’s why it’s important to push back against bogus stats on rape. Because they conceal the most important stat; the one that is the most useful guide for future policy and points the way toward ending rape culture.
The Pending Crash or How to Play with Scales:
Yesterday morning, I saw a chart claiming that the recent stock market trends are an eerie parallel of the run-up to the 1929 crash. I was immediately suspicious because, even if the data were accurate, we see this sort of crap all the time. There are a million people who have made a million bucks on Wall Street claiming to pattern match trends in the stock market. They make huge predictions, just like the Facebook study above. And those predictions are always wrong. Because, again, the out of sample data contains the real leverage.
This graph is even worse than that, though. As Quartz points out, the graph makers used two different y-axes. In one, the the 1928-29 rise of the stock market was a near doubling. In the other, the 2013-4 rise was an increase of about 25%. When you scale them appropriately, the similarity vanishes. Or, alternatively, the pending “crash” would be just an erasure of that 25% gain.
I’ve seen this quite a bit and it’s beginning to annoy me. Zoomed-in graphs of narrow ranges of the y-axis are used to draw dramatic conclusions about … whatever you want. This week, it’s the stock market. Next week, it’s global warming skeptics looking at little spikes on a 10-year temperature plot instead of big trends on a 150-year one. The week after, it will be inequality data. Here is one from Piketty and Saez, which tracks wealth gains for the rich against everyone else. Their conclusion might be accurate but the plot is useless because it is scaled to intervals of $5 million. So even if the bottom 90% were doing better, even if their income was doubling, it wouldn’t show up on the graph.
I’m sorry, but I’m going to have to call out my favorite website again.
One of the things that drives budget hawks nuts is baseline spending. In baseline spending, government program X is projected to grow in the future and any slice of that growth that is removed by budget-cutters is called a “cut” even though it really isn’t.
Let’s say you have a government program that pays people to think about how wonderful our government is. Call it the Positing Thinking Initiative and fund it at $1 billion. Future spending for PTI will be projected to grow a few percent a year for cost of living, a few percent for increase utilization, etc. so that, in FY 2014, it’s a $1.2 billion program. And by FY2023, it’s a $6 billion program.
Congress will then “cut” the funding a little bit so that, by FY2023 it’s “only” a $4 billion program. They’ll then claim a few billion in spending cuts and go off for tea and medals.
This drives budget hawks nuts because it changes the language. It makes spending increases into spending “cuts” and makes actual spending cuts (or just level spending) into “savage brutal cuts”. This one of the reasons the sequester drew as much opposition as opponents thought it would. The sequester actually did cut spending for programs but everyone was so used to the distorted language of Washington that they couldn’t distinguish a real cut from a faux cut.
So I can understand where Ira Stoll is coming from when he claims that the cuts to the food stamp program aren’t actually cuts. The problem is that he’s not comparing apples to apples:
The non-partisan Congressional Budget Office estimates that the House bill would spend $725 billion on food stamps over the years 2014 to 2023. The Department of Agriculture’s web site offers a summary of spending on the program that reports spending totaling $461.7 billion over the years 2003 to 2012, a period that included a dramatic economic downturn.
This is a great example of how and why it is so difficult to cut government spending, and how warped the debate over spending has become. The Republicans want to increase food stamp spending 57 percent. The Democrats had previously planned to increase it by 65 percent (to $764 billion over 10 years instead of the $725 billion in the Republican bill), so they depict the Republicans as “meanspirited class warriors” seeking “deep cuts.”
Stoll acknowledges the economic downturn but ignores that the time period he’s talking about includes five years of non-downturn time. Food stamp spending tracks unemployment; the economy is the biggest reason food stamp spending has exploded in recent years. So this isn’t really a spending “hike” so much as the CBO estimating that unemployment will be a bigger problem in the next decade than it was in the last one.
Here is the CBOs report. Pay particular attention to Figure 2, which clearly shows that food stamp spending will decline every year for the next decade (a little more sharply in inflation-adjusted terms). It will be a very long time before it is back to pre-recessionary levels, but it is, in fact, declining, even in nominal dollars. This isn’t a baseline trick; this is an actual decline.
Spending (mostly for benefits and administrative costs) on SNAP in 2022 will be about $73 billion, CBO projects. In inflation-adjusted dollars, spending in 2022 is projected to be about 23 percent less than it was in 2011 but still about 60 percent higher than it was in 2007.
In fact, long-term projections of food stamp spending are very problematic since they depend heavily on the state of the economy. If the economy is better than the CBO anticipates, food stamp spending could be down to pre-recession levels by the end of the decade.
So with a program like food stamps, you really can’t play with decade-long projections like Stoll. That’s mathematical malpractice: comparing two completely different sets of budgets. CBO does decade-long projections because they are obligated to. But the only thing you can really judge is year-to-year spending.
Food stamp spending in FY2012 was $78 billion. FY2014 spending, under the Republican bill, will be lower than that (how much lower is difficult to pin down).
That’s a cut, not an increase. Even by Washington standards.
Probably one of the most frustrating mathematical practices is the tendency of politicos to cherry-pick data: only take the data points that are favorable to their point of view and ignore all the others. I’ve talked about this before but two stories circling the drain of the blogosphere illustrated this practice perfectly.
The first is on the subject of global warming. Global warming skeptics have recently been crowing about two pieces of data that supposedly contradict the theory of global warming: a slow-down in temperature rise over the last decade and a “60% recovery” in Arctic sea ice.
The Guardian, with two really nice animated gifs, show clearly why these claims are lacking. Sea ice levels vary from year to year. The long-term trend, however, has been a dramatic fall with current sea ice levels being a third of what they were a few decades ago (and that’s just area: in terms of volume it’s much worse with sea ice levels being a fifth of what they were). The 60% uptick is mainly because ice levels were so absurdly low last year that the natural year-to-year variation is equal to almost half the total area of ice. In other words, the variation in yearly sea levels has not changed — the baseline has shrunk so dramatically that the variations look big in comparison. This could easily — and likely will — be matched by a 60% decline. Of course, that decline will be ignored by the very people hyping the “recovery”.
Temperature does the same thing. If you look at the second gif, you’ll see the steady rise in temperature over the last 40 years. But, like sea ice levels, planetary temperatures vary from year to year. The rise is not perfect. But each time it levels or even falls a little, the skeptics ignore forty years worth of data.
(That having been said, temperatures have been rising much slower for the last decade than they were for the previous three. A number of climate scientists now think we have overestimated climate sensitivity).
But lest you think this sort of thing is only confined to the Right …
Many people are tweeting and linking this article which claims that Louis Gohmert spouted 12 lies about Obamacare in two minutes. Some of the things Gohmert said were not true. But other were and still others can not really be assessed at this stage. To take on the lies one-by-one:
Was Obamacare passed against the will of the people?
Nope. It was passed by a president who won the largest landslide in two decades and a Democratic House and Senate with huge majorities. It was passed with more support than the Bush tax cuts and Medicare Part D, both of which were entirely unfunded. And the law had a mostly favorable perception in 2010 before Republicans spent hundreds of millions of dollars spreading misinformation about it.
The first bits of that are true but somewhat irrelevant: the Iraq War had massive support at first, but became very unpopular. The second is cherry-picked. Here is the Kaiser Foundation’s tracking poll on Obamacare (panel 6). Obamacare barely crested 50% support for a brief period, well within the noise. Since then, it has had higher unfavorables. If anything, those unfavorables have actually fallen slightly, not risen in response to “Republican lies”.
Supporters of the law have devised a catch-22 on the PPACA: if support falls, it’s because of Republican money; if it rises it’s because people are learning to love the law. But the idea that there could be opposition to it? Perish the thought!
Is Obamacare still against the will of American people?
Actually, most Americans want it implemented. Only 6 percent said they wanted to defund or delay it in a recent poll.
That is extremely deceptive. Here is the poll. Only 6% want to delay or defund the law because 30% want it completely repealed. Another 31% think it needs to be improved. Only 33% think the law should be allowed to take effect or be expanded.
(That 6% should really jump out at you since it’s completely at variance with any political reality. The second I saw it, I knew it was garbage. Maybe they should have focus-group-tested it first to come up with some piece of bullshit that was at least believable.)
Of the remaining questions, many are judgement calls on things that have yet to happen. National Memo asserts that Obamacare does not take away your decisions about health care, does not put the government between you and your doctor and will not keep seniors from getting the services they need. All of these are judgement calls about things that have yet to happen. There are numerous people — people who are not batshit crazy like Gohmert — who think that Obamacare and especially the IPAB will eventually create government interference in healthcare. Gohmert might be wrong about this. But to call it a lie when someone makes a prediction about what will happen is absurd. Let’s imagine this playing out in 2002:
We rate Senator Liberal’s claim that we will be in Iraq for a decade and it will cost 5000 lives and $800 billion to be a lie. The Bush Administration has claimed that US troops will be on the ground for only a few years and expect less than a thousand casualties and about $2 billion per month. In fact, some experts predict it will pay for itself.
See what I did there?
Obamacare is a big law with a lot of moving parts. There are claims about how it is going to work but we won’t really know for a long time. Maybe the government won’t interfere with your health care. But that’s a big maybe to bet trillions of dollars on.
The article correctly notes that the government will not have access to medical records. But then it is asserts that any information will be safe. This point was overtaken by events this week when an Obamacare site leaked 2400 Social Security numbers.
See what I mean about “fact-checking” things that have yet to happen?
Then there’s this:
Under Obamacare, will young people be saddled with the cost of everybody else?
No. Thanks to the coverage for students, tax credits, Medicaid expansion and the fact that most young people don’t earn that much, most young people won’t be paying anything or very much for health care. And nearly everyone in their twenties will see premiums far less than people in their 40s and 50s. If you’re young, out of school and earning more than 400 percent of the poverty level, you may be paying a bit more, but for better insurance.
This is incorrect. Many young people are being coerced into buying insurance that they wouldn’t have before. As Avik Roy has pointed out, cheap high-deductible plans have been effectively outlawed. Many college and universities are seeing astronomical rises in health insurance premiums, including my own. The explosion of invasive wellness programs, like UVAs, has been explicitly tied to the PPACA. Gohmert is absolutely right on this one.
The entire point of Obamacare was to get healthy people to buy insurance so that sick people could get more affordable insurance. That is how this whole thing works. It’s too late to back away from that reality now.
Does Obamacare prevent the free exercise of your religious beliefs?
No. But it does stop you from forcing your beliefs on others. Employers that provide insurance have to offer policies that provide birth control to women. Religious organizations have been exempted from paying for this coverage but no one will ever be required to take birth control if their religion restricts it — they just can’t keep people from having access to this crucial, cost-saving medication for free.
This is a matter of philosophy. Many liberals think that if an employer will not provide birth control coverage to his employees, he is “forcing” his religious views upon them (these liberals being under the impression that free birth control pills are a right). I, like many libertarians and conservatives (and independents), see it differently: that forcing someone to pay for something with which they have a moral qualm is violating their religious freedom. The Courts have yet to decide on this.
I am reluctant to call something a “lie” when it’s a difference of opinion. Our government has made numerous allowance for religious beliefs in the past, including exemptions from vaccinations, the draft, taxes and anti-discrimination laws. We are still having a debate over how this applies to healthcare. Sorry, National Memo, that debate isn’t over yet.
So let’s review. Of Gohmert’s 12 “lies”, the breakdown is like so:
Debatable or TBD: 5
(You’ll note that’s 13 “lies”; apparently National Memo can’t count).
So 4 only out of 13 are lies. Hey, even Ty Cobb only hit .366
Time to clear out a few things I don’t have time to write lengthy posts about.
You know, you could probably cut out a career in responding to Mother Jones twisting and distorting of data from gun deaths. Today has another wonderful example. Hopping on the rather hysterical claim that gun deaths are close to exceeding traffic deaths, they look at it at a state by state level and conclude that “It’s little surprise that many of these states—including Alaska, Arizona, Colorado, Indiana, Utah, and Virginia—are notorious for lax gun laws.”
Look at the map. Then look at this one which shows the Brady Campaign’s scorecard for state laws on guns. The states were gun deaths exceed traffic deaths are Alaska (Bradley score 0), Washington (48), Oregon (38), California (81!!), Nevada (5), Utah (0), Arizona (0), Colorado (15), Missouri (4), Illinois (35), Louisiana (2), Michigan (25), Ohio (7) and Virginia (12). Of the 14 states, half have Brady scores over 12 and California has the most restrictive gun laws in the nation.
Going by rate of gun ownership, the states are Alaska (3rd highest gun ownership rate in nation), Washington (33), Oregon (28), California (44), Nevada (38), Utah (16), Arizona (32), Colorado (36), Missouri (15), Illinois (43), Louisiana (13), Michigan (27), Ohio (37) and Virginia (35). In other words, the states where traffic deaths exceed gun death are just as likely to have a low gun ownership rate as a high one.
Moreover, the entire “guns are killing more than cars” meme is garbage to begin with. Gun deaths, as I have said in every single post on this subject, have fallen over the last twenty years. The thing is that traffic deaths have fallen even faster. The gun grabbers might have had a point back in 1991, when we had a spike in gun deaths that caused them to almost exceed traffic deaths. But they don’t now because both rates are down, way down. Traffic fatalities, in particular, plunged dramatically in the mid-00′s.
A real analysis of the data would look at both factors to see if better drunk driving laws or seatbelt laws or whatever are also playing a factor here. But Mother Jones isn’t interested in that (for the moment). What they are interested in is stoking panic about guns.
(Notice also that MJ illustrates their graph with a picture of an assault rifle, even though these are responsibly for a tiny number of gun deaths.)
Oh, no, not you, Best Magazine on the Planet:
The growth of federal regulations over the past six decades has cut U.S. economic growth by an average of 2 percentage points per year, according to a new study in the Journal of Economic Growth. As a result, the average American household receives about $277,000 less annually than it would have gotten in the absence of six decades of accumulated regulations—a median household income of $330,000 instead of the $53,000 we get now.
You know, I hate it when people play games with numbers and I won’t put up with it from my side. I agree with Reason’s general point that we are over-regulated and badly regulated and that it is hurting our economy. Even the most conservative estimates indicate that bad regulation is sucking hundreds of billions out of the economy — and that’s accounting for the positive effects of regulation.
But the claim that we would be four times richer if it weren’t for regulation is garbage. As Bailey notes in the article, the growth in the US economy over the last half century has been about 3.2 percent. Without regulation, according to this study, it would have been 5.2, which is far higher than the US has ever had over any extended period of time, even before the progressive era. And because that wild over-estimate is exponential, it results in an economy that would be four times what we have now; four times what any large country would have now. The hypothetical US would be as wealthy, relative the real US, as the real US is to Serbia. Does anyone really think that without regulation we would be producing four times as much goods and services?
Even if we assume that we could produce an ideally regulated society, regulation is not the only limit on the economy. Other factors — birth rate, immigration, war, business cycles, education, technological progress, social unrest and the economic success of other countries — play a factor. A perfectly regulated society would most likely move from a position where its growth was limited by regulation to a position where its growth was limited by other factors (assuming this is not already the case)
The paper is very long and complicated so I can’t dissect where their economic model goes wrong. But I will point out that no country in history, including the United States, has ever had half a century of 5% economic growth. Even countries with far less regulation and far more economic freedom than we have do not show the kind of explosive growth they project. In the absence of any real-life example showing that regulatory restraint can produce this kind of growth, we can’t accept numbers that are so ridiculous.
Other studies, as Reason notes, estimate the impact of regulation as being something like 10-20% of our economy. That would require that regulation knock down our economic growth by 0.3% per year, which seems much more reasonable.
(H/T: Maggie McNeill, although she might not like where I went with this one.)
Mother Jones, not content with having running one of the more bogus studies on mass shootings (for which they boast about winning an award from Ithaca College), is crowing again about a new study out of Texas State. They claim that the study shows that mass shooting are rising, that available guns are the reason and that civilians never stop shootings.
It’s too bad they didn’t read the paper too carefully. Because it supports none of those conclusions.
So, yeah. They’re still playing with tiny numbers and tiny ranges of data to draw unsupportable conclusions. To be fair, the authors of the study are a bit more circumspect in their analysis, which is focused on training for law enforcement in dealing with active shooter situations. But Mother Jones never feels under any compulsion to question their conclusions.
(H/T: Christopher Mason)
Update: You might wonder why I’m on about this subject. The reason is that I think almost any analysis of mass shootings is deliberately misleading. Over the last twenty years, gun homicides have declined 40% (PDF) and gun violence by 70%. This is the real data. This is what we should be paying attention to. By diverting our attention to these horrific mass killings, Mother Jones and their ilk are focusing on about one one thousandth of the problem of gun violence because that’s the only way they can make it seem that we are in imminent danger.
The thing is, Mother Jones does acknowledge the decline in violence in other contexts, such as claiming that the crackdown on lead has been responsible for the decline in violence. So when it suits them, they’ll freely acknowledge that violent crime has plunged. But when it comes to gun control, they pick a tiny sliver of gun violence to try to pretend that it’s not. And the tell, as I noted before, is that in their gun-control articles, they do not acknowledge the overall decline of violence.
Using a fact when it suits your purposes and ignoring it when it doesn’t is pretty much the definition of hackery.
A few weeks ago Mother Jones, having not learned the lesson of their absurd article claiming mass shootings are on the rise, published a list of 10 Myths about guns and gun control from Dave Gilson. And I’m going to debunk their debunking again because the article represents what I believe is one of the worst sins in the field of Mathematical Malpractice: cherry-picking. As I went through this, it became obvious that MJ was not interested in the facts, really. What was motivating them was the argument. And so they picked any study — no matter how small, how biased or how old — to support their point. They frequently ignore obvious objections and biases. And they sometimes ignore larger more detailed studies in favor of the smaller ones if it will support their contention.
We see this a lot in the punditocracy, unfortunately. As Bill James said, most people use studies the way a drunk uses a lamppost — for support, not illumination. In any sufficiently advanced but difficult field of study, you will find multiple studies examining an issue. Let’s say it’s a supposed connection between watching Glee and having a heart attack. If there is, in reality, no connection between the two, you might find eight studies that show no connection, one that shows an anti-correlation and one that shows a correlation. This is fine. This is science. There are always outlier studies even if all the researchers are completely ethical and honest. The outliers fall away when your interest is the question and you look at all the evidence. But the outliers dominate the discussion from those who have an agenda.
This happens a lot in the gun debate. On both sides, really. But Mother Jones’ article is a particularly putrid example of this because that’s basically all it does: collect the cherry-picked nonsensical studies that support their anti-gun agenda. It’s quite remarkable actually; almost a clinic in how not to do research.
But here’s the one thing that really tips you off. There is one myth that Mother Jones does not debunk. It’s a myth that’s really independent of what you think of gun ownership … unless you’ve already staked part of your reputation and agenda on the myth that gun violence is increasing. In fact, all forms of violent crime have been falling for twenty years. This is, in my mind, the single most important fact in debates over crime and violence and the single most important myth to debunk.
MJ does not address this myth. They don’t even talk about it. That is a huge tell.
This analysis, which claims that the US has more school spree killings than 36 nations combined, is getting a lot of play. It shouldn’t. It is extremely bad mathematical malpractice.
The basic reason it is mathematical malpractice is the same reason the Mother Jones study was: it is difficult to analyze extremely rare events. When you narrow your investigation to events that happen maybe once a decade and are compiled haphazardly, you are simply going to be dominated by small number statistics and selection bias. You can therefore use those numbers to say, basically, anything you want.
Let’s break down just how bad the numbers are being twisted here.
1) The sample ends in 2009. That excludes the recent spate of knife attacks in Chinese schools that have left 21 dead. If you did this analysis a week ago, you would have had to drop China from the right column.
2) The sample excludes acts of terror or war. But if Islamists shoot up a school because they don’t want girls to read, are those kids any less dead? If a drone strike misses its targets and kills a classroom, are those kids less dead? Why must we exclude the Beslan attack that left 186 kids dead?
3) The sample excludes single homicides, which amount to 302 deaths in the United States over the time involved and God knows how many in other countries. So you are literally excluding 90% of the problem and focusing just on a tiny subset of killings.
4) Comparing us to 36 other countries is ridiculous when some of those countries are places like Bosnia-Herzegovina (population 4 million). We have more population, period, then 30 of the countries on that list combined. Also included in that list of countries are England, Scotland and Northern Ireland, which are, technically speaking, not countries.
5) The problem of small number statistics can be best illustrated by playing with the data a bit. If I include the knife attacks and move China into the left column, suddenly China has more violent deaths than 30 other countries. If I move Germany onto the left side, suddenly they have more spree killings than a bunch of other countries. If I define the sample in the 1990′s, suddenly Australia dominates the statistics. You simple can not draw conclusions from samples that are that sensitive to single events.
6) Combining points 4 and 5, if you look at spree killing rates rather than the deliberate mathematical malpractice of comparing absolute numbers, the situation is very different. In 2009, the Winnenden shooting killed 15. Scaled up to the population of the United States, that would be the equivalent of 60 people dead, more than the worst year the United States has ever had. In 1996, 35 people were kill in Port Arthur, Australia. Scaled up to the US population, that would almost equal 500 dead. It is an event that is seared into the memories of Australians. My point is not that these countries are worse than we are. My point is that these are rare and horrible events and you can manipulate the numbers to prove anything you want.
7) The biggest thing missing here is a sense of time. Is the rate of school killings going up or down? The answer, of course, is down. Check out chart one at Ezra Klein’s blog that shows that the rate of assault death has fallen by over half since 1970. Check out the NCES page I link above which shows a significant decline in on-campus homicides, from 40/year in the 90′s to 30/year in the 00′s. That decline is a hundred more kids running in the sunshine. The NCES data, based on a complete sample of over 600 incidents, is useful. This …. isn’t.
I’m not trying to downplay the horror that unfolded on Friday. However, I don’t think any debate can proceed unless we have a good grasp of the problem we are trying to solve. Far too many children are murdered in school in this country — that was as true on Thursday as it is today. But to be useful, the debate needs to be on honest terms. Committing mathematical malpractrice by deceptively comparing the United States to 36 other countries as though there something to be learned from that is not an honest debate and is likely to produce a panicky and ill-considered response.