Posts Tagged ‘Crime’

Mathematical Malpractice Watch: A Trilogy of Error

Wednesday, February 12th, 2014

Three rather ugly instances of mathematical malpractice have caught my attention in the last month. Let’s check them out.

The Death of Facebook or How to Have Fun With Out of Sample Data

Last month, Princeton researchers came out with the rather spectacular claim that the social network Facebook would be basically dead within a few years. The quick version is that they fit an epidemiological model to the rise and fall of MySpace. They then used that same model, varying the parameters, to fit Google trends on searches for Facebook. They concluded that Facebook would lose 80% of its customers by 2017.

This was obviously nonsese as detailed here and here. It suffered from many flaws, notably assuming that the rise and fall of MySpace was necessarily a model for all social networks and the dubious method of using Google searches instead of publicly available traffic data as their metric.

But there was a deeper flaw. The authors fit a model of a sharp rise and fall. They then proclaim that this model works because Facebook’s google data follows the first half of that trend and a little bit of the second. But while the decline in Facebook Google searches is consistent with their model, it is also consistent with hundreds of others. It would be perfectly consistent with a model that predicts a sharp rise and then a leveling off as the social network saturates. Their data are consistent with but not discriminating against just about any model.

The critical part of the data — the predicted sharp fall in Facebook traffic — is out of sample (meaning it hasn’t happened yet). But based on a tiny sliver of data, they have drawn a gigantic conclusion. It’s Mark Twain and the length of the Mississippi River all over again.

We see this a lot in science, unfortunately. Global warming models often predict very sharp rises in temperature — out of sample. Models of the stock market predict crashes or runs — out of sample. Sports twerps put together models that predict Derek Jeter will get 4000 hits — out of sample.

Anyone who does data fitting for a living knows this danger. The other day, I fit a light curve to a variable star. Because of an odd intersection of Fourier parameters, the model predicted a huge rise in brightness in the middle of its decay phase because there were no data to constrain it there. So it fit a small uptick in the decay phase as though it were the small beginning of a massive re-brightening.

The more complicated the model, the more danger there is of drawing massive conclusions from tiny amounts of data or small trends. If the model is anything other than a straight line, be very very wary at out-of-sample predictions, especially when they are predicting order-of-magnitude changes.

A Rape Epidemic or How to Reframe Data:

The CDC recently released a study that claimed that 1.3 million women were raped and 12.6 million more were subject to sexual violence in 2010. This is six or more times the estimates of the FBI’s extremely rigorous NCVS estimate. Christina Hoff Summers has a breakdown of why the number is so massive:

It found them by defining sexual violence in impossibly elastic ways and then letting the surveyors, rather than subjects, determine what counted as an assault. Consider: In a telephone survey with a 30 percent response rate, interviewers did not ask participants whether they had been raped. Instead of such straightforward questions, the CDC researchers described a series of sexual encounters and then they determined whether the responses indicated sexual violation. A sample of 9,086 women was asked, for example, “When you were drunk, high, drugged, or passed out and unable to consent, how many people ever had vaginal sex with you?” A majority of the 1.3 million women (61.5 percent) the CDC projected as rape victims in 2010 experienced this sort of “alcohol or drug facilitated penetration.”

What does that mean? If a woman was unconscious or severely incapacitated, everyone would call it rape. But what about sex while inebriated? Few people would say that intoxicated sex alone constitutes rape — indeed, a nontrivial percentage of all customary sexual intercourse, including marital intercourse, probably falls under that definition (and is therefore criminal according to the CDC).

Other survey questions were equally ambiguous. Participants were asked if they had ever had sex because someone pressured them by “telling you lies, making promises about the future they knew were untrue?” All affirmative answers were counted as “sexual violence.” Anyone who consented to sex because a suitor wore her or him down by “repeatedly asking” or “showing they were unhappy” was similarly classified as a victim of violence. The CDC effectively set a stage where each step of physical intimacy required a notarized testament of sober consent.

In short, they did what is called “reframing”. They took someone’s experiences, threw away that person’s definition of them and substituted their own definition.

This isn’t the first time this has happened with rape stats nor the first time Summers had uncovered this sort of reframing. Here is an account of how researchers decided that women who didn’t think they had been raped were, in fact, raped, so they could claim a victimization rate of one in four.

Scientists have to classify things all the time based on a variety of criteria. The universe is a messy continuum; to understand it, we have to sort things into boxes. I classify stars for a living based on certain characteristics. The problem with doing that here is that women are not inanimate objects. Nor are they lab animals. They can have opinions of their own about what happened to them.

I understand that some victims may reframe their experiences to try to lessen the trauma of what happened to them. I understand that a woman can be raped but convince herself it was a misunderstanding or that it was somehow her fault. But to a priori reframe any woman’s experience is to treat them like lab rats, not human beings capable of making judgements of their own.

But it also illustrates a mathematical malpractice problem: changing definitions. This is how 10,000 underage prostitutes in the United States becomes 200,000 girls “at risk”. This is how small changes in drug use stats become an “epidemic”. If you dig deep into the studies, you will find the truth. But the banner headline — the one the media talk about — is hopelessly and deliberately muddled.

Sometimes you have to change definitions. The FBI changed their NCVS methodology a few years ago on rape statistics and saw a significant increase in their estimates. But it’s one thing to hone; it’s another to completely redefine.

(The CDC, as my friend Kevin Wilson pointed out, mostly does outstanding work. But they have a tendency to jump with both feet into moral panics. In this case, it’s the current debate about rape culture. Ten years ago, it was obesity. They put out a deeply flawed study that overestimated obesity deaths by a factor of 14. They quickly admitted their screwup but … guess which number has been quoted for the last decade on obesity policy?)

You might ask why I’m on about this. Surely any number of rapes is too many. The reason I wanted to talk about this, apart from my hatred of bogus studies, is that data influences policy. If you claim that 1.3 million women are being raped every year, that’s going to result in a set of policy decisions that are likely to be very damaging and do very little to address the real problem.

If you want a stat that means something, try this one: the incidence of sexual violence has fallen 85% over the last 30 years. That is from the FBI’s NCVS data so even if they are over- or under-estimating the amount of sexual violence, the differential is meaningful. That data tells you something useful: that whatever we are doing to fight rape culture, it is working. Greater awareness, pushing back against blaming the victim, changes to federal and state laws, changes to the emphasis of attorneys general’s offices and the rise of internet pornography have all been cited as contributors to this trend.

That’s why it’s important to push back against bogus stats on rape. Because they conceal the most important stat; the one that is the most useful guide for future policy and points the way toward ending rape culture.

The Pending Crash or How to Play with Scales:

Yesterday morning, I saw a chart claiming that the recent stock market trends are an eerie parallel of the run-up to the 1929 crash. I was immediately suspicious because, even if the data were accurate, we see this sort of crap all the time. There are a million people who have made a million bucks on Wall Street claiming to pattern match trends in the stock market. They make huge predictions, just like the Facebook study above. And those predictions are always wrong. Because, again, the out of sample data contains the real leverage.

This graph is even worse than that, though. As Quartz points out, the graph makers used two different y-axes. In one, the the 1928-29 rise of the stock market was a near doubling. In the other, the 2013-4 rise was an increase of about 25%. When you scale them appropriately, the similarity vanishes. Or, alternatively, the pending “crash” would be just an erasure of that 25% gain.

I’ve seen this quite a bit and it’s beginning to annoy me. Zoomed-in graphs of narrow ranges of the y-axis are used to draw dramatic conclusions about … whatever you want. This week, it’s the stock market. Next week, it’s global warming skeptics looking at little spikes on a 10-year temperature plot instead of big trends on a 150-year one. The week after, it will be inequality data. Here is one from Piketty and Saez, which tracks wealth gains for the rich against everyone else. Their conclusion might be accurate but the plot is useless because it is scaled to intervals of $5 million. So even if the bottom 90% were doing better, even if their income was doubling, it wouldn’t show up on the graph.

Mother Jones Again. Actually Texas State

Wednesday, May 22nd, 2013

Mother Jones, not content with having running one of the more bogus studies on mass shootings (for which they boast about winning an award from Ithaca College), is crowing again about a new study out of Texas State. They claim that the study shows that mass shooting are rising, that available guns are the reason and that civilians never stop shootings.

It’s too bad they didn’t read the paper too carefully. Because it supports none of those conclusions.

  • The Texas State study covers only 84 incidents. Their “trend” is that about half of these incident happened in the last two years of the study. That is, again, an awfully small number to be drawing conclusions from.
  • The data are based on Lexis/Nexus searches. That is not nearly as thorough as James Alan Fox‘s use of FBI crime stats and may measure media coverage more than actual events. They seem to have been reasonably thorough but they confirm their data from … other compilations.
  • Their analysis only covers the years 2000-2010. This conveniently leaves out 2011 (which had few incidents) and the entirety of the 80′s and 90′s, when crime rates were nearly twice what they are now. The word for this is “cherry picking”. Consider what their narrow year range means. If the next decade has fewer incidents, the “trend” becomes a spike. Had you done a similar study covering the years 1990-2000, using MJ’s graph, you would have concluded that mass shootings were rising then. But this would have been followed by five years with very few active shooter events. Look at Mother Jones’ graph again. You can see that mass shootings fell dramatically in the early 2000′s, then spiked up again. That looks like noise in a flat trend over a 30-year baseline. But when you analyze it the way the Blair study does, it looks like a trend. You know what this reminds me of? The bad version of global warming skepticism. Global warming “skeptics” will often show temperature graphs that start in 1998 (an unusually warm year) and go the present to claim that there is no global warming. But if you look at the data for the last century, the long-term trend becomes readily apparent. As James Alan Fox has show, the long-term trend is flat. What Mother Jones has done is jump on a study that really wasn’t intended to look at long-term trends and claim it confirms long-term trends.
  • Mother Jones’ says: “The unprecedented spike in these shootings came during the same four-year period, from 2009-12, that saw a wave of nearly 100 state laws making it easier to obtain, carry, and conceal firearms.” They ignore that the wave of gun law liberalization began in the 90′s, before the time span of this study.
  • MJ also notes that only three of the 84 attacks were stopped by the victims using guns. Ignored in their smugness is that a) that’s three times what Mother Jones earlier claimed over a much longer time baseline; b) the number of incidents stopped by the victims was actually 16. Only three used guns.; c) at least 1/3 of the incident happened in schools, were guns are forbidden.
  • So, yeah. They’re still playing with tiny numbers and tiny ranges of data to draw unsupportable conclusions. To be fair, the authors of the study are a bit more circumspect in their analysis, which is focused on training for law enforcement in dealing with active shooter situations. But Mother Jones never feels under any compulsion to question their conclusions.

    (H/T: Christopher Mason)

    Update: You might wonder why I’m on about this subject. The reason is that I think almost any analysis of mass shootings is deliberately misleading. Over the last twenty years, gun homicides have declined 40% (PDF) and gun violence by 70%. This is the real data. This is what we should be paying attention to. By diverting our attention to these horrific mass killings, Mother Jones and their ilk are focusing on about one one thousandth of the problem of gun violence because that’s the only way they can make it seem that we are in imminent danger.

    The thing is, Mother Jones does acknowledge the decline in violence in other contexts, such as claiming that the crackdown on lead has been responsible for the decline in violence. So when it suits them, they’ll freely acknowledge that violent crime has plunged. But when it comes to gun control, they pick a tiny sliver of gun violence to try to pretend that it’s not. And the tell, as I noted before, is that in their gun-control articles, they do not acknowledge the overall decline of violence.

    Using a fact when it suits your purposes and ignoring it when it doesn’t is pretty much the definition of hackery.

    Mathematical Malpractice Watch: Guns

    Saturday, September 29th, 2012

    A few weeks ago, Mother Jones did a timeline of mass shootings in response to the spate of summer shootings. The defined their criteria, listed 61 incidents and pointed out, correctly, that most of them were committed with legal firearms.

    The highlight is a map of mass shootings over the last thirty years. The map has some resemblance to Radley Balko’s famous map of botched law enforcement raids. But the use of a map and dots is where the resemblance ends. Balko was very clear that his list of incidents was not, in any way, definitive. And he did not try to parse his incomplete data to draw sketchy conclusions.

    Mother Jones felt under no such compulsion.

    This week, they’ve published an “analysis” of their data and drawn the conclusion that our society has more guns than ever and, perhaps related, more mass shootings. Below, I’ll detail why I think their “analysis” — and yes, I will keep using quotation marks for this — is useless, uninformative and flat-out wrong.

    (more…)

    The “Liberal” Me

    Tuesday, March 20th, 2012

    Am I a liberal? Have I become one?

    That may seem like a ridiculous question to the three people who read this blog and are, on balance, to the left of me. But it’s been on my mind a bit lately. I am constantly accused of being a RINO or an out-and-out liberal on conservative sites. Friends and family often describe me as “so liberal”. And every time Obama screws up (about once a week), I get a message or an e-mail or a comment asking if I’m happy that I voted for him (which I didn’t; I voted for Barr). The current GOP primary race — in which none of the candidates really appeal to me — has only exacerbated this since I spend most of my time pointing out why each of the candidates is a terrible choice.

    Thinking about it for a while, however, there may be something to the criticism. There are a handful of issues on which I’ve moved “left” in the last decade or so. But I do not see these as some sudden wellspring of liberalism. They are my fundamental conservatism and libertarianism refined. As I become more aware of the complexity and debate over certain issues, I find my libertarian/conservative philosophy leading me to views that I consider to be fundamentally conservative, but are no longer considered dogma by the GOP, least of all their collection of media dog washers.

    (more…)

    Wednesday Linkorama

    Thursday, June 2nd, 2011

    Thanks to Twitter siphoning off my political rants, you’re getting more … non-political links:

  • Cracked debunks the Twitter revolution. I’m forced to mostly agree. Social networking may have played a minor role in the upheavals in the Middle East, at best. But real activism involves risking your life, not turning your Facebook profile green.
  • I really really like this idea of the Billion Price Index as a complement to traditional inflation metrics.
  • Do you know … do either of you have any idea of how fucking glad I am I don’t have a big ass commute anymore? I can’t imagine how I did it for so long.
  • I really hope the anti-homework agenda catches on. What’s being done to kids these days is absurd busy work bullshit.
  • So do you think studies like this will, in any way, slow down those who want to ban fatty foods?
  • Political links:

  • Experts are once again stunned that poverty does not cause crime. They seem to be stunned by this quite a lot.
  • Want to stimulate the economy? Wonder how America can lead the world in innovation again? Repeal SOX.
  • Mathematical Malpractice Watch: Why NationMaster Sucks

    Thursday, April 21st, 2011

    Graphjam ran a graphic today apparently showing all the awful things the US leads the world in.

    It’s crap. It’s clearly produced by someone who spent a few minutes browing nationmaster.com. Nationmaster is convenient but their accuracy is, at best, suspect. There is no uniformity of data and many of the samples are incomplete or old. To be honest, you’re better off going to wikipedia. Much better off.

    But beyond that, they just haven’t thought too much. For example, the graphic has has the US as #1 in crime. This is true, but only because we are a large country and a transparent one. The UK has half as many crimes but a fifth of our population. Germany half as many crimes but a quarter of our population. The crime rate in the US is high but not tops. Same goes with rape, which they have as #1. Scandinavian countries lead the civilized world in that (although likely because they measure their rape stats differently).

    But a lot of this is the nationmaster problem. They have the US as #1 in CO2 emissions. This is actually wrong as China is #1. US emissions have actually been flat over the last few decades. The nationmaster data are 10 years old — way too far out of date. They also have the US as #1 in divorce rate. This is wrong. Russia is #1.

    Teen birth rate? The US is #1 among developed nations. But you have to exclude almost every developing nation in the world to get that ranking. Nationmaster’s data is selective and based on 1994 data. The teen birth rate has plunged since then.

    Heart attacks? I haven’t the faintest clue what they’re showing here. But heart attack survival rates have been growing massively in the US.

    We do lead the world in McDonald’s restaurants and plastic surgery. That tends to come from being the richest country on Earth. We also, unfortunately, lead the world in both prison population and incarceration rate — yet another wonderful effect of our stupid war on drugs.