# How Many Women?

Campus sexual violence continues to be a topic of discussion, as it should be. I have a post going up on the other site about the kangaroo court system that calls itself campus justice.

But in the course of this discussion, a bunch of statistical BS has emerged. This centers on just how common sexual violence is on college campuses, with estimates ranging from the one-in-five stat that has been touted, in various forms, since the 1980’s, to a 0.2 percent rate touted in a recent op-ed.

Let’s tackle that last one first.

According to the FBI “[t]he rate of forcible rapes in 2012 was estimated at 52.9 per 100,000 female inhabitants.”

Assuming that all American women are uniformly at risk, this means the average American woman has a 0.0529 percent chance of being raped each year, or a 99.9471 percent chance of not being raped each year. That means the probability the average American woman is never raped over a 50-year period is 97.4 percent (0.999471 raised to the power 50). Over 4 years of college, it is 99.8 percent.

Thus the probability that an American woman is raped in her lifetime is 2.6 percent and in college 0.2 percent — 5 to 100 times less than the estimates broadcast by the media and public officials.

This estimate is way too low. It is based on taking one number and applying high school math to it. It misses the mark because it uses the wrong numbers and some poor assumptions.

First of all, the FBI’s stats are on documented forcible rape and does not account for under-reporting and does not includes sexual assault. The better comparison is the National Crime Victimization Survey, which estimates about 300,000 rapes or sexual assaults in 2013 for an incidence rate of 1.1 per thousand. But even that number needs some correction because about 2/3 of sexual violence is visited upon women between the ages of 12 and 30 and about a third among college-age women. The NCVS rate indicates about a 10% lifetime risk or about 3% college-age risk for American women. This is lower than the 1-in-5 stat but much higher than 1-in-500.

(*The NCVS survey shows a jump in sexual violence in the 2000’s. That’s not because sexual violence surged; it’s because they changed their methodology, which increased their estimates by about 20%.)

So what about 1-in-5? I’ve talked about this before, but it’s worth going over again: the one-in-five stat is almost certainly a wild overestimate:

The statistic comes from a 2007 Campus Sexual Assault study conducted by the National Institute of Justice, a division of the Justice Department. The researchers made clear that the study consisted of students from just two universities, but some politicians ignored that for their talking point, choosing instead to apply the small sample across all U.S. college campuses.

The CSA study was actually an online survey that took 15 minutes to complete, and the 5,446 undergraduate women who participated were provided a \$10 Amazon gift card. Men participated too, but their answers weren’t included in the one-in-five statistic.

If 5,446 sounds like a high number, it’s not — the researchers acknowledged that it was actually a low response rate.

But a lot of those responses have to do with how the questions were worded. For example, the CSA study asked women whether they had sexual contact with someone while they were “unable to provide consent or stop what was happening because you were passed out, drugged, drunk, incapacitated or asleep?”

The survey also asked the same question “about events that you think (but are not certain) happened.”

That’s open to a lot of interpretation, as exemplified by a 2010 survey conducted by the U.S. Centers for Disease Control and Prevention, which found similar results.

I’ve talked about the CDC study before and its deep flaws. Schow points out that the victimization rate they are claiming is way more than the National Crime Victimization Survey (NCVS), the FBI and the Rape, Abuse and Incest National Network (RAINN) estimates. All three of those agencies use much more rigorous data collection methods. NCVS does interviews and asks the question straight up: have you been raped or sexually assaulted? I would trust the research methods of these agencies, who have been doing this for decades, over a web-survey of two colleges.

Another survey recently emerged from MIT which claimed 1-in-6 women are sexually assaulted. But only does this suffer from the same flaws as the CSA study (a web survey with voluntary participation), it’s not even claiming what it claims:

When it comes to experiences of sexual assault since starting at MIT:

• 1 in 20 female undergraduates, 1 in 100 female graduate students, and zero male students reported being the victim of forced sexual penetration
• 3 percent of female undergraduates, 1 percent of male undergraduates, and 1 percent of female grad students reported being forced to perform oral sex
• 15 percent of female undergraduates, 4 percent of male undergraduates, 4 percent of female graduate students, and 1 percent of male graduate students reported having experienced “unwanted sexual touching or kissing”
• All of these experiences are lumped together under the school’s definition of sexual assault.

When students were asked to define their own experiences, 10 percent of female undergraduates, 2 percent of male undergraduates, three percent of female graduate students, and 1 percent of male graduate students said they had been sexually assaulted since coming to MIT. One percent of female graduate students, one percent of male undergraduates, and 5 percent of female undergraduates said they had been raped.

Note that even with a biased study, the result is 1-in-10, not 1-in-5 or 1-in-6.

OK, so web surveys are a bad way to do this. What is a good way? Mark Perry points out that the one-in-five stat is inconsistent with another number claimed by advocates of new policies: a reporting rate of 12%. If you assume a reporting rate near that and use the actual number of reported assaults on major campuses, you get a rate of around 3%.

Hmmm.

Further research is consistent with this rate. For example, here, we see that UT Austin has 21 reported incidents of sexual violence. That’s one in a thousand enrolled women. Texas A&M reported nine, one in three thousand women. Houston reported 11, one in 2000 women. If we are to believe the 1-in-5 stat, that’s a reporting rate of half a percent. A reporting rate of 10%, which is what most people accept, would mean … a 3-5% risk for five years of enrollment.

So … Mark Perry finds 3%. Texas schools show 3-5%. NCVS and RAINN stats indicate 2-5%. Basically, any time we use actual numbers based on objectives surveys, we find the number of women who are in danger of sexual violence during their time on campus is 1-in-20, not 1-in-5.

One other reason to disbelieve the 1-in-5 stat. Sexual violence in our society is down — way down. According to the Bureau of Justice Statistics, rape has fallen from 2.5 per 1000 to 0.5 per thousand, an 80% decline. The FBI’s data show a decline from 40 to about 25 per hundred thousand, a 40% decline (they don’t account for reporting rate, which is likely to have risen). RAINN estimates that the rate has fallen 50% in just the last twenty years. That means 10 million fewer sexual assaults.

Yet, for some reason, sexual assault rates on campus have not fallen, at least according to the favored research. They were claiming 1-in-5 in the 80’s and they are claiming 1-in-5 now. The sexual violence rate on campus might fall a little more slowly than the overall society because campus populations aren’t aging the way the general population is and sexual violence victims are mostly under 30. But it defies belief that the huge dramatic drops in violence and sexual violence everywhere in the world would somehow not be reflected on college campuses.

Interestingly, the decline in sexual violence does appear if you polish the wax fruit a bit. The seminal Koss study of the 1980’s claimed that one-in-four women were assaulted or raped on college campuses. As Christina Hoff Summer and Maggie McNeill pointed out, the actual rate was something like 8%. A current rate of 3-5% would indicate that sexual violence on campus has dropped in proportion to that of sexual violence in the broader society.

It goes without saying, of course that 3-5% of women experiencing sexual violence during their time at college is 3-5% too many. As institutions of enlightenment (supposedly), our college campuses should be safer than the rest of society. I support efforts to clamp down on campus sexual violence, although not in the form that it is currently taking, which I will address on the other site.

But the 1-in-5 stat isn’t reality. It’s a poll-test number. It’s a number picked to be large enough to be scary but not so large as to be unbelievable. It is being used to advance an agenda that I believe will not really address the problem of sexual violence.

Numbers means things. As I’ve argued before, if one in five women on college campuses are being sexually assaulted, this suggests a much more radical course of action than one-in-twenty. It would suggest that we should shut down every college in the country since they are the most dangerous places for women in the entire United States. But 1-in-20 suggests that an overhaul of campus judiciary systems, better support for victims and expulsion of serial predators would do a lot to help.

In other words, let’s keep on with the policies that have dropped sexual violence 50-80% in the last few decades.

# Mathematical Malpractice Watch: A Trilogy of Error

Three rather ugly instances of mathematical malpractice have caught my attention in the last month. Let’s check them out.

The Death of Facebook or How to Have Fun With Out of Sample Data

Last month, Princeton researchers came out with the rather spectacular claim that the social network Facebook would be basically dead within a few years. The quick version is that they fit an epidemiological model to the rise and fall of MySpace. They then used that same model, varying the parameters, to fit Google trends on searches for Facebook. They concluded that Facebook would lose 80% of its customers by 2017.

This was obviously nonsese as detailed here and here. It suffered from many flaws, notably assuming that the rise and fall of MySpace was necessarily a model for all social networks and the dubious method of using Google searches instead of publicly available traffic data as their metric.

But there was a deeper flaw. The authors fit a model of a sharp rise and fall. They then proclaim that this model works because Facebook’s google data follows the first half of that trend and a little bit of the second. But while the decline in Facebook Google searches is consistent with their model, it is also consistent with hundreds of others. It would be perfectly consistent with a model that predicts a sharp rise and then a leveling off as the social network saturates. Their data are consistent with but not discriminating against just about any model.

The critical part of the data — the predicted sharp fall in Facebook traffic — is out of sample (meaning it hasn’t happened yet). But based on a tiny sliver of data, they have drawn a gigantic conclusion. It’s Mark Twain and the length of the Mississippi River all over again.

We see this a lot in science, unfortunately. Global warming models often predict very sharp rises in temperature — out of sample. Models of the stock market predict crashes or runs — out of sample. Sports twerps put together models that predict Derek Jeter will get 4000 hits — out of sample.

Anyone who does data fitting for a living knows this danger. The other day, I fit a light curve to a variable star. Because of an odd intersection of Fourier parameters, the model predicted a huge rise in brightness in the middle of its decay phase because there were no data to constrain it there. So it fit a small uptick in the decay phase as though it were the small beginning of a massive re-brightening.

The more complicated the model, the more danger there is of drawing massive conclusions from tiny amounts of data or small trends. If the model is anything other than a straight line, be very very wary at out-of-sample predictions, especially when they are predicting order-of-magnitude changes.

A Rape Epidemic or How to Reframe Data:

The CDC recently released a study that claimed that 1.3 million women were raped and 12.6 million more were subject to sexual violence in 2010. This is six or more times the estimates of the FBI’s extremely rigorous NCVS estimate. Christina Hoff Summers has a breakdown of why the number is so massive:

It found them by defining sexual violence in impossibly elastic ways and then letting the surveyors, rather than subjects, determine what counted as an assault. Consider: In a telephone survey with a 30 percent response rate, interviewers did not ask participants whether they had been raped. Instead of such straightforward questions, the CDC researchers described a series of sexual encounters and then they determined whether the responses indicated sexual violation. A sample of 9,086 women was asked, for example, “When you were drunk, high, drugged, or passed out and unable to consent, how many people ever had vaginal sex with you?” A majority of the 1.3 million women (61.5 percent) the CDC projected as rape victims in 2010 experienced this sort of “alcohol or drug facilitated penetration.”

What does that mean? If a woman was unconscious or severely incapacitated, everyone would call it rape. But what about sex while inebriated? Few people would say that intoxicated sex alone constitutes rape — indeed, a nontrivial percentage of all customary sexual intercourse, including marital intercourse, probably falls under that definition (and is therefore criminal according to the CDC).

Other survey questions were equally ambiguous. Participants were asked if they had ever had sex because someone pressured them by “telling you lies, making promises about the future they knew were untrue?” All affirmative answers were counted as “sexual violence.” Anyone who consented to sex because a suitor wore her or him down by “repeatedly asking” or “showing they were unhappy” was similarly classified as a victim of violence. The CDC effectively set a stage where each step of physical intimacy required a notarized testament of sober consent.

In short, they did what is called “reframing”. They took someone’s experiences, threw away that person’s definition of them and substituted their own definition.

This isn’t the first time this has happened with rape stats nor the first time Summers had uncovered this sort of reframing. Here is an account of how researchers decided that women who didn’t think they had been raped were, in fact, raped, so they could claim a victimization rate of one in four.

Scientists have to classify things all the time based on a variety of criteria. The universe is a messy continuum; to understand it, we have to sort things into boxes. I classify stars for a living based on certain characteristics. The problem with doing that here is that women are not inanimate objects. Nor are they lab animals. They can have opinions of their own about what happened to them.

I understand that some victims may reframe their experiences to try to lessen the trauma of what happened to them. I understand that a woman can be raped but convince herself it was a misunderstanding or that it was somehow her fault. But to a priori reframe any woman’s experience is to treat them like lab rats, not human beings capable of making judgements of their own.

But it also illustrates a mathematical malpractice problem: changing definitions. This is how 10,000 underage prostitutes in the United States becomes 200,000 girls “at risk”. This is how small changes in drug use stats become an “epidemic”. If you dig deep into the studies, you will find the truth. But the banner headline — the one the media talk about — is hopelessly and deliberately muddled.

Sometimes you have to change definitions. The FBI changed their NCVS methodology a few years ago on rape statistics and saw a significant increase in their estimates. But it’s one thing to hone; it’s another to completely redefine.

(The CDC, as my friend Kevin Wilson pointed out, mostly does outstanding work. But they have a tendency to jump with both feet into moral panics. In this case, it’s the current debate about rape culture. Ten years ago, it was obesity. They put out a deeply flawed study that overestimated obesity deaths by a factor of 14. They quickly admitted their screwup but … guess which number has been quoted for the last decade on obesity policy?)

You might ask why I’m on about this. Surely any number of rapes is too many. The reason I wanted to talk about this, apart from my hatred of bogus studies, is that data influences policy. If you claim that 1.3 million women are being raped every year, that’s going to result in a set of policy decisions that are likely to be very damaging and do very little to address the real problem.

If you want a stat that means something, try this one: the incidence of sexual violence has fallen 85% over the last 30 years. That is from the FBI’s NCVS data so even if they are over- or under-estimating the amount of sexual violence, the differential is meaningful. That data tells you something useful: that whatever we are doing to fight rape culture, it is working. Greater awareness, pushing back against blaming the victim, changes to federal and state laws, changes to the emphasis of attorneys general’s offices and the rise of internet pornography have all been cited as contributors to this trend.

That’s why it’s important to push back against bogus stats on rape. Because they conceal the most important stat; the one that is the most useful guide for future policy and points the way toward ending rape culture.

The Pending Crash or How to Play with Scales:

Yesterday morning, I saw a chart claiming that the recent stock market trends are an eerie parallel of the run-up to the 1929 crash. I was immediately suspicious because, even if the data were accurate, we see this sort of crap all the time. There are a million people who have made a million bucks on Wall Street claiming to pattern match trends in the stock market. They make huge predictions, just like the Facebook study above. And those predictions are always wrong. Because, again, the out of sample data contains the real leverage.

This graph is even worse than that, though. As Quartz points out, the graph makers used two different y-axes. In one, the the 1928-29 rise of the stock market was a near doubling. In the other, the 2013-4 rise was an increase of about 25%. When you scale them appropriately, the similarity vanishes. Or, alternatively, the pending “crash” would be just an erasure of that 25% gain.

I’ve seen this quite a bit and it’s beginning to annoy me. Zoomed-in graphs of narrow ranges of the y-axis are used to draw dramatic conclusions about … whatever you want. This week, it’s the stock market. Next week, it’s global warming skeptics looking at little spikes on a 10-year temperature plot instead of big trends on a 150-year one. The week after, it will be inequality data. Here is one from Piketty and Saez, which tracks wealth gains for the rich against everyone else. Their conclusion might be accurate but the plot is useless because it is scaled to intervals of \$5 million. So even if the bottom 90% were doing better, even if their income was doubling, it wouldn’t show up on the graph.

# Mother Jones Again. Actually Texas State

Mother Jones, not content with having running one of the more bogus studies on mass shootings (for which they boast about winning an award from Ithaca College), is crowing again about a new study out of Texas State. They claim that the study shows that mass shooting are rising, that available guns are the reason and that civilians never stop shootings.

It’s too bad they didn’t read the paper too carefully. Because it supports none of those conclusions.

• The Texas State study covers only 84 incidents. Their “trend” is that about half of these incident happened in the last two years of the study. That is, again, an awfully small number to be drawing conclusions from.
• The data are based on Lexis/Nexus searches. That is not nearly as thorough as James Alan Fox‘s use of FBI crime stats and may measure media coverage more than actual events. They seem to have been reasonably thorough but they confirm their data from … other compilations.
• Their analysis only covers the years 2000-2010. This conveniently leaves out 2011 (which had few incidents) and the entirety of the 80’s and 90’s, when crime rates were nearly twice what they are now. The word for this is “cherry picking”. Consider what their narrow year range means. If the next decade has fewer incidents, the “trend” becomes a spike. Had you done a similar study covering the years 1990-2000, using MJ’s graph, you would have concluded that mass shootings were rising then. But this would have been followed by five years with very few active shooter events. Look at Mother Jones’ graph again. You can see that mass shootings fell dramatically in the early 2000’s, then spiked up again. That looks like noise in a flat trend over a 30-year baseline. But when you analyze it the way the Blair study does, it looks like a trend. You know what this reminds me of? The bad version of global warming skepticism. Global warming “skeptics” will often show temperature graphs that start in 1998 (an unusually warm year) and go the present to claim that there is no global warming. But if you look at the data for the last century, the long-term trend becomes readily apparent. As James Alan Fox has show, the long-term trend is flat. What Mother Jones has done is jump on a study that really wasn’t intended to look at long-term trends and claim it confirms long-term trends.
• Mother Jones’ says: “The unprecedented spike in these shootings came during the same four-year period, from 2009-12, that saw a wave of nearly 100 state laws making it easier to obtain, carry, and conceal firearms.” They ignore that the wave of gun law liberalization began in the 90’s, before the time span of this study.
• MJ also notes that only three of the 84 attacks were stopped by the victims using guns. Ignored in their smugness is that a) that’s three times what Mother Jones earlier claimed over a much longer time baseline; b) the number of incidents stopped by the victims was actually 16. Only three used guns.; c) at least 1/3 of the incident happened in schools, were guns are forbidden.
• So, yeah. They’re still playing with tiny numbers and tiny ranges of data to draw unsupportable conclusions. To be fair, the authors of the study are a bit more circumspect in their analysis, which is focused on training for law enforcement in dealing with active shooter situations. But Mother Jones never feels under any compulsion to question their conclusions.

(H/T: Christopher Mason)

Update: You might wonder why I’m on about this subject. The reason is that I think almost any analysis of mass shootings is deliberately misleading. Over the last twenty years, gun homicides have declined 40% (PDF) and gun violence by 70%. This is the real data. This is what we should be paying attention to. By diverting our attention to these horrific mass killings, Mother Jones and their ilk are focusing on about one one thousandth of the problem of gun violence because that’s the only way they can make it seem that we are in imminent danger.

The thing is, Mother Jones does acknowledge the decline in violence in other contexts, such as claiming that the crackdown on lead has been responsible for the decline in violence. So when it suits them, they’ll freely acknowledge that violent crime has plunged. But when it comes to gun control, they pick a tiny sliver of gun violence to try to pretend that it’s not. And the tell, as I noted before, is that in their gun-control articles, they do not acknowledge the overall decline of violence.

Using a fact when it suits your purposes and ignoring it when it doesn’t is pretty much the definition of hackery.

# Mathematical Malpractice Watch: Guns

A few weeks ago, Mother Jones did a timeline of mass shootings in response to the spate of summer shootings. The defined their criteria, listed 61 incidents and pointed out, correctly, that most of them were committed with legal firearms.

The highlight is a map of mass shootings over the last thirty years. The map has some resemblance to Radley Balko’s famous map of botched law enforcement raids. But the use of a map and dots is where the resemblance ends. Balko was very clear that his list of incidents was not, in any way, definitive. And he did not try to parse his incomplete data to draw sketchy conclusions.

Mother Jones felt under no such compulsion.

This week, they’ve published an “analysis” of their data and drawn the conclusion that our society has more guns than ever and, perhaps related, more mass shootings. Below, I’ll detail why I think their “analysis” — and yes, I will keep using quotation marks for this — is useless, uninformative and flat-out wrong.

# The “Liberal” Me

Am I a liberal? Have I become one?

That may seem like a ridiculous question to the three people who read this blog and are, on balance, to the left of me. But it’s been on my mind a bit lately. I am constantly accused of being a RINO or an out-and-out liberal on conservative sites. Friends and family often describe me as “so liberal”. And every time Obama screws up (about once a week), I get a message or an e-mail or a comment asking if I’m happy that I voted for him (which I didn’t; I voted for Barr). The current GOP primary race — in which none of the candidates really appeal to me — has only exacerbated this since I spend most of my time pointing out why each of the candidates is a terrible choice.

Thinking about it for a while, however, there may be something to the criticism. There are a handful of issues on which I’ve moved “left” in the last decade or so. But I do not see these as some sudden wellspring of liberalism. They are my fundamental conservatism and libertarianism refined. As I become more aware of the complexity and debate over certain issues, I find my libertarian/conservative philosophy leading me to views that I consider to be fundamentally conservative, but are no longer considered dogma by the GOP, least of all their collection of media dog washers.

Thanks to Twitter siphoning off my political rants, you’re getting more … non-political links:

• Cracked debunks the Twitter revolution. I’m forced to mostly agree. Social networking may have played a minor role in the upheavals in the Middle East, at best. But real activism involves risking your life, not turning your Facebook profile green.
• I really really like this idea of the Billion Price Index as a complement to traditional inflation metrics.
• Do you know … do either of you have any idea of how fucking glad I am I don’t have a big ass commute anymore? I can’t imagine how I did it for so long.
• I really hope the anti-homework agenda catches on. What’s being done to kids these days is absurd busy work bullshit.
• So do you think studies like this will, in any way, slow down those who want to ban fatty foods?

• Experts are once again stunned that poverty does not cause crime. They seem to be stunned by this quite a lot.
• Want to stimulate the economy? Wonder how America can lead the world in innovation again? Repeal SOX.
• # Mathematical Malpractice Watch: Why NationMaster Sucks

Graphjam ran a graphic today apparently showing all the awful things the US leads the world in.

It’s crap. It’s clearly produced by someone who spent a few minutes browing nationmaster.com. Nationmaster is convenient but their accuracy is, at best, suspect. There is no uniformity of data and many of the samples are incomplete or old. To be honest, you’re better off going to wikipedia. Much better off.

But beyond that, they just haven’t thought too much. For example, the graphic has has the US as #1 in crime. This is true, but only because we are a large country and a transparent one. The UK has half as many crimes but a fifth of our population. Germany half as many crimes but a quarter of our population. The crime rate in the US is high but not tops. Same goes with rape, which they have as #1. Scandinavian countries lead the civilized world in that (although likely because they measure their rape stats differently).

But a lot of this is the nationmaster problem. They have the US as #1 in CO2 emissions. This is actually wrong as China is #1. US emissions have actually been flat over the last few decades. The nationmaster data are 10 years old — way too far out of date. They also have the US as #1 in divorce rate. This is wrong. Russia is #1.

Teen birth rate? The US is #1 among developed nations. But you have to exclude almost every developing nation in the world to get that ranking. Nationmaster’s data is selective and based on 1994 data. The teen birth rate has plunged since then.

Heart attacks? I haven’t the faintest clue what they’re showing here. But heart attack survival rates have been growing massively in the US.

We do lead the world in McDonald’s restaurants and plastic surgery. That tends to come from being the richest country on Earth. We also, unfortunately, lead the world in both prison population and incarceration rate — yet another wonderful effect of our stupid war on drugs.