Q&A with Richard A. Berk

Prison beds are expensive. Housing an inmate in a high-security facility can cost more than a year of tuition at Penn.

Richard Berk

Candace diCarlo

A statistician by trade, Richard Berk has designed crime prediction software to help anticipate when people on probation, or on parole, are most likely to commit murder or be murdered. Using machine-learning models, he says computers can determine which individual profiles are associated with a particular level of risk.

For example, the South Carolina Department of Corrections' recent capital improvements plan estimates, in 2006 dollars, $50 million in construction costs for a 500-bed, maximum-security institution, or about $100,000 per bed, and $99 million in construction costs for a 1,500-bed, high-security facility, or around $66,000 per bed.

For fiscal year 2011-12, Pennsylvania Gov. Tom Corbett requested $1.88 billion to incarcerate slightly more than 51,000 prisoners, or about $37,000 per inmate.

Richard A. Berk, a professor in the Wharton School and the Department of Criminology in the School of Arts and Sciences, is an expert on inmate classification and has developed procedures that law enforcement officials use to determine where prisoners are placed.

To cut costs, Berk says prison beds should be used more efficiently, with only high-risk individuals—those who pose a risk to themselves or others—placed in the most expensive, secure settings. “You don’t want to waste those resources on individuals who could be perfectly safe living in the equivalent of a dormitory,” he says.

A statistician by trade, Berk has designed crime prediction software to help anticipate when people on probation, or on parole, are most likely to commit murder or be murdered. Using machine-learning models, he says computers can determine which individual profiles are associated with a particular level of risk.

“We can forecast for people on probation and parole, substantially more accurately than ever before, who’s going to be murdered and who’s going to try to murder somebody,” he says. “[The software is] used to help determine what kind of supervision is going to be imposed on the individual. High-risk folks obviously get tighter supervision. We don’t want them going around killing people.”

The Current sat down with Berk in the McNeil Building on Locust Walk to discuss crime, punishment and how many aspects of human misconduct can be predicted and solved by math.

Q. You studied psychology at Yale and sociology at Johns Hopkins. How did you become interested in statistics?
A. What basically happened was while I was getting my Ph.D., I took a lot of math and stats. It was encouraged in my department and I just got hooked on it. The fact that I got a Ph.D. in sociology really doesn’t reflect very well what I did. Most of what I did was statistics and applied math. My Ph.D. is in sociology but the training I got was as much in statistics as it was in sociology.

Q. What do you teach in the Department of Criminology and Wharton?
A. They’re sort of overlapping. In Wharton, I’m in the Statistics Department, and that’s a world-class department. It’s fabulous and there are a lot of people there doing interesting things, and several [people] doing things that overlap with my interests, so it’s a really good fit for me.
In the Criminology Department, it turns out that what I teach is effectively statistics also, but it’s applied very explicitly to policy matters and criminal justice. For example, how good is DNA identification? How good are fingerprinting procedures? Or how do you forecast crime hot spots? So it’s still statistics, but the difference is in the Criminology Department I emphasize the subject matter, whereas in the Statistics Department, of course, I emphasize the tools, the procedures.

Q. Is the justice system currently doing a good job of classifying inmates?
A. Pretty much so. No forecasting is perfect, you’re going to make mistakes, but you’re making many, many fewer mistakes than you would be if you used the old procedures that had been in place 10 years ago.

Q. How are your procedures different from the processes used a decade ago?
A. Mine are based on what’s called machine learning, these new procedures that allow you to work with very large databases. The way I say it is, if you’re looking for a needle in a haystack, first of all you have to have the haystack, and procedures that can look through a haystack. Conventional statistical procedures don’t do that very effectively. A lot of the new algorithms that are out there, particularly machine learning, allow you to work with really large databases—hundreds of thousands of cases—in an efficient fashion.

Q. Your bio says you’ve investigated law enforcement strategies for reducing domestic violence. What did your research involve?
A. We were working with the Los Angeles County Sheriff’s Department. When [police officers] come to the scene of a domestic violence incident, they have to make a decision about what to do with the perpetrator. There are a lot of things they can do: They can arrest the guy, they can order him out of the house, or they can volunteer, for example, to take the victim to a shelter. One of the reasons they make these decisions is to prevent subsequent events. If they’re able to forecast the likelihood of another domestic violence incident in the near future, they’re probably going to take a more intrusive approach to the incident, but they have to forecast. We developed a checklist that they can go through to help them forecast more accurately whether there’s going to be a repeat incident, let’s say, within the next several weeks. It turned out to be pretty effective.

Q. You have also studied the role of race in capital punishment. Is race a factor in capital punishment?
A. The issue there is technical. ... With the kinds of data that we currently have available and the kinds of tools, is it possible to find evidence that race matters once you adjust for the other things that might affect whether a person is charged with a capital crime? The question is not whether there is [a racial factor] or there isn’t, the question is, can you tell? And the answer pretty much is no, that these decisions are too complicated. The decision-making outstretched the ability of our data to really arrive at any kind of definitive assessment. So the answer is no, you can’t tell. People claim both ways. Some people claim yes, some people claim no. The truth is you can’t tell.

Q. Can you talk about your work detecting violations of environmental regulations? How is this related to your other work?
A. This was a white-collar crime problem. This research was centered off the coast of South America in the international tuna fisheries down there. One of the regulations written into the international treaties is that when you catch tuna, you’re not supposed to catch dolphins. When you go to the grocery store and you buy a can of tuna, it says, ‘dolphin safe’ ... It says that these fishing boats are abiding by the regulations and are not killing dolphins when they catch tuna. The problem is, how do you tell? A fishing boat is 200 miles off the shore in the ocean. Nobody’s around. They net a bunch of tuna, who’s to say whether they caught a dolphin or not? What the treaty provides for is putting observers on those boats who are supposed to record if there are any dolphins when the tuna are caught. The problem is, who’s to say that the observer can’t be bribed by the captain? A net full of tuna could be worth tens of thousands of dollars, so you can give the observer a thousand bucks to somehow not report that the dolphins were there. The trick here was to find out which observers were cheating. That would be a violation of the regulations, and so we developed machine learning statistical models to find the bad guys on the tuna boats who were taking bribes to underreport dolphin deaths. And that worked real well, too.

Q. It sounds like a lot of human wrongdoing can be predicted or solved with mathematical problems.
A. More so than you’d think. There’s more information if you know how to squeeze it out of the data set. There’s more information there than you think. Conventional physical tools don’t necessarily get at it very well. Some of these new procedures get at it much better. It’s surprising how well you can forecast and find malfeasance if you just know how to report.

Q. You have also investigated claims that the death penalty serves as a general deterrent. Does it?
A. The answer is like the race question: You really can’t tell. This is another one of these circumstances where ideological positions are on both sides. Some people say the death penalty’s great, it deters homicide. Other people say no way. And sometimes they try to bring data to bear. The fact of the matter is the data that’s available just can’t give you a definitive answer, and will not give the definitive answer anytime soon. You may be publishing in journals, but you’re not advancing the understanding any because you’re just producing noise.

Q. Do you have a personal opinion any way?
A. I don’t know whether the death penalty deters. I guess I’m a little skeptical but, again, as people say, you’re entitled to your own opinions but not your own facts. I don’t have my own facts about this, or anybody else’s. It’s hard to tell.

Q. The United States has only 5 percent of the world’s population but 25 percent of the world’s inmates, with nearly 2 million prisoners. Why do you think the incarceration rate in America is so high?
A. I’m a statistician, not a criminologist. The obvious answer is because we write our laws that way. We have this system in which individuals are released on parole and we watch them like a hawk, and if they do anything—even if it’s not a threat to public safety—we throw them back in prison. So we have this revolving-door phenomenon. But it comes down, basically, to the fact that we just write these draconian laws. A European democracy might incarcerate someone for five years; we might incarcerate them for 20. Same crime, same risk, we just like tougher laws. Of course now we have a problem because we don’t want to pay for it. It’s expensive.

Q. Is there anything the United States can do to lower the cost of incarceration?
A. Oh, there’s lots. We incarcerate many people for far too long, given their threat to public safety. We have people who are really no threat to public safety who are serving substantial prison terms. For example, California is now under court order to do something about overcrowding. I’ve been working with them a little and they could probably release a third of their inmates without having a significant effect on violent crime in California. I’m not saying that’s politically tenable, but that is an empirical matter. You’re putting them away far, far too long if what you’re concerned about is the risk they pose to public safety. You could basically solve the problem by just being more intelligent in how you meter out your sentences. You give long sentences to people who are really a threat, and you don’t do that for individuals who, you may not like what they do, but they’re not a threat to public safety.

Q. Is part of this the result of the War on Drugs?
A. That’s a very good example of the problem. If you wind up in prison for possession of small amounts of drugs, it’s not a good thing that you’re using drugs, but a large fraction of those individuals are no threat to public safety. They may be doing stupid things that are ill advised, but they’re not a threat. Some of them are, but we can tell which are which, or at least we can tell better than others historically have been able to do.

We’re squeezing more information out of those predictors by having the computer document what the relationships are rather than imposing a priority."

Q. Am I correct that your crime prediction software examines roughly two-dozen variables, such as criminal record and geographic location?
A. There are no surprises. This is something that always mystifies people when I talk about it. The predictors that we use are by and large what people have known about for a hundred years. There’s no mystery.
What we’re doing is two things: One is we’re squeezing more information out of those predictors by having the computer document what the relationships are rather than imposing a priority. The computer’s a lot smarter than we are in finding these things. The other thing is—it’s a more technical point—but you can set up this procedure in such a way that [the computer finds] things that you don’t understand. ... I borrowed a term from physics called dark structure. There’s structure out there that helps us forecast, but we don’t yet know what it is, or how it works, or what its mechanisms are. … When you allow the computer to find it, you predict a lot better. These new techniques do a much better job. I hope my physics friends don’t mind us using the term dark structure. They talk about dark energy.

Q. Is age an important variable in this prediction process?
A. Just what you’d expect. Young men with long criminal records who started their lives of crime at a young age, and who have a continuous record of getting in trouble up to the point where they’re convicted of a felony—they’re the bad guys that you’d expect. And they are guys, by and large. Violent crime is committed by men, almost exclusively, and young men almost exclusively. People age out of crime. The high-risk groups are between about 16 and 20, and after that it starts to drop dramatically, and by 40, they’re pussycats.

Q. If there were no surprises, why haven’t researchers come to this conclusion sooner or utilized these predictors before?
A. The technology that we’re using, the computer technology, the algorithm, the statistics, are quite new. I would say within the last five years. Also, I think criminologists, sociologists, economists, political scientists, or anybody who studies crime, they’re sort of blinded by their own theory. They think they know and so they look at the variables that their theory says they should look at, which means they don’t look elsewhere, where there might be interesting things. It’s sort of the arrogance of theory. If you let the computer just snoop around in the dataset, it finds things that are unexpected by existing theory and works really substantially well to help forecast. Theory’s a great thing, but you don’t want to let it be a straightjacket, and most of the people who work on this kind of stuff use theory as a straightjacket. They miss lots of things because they believe their theory.

Q. Where is your software being used?
A. It’s being used in Philadelphia, it’s being used in Baltimore and we’re in the process of implementing it statewide in Pennsylvania. We’re trying to implement it for juveniles in Maryland. We’re not there yet, but that’ll happen too. And we’re just starting a new project for Maryland. We’re looking at victimization so that when a family comes into children’s protective services because of concerns about neglect or abuse, we want to forecast which kids are likely to be either murdered or subjected to really grisly, dangerous child abuse. We want to find it before it happens and prevent it. We’re not so much interested in the perpetration of crime. In this circumstance, we’re trying to focus on the kids who are going to be victims and get there before these kids are in terrible trouble.

Q. So you may be able to predict which children may be victimized?
A. We’re just starting that project. It’ll be probably a year before we know if we can pull it off or not. It depends on the quality of the data.

Q. Have you heard back from the City of Philadelphia about the software?
A. They love it. We’re also working with the local district attorney’s office to help them decide how to charge [criminals]. If someone’s a real bad guy, you want to make sure you don’t plea bargain away the capacity to incarcerate somebody. If they’re going to go out and kill somebody, you better not let them. But that means forecasting in part so that when they charge, they charge in a way that can incapacitate the individual. So we’re working with them on that, and also working with them on developing recommendations for bail. You don’t want to release somebody on bail if they’re going to go out and kill somebody. You want to be able to forecast that. In those circumstances, you either deny bail or you make the bail very, very high.

Q. What do you say to people who say human beings all have free will and even though someone is predicted to act a certain way, he or she could choose differently?
A. They certainly can. We predict based on probability. We don’t say with certainty anybody will do anything.

Q. What research are you currently working on?
A. I have a book right now that Springer is going to put out, and it’s basically about these procedures, which is how you do crime forecasting with large databases and machine learning. It’s in the proofreading stages now. It will come out in 2012. I’m guessing maybe spring/summer.
All of these projects continue because this is so new. As people put the software on their servers and they run it for a year or so, they find new ways to tweak and make it better. They’ll come back and I’ll see if I can help them out. In all these cases, I’m still in touch with the folks who are using this stuff and we are continuing to refine the methods.
Also, criminal justice agencies are collecting better and better data and more and more data, so a model that I might have built five years ago can be rebuilt now with new information that’ll be even better.

Originally published on December 15, 2011