Lesson 1: The Purpose of Statistics in Criminology and Criminal Justice

Lesson Overview

The opening lesson of this course outlines basic purposes and principles of using quantitative techniques to analyze data produced by scientific methods. The rationales for utilizing statistics are discussed as are paradigmatic schemes for viewing quantification within the overall process of social scientific research.

Lesson Objectives

By the end of this lesson, you should be able to:

Explain why statistical analysis is utilized in social scientific research;
Recognize the principles for using social statistics;
Understand statistics as a 'language' for explaining social phenomena.

Brief Introduction of Statistics in Criminal Justice

Criminological research is developing by leaps and bounds. Huge amounts of data are collected by both scientists studying issues of crime and justice as well as by those working in the day-to-day administration of the criminal justice system. It would be impossible to summarize all of this information in a timely fashion without the help of statistical analysis.

Statistics are based on a basic mathematical principle known as probability theory. This theory is predicated on illustrating the odds, or probability, of something occurring. Therefore, statistics are a mathematical expression of likelihood, such as the likelihood of inner-city children becoming gang members, the likelihood of probationers to recidivate, etc.

So what, specifically, are statistics able to do?

First, statistics serve as the "answers" to research questions. These questions are often formally worded as hypotheses (i.e. 'Level of education and violent behavior are significantly related'.) A hypothesis receives scientific support only when a researcher can demonstrate significant empirical evidence, utilizing scientific methods. As you learned in research methods, the best studies are those which utilize entire populations, or consist of larger samples which have been randomly drawn. Statistics are used to summarize large volumes of information, and in doing so, allow research hypotheses to be supported or refuted.

Second, statistics seek to clarify, not confuse, issues. They are a 'language' for communicating results. Even though this language is mathematically based, it is no different, in terms of meaning, than any other language such as English, Japanese, or Greek. Students are often encouraged to approach a class such as quantitative analysis from the perspective of learning a foreign language, not only because this analogy is appropriate, but also because focusing on meaning or outcomes de-emphasizes the computational aspect of statistics that students frequently dread. While mathematical computation is a necessary part of any statistics course, the true goal is to understand what the numbers are 'telling' you. The figures we calculate are making a statement just like a paragraph of text is making a statement.

Different statistics are used in different circumstances just like certain words or phrases are used in different circumstances. Statistics are logical, rational, and complete. There are patterns which are followed and commonalities in meaning. Once we work past any math 'phobias' you may have, learning quantitative analysis is much the same as the process of learning any new language.

Five Principles of Statistics

Before delving into the world of quantitative analysis, it helps to outline a few basic rules, or principles, of statistical reasoning. These rules apply to most, if not all, situations in which statistics are utilized and cover the range of statistics from the most basic to the most complex. You should keep these principles in the back of your mind throughout the course as points of reference.

Statistics seek to reduce the amount of error in research as much as possible.

Error, or invalidity, is the bane of any research project. If the results of your research do not paint an accurate portrait of social reality, then the research itself becomes a wasted effort. With statistical reasoning, not only do you gain the advantage of empirical support, but you also are able to gauge the accuracy of your results.

Statistics based on more information are preferable to those based on less.

This principle is primarily concerned with sampling and sample size. In general, the larger the sample size, the more confidence one may have in the results derived from the data analysis. Error decreases dramatically as sample sizes rise from zero, then level off gradually when sample sizes surpass one or two thousand. It is typical to see sample sizes between 100 and 5000, and very few of higher numbers.

Intuitively, it should make sense that the more subjects taken from a population and put into a sample, the less likely it is for invalidity to creep into the results. Of course, if every element in a population provides data, there is no sampling error at all. Thus, the closer the sample size is to the population size, the more confidence you may have in the statistics which come from that sample.

Lastly, we assume that random sampling techniques are utilized when possible. You may have heard the term 'fudging' statistics. The manipulation of research results most typically happens before any mathematical operations are performed. The most common method of improper data manipulation is to affect the composition of the sample prior to data being collected. For example, if you want the average GPA of a group of students to be high, make the sample consist mostly of Dean's List students. It is more difficult to manipulate data after it has been collected.

Outliers can have an extreme effect on statistical validity

An outlier is defined as a deviant case, more than 3 standard deviations away from the mean, or from 'average.' Including outliers in statistical analysis may present a very skewed picture of social reality. For example, let's say there are 10 homes on a residential street. Nine of those homes are valued at $200,000, and the last home is a mansion at the end of the street valued at $3,000,000. The total value of all the homes on the street is $4,800,000, and the mean value for any one home would be $480,000. Would you say this average home price is indicative of the average home value on this street? Obviously the presence of one highly priced home is 'pulling' property values up considerably, as the vast majority of homes on this street are worth $200,000. This skewed average could have serious ramifications; what would you say these ramifications might be?

The easiest way to deal with outliers is to simply omit them from statistical analysis. However, a significant number of outliers may hint at something important, and outlying values should always be discussed in a report.

The data from which statistics are calculated must be collected in a logical, systematic, methodologically sound manner.

Research methods are the foundation upon which statistics are based. If data (or pieces of information) are not collected from subjects in a scientifically rigorous manner, the conclusions drawn from that information will have questionable validity. Proper data collection methods are the focus of the graduate research methods course (CRIMJ 450W), and without a sound basis, any subsequent conclusions should be viewed skeptically.

Statistics are MEANINGLESS without background information about how the data were collected.

One of the biggest confounds in statistics, and social scientific research in general, is that statistics gleaned from good data are indistinguishable from statistics based on bad information. Good and bad (that is valid statistics and invalid statistics) appear identical, and the only differentiation is whether the methodologies used to generate the figures are sound. Solid scientific information details the methodology used to collect the data, and allows for replication of the research at hand. One of the easiest ways to spot potential problems with statistics is if needed background information on the methodology is not provided. As a rule of thumb, only those with something to hide will withhold information about the collection process. Ethical scientists pursue the truth, and therefore lay bare their research processes.

Weekly Assignments

Discussion Forums

Please participate in the discussions of the following questions in the Lesson 1 Common Statistics in Criminal Justice online discussion forum:

What are some common statistics used in the field of criminology/criminal justice? How are they used and what do they mean to you?

For those of you who have experience working in the CJ field, what types of data are collected by criminal justice agencies in the system? What becomes of this information? How is it used by the agency itself?