The opening lesson of this course outlines basic purposes and principles of using quantitative techniques to analyze data produced by scientific methods. The rationales for utilizing statistics are discussed as are paradigmatic schemes for viewing quantification within the overall process of social scientific research.
By the end of this lesson, you should be able to:
Criminological research is developing by leaps and bounds. Huge amounts of data are collected by both scientists studying issues of crime and justice as well as by those working in the day-to-day administration of the criminal justice system. It would be impossible to summarize all of this information in a timely fashion without the help of statistical analysis.
Statistics are based on a basic mathematical principle known as probability theory. This theory is predicated on illustrating the odds, or probability, of something occurring. Therefore, statistics are a mathematical expression of likelihood, such as the likelihood of inner-city children becoming gang members, the likelihood of probationers to recidivate, etc.
So what, specifically, are statistics able to do?
Different statistics are used in different circumstances just like certain words or phrases are used in different circumstances. Statistics are logical, rational, and complete. There are patterns which are followed and commonalities in meaning. Once we work past any math 'phobias' you may have, learning quantitative analysis is much the same as the process of learning any new language.
Before delving into the world of quantitative analysis, it helps to outline a few basic rules, or principles, of statistical reasoning. These rules apply to most, if not all, situations in which statistics are utilized and cover the range of statistics from the most basic to the most complex. You should keep these principles in the back of your mind throughout the course as points of reference.
Error, or invalidity, is the bane of any research project. If the results of your research do not paint an accurate portrait of social reality, then the research itself becomes a wasted effort. With statistical reasoning, not only do you gain the advantage of empirical support, but you also are able to gauge the accuracy of your results.
This principle is primarily concerned with sampling and sample size. In general, the larger the sample size, the more confidence one may have in the results derived from the data analysis. Error decreases dramatically as sample sizes rise from zero, then level off gradually when sample sizes surpass one or two thousand. It is typical to see sample sizes between 100 and 5000, and very few of higher numbers.
Intuitively, it should make sense that the more subjects taken from a population and put into a sample, the less likely it is for invalidity to creep into the results. Of course, if every element in a population provides data, there is no sampling error at all. Thus, the closer the sample size is to the population size, the more confidence you may have in the statistics which come from that sample.
Lastly, we assume that random sampling techniques are utilized when possible. You may have heard the term 'fudging' statistics. The manipulation of research results most typically happens before any mathematical operations are performed. The most common method of improper data manipulation is to affect the composition of the sample prior to data being collected. For example, if you want the average GPA of a group of students to be high, make the sample consist mostly of Dean's List students. It is more difficult to manipulate data after it has been collected.
An outlier is defined as a deviant case, more than 3 standard deviations away from the mean, or from 'average.' Including outliers in statistical analysis may present a very skewed picture of social reality. For example, let's say there are 10 homes on a residential street. Nine of those homes are valued at $200,000, and the last home is a mansion at the end of the street valued at $3,000,000. The total value of all the homes on the street is $4,800,000, and the mean value for any one home would be $480,000. Would you say this average home price is indicative of the average home value on this street? Obviously the presence of one highly priced home is 'pulling' property values up considerably, as the vast majority of homes on this street are worth $200,000. This skewed average could have serious ramifications; what would you say these ramifications might be?
The easiest way to deal with outliers is to simply omit them from statistical analysis. However, a significant number of outliers may hint at something important, and outlying values should always be discussed in a report.
Research methods are the foundation upon which statistics are based. If data (or pieces of information) are not collected from subjects in a scientifically rigorous manner, the conclusions drawn from that information will have questionable validity. Proper data collection methods are the focus of the graduate research methods course (CRIMJ 450W), and without a sound basis, any subsequent conclusions should be viewed skeptically.
One of the biggest confounds in statistics, and social scientific research in general, is that statistics gleaned from good data are indistinguishable from statistics based on bad information. Good and bad (that is valid statistics and invalid statistics) appear identical, and the only differentiation is whether the methodologies used to generate the figures are sound. Solid scientific information details the methodology used to collect the data, and allows for replication of the research at hand. One of the easiest ways to spot potential problems with statistics is if needed background information on the methodology is not provided. As a rule of thumb, only those with something to hide will withhold information about the collection process. Ethical scientists pursue the truth, and therefore lay bare their research processes.