Main Content
Lesson 1: Decision Making Under Uncertainty
Types of Variables
Data (at least for the purposes of statistics) fall into two main groups: categorical and quantitative.
-
Variable
- A variable is a characteristic of the chosen sample that needs to be analyzed for decision-making. For example: age, gender, household income, number of children, average sale, time spent on social media,
Classifying Variables
-
Quantitative
-
Numerical values with magnitudes that can be placed in meaningful order with consistent intervals, also known as numerical or measurement variables.
- Discrete
- Numerical data that can be counted:
- age
- number of production plants
- number of employees
- Continuous
- Numerical data that is a continuous measurement:
- salary ($ usually considered continuous)
- experience (may also be considered discrete; depends on precision in measurement)
-
Categorical
-
Names or labels (i.e., categories) with no logical order or with a logical order but inconsistent differences between groups, also known as qualitative.
- For example, responses to questions about marital status, coded as: Single = 1, Married = 2, Divorced = 3, Widowed = 4
- Nominal Data
- Nominal data are qualitative responses coded in numbers.
Arithmetic operations don’t make any sense (e.g., does Widowed ÷ 2 = Married?).
- Ordinal Data
-
Ordinal data appear to be categorical in nature, but their values have an order or ranking.
- For example, Amazon reviews: Poor = 1, Fair = 2, Good = 3, Very Good = 4, Excellent = 5
- Although it is still not meaningful to do arithmetic on this data (e.g., does 2*fair = very good?!), we can say things like excellent > poor or fair < very good. That is, order is maintained no matter which numeric values are assigned to each category.
Choice of Variables
One important aspect of statistical analysis is identifying variables that may be relevant to your study and collecting data about them. In the end, not all the variables you identified may be relevant. But you should start by exploring all of them (within financial, time, and resource constraints) and then choose the most important ones. For example, what are the variables that determine salary? Probably education level, years of experience, age, and maybe gender. After our analysis, we may discover that gender does not have any effect on salary, but it is important to include this in our initial analysis if only to test whether it affects our variable of interest (salary in this case). More on this later in Regression Analysis.
Classification Issues
The types of data do not always readily indicate how information should be classified. For example, is employee number numerical or categorical? Although this number may appear to be numerical, the intent of the variable can be categorical if the focus of the analysis is size of the firm. Other variables may cause classification problems. Suppose each employee’s file contains a performance evaluation in the form of a number that could range from 0 to 100. No employee has received an evaluation of 0, or even less than 40. Is this numerical data? Is an evaluation of 80 truly twice as good as an evaluation of 40? Maybe, maybe not—it depends on the scale used in the evaluation instrument. Is this ratio data? If so, what does 0 mean?
Practical Concerns About Interpreting Data Types
The purpose of introducing you to the forms of data types is to alert you to the distinctions that are natural and possible in the way data may appear. Although the classification of data may not always be clear, it is important to reflect on the available data and what it measures. Why? Although we know that it makes no sense to find the average of categorical data, you will undoubtedly see someone do it. It's easy to assign numbers to categorical data and then start adding and dividing, but this is no justification for doing so. For example, MBA programs often ask students for current salary information. When they do so, they often use surveys with salary groupings—categorical data. Then they will report a salary average for students—oops, this is meaningless! If we accept what is presented unquestioningly, we become part of the problem. Inappropriate use of data undermines the opportunity to see the information data can provide. We must always ask questions about data type to determine the meaningfulness of any data analysis.