methods of organizing data
Organize data in a frequency distribution.
• Organize data in a class interval frequency distribution.
• Graph data in a bar graph.
• Graph data in a histogram.
• Graph data in a frequency polygon.
We will discuss two methods of organizing data: frequency distributions and graphs.
To illustrate the processes of organizing and describing data, let’s use the data set presented in Table 3.1. These data represent the scores of 30 students on an introductory psychology exam. One reason for organizing data and using statistics is so that meaningful conclusions can be drawn. As you can see from Table 3.1, our list of exam scores is simply that—a list in no particular order. As shown here, the data are not especially meaningful. One of the first steps in organizing these data might be to rearrange them from highest to lowest or lowest to highest.
Once this is accomplished (see Table 3.2), we can try to condense the data into a frequency distribution—a table in which all of the scores are listed along with the frequency with which each occurs. We can also show a relative frequency distribution, which indicates the proportion of the total observations included in each score. When the relative frequency distribution is multiplied by 100, it is read as a percentage. A frequency distribution and a relative frequency distribution of our exam data are presented in Table 3.3.
frequency distribution A table in which all of the scores are listed along with the frequency with which each occurs.
The frequency distribution is a way of presenting data that makes the pattern of the data easier to see. We can make the data set even easier to read (especially desirable with large data sets) if we group the scores and create a class interval frequency distribution. We can combine individual scores into categories, or intervals, and list them along with the frequency of scores in each interval. In our exam score example, the scores range from 45 to 95—a 50-point range. A rule of thumb when creating class intervals is to have between 10 and 20 categories (Hinkle, Wiersma, & Jurs, 1988). A quick method of calculating what the width of the interval should be is to subtract the smallest score from the largest score and then divide by the number of intervals you would like (Schweigert, 1994). If we wanted 10 intervals in our example, we would proceed as follows to determine the width of each interval:
95−4510=510=595−4510=510=5<math xmlns=”http://www.w3.org/1998/Math/MathML” display=”inline” alttext=”math”><mrow><mfrac><mrow><mn>95</mn><mo>−</mo><mn>45</mn></mrow><mrow><mn>10</mn></mrow></mfrac><mo>=</mo><mfrac><mn>5</mn><mrow><mn>10</mn></mrow></mfrac><mo>=</mo><mn>5</mn></mrow></math>
class interval frequency distribution A table in which the scores are grouped into intervals and listed along with the frequency of scores in each interval.
The frequency distribution using the class intervals with a width of 5 is provided in Table 3.4. Notice how much more compact the data appear when presented in a class interval frequency distribution. Although such distributions have the advantage of reducing the number of categories, they have the disadvantage of not providing as much information as a regular frequency distribution. For example, although we can see from the class interval frequency distribution that five people scored between 75 and 79, we do not know their exact scores within the interval.
Frequency distributions can provide valuable information, but sometimes a picture is of greater value. Several types of pictorial representations can be used to represent data. The choice depends on the type of data collected and what the researcher hopes to emphasize or illustrate. The most common graphs used by psychologists are bar graphs, histograms, and frequency polygons (line graphs). Graphs typically have two coordinate axes, the x-axis (the horizontal axis) and the y-axis (the vertical axis). Most commonly, the y-axis is shorter than the x-axis, typically 60% to 75% of the length of the x-axis.
Bar graphs and histograms are frequently confused. When the data collected are on a nominal scale, or if the variable is a qualitative variable (a categorical variable for which each value represents a discrete category), then a bar graph is most appropriate. A bar graph is a graphical representation of a frequency distribution in which vertical bars are centered above each category along the x-axis and are separated from each other by a space, indicating that the levels of the variable represent distinct, unrelated categories.
qualitative variable A categorical variable for which each value represents a discrete category.
bar graph A graphical representation of a frequency distribution in which vertical bars are centered above each category along the x-axis and are separated from each other by a space, indicating that the levels of the variable represent distinct, unrelated categories.
If the variable is a quantitative variable (the scores represent a change in quantity), or if the data collected are ordinal, interval, or ratio in scale, then a histogram can be used. A histogram is also a graphical representation of a frequency distribution in which vertical bars are centered above scores on the x-axis, but in a histogram the bars touch each other to indicate that the scores on the variable represent related, increasing values.
quantitative variable A variable for which the scores represent a change in quantity.
histogram A graphical representation of a frequency distribution in which vertical bars centered above scores on the x-axis touch each other to indicate that the scores on the variable represent related, increasing values.
In both a bar graph and a histogram, the height of each bar indicates the frequency for that level of the variable on the x-axis. The spaces between the bars on the bar graph indicate not only the qualitative differences among the categories but also that the order of the values of the variable on the x-axis is arbitrary. In other words, the categories on the x-axis in a bar graph can be placed in any order. The fact that the bars are contiguous in a histogram indicates not only the increasing quantity of the variable but also that the variable has a definite order that cannot be changed.
A bar graph is illustrated in Figure 3.1. For a hypothetical distribution, the frequencies of individuals who affiliate with various political parties are indicated. Notice that the different political parties are listed on the x-axis, whereas frequency is recorded on the y-axis. Although the political parties are presented in a certain order, this order could be rearranged because the variable is qualitative.
Figure 3.2 illustrates a histogram. In this figure, the frequencies of intelligence test scores from a hypothetical distribution are indicated. A histogram is appropriate because the IQ score variable is quantitative. The variable has a specific order that cannot be rearranged. You can see how to use Excel and SPSS to create both bar graphs and histograms in the Statistical Software Resources section at the end of this chapter. If you are unfamiliar with Excel or SPSS, see Appendix C to get started with these tools.