This example uses data collected from a study of students enrolled in a registered nurse to bachelor of science in nursing (RN to BSN) program (Mancini, Ashwill, & Cipher, 2014). The predictor in this example is number of academic degrees obtained by the student prior to enrollment, and the dependent variable was number of months it took for the student to complete the RN to BSN program. The null hypothesis is “Number of degrees does not predict the number of months until completion of an RN to BSN program.”
The data are presented in Table 29-1. A simulated subset of 20 students was selected for this example so that the computations would be small and manageable. In actuality, studies involving linear regression need to be adequately powered (Aberson, 2010; Cohen, 1988). Observe that the data in Table 29-1 are arranged in columns that correspond to 321the elements of the formula. The summed values in the last row of Table 29-1 are inserted into the appropriate place in the formula for b.
TABLE 29-1
ENROLLMENT GPA AND MONTHS TO COMPLETION IN AN RN TO BSN PROGRAM
Student ID | x | y | x2 | xy |
(Number of Degrees) | (Months to Completion) | |||
1 | 1 | 17 | 1 | 17 |
2 | 2 | 9 | 4 | 18 |
3 | 0 | 17 | 0 | 0 |
4 | 1 | 9 | 1 | 9 |
5 | 0 | 16 | 0 | 0 |
6 | 1 | 11 | 1 | 11 |
7 | 0 | 15 | 0 | 0 |
8 | 0 | 12 | 0 | 0 |
9 | 1 | 15 | 1 | 15 |
10 | 1 | 12 | 1 | 12 |
11 | 1 | 14 | 1 | 14 |
12 | 1 | 10 | 1 | 10 |
13 | 1 | 17 | 1 | 17 |
14 | 0 | 20 | 0 | 0 |
15 | 2 | 9 | 4 | 18 |
16 | 2 | 12 | 4 | 24 |
17 | 1 | 14 | 1 | 14 |
18 | 2 | 10 | 4 | 20 |
19 | 1 | 17 | 1 | 17 |
20 | 2 | 11 | 4 | 22 |
sum Σ | 20 | 267 | 30 | 238 |
The computations for the b and α are as follows:
Step 1: Calculate b.
From the values in Table 29-1, we know that n = 20, Σx = 20, Σy = 267, Σx2 = 30, and Σxy = 238. These values are inserted into the formula for b, as follows:
b=20(238)−(20)(267)20(30)−20 2
b=−2.9
Step 2: Calculate α.
From Step 1, we now know that b = −2.9, and we plug this value into the formula for α.
α=267−(−2.9)(20)20
α=16.25
Step 3: Write the new regression equation:
y=−2.9x+16.25
Step 4: Calculate R.
The multiple R is defined as the correlation between the actual y values and the predicted y values using the new regression equation. The predicted y value using the new equation is represented by the symbol ŷ to differentiate from y, which represents the actual y values in the data set. We can use our new regression equation from Step 3 to compute predicted program completion time in months for each student, using their number of academic degrees prior to enrollment in the RN to BSN Program. For example, Student #1 had earned 1 academic degree prior to enrollment, and the predicted months to completion for Student 1 is calculated as:
y ̂ =−2.9(1)+16.25
y ̂ =13.35
Thus, the predicted ŷ is 13.35 months. This procedure would be continued for the rest of the students, and the Pearson correlation between the actual months to completion (y) and the predicted months to completion (ŷ) would yield the multiple R value. In this example, the R = 0.638. The higher the R, the more likely that the new regression equation accurately predicts y, because the higher the correlation, the closer the actual y values are to the predicted ŷ values. Figure 29-1 displays the regression line where the x axis represents possible numbers of degrees, and the y axis represents the predicted months to program completion (ŷ values).
Step 5: Determine whether the predictor significantly predicts y.
t=Rn−21−R 2 ‾ ‾ ‾ ‾ √
To know whether the predictor significantly predicts y, the beta must be tested against zero. In simple regression, this is most easily accomplished by using the R value from Step 4:
t=.638200−21−.407 ‾ ‾ ‾ ‾ ‾ √
t=3.52
The t value is then compared to the t probability distribution table (see Appendix A). The df for this t statistic is n − 2. The critical t value at alpha (α) = 0.05, df = 18 is 2.10 for a two-tailed test. Our obtained t was 3.52, which exceeds the critical value in the table, thereby indicating a significant association between the predictor (x) and outcome (y).
Step 6: Calculate R2.
After establishing the statistical significance of the R value, it must subsequently be examined for clinical importance. This is accomplished by obtaining the coefficient of determination for regression—which simply involves squaring the R value. The R2 represents the percentage of variance explained in y by the predictor. Cohen describes R2 values of 0.02 as small, 0.15 as moderate, and 0.26 or higher as large effect sizes (Cohen, 1988). In our example, the R was 0.638, and, therefore, the R2 was 0.407. Multiplying 0.407 × 100% indicates that 40.7% of the variance in months to program completion can be explained by knowing the student’s number of earned academic degrees at admission (Cohen & Cohen, 1983).
The R2 can be very helpful in testing more than one predictor in a regression model. Unlike R, the R2 for one regression model can be compared with another regression model that contains additional predictors (Cohen & Cohen, 1983). The R2 is discussed further in Exercise 30.
The standardized beta (β) is another statistic that represents the magnitude of the association between x and y. β has limits just like a Pearson r, meaning that the standardized β cannot be lower than −1.00 or higher than 1.00. This value can be calculated by hand but is best computed with statistical software. The standardized beta (β) is calculated by converting the x and y values to z scores and then correlating the x and y value using the Pearson r formula. The standardized beta (β) is often reported in literature instead of the unstandardized b, because b does not have lower or upper limits and therefore the magnitude of b cannot be judged. β, on the other hand, is interpreted as a Pearson r and the descriptions of the magnitude of β can be applied, as recommended by Cohen (1988). In this example, the standardized beta (β) is −0.638. Thus, the magnitude of the association between x and y in this example is considered a large predictive association (Cohen, 1988).
This is how our data set looks in SPSS.
Step 1: From the “Analyze” menu, choose “Regression” and “Linear.”
Step 2: Move the predictor, Number of Degrees, to the space labeled “Independent(s).” Move the dependent variable, Number of Months to Completion, to the space labeled “Dependent.” Click “OK.”