Transcript Chapter 11
11-1 Chapter Eleven Simple Linear Regression Analysis 11-2 McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Simple Linear Regression 11-3 11.1 The Simple Linear Regression Model 11.2 The Least Squares Point Estimates 11.3 Model Assumptions, Mean Squared Error, Std. Error 11.4 Testing Significance of Slope and y-Intercept 11.5 Confidence Intervals and Prediction Intervals 11.6 The Coefficient of Determination and Correlation 11.7 An F Test for the Simple Linear Regression Model *11.8 Checking Regression Assumptions by Residuals *11.9 Some Shortcut Formulas 11.1 The Simple Linear Regression Model y= μ y|x ε = β0 β1 x ε Average Hourly Temperature Week x (deg F) 1 28.0 2 28.0 3 32.5 4 39.0 5 45.9 6 57.8 7 58.1 8 62.5 Weekly Fuel Consumption y (MMcf) 12.4 11.7 12.4 10.8 9.4 9.5 8.0 7.5 y|x = b0 + b1x + e is the mean value of the dependent variable y when the value of the independent variable is x. b0 is the y-intercept, the mean of y when x is 0. b1 is the slope, the change in the mean of y per unit change in x. e is an error term that describes the effect on y of all factors other than x. 11-4 The Simple Linear Regression Model Illustrated 11-5 11.2 The Least Squares Point Estimates yˆ b0 b1 x Estimation/Prediction Equation: Least squares point estimate of the slope b1 b1 SS xy SS xx SS xy x y ( x x )( y y ) x y i i i SS xx ( xi x ) i i x x n 2 2 2 i i Least squares point estimate of the y-intercept b0 b0 y b1 x 11-6 y y i n x x i n i n Example: The Least Squares Point Estimates Prediction (x = 40) yˆ b0 b1 x 15.84- 0.1279(40)10.72MMcfof Gas y 12.4 11.7 12.4 10.8 9.4 9.5 8.0 7.5 81.7 x2 784.00 784.00 1056.25 1521.00 2106.81 3340.84 3375.61 3906.25 16874.76 x 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5 351.8 xy 347.20 327.60 403.00 421.20 431.46 549.10 464.80 468.75 3413.11 Slope b1 SS xy x y 3413.11 (351.8)(81.7) 179.6475 x y i i i SS xx xi2 b1 11-7 SS xy SS xx x 2 i n i n 16874.76 179.6475 0.1279 1404.355 8 2 (351.8) 1404.355 8 y-Intercept b0 y 81.7 10.2125 n 8 xi 351.8 43.98 x n 8 y i b0 y b1 x 10.2125 (0.1279)( 43.98) 15.84 11.3 The Regression Model Assumptions Model y= μ y|x ε = β0 β1 x ε Assumptions about the model error terms, e’s Mean Zero The mean of the error terms is equal to 0. Constant Variance The variance of the error terms s2 is, the same for all values of x. Normality The error terms follow a normal distribution for all values of x. Independence The values of the error terms are statistically independent of each other. 11-8 Regression Model Assumptions Illustrated 11-9 Mean Square Error and Standard Error SSE ei2 ( yi yˆ i ) 2 Sum of Squared Errors s 2 MSE s MSE SSE Mean Square Error, point estimate n-2 of residual variance s2 SSE n-2 Standard Error, point estimate of residual standard deviation s Example 11.6 The Fuel Consumption Case y 12.4 11.7 12.4 10.8 9.4 9.5 8.0 7.5 11-10 x 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5 pred 12.2588 12.2588 11.6833 10.8519 9.9694 8.4474 8.4090 7.8463 y - pred 0.1412 -0.5588 0.7168 -0.0519 -0.5694 1.0526 -0.4090 -0.3462 SSE (y - pred)2 0.019937 0.312257 0.513731 0.002694 0.324205 1.108009 0.167289 0.119889 2.568011 s 2 MSE SSE n- 2 2.568 0.428 6 s s 2 0.428 0.6542 11.4 Significance Test and Estimation for Slope If the regression assumptions hold, we can reject H0: b1 = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than . Alternative Reject H0 if: p-Value t t Area under t distributi on right of t H a : b1 0 t t Area under t distributi on left of t H a : b1 0 t t / 2 , that is Twice area under t distributi on right of t H a : b1 0 t t / 2 or t t / 2 Test Statistic b s t= 1 where sb1 sb1 SS xx 100(1-)% Confidence Interval for b1 [b1 t / 2 sb1 ] t, t/2 and p-values are based on n – 2 degrees of freedom. 11-11 Significance Test and Estimation for y-Intercept If the regression assumptions hold, we can reject H0: b0 = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than . Alternative H a : b0 0 Reject H0 if: t t H a : b0 0 t t H a : b0 0 t t / 2 , that is p-Value Area under t distributi on right of t Area under t distributi on left of t Twice area under t distributi on right of t t t / 2 or t t / 2 100(1-)% Conf Interval for b0 Test Statistic b0 1 x2 [b0 t / 2 sb0 ] t= where sb0 s sb0 n SS xx t, t/2 and p-values are based on n – 2 degrees of freedom. 11-12 Example: Inferences About Slope and y-Intercept Regression Statistics Multiple R 0.948413871 R Square 0.899488871 Adjusted R Square 0.882737016 Standard Error 0.654208646 Observations 8 Example 11.7 The Fuel Consumption Case Excel Output ANOVA df 11-13 SS 22.980816 2.567934 25.548750 MS 22.980816 0.427989 F Significance F 53.694882 0.000330052 Regression Residual Total 1 6 7 Intercept Temp Coefficients Standard Error t Stat P-value 15.83785741 0.801773385 19.75353349 0.000001092 -0.127921715 0.01745733 -7.327679169 0.000330052 Tests Intercept Temp Coefficients Standard Error Lower 95% Upper 95% 15.83785741 0.801773385 13.87598718 17.79972765 -0.127921715 0.01745733 -0.170638294 -0.085205136 Intervals 11.5 Confidence and Prediction Intervals Distance Value 1 ( x0 x ) 2 n SS xx If the regression assumptions hold, Prediction (x = x0) yˆ b0 b1 x0 100(1 - )% confidence interval for the mean value of y, y|xo [yˆ t/2s Distance value ] 100(1 - )% prediction interval for an individual value of y [yˆ t/2s 1 + Distance value ] t/2 is based on n-2 degrees of freedom 11-14 Example: Confidence and Prediction Intervals Example 11.7 The Fuel Consumption Case Minitab Output (predicted FuelCons when Temp, x = 40) Predicted Values Fit StDev Fit 10.721 0.241 11-15 ( 95.0% CI 10.130, 11.312) ( 95.0% PI 9.014, 12.428) 11.6 The Simple Coefficient of Determination The simple coefficient of determination r2 is Explained variatio n r Total variation 2 r2 is the proportion of the total variation in y explained by the simple linear regression model Total variation Explained variation Unexplaine d variation Total variation = (yi y )2 Total Sum of Squares (SSTO) Explained variation = (yˆ i y )2 Regression Sum of Squares (SSR) Unexplaine d variation = (yi yˆ i )2 11-16 Error Sum of Squares (SSE) The Simple Correlation Coefficient The simple correlation coefficient measures the strength of the linear relationship between y and x and is denoted by r. r= r 2 if b1 is positive, and r= r 2 if b1 is negative Where, b1 is the slope of the least squares line. ANOVA df Regression Residual Total 1 6 7 SS 22.980816 2.567934 25.548750 MS 22.980816 0.427989 Regression Statistics Multiple R 0.948413871 R Square 0.899488871 Adjusted R Square 0.882737016 Standard Error 0.654208646 Observations 8 11-17 F Significance F 53.694882 0.000330052 Example 11.15 Fuel Consumption Excel Output 22.980816 r 0.899489 25.548750 r 0.899489 0.948414 2 Different Values of the Correlation Coefficient 11-18 11.7 F Test for Simple Linear Regression Model To test H0: b1= 0 versus Ha: b1 0 at the level of significance Test Statistic: F(model) Explained variation (Unexplained variation)/(n-2) Reject H0 if F(model) > F or p-value < F is based on 1 numerator and n-2 denominator degrees of freedom. 11-19 Example: F Test for Simple Linear Regression Example 11.17 The Fuel Consumption Case ANOVA df Regression Residual Total 1 6 7 SS 22.980816 2.567934 25.548750 MS 22.980816 0.427989 F Significance F 53.694882 0.000330052 Excel Output F-test at = 0.05 level of significance Test Statistic: F(model) Explained variation 22.980816 53.695 (Unexplain ed variation )/(n - 2) 2.567904 /(8 2) Reject H0 at level of significance, since F(model) 53.695 5.99 F.05 and p - value 0.00033 0.05 F is based on 1 numerator and 6 denominator degrees of freedom. 11-20 *11.8 Checking the Regression Assumptions by Residual Analysis For an observed value of y, the residual is e y yˆ (observed y predicted y) where the predicted value of y is calculated as yˆ b0 b1 x If the regression assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance s2. Residual Plots Residuals versus independent variables Residuals versus predicted y’s Residuals in time order (if the response is a time series) Histogram of residuals Normal plot of the residuals 11-21 Checking the Constant Variance Assumption Example 11.18: The QHIC Case Plot: Residual versus x and predicted responses 11-22 Checking the Normality Assumption Example 11.18: The QHIC Case Plots: Histogram and Normal Plot of Residuals 11-23 Checking the Independence Assumption Plots: Residuals versus Fits (to check for functional form, not shown) Residuals versus Time Order 11-24 Combination Residual Plots Example 11.18: The QHIC Case Minitab Output Plots: Histogram and Normal Plot of Residuals, Residuals versus Order (I Chart), Residuals versus Fit. Residual Model Diagnostics Normal Plot of Residuals I Chart of Residuals 500 300 3.0SL=396.3 100 Residual Residual 200 0 -100 2 2 0 X=0.000 -200 -3.0SL=-396.3 -300 -500 -1 0 1 0 10 20 30 Observation Number Histogram of Residuals Residuals vs. Fits 9 8 7 6 5 4 3 2 1 0 40 300 200 100 0 -100 -200 -300 -300 -200 -100 0 100 200 300 Residual 11-25 2 Normal Score Residual Frequency -2 0 2004006008001000 1 200 1400 1 600 1800 Fit *11.9 Some Shortcut Formulas Total variation SSTO SS yy Explained variation SSR SS xy2 SS xx Unexplaine d variation SSE = SS yy SS xy2 SS xx where SS xy x y ( x x )( y y ) x y i i SS xx ( xi x ) i i x x n y y n 2 2 SS yy ( yi y ) 11-26 i 2 i i 2 2 2 i i i n Simple Linear Regression Summary: 11.1 11.2 11.3 11.4 11.5 11.6 11.7 *11.8 *11.9 11-27 The Simple Linear Regression Model The Least Squares Point Estimates Model Assumptions, Mean Squared Error, Std. Error Testing Significance of Slope and y-Intercept Confidence Intervals and Prediction Intervals The Coefficient of Determination and Correlation An F Test for the Simple Linear Regression Model Checking Regression Assumptions by Residuals Some Shortcut Formulas