Transcript Chapter 11
Chapter 11 Analysis of Variance and Chi-Square Applications © 2002 Thomson / South-Western Slide 11-1 Learning Objectives • Understand the differences between various experimental designs and when to use them. • Compute and interpret the results of a one-way ANOVA. • Compute and interpret the results of a random block design. © 2002 Thomson / South-Western Slide 11-2 Learning Objectives, continued • Compute and interpret the results of a two-way ANOVA. • Understand and interpret interaction. • Understand the chi-square goodnessof-fit test and how to use it. • Analyze data by using the chi-square test of independence. © 2002 Thomson / South-Western Slide 11-3 Introduction to Design of Experiments • An Experimental Design is a plan and a structure to test hypotheses in which the business analyst controls or manipulates one or more variables. It contains independent and dependent variables. • Factors is another name for the independent variables of an experimental design. © 2002 Thomson / South-Western Slide 11-4 Design of Experiments, continued • Treatment variable is the independent variable that the experimenter either controls or modifies. • Classification variable is the independent variable that was present prior to the experiment, and is not a result of the experimenter’s manipulations or control. © 2002 Thomson / South-Western Slide 11-5 Design of Experiments, continued • Levels or Classifications are the subcategories of the independent variable used by the business analyst in the experimental design. • The Dependent Variable is the response to the different levels of the independent variables. © 2002 Thomson / South-Western Slide 11-6 Three Types of Experimental Designs • Completely Randomized Design • Randomized Block Design • Factorial Experiments © 2002 Thomson / South-Western Slide 11-7 Completely Randomized Design 1 Machine Operator 2 3 Valve Opening Measurements . . . . . . © 2002 Thomson / South-Western . . . Slide 11-8 Example: Number of Foreign Freighters Docking in each Port per Day Long Beach Houston New York New Orleans 5 2 8 3 7 3 4 5 4 5 6 3 2 4 7 4 6 9 2 8 © 2002 Thomson / South-Western Slide 11-9 Analysis of Variance (ANOVA): Assumptions • Observations are drawn from normally distributed populations. • Observations represent random samples from the populations. • Variances of the populations are equal. © 2002 Thomson / South-Western Slide 11-10 One-Way ANOVA: Procedural Overview H : o 1 2 3 k Ha: At least one of the means is different from the others MSC F MSE If F > If F © 2002 Thomson / South-Western F , reject H . F , do not reject H . c c o o Slide 11-11 Partitioning Total Sum of Squares of Variation SST (Total Sum of Squares) SSC (Treatment Sum of Squares) © 2002 Thomson / South-Western SSE (Error Sum of Squares) Slide 11-12 One-Way ANOVA: Sums of Squares Definitions Total sum of squares = error sum of squares + between sum of squares SST = SSC + SSE X ji X C nj j=1 i=1 2 C n j X j X j 1 X ij X j 2 C nj 2 j 1 i 1 where : i particular member of a treatment level j = a treatment level C = number of treatment levels n number of observations in a given treatment level j X= grand mean X = mean of a treatment group or level X individual value j ij © 2002 Thomson / South-Western Slide 11-13 One-Way ANOVA: Computational Formulas X X X X 2 C SSC n j j j 1 C SSE nj nj SST j 1 i 1 MSC ij MSE X ij X j C SSE df C 1 df E N C 2 SSC df C 2 j 1 i 1 C df df T N 1 where: i = a particular member of a treatment level j = a treatment level C = number of treatment levels n= j E MSC F MSE X = grand mean X X = j ij © 2002 Thomson / South-Western number of observations in a given treatment level column mean individual value Slide 11-14 Freighter One-Way ANOVA: Preliminary Calculations New Orleans Long Beach Houston New York 5 7 4 2 2 3 5 4 6 8 4 6 7 9 8 3 5 3 4 2 T1 = 18 T2 = 20 T3 = 42 T4 = 17 n1= 4 n2 = 5 n3 = 6 n4 = 5 T = 97 N = 20 © 2002 Thomson / South-Western Slide 11-15 Freighter One-Way ANOVA: Sum of Squares Calculations T n X j : T j : n X j : 1 18 T 1 4 n X 1 4.5 © 2002 Thomson / South-Western 2 20 T 2 5 n X 2 4.0 3 42 T 3 6 n X 3 7.0 4 42 4 5 N 20 3.4 X 4.85 4 T 97 Slide 11-16 Freighter One-Way ANOVA: Sum of Squares Calculations, continued C SSC n j j 1 X 2 j X [ 4 (4.5 4.85) 5 (4.0 4.85) 6 (4.7 4.85) 5 (3.4 4.85) 2 42.35 C nj SSE X ij X j 1 i 1 2 2 2 j (5 4.5) (7 4.5) (4 4.5) (2 4.5) 2 2 2 2 (2 4.0) (3 4.0) (4 34 . ) (2 34 . ) 2 44.20 C nj SST X ij X j 1 i 1 2 2 2 2 2 (5 4.85) (7 4.85) (4 4.85) (4 4.85) (2 4.85) 2 2 2 2 2 8655 . © 2002 Thomson / South-Western Slide 11-17 Freighter OneWay ANOVA: Mean Square and F Calculations df df df C E T C 1 4 1 3 N C 20 4 16 N 1 20 1 19 MSC MSE SSC df C SSE df 42.35 14.12 3 44.20 2.76 16 E MSC 14.12 F 512 . MSE 2.76 © 2002 Thomson / South-Western Slide 11-18 Freighter Example: Analysis of Variance Source of Variancedf SS MS F Between Factor Error Total 3 16 19 42.35 44.20 86.55 14.12 2.76 © 2002 Thomson / South-Western 5.12 Slide 11-19 A Portion of the F Table for = 0.05 F Denominator Degrees of Freedom 1 ... 15 16 17 .05,3,16 Numerator Degrees of Freedom 1 2 3 4 5 6 7 8 9 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 ... ... ... ... ... ... ... ... ... 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 © 2002 Thomson / South-Western Slide 11-20 Freighter One-Way ANOVA: Procedural Summary Ho : 1 2 3 4 Ha : At least one of themeans is differentfrom theothers If F > If F F 3.24, reject H . F 3.24, do reject H . c c 1 3 2 16 o o Since F = 5.12 > Fc 3.24, reject Ho. © 2002 Thomson / South-Western Rejection Region Non rejection Region F .05,9,11 324 . Critical Value Slide 11-21 Excel Output for the Freighter Example Anova: Single Factor SUMMARY Groups Long Beach Houston New York New Orleans Count 4 5 6 5 ANOVA Source of Variation Between Groups Within Groups SS 42.35 44.2 Total 86.55 © 2002 Thomson / South-Western Sum Average Variance 18 4.5 4.3333 20 4 2.5 42 7 3.2 17 3.4 1.3 df 3 16 MS 14.117 2.7625 F P-value 5.1101 0.0114 F crit 3.2389 19 Slide 11-22 Multiple Comparison Tests • An analysis of variance (ANOVA) test is an overall test of differences among groups. • Multiple Comparison techniques are used to identify which pairs of means are significantly different given that the ANOVA test reveals overall significance. © 2002 Thomson / South-Western Slide 11-23 Randomized Block Design • An experimental design in which there is one independent variable, and a second variable known as a blocking variable, that is used to control for confounding or concomitant variables. • Confounding or concomitant variable are not being controlled by the business analyst but can have an effect on the outcome of the treatment being studied. © 2002 Thomson / South-Western Slide 11-24 Randomized Block Design, continued • Blocking variable is a variable that the business analyst wants to control but is not the treatment variable of interest. • Repeated measures design is a randomized block design in which each block level is an individual item or person, and that person or item is measured across all treatments. © 2002 Thomson / South-Western Slide 11-25 Partitioning the Total Sum of Squares in the Randomized Block Design SST (total sum of squares) SSE (error sum of squares) SSC (treatment sum of squares) SSR (sum of squares blocks) © 2002 Thomson / South-Western SSE’ (sum of squares error) Slide 11-26 A Randomized Block Design Single Independent Variable . Individual observations . Blocking Variable . . . . . . . . . . . . . . . © 2002 Thomson / South-Western Slide 11-27 Randomized Block Design Treatment Effects: Procedural Overview Ho : 1 2 3 k Ha : At least one of themeansis differentfrom theothers MSC F MSE If F > If F F , reject H . F , do not reject H . c c © 2002 Thomson / South-Western o o Slide 11-28 Randomized Block Design: Computational Formulas C SSC n ( X j X ) j 1 n SSR C ( X i 1 n n i X ) 2 2 SSE ( X ij X i X i X ) j 1 i 1 n n SST ( X ij X ) j 1 i 1 SSC MSC C 1 SSR MSR n 1 SSE MSE N n C 1 MSC F treatments MSE MSR F blocks MSE 2 2 df C df R df E df E C 1 n 1 C 1 n 1 N n C 1 N 1 where: i = block group (row) j = a treatment level (column) C = number of treatment levels (columns) n = number of observations in each treatment level (number of blocks - rows) X individual observation X treatment (column) mean X block (row) mean © 2002 Thomson / South-Western ij j i SSC sum of squares columns (treatment) SSR = sum of squares rows (blocking) SSE = sum of squares error SST = sum of squares total X = grand mean N = total number of observations Slide 11-29 Tread-Wear Example: Randomized Block Design Speed Supplier Slow Medium Fast Block Means ( X ) i n=5 1 3.7 4.5 3.1 3.77 2 3.4 3.9 2.8 3.37 3 3.5 4.1 3.0 3.53 4 3.2 3.5 2.6 3.10 5 3.9 4.8 3.4 4.03 3.54 4.16 2.98 3.56 Treatment Means( X ) N = 15 X j C=3 © 2002 Thomson / South-Western Slide 11-30 Tread-wear Randomized Block Design: Sum of Squares Calculations (Part 1) C SSC n ( X j X ) j 1 2 5[(3.54 356 . ) (4.16 356 . ) (2.98 356 . ) 2 2 2 3484 . n SSR C ( X i 1 i X ) 2 3[(3.77 356 . ) (3.37 356 . ) (3.53 356 . ) (3.10 356 . ) (4.03 356 . )] 2 2 2 2 2 1549 . © 2002 Thomson / South-Western Slide 11-31 Tread-wear Randomized Block Design: Sum of Squares Calculations (Part 2) C n SSE ( X ij X j X i X ) j 1 i 1 2 (3.7 354 . 377 . 356 . ) (3.4 354 . 337 . 356 . ) 2 2 (2.6 2.98 310 . 356 . ) (3.4 2.98 4.03 356 . ) 0143 . 2 C n SST ( X ij X ) 2 2 j 1 i 1 (3.7 356 . ) (3.4 356 . ) (2.6 3.56) (3.4 356 . ) 2 2 2 2 5176 . © 2002 Thomson / South-Western Slide 11-32 Tread-wear Randomized Block Design: Mean Square Calculations SSC 3.484 MSC 1742 . C 1 2 SSR 1549 . MSR 0.387 n 1 4 SSE 0143 . MSE 0.018 N n C 1 8 MSC 1742 . F 96.78 MSE 0.018 © 2002 Thomson / South-Western Slide 11-33 Analysis of Variance for the Tread-Wear Example Source of VarianceSS df Treatment 3.484 Block 1.549 Error 0.143 Total 5.176 © 2002 Thomson / South-Western MS 2 4 8 14 F 1.742 0.387 0.018 96.78 Slide 11-34 Tread-wear Randomized Block Design Treatment Effects: Procedural Summary Ho: 1 2 3 Ha: At least one of the means is different from the others MSC 1742 . F 96.78 MSE 0.018 F = 96.78 > © 2002 Thomson / South-Western F .01,2,8 = 8.65, reject Ho. Slide 11-35 Excel Output for Tread-Wear Randomized Block Design Anova: Two-Factor Without Replication SUMMARY 1 2 3 4 5 Slow Medium Fast Count Sum 11.3 10.1 10.6 9.3 12.1 Average 3.7666667 3.3666667 3.5333333 3.1 4.0333333 Variance 0.4933333 0.3033333 0.3033333 0.21 0.5033333 5 17.7 5 20.8 5 14.9 3.54 4.16 2.98 0.073 0.258 0.092 3 3 3 3 3 ANOVA Source of Variation SS df MS F P-value F crit Rows 1.5493333 4 0.3873333 21.719626 0.0002357 7.0060651 Columns 3.484 2 1.742 97.682243 2.395E-06 8.6490672 Error 0.1426667 8 0.0178333 Total © 2002 Thomson / South-Western 5.176 14 Slide 11-36 Two-Way Factorial Design • An experimental design in which two ot more independent variables are studied simultaneously and every level of treatment is studied under the conditions of every level of all other treatments. • Also called a factorial experiment. © 2002 Thomson / South-Western Slide 11-37 Two-Way Factorial Design Column Treatment . . Row Treatment Cells . . . . . . . . . . . . . . . © 2002 Thomson / South-Western Slide 11-38 Two-Way ANOVA: Hypotheses Row Effects: Ho: Row Means are all equal. Ha: At least one row mean is different from the others. Columns Effects: Ho: Column Means are all equal. Ha: At least one column mean is different from the others. Interaction Effects: Ho: The interaction effects are zero. Ha: There is an interaction effect. © 2002 Thomson / South-Western Slide 11-39 Formulas for Computing a Two-Way ANOVA R SSR nC ( X i 1 C i X ) 2 SSC nR ( X j X ) j 1 R 2 C SSI n ( X ij X i X j X ) i 1 j 1 SSE ( X ijk X ij ) R C n i 1 j 1 k 1 C R n SST ( X ijk X ) 2 2 c 1 r 1 a 1 SSR R 1 SSC MSC C 1 SSI MSI R 1 C 1 SSE MSE RC n 1 MSR © 2002 Thomson / South-Western 2 R 1 df R df C 1 C df I df df E T R 1 C 1 C = number of column treatments RC n 1 R = number of row treatments i = row treatment level N 1 MSR MSE MSC F C MSE MSI F I MSE F R where: n = number of observations per cell j = column treatment level k = cell member Xijk = individual observation X X X ij i j = cell mean = row mean = column mean X = grand mean Slide 11-40 A 2 3 Factorial Design with Interaction Row effects Cell Means R1 R2 C1 © 2002 Thomson / South-Western C2 Column C3 Slide 11-41 A 2 3 Factorial Design with Some Interaction Row effects Cell Means R1 R2 C1 © 2002 Thomson / South-Western C2 Column C3 Slide 11-42 A 2 3 Factorial Design with No Interaction Row effects Cell Means R1 R2 C1 C2 C3 Column © 2002 Thomson / South-Western Slide 11-43 CEO Dividend 2 3 Factorial Design: Data and Measurements Location Where Company Stock is Traded How Stockholders are Informed of Dividends Annual/Quarterly Reports Presentations to Analysts Xj NYSE AMEX 2 1 2 1 X11=1.5 2 3 1 2 X21=2.0 2 3 3 2 X12=2.5 3 3 2 4 X22=3.0 1.75 2.75 © 2002 Thomson / South-Western OTC Xi 4 3 4 2.5 3 X13=3.5 4 4 3 2.9167 4 X23=3.75 X=2.7083 N = 24 n=4 3.625 Slide 11-44 CEO Dividend 2 3 Factorial Design: Calculations (Part 1) R SSR nC ( X i X ) 2 i 1 ( 4)( 3)[( 2.5 2.7083) 2 (2.9167 2.7083) 2 ] 10418 . C SSC nR ( X j X ) 2 j 1 ( 4)( 2)[(1.75 2.7083) 2 (2.75 2.7083) 2 (3.625 2.7083) 2 ] 14.0833 R C SSI n ( X ij X i X j X ) 2 i 1 j 1 4[(15 . 2.5 1.75 2.7083) 2 (2.5 2.5 2.75 2.7083) 2 ( 3.5 2.5 3.625 2.7083) 2 ( 2.0 2.9167 1.75 2.7083) 2 ( 3.0 2.9167 2.75 2.7083) 2 ( 3.75 2.9167 3.625 2.7083) 2 ] 0.0833 © 2002 Thomson / South-Western Slide 11-45 CEO Dividend 2 3 Factorial Design: Calculations SSE ( X ijk X ij) R C n (Part 2) 2 i 1 j 1 k 1 (2 15 . ) (115 . ) (3 375 . ) (4 375 . ) 2 2 2 2 7.7500 C R n SST ( X ijk X ) 2 c 1 r 1 a 1 (2 2.7083) (1 2.7083) (3 2.7083) (4 2.7083) 2 2 2 2 22.9583 © 2002 Thomson / South-Western Slide 11-46 CEO Dividend 2 3 Factorial Design: Calculations (Part 3) SSR 10418 . MSR 10418 . R 1 1 SSC 14.0833 MSC 7.0417 C 1 2 SSI 0.0833 MSI 0.0417 R 1 C 1 2 SSE 7.7500 MSE 0.4306 RC n 1 18 © 2002 Thomson / South-Western MSR 10418 . F R MSE 0.4306 2.42 MSC 7.0417 F C MSE 0.4306 16.35 MSI 0.0417 . F I MSE 0.4306 010 Slide 11-47 CEO Dividend: Analysis of Variance Source of VarianceSS df Row 1.0418 Column 14.0833 Interaction 0.0833 Error 7.7500 Total 22.9583 *Denotes MS 1 2 2 18 23 F 1.0418 2.42 7.0417 16.35* 0.0417 0.10 0.4306 significance at = .01. © 2002 Thomson / South-Western Slide 11-48 Anova: Two-Factor With Replication CEO Dividend Excel Output (Part 1) SUMMARY Reports Count Sum Average Variance Presentation Count Sum Average Variance NYSE ASE OTC Total 4 6 1.5 0.3333 4 10 2.5 0.3333 4 14 3.5 0.3333 12 30 2.5 1 4 8 2 0.6667 4 12 3 0.6667 4 15 3.75 0.25 12 35 2.9167 0.9924 8 14 1.75 0.5 8 22 2.75 0.5 8 29 3.625 0.2679 Total Count Sum Average Variance © 2002 Thomson / South-Western Slide 11-49 CEO Dividend Excel Output (Part 2) ANOVA Source of Variation Sample Columns Interaction Within SS 1.0417 14.083 0.0833 7.75 Total 22.958 © 2002 Thomson / South-Western df 1 2 2 18 MS 1.0417 7.0417 0.0417 0.4306 F P-value F crit 2.4194 0.1373 4.4139 16.355 9E-05 3.5546 0.0968 0.9082 3.5546 23 Slide 11-50 2 Goodness-of-Fit Test The 2 goodness-of-fit test compares expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what was expected and what was observed. © 2002 Thomson / South-Western Slide 11-51 2 Goodness-of-Fit Test f o f e 2 2 f e df = k - 1 - c where : f e f o frequencyof observed values frequencyof expectedvalues k number of categories c = number of parametersestimatedfrom thesampledata © 2002 Thomson / South-Western Slide 11-52 Milk Sales Data for Demonstration Problem 11.4 © 2002 Thomson / South-Western Month January February March April May June July August September October November December Gallons 1,553 1,585 1,649 1,590 1,497 1,443 1,410 1,450 1,495 1,564 1,602 1,609 18,447 Slide 11-53 Demonstration Problem 11.4: Hypotheses and Decision Rules Ho : T hemonthlymilk figures for milk sales are uniformlydistributed Ha : T hemonthlymilk figures for milk sales are not uniformlydistributed .01 df k 1 c 12 1 0 11 2 .01,11 If If 2 Cal 2 Cal 24.725, reject Ho. 24.725, do not reject Ho. 24.725 © 2002 Thomson / South-Western Slide 11-54 Demonstration Problem 11.4: Calculations Month January February March April May June July August September October November December fo fe (fo - fe)2/fe 1,553 1,537.25 0.16 1,585 1,537.25 1.48 1,649 1,537.25 8.12 1,590 1,537.25 1.81 1,497 1,537.25 1.05 1,443 1,537.25 5.78 1,410 1,537.25 10.53 1,450 1,537.25 4.95 1,495 1,537.25 1.16 1,564 1,537.25 0.47 1,602 1,537.25 2.73 1,609 1,537.25 3.35 18,447 18,447.00 41.59 © 2002 Thomson / South-Western Observed Chi-square = 41.59 Slide 11-55 Demonstration Problem 11.4: Conclusion df = 11 Non Rejection region 0.01 24.725 2 Cal © 2002 Thomson / South-Western 41.59 24.725, reject Ho. Slide 11-56 Defects Example: Using a 2 Goodness-ofFit Test to Test a Population Proportion .05 df k 1 c 2 1 0 1 Ho : P = .08 Ha: P .08 If 2 3.841 .05,1 © 2002 Thomson / South-Western If 2 Cal 2 Cal 3841 . , reject Ho. 3841 . , do not reject Ho. Slide 11-57 Defects Example: Calculations fo 33 167 200 Defects Nondefects n= Defects f f e e fe 16 184 200 nP 200 .08 16 Nondefects f f e e f o f e 2 2 f e 167184 33 16 = 2 2 16 . + 1.57 18.06 . 1963 184 n 1 P 200 .92 184 © 2002 Thomson / South-Western Slide 11-58 df = 1 Defects Example: Conclusion 0.05 Non Rejection region 3.841 © 2002 Thomson / South-Western 2 Cal 19.63 3.841, reject Ho. Slide 11-59 Contingency Analysis: 2 Test of Independence A statistical test used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent. Qualitative Variables Nominal Data © 2002 Thomson / South-Western Slide 11-60 Investment Example: 2 Test of Independence • In which region of the country do you reside? A. Northeast B. Midwest C. South D. West • Which type of financial investment are you most likely to make today? E. Stocks F. Bonds G. Treasury bills © 2002 Thomson / South-Western Slide 11-61 Investment Example: 2 Test of Independence Type of financial Investment Contingency Table E F A Geographic B C Region D nE © 2002 Thomson / South-Western nF G O13 nG nA nB nC nD N Slide 11-62 Investment Example: 2 Test of Independence If A and F are independent, P A F P A P F n P A A N n P F F e AF n A nF N N N N P A F n n A N N P A F F N n n A F N Type of Financial Investment Contingency Table E A Geographic B C Region D G e12 nE © 2002 Thomson / South-Western F nF nG nA nB nC nD N Slide 11-63 2 Test of Independence: Formulas eij Expected Frequencies n n i j N where : i = the row j = the colum n ni nj the total of row i the total of column j N = the total of all fr equencies f o f e 2 Calculated (Observed ) © 2002 Thomson / South-Western 2 fe where : df = (r - 1)(c - 1) r = the numbe r of rows c = the numbe r of columns Slide 11-64