Transcript Document
RESEARCH METHODOLOGY & STATISTICS TESTING DIFFERENCES IN PROPORTIONS Addictions Department MSc(Addictions) Testing differences between 2 Proportions • Another common problem in Addiction (& other) research • Consider dichotomous (Yes/ No) measures • ‘Exposure’ • Gender • Treated/ Not treated • ‘Outcome’ • Meeting diagnosis for e.g., alcohol dependence • Successful treatment outcome • Note that a construct can be measured using different metrics (e.g., dichotomous (dependence: yes/ no) vs continuous (number of dependence symptoms) Relapse in treated vs. controls • A clinical researcher hypotheses that the addition of a new treatment component will significantly improve treatment outcome among individuals receiving inpatient detox for alcohol dependence. • She recruits 50 consecutive patients admitted to the detox unit. Half of them receive additional training in meditation and half do not. • Patients are followed up (100% re-contact rate) at three months and assessed for alcohol consumption/ dependence. • Null Hypothesis (Ho): There will be no difference between the two groups in rates of relapse at 3 months Proportions/ Percentages/ Rates Percentage = Number of people with ‘x’/ total number % Treated = 25/ 50 = 50% Proportion (%/100; i.e.. Adds to 1) Proportion treated = .5 ‘Risk’ - Especially when discussing outcomes (e.g., relapse) we often talk about ‘risk’ • In this example the ‘risk’ of relapse might be expressed as either a percentage or a proportion. • • • • • • Relative Risk • At Follow-up 12 of the treatment group have remained abstinent compared with 3 of the control group • Relative risk is the ratio of the two risks (proportions) • So, in above example 12/ 25 = .48; 3/25 = .12 • Relative risk = .48/ .12 • =4.0 ‘Cured’ Relapse Total ‘Risk’ Treated 12 13 25 .48 Controls 3 22 25 .12 NOT in the exam (!) Formula for Relative Risk (again): Standard Error of the Relative Risk: 95% confidence interval around the Relative Risk + - Total ‘Risk’ Case a b (a+b) a/(a+b) Controls c d (c+d) c/(c+d) Take- home message Most/ All statistical packages – and numerous computer programs - automatically provide confidence intervals for estimates of relative risks (and other statistics) There is no need be memorize formulae used BUT: Important to recognize that there is a strong mathematical foundation to these calculations It IS possible to calculate them by calculator if you are provided with individual level data Standard error (& therefore CI) is partially determined by sample size Some Examples Use this website: http://www.medcalc.org/calc/relative_risk.php To: Calculate/ confirm RR & CI for the above example Calculate RR for this example: 3 Month Outcome ‘Sober’ Treatment Drinking Total Control 54 537 591 Meditation 345 293 638 Total 399 830 1229 Odds Ratio • Anyone gamble? • The odds of an event are the chance of an event happening divided by the chance of that event not happening • SO: IF rolling a dice, the odds of rolling a ‘1’ are 1/5. • In the treatment example above the odds of not drinking at 3 months in the treated/ meditation group are 12/ 13 • In the control group: 3/22 • The odds ratio is simply the ratio of these odds • = (12/13)/ (3/22) • = .9231/ .1364 • = 6.77 More Formally Odds ratio = (a*d)/(b*c) Example 3 Month Outcome ‘Sober’ Treatment Drinking Total Control 3 22 25 Meditation 12 13 25 Total 25 25 50 Odds ratio = (a*d)/(b*c) = (3 * 13)/(22 * 12) =39/264 = .1477 Odds Ratios are ‘symmetric’ .1477 X 6.7676 =1 ??????????? Some Features of Odds Ratios Can range from 0 to (theoretically) infinity An odds ratio of 1 = no difference between groups Odds ratios less than one represent reduced likelihood of outcome Odds ratio greater than one represent increased likelihood of outcome Are ‘symmetric’ Some Features of Odds Ratios Can calculate 95% (& other) Confidence intervals around the estimate Note that confidence intervals are calculated on a logarithmic scale so are not symmetrical If confidence intervals include ‘1’ then estimate is not significantly different from one – the difference between treatments is not significant & we can not reject the null hypothesis. Some Examples Use this website: http://www.medcalc.org/calc/odds_ratio.php To: Calculate/ confirm OR & CI for the above example Calculate OR for these examples: 3 Month Outcome ‘Sober’ Treatment Drinking Total Control 54 537 591 Meditation 345 293 638 Total 399 830 1229 Male Female Total Yes 675 416 1091 No 2570 3460 6030 Total 3245 3876 7121 Gender Alcohol Abuse Chi Square • The Chi square statistic (χ 2) tests whether the distribution of a categorical variable differ between groups/ categories • Examples of categorical variables • Primary drug (tobacco vs alcohol vs cannabis vs cocaine vs heroin) • Occupational status (student vs unemployed vs retired etc) • Status at follow-up of e.g., 20 year longitudinal study (interviewed vs located but refused vs. not located vs. dead) • χ 2 compares counts of categorical responses between 2+ independent groups Contingency Table In simplest case we can continue to consider the 2 X 2 contingency Table Calculating χ 2 Each cell of the contingency Table can be assigned an ‘expected’ value, assuming that there is no association The expected (E) value for ‘a’ would be: ((a+b)*(a+c))/(a+b+c+d) The value of the test-statistic is where O = an observed frequency; E = an expected (theoretical) frequency, asserted by the null hypothesis; Some Features of χ 2 • This calculation produces a Chi square statistic (χ 2) which, like the t-test (& others), has a known distribution • There is also a ‘degrees of freedom’ associated with this statistic, calculated: • D.f. = (number of columns -1)*(number of rows-1) • So, in our example, it would be (2-1)*(2-1)=1 • Using the estimated χ 2 and its degrees of freedom, we can look this up in Tables to find the significance level associated with a specific test value & d.f. Why not stick with Odds Ratios? • The example I’ve used could just as easily be tested using an OR & its 95% CI • BUT: Odds ratios are (generally) only useful for dichotomous variables • Chi can be used for categorical variables with 3 or more categories An Example • A researcher has conducted a 10 year follow-up on individuals entering treatment for alcohol/ drug addiction • In 2003 everyone entering treatment for addiction in her clinic was enrolled into a research study and baseline information was collected (e.g., patterns of drug use) • In this sample, there were three categories of drug users: those who primarily used alcohol, those who primarily used heroin and those who primarily used cocaine • At follow-up, after 10 years, she was able to interview only 60% of those who had participated in 2003 • Reasons for loss to follow-up included death, unable to locate, refusal Data Heroin Alcohol Cocaine Interviewed 50 70 60 Refused 5 12 8 Not Located 25 16 24 Died 20 12 8 An Example • Examine whether there were differences between groups (primary drug) and follow-up status • Use this calculator to calculate • http://www.quantpsy.org/chisq/chisq.htm • Firstly: How many degrees of freedom? Correlation • Correlations (r) are used to test whether there is a significant association between continuous variables • Examples of categorical variables • Height (cm), weight (kg) • Number of drinks? • Scores on a personality dimension • One web site: • http://www.mathsisfun.com/data/correlation.html Correlation • Can range between -1 and 1 – o indicates no association and 1 (+ or –) indicates perfect association. • Significance is sample size dependent • Important also to graph • Correlation is NOT causation – we’ll be discussing this more this afternoon. • Calculator: • http://www.socscistatistics.com/tests/pearson/Default2.a spx • (will need r, and n to click thru to p value calculator, • Exercise – try adding some outliers Summary • Comparing means on a continuous variable across subpopulations (e.g., gender) • T-test • Comparing rates of a dichotomous outcome across two groups • Relative risk • Odds ratio • Chi square • Comparing frequency of a categorical outcome across 2 or more groups • Chi square • Examining n associations between two continuous measures • Correlation Best way to test an association Gender Number of cigarettes/day Gender Ever used cannabis 1 0 1 3 2 1 2 15 1 0 2 25 2 1 2 25 1 1 1 19 2 0 1 40 1 1 1 10 1 0 1 20 2 1 2 20 Best way to test an association Region (1=NZ; 2=UK; 3=AUS) Preferred drug (1=Alc; 2=Cig; 3=THC) 1 1 2 3 3 2 2 1 1 1 3 2 1 1 3 3 2 1 Novelty seeking Max drinks/ month 30 5 28 99 33 0 21 25 28 67 32 12 30 30 31 16 29 44