Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to validity and whether appropriate remedies were employed Al Best, PhD Perkinson 3100B [email protected] V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Goals Be able.
Download ReportTranscript Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to validity and whether appropriate remedies were employed Al Best, PhD Perkinson 3100B [email protected] V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Goals Be able.
Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to validity and whether appropriate remedies were employed Al Best, PhD Perkinson 3100B [email protected] V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Goals Be able to answer four questions: Based on the study design, what is the level of evidence? How were threats to validity addressed? Based on the goals of the study, How do you describe the results? To justify the conclusions, were comparisons done appropriately? V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Overview Threats to validity – Bias – Confounding – Chance – Multiplicity Some solutions – Study design – Randomization – Masking (AKA blinding) – Analysis V I R G I N I A C O M M O Analysis – – – – Descriptive stats SD vs SE T-test and ANOVA Statistical significance vs Clinical importance – Ordinal data and nonparametric stats – Correlation – Survival analysis N W E Did the paper do the right stats? A L T H U N I V E R S I T Y Bias Definition, Bias: – Systematic distortion of the estimated intervention effect away from the “truth” – Caused by inadequacies in the design, conduct, or analysis of a trial Selection bias Measurement bias V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Selection bias Definition: Bias from the use of a nonrepresentative group as the basis of generalization to a broader population Example: Estimate prognosis from patients newly diagnosed and infer to patients hospitalized with the disease – Newly diagnosed patients have a much broader spectrum of outcomes V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Selection bias Selection bias? – How are patients allocated to intervention groups? – How are exposure groups identified? Patients across time: – Groups comparable at baseline? – Similar follow-up? Similar dropout? – ALL subjects analyzed? (NOT only the completers!) V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Measurement (information) bias Definition, Measurement bias: – Systematic failure of a measurement process to accurately represent the measurement target Examples: – different approaches to questioning, when determining past exposures in a case-control study – more complete medical history and physical examination of subjects who have been exposed to an agent suspected of causing a disease than of those who have not been exposed to the agent V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Measurement bias? NHANES III: “We estimate that at least 35% of the dentate US adults aged 30 to 90 have periodontitis”1 – Mesial and buccal surfaces – Two randomly selected quadrants – CAL≥3mm Or: Full mouth prevalence= 65%2 JM Albandar, JA Brunelle, A Kingman (1999) "Destructive periodontal disease in adults 30 years of age and older in the United States, 1988-1994". Journal of Periodontology 70 (1): 13–29. 2 A Kingman & JM Albandar (2002) “Methodological aspects of epidemiological studies of periodontal diseases.” Periodontology 2000 29, 11–30. 1 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Confounding Informal definition: distortion of the true biologic relation between an exposure and a disease outcome of interest Usually due to a research design and analysis that fail to account for additional variables associated with both – Such variables are referred to as confounders or as lurking variables – Look for factors associated with the outcome and with the exposure V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Confounding examples Misidentified carcinogen Prior to the discovery of HPV, HSV-2 was associated with the cervical cancer It is now well established that HPV is central to the pathogenesis of invasive cervical cancer. And HSV-2 appears to increase the risk Hawes & Kiviat (2002) Are Genital Infections and Inflammation Cofactors V inI the R GPathogenesis I N I A Cof OInvasive M M O Cervical N W E Cancer? A L T H JNCI U N94(21): I V E 1592-159 R S I T Y Perio and CVD Cigarette smoking is associated with adult perio and CVD This produces an association between perio and CVD Control for smoking to see the perio-CVD relationship clearly Scannapieco et al. (2003) Associations Between Periodontal Disease and Risk for Atherosclerosis, Cardiovascular Disease, and Stroke. A Systematic VReview. I R G Annals I N I of A Periodontology C O M M O N(8)38-53 W E A L T H U N I V E R S I T Y Chance Begin by assuming: No relationship. No difference. No change. The intervention has no effect. The exposure changes nothing. Ask: “I assume no effect, do the data support this?” – The p-value answers this question. Decision rule: p-value < 0.05 means the data is unlikely to have occurred by chance. – A license to make up a story P-value > 0.05 means there is no story – It does NOT mean that the study demonstrated no relationship, no difference, no change. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Multiplicity Outcome: Caries – visual exam – x-ray interpretation – Fiber optic transillumination – Electrical caries meter – DiagnoDent Outcome: Periodontology – alveolar bone loss – clinical attachment level – pocket depth Clustered data in Independent Subjects – Teeth – Tooth surfaces – Restorations – Implants Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent. 2013 May;41(5):385-92. pubmed/23459191 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Multiplicity effects N=47 perio, N=20 healthy Analyzed for the presence of 300 species V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identify multiplicity effects Multiple outcomes: The proliferation of possible comparisons in a trial. Common sources of multiplicity are: – multiple outcome measures, assessment at several time points, subgroup analyses, or multiple intervention groups Multiple comparisons: Performance of multiple analyses on the same data. Multiple statistical comparisons increase the probability of a type I error: “finding” an association when there is none. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identify multiplicity effects Analysis of the same variable at multiple time points after treatment initiation Periodic analysis of accumulating partial results Post hoc subgroup comparisons are especially likely not to be confirmed in following studies Bottom line: With every comparison, the chance of a false positive goes up exponentially. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Stats 3: Overview Threats to validity – Bias – Confounding – Chance – Multiplicity Some solutions – Study design – Randomization – Masking (AKA blinding) – Analysis V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Study Design “A justification for the sample size used in the study should be given. Baseline characteristics of the study groups should be compared and information given on non-response and dropouts.” Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent. 2013 May;41(5):385-92. pubmed/23459191 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Power and Sample Size But first, backing up. What is the definition of significance level (alpha)? – It is the probability of rejecting a true null hypothesis. What is the definition of a p-value? – The p-value is the probability that the data occurred by chance, assuming the null hypothesis is true. – The p-value is NOT the probability that the null-hypothesis is true. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Trade offs Conclusion Do not reject Reject nullnull-hypothesis hypothesis (p-value > .05) (p-value < .05) Truth Null-hypothesis (no difference) correct Type I error Alternative hypothesis (difference) Type II error correct Alpha = Type I error = prob. of rejecting a true null hypothesis V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Trade offs Conclusion Do not reject Reject nullnull-hypothesis hypothesis (p-value > .05) (p-value < .05) Truth Null-hypothesis (no difference) correct Type I error Alternative hypothesis (difference) Type II error correct Beta = Type II error = prob. of not finding a true difference Power = probability of rejecting HO when it is false. Power = probability of finding a true difference. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Power Power = probability of finding a true difference. Power depends upon: – The size of the difference V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Power Power = probability of finding a true difference. Power depends upon: – The size of the difference – Measurement variability – Sample size V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Randomization: What it is Randomization: assignment of treatments to patients (equivalently, patients to treatments) based a chance Can take many different forms, all acceptable – The simplest is a coin-flip for each patient Look for exactly HOW randomization happened – An explicit description is required – If the paper does not SAY random assignment was done, it wasn’t. Note: Don’t confuse “random selection of subjects” with “random assignment to treatments” V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Randomization: Why? What it accomplishes – Virtually eliminates opportunities for intentional or inadvertent skewing of patient allocation to favor a treatment – Eliminates other selection biases of all sorts affecting treatment comparisons, period! – Tends to protect against confounding But – Cannot assure comparable groups – Randomize after recruitment and consent – No effect on measurement bias or placebo effect V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Blinding, AKA: Masking V I Masking and Blinding refer to concealment of the randomized intervention received by a patient. Who may be blind: – Case/patients/participants – Interventionists, those treating participants – Those measuring outcomes: Clinicians and technicians who do not treat case/patients, but are involved in evaluating their outcomes – Investigators involved in decision-making about policies during the trial, and about statistical analyses to interpret the resulting data R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Double? Blind “Blinding is intended to prevent bias on the part of study personnel. The most common application is doubleblinding, in which participants, caregivers, and outcome assessors are blinded to intervention assignment.” Altman, et al. (2001) The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine, 134(8), 663-694. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Does the paper say blinding occurred? Needs to be explicit – Which of the trial participants were masked, and how treatment was concealed – Understand what the blinding accomplished Blinded measurement directly and totally protects against “diagnostic suspicion bias,” a skewing by treatment-influenced expectations Look for differential dropouts – as “uncooperative” patients get less social support for returning for follow-up V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Analysis: Overview Threats to validity – Bias – Confounding – Chance – Multiplicity Some solutions – Study design – Randomization – Masking (AKA blinding) – Analysis V I R G I N I A C O M M O Analysis – – – – Descriptive stats SD vs SE T-test and ANOVA Statistical significance vs Clinical importance – Ordinal data and nonparametric stats – Correlation – Survival analysis N W E Did the paper do the right stats? A L T H U N I V E R S I T Y Quantitative Data Continuous Type – Age, duration of disease, roughness, level, color change Discrete Type (count data) – dmfs, dmft, # involved surfaces, # bleaching treatments V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Describe Quantitative Data We describe numeric data by: Measures of Centrality – AKA: typical value, location – Mean, median Measures of Spread – Standard deviation, range Shape of distribution – Normal – Skewed V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Descriptive Statistics Both a measure of centrality and a measure of variability are required to describe a set of numeric data e.g. mean (SD=) or median (first quartile, third quartile). The standard deviation is only appropriate for use with the mean. The mean and the median should be routinely compared to investigate the impact of outliers. Interpretations – 95% of the individuals are within 2 SD of the mean – 50% of the individuals are between the 25th%tile and the 75th%tile SD = square root (average squared deviations from the mean) Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent. 2013 May;41(5):385-92. pubmed/23459191 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y The median is little affected by extreme observations V I R The three distributions above have the same median, but different means. G I N I A C O M M O N W E A L T H U N I V E R S I T Y The median is little affected by extreme observations V I R The three distributions above have the same median, but different means. G I N I A C O M M O N W E A L T H U N I V E R S I T Y Example: Henson, et al. V The purpose of this study was to determine whether dental esthetics influenced the perceptions of teens when judging a peer’s athletic, social, leadership, and academic abilities. Methods: The frontal-face smiling photographs of 10 teenage volunteers were each altered to create 1 image with an ideal arrangement of teeth and 1 with a nonideal arrangement. Two parallel surveys were constructed with 1 photo displaying either an ideal or a nonideal smile image of each subject. If the ideal smile image appeared in one survey, then the nonideal smile appeared in the other. N=221 peer evaluators rated the pictures. I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Influence of dental esthetics on social perceptions of adolescents judged by peers ST Henson, SJ Lindauer, WG Gardner, B Shroff, E Tufekci, and AM Best V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y SE=standard error of the estimate SE=SD/√n V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Describing Average Data Fig 2. Ratings for perceived social characteristics between ideal and non-ideal smiles. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Describing Numeric Data Boxplot: 75th%tile Median 25th%tile whiskers V I Distribution of data from the two parallel surveys. Visual analog scale; 50=neutral, 0=disagree, 100=agree “This person is a leader” R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Test Statistic A test statistic compares what we expect under the null hypothesis to what we actually observe. V 𝑇= Mean Difference SE (Difference) “I presume that the null hypothesis is true, do the data support this?” I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Student’s T The T distribution was discovered by the mathematician William Gossett, who was employed by the Guinness brewery. He used the pseudonym “Student” in his paper describing his result because of the company policy prohibiting publication. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y ANOVA A t-test is used when comparing (only) two groups. When more than two groups are compared, or comparisons are using multiple classification variables, use Analysis of Variance. Example: in the AJODO paper we tested whether the mean VAS was different across: – Evaluator’s sex, and race, and – Picture’s sex, race, and “ideal smile vs. nonideal” V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Statistical Significance vs. Clinical Importance Stats: The difference is larger than chance. Clinical: The difference is large enough to matter. – Look at the CIs V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Parametric testing vs. Nonparametric testing Recall: Populations have parameters and we use sample data to estimate Parametric tests assume that the data is Normally distributed. Nonparametric tests do not make this assumption. The data is just ranks (ordinal data) and the distributions are compared. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Normal Distribution? Not every measure has a Normal distribution Some are highly skewed (i.e., a few very large values) Restricted range (eg., no zero values) Examples: – Triglyceride – Microbial counts – dmfs/DMFS scores – Shear strength (breaking strength) V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Normal Distribution? CFU/ml Enterococcus Faecalis Control Sodium Hypochlorite, 1min Green=Normal distribution, Red=log normal JP Coudron (2012) MSD Thesis V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y How we Measure Statistical Associations Associations are what we observe, as – Differences or ratios of: means or medians Proportions, odds, rates – Correlation, regression coefficients – Slopes of trends in statistical models Causation → association, but not the other way around No measure of association, in itself, implies causation V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Relationships We visualize the relationship between two numeric variables using a scatterplot We summarize the strength and direction of a linear relationship using a correlation – Pearson’s correlation coefficient, r – r = 0 means no linear relationship – r = +1 means a perfect positive relationship – r = – 1 means a perfect negative relationship – r has no units. V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Survival Analysis V OBJECTIVE: To assess the predictors of implant failure after grafted maxillary sinus (GMS). METHODS: A total of 1045 implants were inserted in 224 patients/347 GMS during a period of 14 years. Kaplan-Meier and Cox proportional hazards analysis were used to assess the following variates: …, auto/allo/xenogenic bone grafts, …RESULTS: Significant implant failure predictors were the graft material (HR = 4.7), with superior results for autogenic bone, … In highly atrophic situations, autogenic bone grafts showed superiority; however, in less atrophic cases, nonautogenic bone-grafts are equivalent. Zinser, et al. The predictors of implant failure after maxillary sinus floor augmentation and reconstruction: a retrospective study of 1045 consecutive implants. I R G I OOOO. N I A (2013)115(5):571-82. C O M M O N Wpubmed/23246225. E A L T H U N I V E R S I T Y Survival after auto/allo/ xenogenic bone grafts “In highly atrophic situations, autogenic bone grafts showed superiority however, in less atrophic cases, nonautogenic bone-grafts are equivalent.” V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Recall: Data classification Distinguishing Type of data Characteristics Discrete or Observations qualitative grouped into distinct classes Nominal Classes without a natural order or rank Ordinal Classes with a predetermined or natural order V I R G I N I A C O M M O N W E Examples Sex, treatment group, presence or absence Disease severity, bone density, plaque accumulation, bleeding A L T H U N I V E R S I T Y Data classification Distinguishing Type of data Characteristics Examples Continuous Observation may or assume any value on a quantitative continuous scale (numeric) V I Interval Numeric value with equal unit differences; arbitrary zero Temperature, GPA Time to event Survival analysis, Censored observations Restoration survival time, Implant success R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Which statistical method? Summary of Statistical Analysis Methods How to decide if the correct statistical test was used? Questions are of the form: For ___ response variable, is there a relationship with ___ predictor variable? For ___ response variable, is there a difference between the groups identified by the ___ predictor variable? See the “decision matrix” and presentation online. Dependent or Response Variable Quantitative – Continuous or Discrete Mean and either SD for the spread of the data) or the SE (for the precision of the estimate) Description I R G I N I A C O M M O N Time to event Counts and Percentages Median survival time Skewed data? Median and percentiles Variable Type Testing Quantitative– Continuous or Discrete Prediction and Association Probability of outcome Linear regression Correlation – Pearson’s r Logistic regression Survival analysis Proportional hazards Comparisons of Independent Groups Two Groups Independent or Predictor Variable Nominal or Ordinal W E A Two or More Groups > Two Groups (ANOVA) & multiple Chi-square comparison tests Comparisons Across Time or Occasions within One Group Two >Two Two or Times Times More Times Repeated Paired tMcNemar’s measures test chi-square ANOVA 95% Confidence Intervals t-test Qualitative – Summary V Qualitative – Nominal or Ordinal L T H (AKA “two group ttest”) U N I V E R S Two or More Groups KaplanMeier survival analysis NA I T Y Goals Be able to answer four questions: Based on the study design, what is the level of evidence? How were threats to validity addressed? Based on the goals of the study, How do you describe the results? To justify the conclusions, were comparisons done appropriately? V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y “… significant linear correlation between chocolate consumption per capita and the number of Nobel laureates per 10 million persons …” Messerli FH. Chocolate consumption, cognitive function, and Nobel laureates. N Engl J Med. 2012 Oct 18;367(16):1562-4. PubMed:23050509 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y