Frederic M. Wolf: MetaAnalysis: Quantitative Methods for Research Synthesis

Index: Research and Evaluation
Inferential Statistics (Notes)At present, this page is just a collection of notes on various topics related to inferential statistics.Page ContentsAncova ANOVA ANOVA table Causation Central Limit Theorem Chi Square Confidence Intervals Correlation Dr. Fox effect Effect Size Factorial Designs and Analysis Gain scores Glass & Hopkins (1984) Homogeneity of Variance Hypotheses Hypothesis testing steps Independent Groups Ttest Inferential Statistics Median Test of Association Multiple Comparisons Multiple Regression (MRC) Parametric tests Power Power of the test Reliability Sampling Sampling Distribution Skewness Standard Error TTest Validity Variables VarianceAncova an additional covariate is needed least in a true experimental design  Ancova has more assumptions than Anova  the adjustment in mean square error in Ancova depends primarily on the size of the correlation bewteen the covariate and the dependent variable  for true exp[erimental designs, the covariate only serves to increase power  the covariate data should be included at the start of the study (before treatment conditions are implemented)ANOVA test of means, more than two groups  when J (number of groups) = 2 or more (some books use "K")  if 2 groups, use ttest or ANOVA for same result  Fisher found that multiple ttests are inefficient, alpha gets inflated (greater probability of type I error  F stat for Fisher  the variance involving just the (4) means is the between groups means, we want it big to reject the null (large relative to within group means)  compare means among themselves to the average variance for each of these groups  there is a significant difference among the mean ___(kind of test) scores of #__ groups of subjects  compare variables just involving the 4 groups means  variance of group means from the grand mean  alterante hypothesis: there are at least one pair of means that are significantly different (in words)  if F (Anova) overall is significant, then go for details next (if null is rejected)  variance = means square (same thing, new terms)  when Ns are equal, simple average all group variances  if ANOVA is significant (F > CV) then use multiple comparisons to see where differences are (ex: 7 groups = 21 possible comparisons [J(J1) / 2 = 21])  if the obtained Fstatistic is larger than the critical value of F then reject the null (conclude, e.g. that the population means of the three groups are not equal) + to determine whether or not differences between pairs of means are statistically significant, do further INFERENTIAL tests (called "multiple comparison" techniques)  ONEWAY ANOVA: any # of levels of an Ivar (LEVELS OF IVAR IS NUMBER OF GROUPS)  Fratio is significance  significane must be less than alpha, otherwise not significant  dependent var is the question asked  don't run posthos multiple comparisons when F is not significantANOVA table SV SS df MS F Between Within _____________________________________________ *p < .05 SV: Source of Variation SS: Sum of Squares df: degrees of freedom MS: Mean Squares F: obtained FstatisticCausation no correlations should be used as causal or perfect (relationships, not causation)  causal questions: Does this work? Does this work better than this?Central Limit Theorem Standard Deviation of the mean + is normal + has a mean equal to µ ("moo") + has standard deviation called "standard error of the mean" ... Standard Error is the Standard Deviation of a sampling distributionChi Square "Chi Square" test of associations  AKA "contigency table analysis"  DOES NOT ANALYZE MEANS  looks at the relationship of two variables  variables are NOT interval or ratio  variables are measured on NOMINAL scale! (or one of the vars is nominal); can turn interval var into nominal var + lowmidhigh SES is usually ordinal, but can be used with nominal var of gender  never want to conclude a causal link, even though that's what were trying to get at  a lot of data from surveys and yes/no test and demographics use this test  nonparametric test: doesn't make assumptions about the population (population assumptions are normality and homogeneity of variance)  less powerful than parametric tests  can do Pearson correlation here (or categorize scores into groups (lose precision between r and Xsquared (chi square)  proportions or frequencies will yield some results  phi coefficient (doesn't square like Pearson r to get %)  *dichotomous = two vars: use only if table is 2 x 2  "phi" SIZE of relationship between two dichotomous vars  interpret like Pearson (e.g. .16 not too significant)  contigency coefficient (C): size of relationship + any size table (dichotomous or greater) [e.g. 4 x 5]  Null example: there is no association between the "community type" and their response  see worksheets for "expected frequencies": ef for a cell is the frequency (number of cases) that owuld fall in that cell if the null hypothesis were true (i,e. there was no association in the population between "community type" and response)  doesn't make sense to say relationship is pos/neg because variables are typically nominal or ordinalConfidence Intervals it is important to report confidence intervals to give the range of probability (e.g. a wide interval will result from a small sample)  Estimating confidence intervals around a sample mean (see worksheet) + when a sample is sufficiently large, the means from repeated random samples will be normally distributed around the population mean + 95% of the area under the curve (approx. 2 Standard Errors of the mean in both directions of the mean) will include µ [95% of the intervals would contain moo] ... we can construct a confidence interval around mean to try and capture µ [IT'S THE PROBABILITY OF HITS AND MISSES, 'MARGIN OF ERROR': FOR 1 CONFIDENCE INTERVAL, IT COULD BE A RARE MISS; but we know the probability of hits and misses ... Standard Error is the Standard Deviation of a sampling distributionCorrelation refers to the degree of relationship between two variables  if two variables are related in that high values on one are related to low values on another, a negative correlation exists  if high relates to high, or low to low, then positive correlation existsDr. Fox effect causal link between entertaining factor in teaching and student ratings of teachers ... higher ratings from students ... bogus methodsEffect Size use % scores rather than raw scores on cognitive and psychomotor measures  % scores not very useful on attitude and many other kinds of scales where rightwrong scoring usually makes no senseFactorial Designs and Analysis single Ivar is rare  2way design: 2 Ivars  dependent var: thing getting analyzed; more than one dvar calls for multivariate analysis of variance (Manova, Mancova)  disordinal interaction: when lines cross; non parallel; an interaction  ordinal interaction: lines don't cross but aren't parallel  IF THE INTERACTION IS SIGNIFICANT, DON'T PAY MUCH ATTENTION TO MAIN EFFECTS, PAY ATTENTION TO THAT  overall N most important for power, not cell size  oneway design: general linear model  twoway factorial design: almost always increases power  leftover effect for individuals who differed from group (cell variance denominator of f ratios)  new factor should be something new that contributes to new interaction with group (e.g. SES, income, choose one or compile them)  too many interactions make it confusing, keep own studies to 3 or 4  code variables: male=1, female=2  enter raw scores  Threeway factorial designs and interactions; example + Factor A: Gender (2 levels, male/female) + B: Age (1=2030, 2=3040) + C: Group, experimental/control ... main effects for gender, age, or group ... A x B interaction? ... A x C interaction? ... B x C interaction? ... A x B x C interaction? ...... the inclusion of new factors increase power if they interact with another and in the ABC interaction (accounting for variance)Gain scores NEVER DO A STUDY WITH GAIN SCORES; MEASUREMENT ERROR EACH TIME DONEGlass & Hopkins (1984)Glass, G. V., & Hopkins, K. D. (1984). Statistical Methods in Education and Psychology (2nd. Ed.). Englewood Cliffs, NJ: PrenticeHall.Homogeneity of Variance ttest is robust to violations of normality assumptions  remember: ttest uses sample variances  if the larger group has the largest variance the ttest will be conservative  if the ACTUAL alpha is smaller than the NOMINAL alpha then reject the null  if larger group has smaller variance the ttest will be liberal  "most powerful is Bartlett's (also most complicated)"  can state (acknowledge) that ttest is liberal (some inaccuracy in results)  if Ns are equal there's no need  see handout, 2.10.92Hypotheses Scientific or research hypothesis: guess (The average SAT verbal score of American Indian students is not 500 [norm] {nondirectional})  Alternate hypothesis:  Null hypothesis: want to reject (usually opposite of  example" "fielddependent learners will use more (on average) program features than fieldindependent learners" [not correlational, compares the means on the dependent variable]  diagram Null True Null False  Accept  correct:  Type II  Null most like  Error   null is    true   Reject  Type I  correct:  Null  Error statistical  power (rej.  null if really false  "reject" or "don't reject" the null is proper way to say it  "fail to reject" = "accept"  Type I Error: rejecting the null if it is true  Type II Error: not rejecting the null if it is false  ß (beta) Probability of making Type II Error  a (alpha) {ex. willing to take a 5% chance of a type I error]  Power: probability of rejecting null hypothesis if it is really false; example: Accept  correct:  60 %  Null most like   Reject  Type I  40%  Null  Error (inversely   related to   above)   statistical power estimate of .9 is really good (big differences are most important)  null: states that any difference in two means is attributable to chance (sampling error)  if P < alpha, we reject the null and conclude that...Hypothesis testing steps1. null and research hypotheses 2. set alpha (probability of rejecting the null when null is true) 3. choose test statistic 4. descritive stats 5. sampling distribution (* if null is true) 6. critical value(s) 7. compute test statistic 8. decide: accept or reject the nullIndependent Groups Ttest Assumptions + populations are normally distributed + independence (design, not statistical) ... 2 groups must be independent ... someone can't be in both groups + homogeneity of variance ... is sigmas are known, do ZtestInferential Statistics all intermediate stats are inferential  infers about a population  descriptive stats describes sample onlyMedian Test of Association[see also: Chi Square]  "Median Test of Association"  a particular application of the Chi Square test of associationMultiple Comparisons Pairwise comparisons (simple) inflate alpha  Familybased = experimentbased  Tukey: more conservative (family based), controls better than NK for Type I error; generally more powerful than NK; if a familybased alpha is desired. This procedure would be most appropriate and most powerful to test all possible pairwise comparisons  Student NewmanKeuls (NK): more powerful and liberal in cases where not worried about alpha; contrast based; most powerful for making ALL possible pairwise comparisons among a set of J means; method of choice for accuracy and power for pairwise comparisons; contrast based, start with largest difference, end when not significant  Dunn: powerful for a few groups; choose as most powerful for a small number of simple and complex but nonorthogonal contrasts  Scheffe: get to look at everything I want to after the fact; choose to conduct many posthoc simple and complex comparisons; least powerful for making ALL possible pairwise comparisons among a set of J means  r: inclusive RANGE or # of means separating the two being compared; choose to compare three experimental groups to the control group mean (and no other comparisons are to be made)  stairstep appraoch (e.g. NK): compare biggest means differences to CV, go to table + if significance go on, else stop completely, then next biggest means  Planned Orthogonal Comparisons (POC): most powerful and most restrictive in tms of which comparisons can be made (RARE, too restrictive); get more power to reject null if it is really false; J1 comparisons, but must be orthogonal; must meet restriction of orthogonality (see glossary)  NK & Tukey: "always good choices"; NO complex comparisons  F would be the same as say Scheffe but critical value would be lower  q statistic: studenized range statistic  if ANOVA is significant (F > CV) then use multiple comparisons to see where differences are (ex: 7 groups = 21 possible comparisons [J(J1) / 2 = 21])  Techniques that are planned (a priori): Dunn, Dunnet, POC  Techniques that are UNplanned (post hoc): Multiple T, Duncan, Tukey, NK, Scheffe  Techniques with a contrastbased error rate: MT, Duncan, NK, POC  Techniques with an (family) experimentbased error rate: Dunn, Dunnet, Tukey, Scheffe  NONE OF THE TECHNIQUES ARE ROBUST TO ANOVA ASSUMPTIONSMultiple Regression (MRC) adding more variables to the equation to test the strength of the predictor (IF THINGS DON'T VARY, THEY CAN'T CORRELATE) Parametric tests parametric tests have assumptions about population (as opposed to nonparametric)  N means sample or population in some studiesPower the probability of not making a type II error (that is 1ß)  Power: probability of rejecting null hypothesis if it is really false; example: Null True Null False  Accept  correct:  60 %  Null most like    null is    true   Reject  Type I  40%  Null  Error (inversely   related to   above)   statistical power estimate of .9 is really good (big differences are most important)  Estimating Power: see worksheet #1, REM 7110  if the standard deviation is know (e.g. 15, then standard error of the mean is: Sigma(submean) = Sigma divided by the square root of n + if critical values = ±2 SD & SD = 15, Mean = 100, then CV = 70 to 130 [see also: Power of the test] + convert to Z score and go to Ztable and find the area under a normal curve that lies to the right of a zscore of zero (size of n always matters)  it is important to report confidence intervals to give the range of probability (e.g. a wide interval will result from a small sample)  APRONS: strategies that increase power + A. relax alpha (make bigger) + P. parametric test on population + R. increase reliability of measure on the dependent variable + O. onetailed (directional) test + N. increase N + S. use more sensitive design or analysis (e.g. ANCOVA, adding more variables, gender, ethnicity  There is an "eyeball method" of estimating power + see notes: 2.3.92  can estimate (best guess) before and after study (after: especially if null has been accepted)  POWER IS ALWAYS EXPRESSED IN PERCENTAGESPower of the test Power of the test for the following values of the parameter + largest value of the parameter that is reasonable + smallest value of the parameter that is reasonable + intermediate value of the parameter that represents best guess + additional values to construct power curve  power of the test increases as sample size (n) increases  power increases as alpha (the probability of rejecting a true null hypothesis) is increased  power increases as the true value of the parameter being tested deviates further from the value hypothesized for it in null  when the parameter (sigma submean) is not known and an estimate (S submean) must be used, the ration is termed a tratio rather than a zratioReliability Five types of reliability 1. interrater 2. stability (testretest) 3. internal consistency 4. parallel forms 5. parallel or stability combinationSampling simple random: everyone has a chance to be selected  stratefied random: equal number from each section or according to characteristics  cluster sampling: (random or nonrandom) (e.g. N = 3 classrooms)  always some sample bias in all types of samples  sampling problems and measurement problems are biggest problems in our field (more than types of analysis)  systematic sampling: an essentially random sample (e.g. every 10th name)  biased, volunteer, convenience sample: selfselecting + random assignment from volunteer sample (design consideration) + R X O (random, variable, observation)  how sample is selected is most important thing (100 may be better than 5000)Sampling Distribution distribution of a statistic (e.g. sample means) + Central Limit Theorem  Sampling Distribution of the Mean: used to set confidence intervals  underlie hypotheses testing and Confidence Intervals in parametric inferential statsSkewness if only a few students receive high scores, the distribution of scores would be positively skewedStandard Error has standard deviation called "standard error of the mean" ... Standard Error is the Standard Deviation of a sampling distribution of means2253253333.  Standard Error = "margin of error", usually percentage, not mean + common: .68 CI (confidence interval) = mean ± 1 S(submean) + common: .95 CI = mean ± 2 S(submean) ... ex: 10± 2 = 8 to 12 (95% chance between 8 to 12)TTest[see also: Independent Groups Ttest]  allows onw to determine the probability of observing a difference in means as large or larger than that which was observed when indeed the null is true  3 assumptions 1. normality 2. homogeneity of variance 3. independence  ttest is robust to violating the assumptions of normality and homogeneity when N's are equal  DEPENDENT (groups) ttest + AKA Paired groups ttest or correlated groups ttest + no assumptions of independence ... (e.g. pretestposttest design; measure change)Validity something can't be valid unless it's reliableVariables "active Ivar": researcher has some control through random assignment for example  "attribute Ivar" can't control  dependent var: thing getting analyzed; more than one dvar calls for multivariate analysis of variance (Manova, Mancova)  example: method effect is enhanced with little or no sig. for attribute varVariance the average of the squared deviations from the mean  difference from the mean, squaredContact UsEnter feedback, comments, questions, or suggestions: Email this pageAdd or change any text to your message in the text field below:

Arlene Fink: Conducting Research Literature Reviews: From Paper to the Internet
SearchTranslationsCaution: Machine generated language translations may contain significant errors. Use with care. 
Update: 20060418T10:00:4407:00