Discussion Week Nine: Chapter 18 Multivariate Statistics
Hope Enechukwu, Rudolf Ezemba, Julie Pelletier, Brittany Provenza, Deanna Roper and Valencia Suggs
July 31, 2015
Multivariate statistics- studies the effect of a single independent variable (IV) on a single dependent variable (DV), or an analysis with at least three variables.
Regression analysis-is used to make predictions
- Multiple correlation/multiple regression-the bond between correlation and regression
- In simple regression one IV (X) is used to predict variable (Y)
- The higher correlation between variables validates prediction accuracy
- Errors occur when correlations of X and Y are not perfect
Correlation coefficients express the dissimilarities of variables among each other
- The stronger the correlation: The better the prediction. The stronger the correlation: The > % of variance explained.
Correlation between two variables are rarely perfect. Researchers try to improve predictions of Y by adding multiple IVs (or predictor variables) using multiple regression.
Redundancy is increased as more variables are added to the equation.
Tests of Significance
Tests of Significance explicate different aspects of data using multiple regression.
The premise of Tests of the Overall Equation and R, as methods, reject the null hypothesis. Not variables of interest, but those that researchers want to eliminate.
Tests for Adding Predictors show whether adding more variables or a specific IV, will increase the correlation to a dependent variable.
In Tests of the Regression Coefficients, there are single and multiple regressions.
In multiple regression, holding extraneous variables constant, strengthens internal validity.
Simultaneous Multiple Regression is appropriate when predictors are not prior causal, but are also equally important.
Hierarchical Multiple Regression are predictors entered theoretically, but also examine the effect of a key IV after removing the effects of extraneous variables.
Stepwise Multiple Regression: 1st step is to select the IV that best correlates with the DV, the 2nd variable is the one that produces the increase to R squared when used with the variable in the first step. This continues until no other IV increases the value of R squared. Remember the variance of the IVs is attributed to the first variable entered into the analysis.
Power Analysis for Multiple Regression
Because small samples can lead to errors and inaccuracies, power analysis is demonstrated as a better way to estimate sample size needs.
- The number of participants needed to reject the null hypothesis that R equals zero, is based on effect size, number of predictors, desired power, and significance criterion.
- In multiple regression, estimated effect size is the function of the value of R². This is predicted by earlier research or with the conventions of small (R² = .02), moderate (R² =.13), or large (R² =.30).
Analysis of Covariance (ANCOVA)
- Similar to multiple regression and features of ANOVA, this compares the means of two or more groups. The central question for both are the same.
- Allows researchers to control confounding variables.
- After randomization, ANCOVA should only be uses last, to improve validity.
Example: Effectiveness of biofeedback on therapy on patients’ anxiety.
- Group in hospital A is exposed, Group in hospital B is not exposed
- Anxiety is measured before and after treatment.
- Pretest anxiety score is controlled through ANCOVA
- DV=posttest anxiety scores
- IV=experimental/comparison group status
- Covariate (continuous variables)= pretext anxiety scores
Selection of Covariates
- Background demographics including age and education. These should relate with the DVs as much as possible. Control is especially important with significant confounding demographics among comparison groups.
- A pretest measure (i.e., an early measure of the DV).
Adjusted Means- Allows researchers to determine net effects (i.e., group differences on the DV minus the net effect of covariates), leading to the rejection, nullification, or its continuation.
Other Least Squares Multivariate Techniques
- Analysis of variance (ANOVA) and multiple regression are very similar
- Both analyze total variability in a continuous dependent measure and contrast variability d/t IVs with that attributable to individual differences or error.
- Experimental data is typically analyzed by ANOVA
- Correlational data is analyzed by regression
- Any data for which ANOVA is appropriate, can be analyzed by multiple regression
- General Linear Model (GLM) is a brand of statistical techniques that fit into linear solutions; foundational procedure, for use with t-test, ANOVA, & multiple regression.
Repeated measures ANOVA for mixed designs: when data are collected three or more times
Multivariate Analysis of Variance (MANOVA): is the extension of ANOVA to more than one dependent variable.
Multivariate Analysis of Covariance (MANCOVA): allows for the control of confounding variables (covariates) when there are two or more dependent variables
Discriminant Analysis: makes predictions about membership in groups.For example, to predict membership in groups such as compliant versus noncompliant cancer patients.
Discriminant function: for a categorical dependent variable, with independent variables that are either dichotomous or continuous.
Wilks’ lambda (λ): indicates the proportion of variance unaccounted for by predictors, or λ = 1 – R2
Logistic Regression: is a widely used multivariate technique like multiple regression, analyzes relationship between multiple independent variables and a dependent variable and yield a predictive equation like discriminant analysis, used to predict categorical dependent variables
relies on an estimation procedure that has less restrictive assumptions that multivariate procedures within the GLM
Basic Concepts for Logistic Regression
Logistic regression uses maximum likelihood estimation (MLE) – estimate the parameters most likely to have generated the observed data.
Logistic Regression: models the probability of an outcome rather than predicting group membership
Odds: reflect the ratio of two probabilities (the probability of an event occurring, to the probability that it will not occur)
Example: if 40% of women practice breast self-exams, the odds would be 0.40 divided by 0.60, or 0.667.
Logit: short for logistic probability unit: is from minus to plus infinity
The Odds Ratio
Variables in Logistic Regression
Dependent variable is typically coded 1 (to represent an event or a characteristic)
Significance Tests in Logistic Regression
Likelihood Index: probability of the observed results, given parameters estimated in the analysis .If overall model fits data perfectly, the likelihood index is 1.0(-2LL), chi-square statistic is then used to test null hypothesis – called likelihood ratio test
Goodness-of-fit Statistic: is the analog of the overall F test in multiple regression.
Hosmer-Lemeshow Test: compares the prediction model to a hypothetically “perfect” model.
Tested against perfect model by computing differences between observed frequencies and expected frequencies.A nonsignificant chi-square is desired: indicates model being tested is not reliably different from the perfect model
Wald Statistic: chi-square Effect Size in Logistic Regression
Nagelkerke R2: is the most frequently reported pseudo R2 index
Survival and Event History Analysis
Survival analysis calculates survival rates among study participants. You can also compare survival scores between experimental and control groups, with a statistic, to test the null hypothesis.
Causal modeling examines relationships among > 3 variables. Path analysis and structural equation modeling are two types of causal modeling.
Path Analysis is used to study the causal pattern among variables, not the discovery of causes. A path diagram, like a recursive model, is used to illustrate the impact that one cause will have on another, implying that causal flow is in one direction. For example, variable x has a causal effect on variable y. However, variable y is not causal to variable x. Variables are either, exogenous, endogenous, or residual.
Path coefficients indicate significant determinants. As standardized partial regression slopes, they represent the proportion of standard deviation (SD).
Structural Equations Modeling (SEM) considers the drawbacks of path analysis, factoring in unmeasurable variables. Testing the overall fit of the causal model to research data using the Goodness of Fit Index (GFI). The indexes and a score of > .90 (90%) indicates a good fit.
Computer and multivariate statistics
Computer and multivariate statistics is an analysis that enhances the researchers’ ability to make a prediction by adding two predictor variables in a multiple regression. Because of its complexity, a computer does this. An example of a multiple regression equation: 18.4 birth weight = (3.119 x age) +48.040. Logistics regression and analysis of covariance use this.
Polit, D. F., & Beck, C. T. (2012). Nursing research: Generating and assessing evidence for nursing practice (Laureate Education, Inc., custom ed.). Philadelphia, PA: Lippincott Williams & Wilkins.