
Perplexities, Paradoxes, and Power of Factor Analysis
December 6th @ 1:00 pm - 1:50 pm

Department of Epidemiology Seminar Series
Speaker:
Tyler J. VanderWeele, PhD
John L. Loeb and Frances Lehman Professor of Epidemiology
Department of Epidemiology, Harvard T.H. Chan School of Public Health
Open to the public
Factor analysis is often employed to evaluate the extent to which a single factor suffices to explain the variation in the individual indicators, or alternatively to identify clusters of indicators that are strongly correlated with one another. However, the conclusions drawn from factor analysis often extend beyond what statistical analyses have in fact established. Often the resulting factors are each interpreted as corresponding to a structural univariate latent variable that is itself causally efficacious. I show that this assumption is in fact so strong that it has empirically testable implications, even though the supposed latent variable is unobserved; statistical tests are proposed that can often reject this underlying assumption. Factor analysis also suffers from the inability to distinguish between associations arising from causal versus conceptual relations, and if two supposed factors were to causally affect one another then, in many settings, over time, the process will converge to a factor model wherein only a single factor can be detected if one uses a single wave of data. Factor analysis further suffers from the problem that if different indicators are used to assess different portions of the distribution of an underlying univariate latent variable (as might arise from the use of negatively worded items in surveys), then factor analysis can suggest that two factors are present even though the data are in fact generated by only one. Examples of each these various phenomena are given from the psychology and biomedical literature concerning causal relations between depression and anxiety, differential associations with mortality of various indicators of life satisfaction, and supposedly different factors corresponding to optimism and pessimism. Despite these severe limitations, factor analyses, perhaps paradoxically, can nevertheless often be very informative, but the phenomena above require an appropriate reinterpretation of factor analysis results as reflecting a combination of causal, conceptual, and distributional relations.