Secondary analyses of survey data sets collected from large probability samples of persons or establishments further scientific progress in many academic fields. The samples underlying these data sets, while enabling inferences about population characteristics or relationships between variables of interest, are often ?complex? in nature, employing sampling strategies such as stratification of the population and cluster sampling. These complex sample design features improve data collection efficiency, but also complicate secondary analyses in terms of the approaches that need to be employed to account for the complex sampling statistically. When specifically considering the large survey research projects sponsored by the National Center for Science and Engineering Statistics (NCSES), many of the survey data sets collected and subsequently made available for secondary analysis arise from samples with complex designs, and this requires secondary users of these data sets to employ appropriate estimation methods. Unfortunately, many secondary analysts of these data sets do not have formal training in survey statistics, and ultimately apply incorrect analytic methods when analyzing these data sets. The application of standard statistical methods to these data sets can lead to incorrect population inferences, which effectively negates the federal resources dedicated to the survey data collection. The proposed research project will review published studies of NCSES data sets to understand the statistical approaches that users of these data are currently employing, review the existing literature in survey statistics with regard to alternative design-based and model-based approaches that are appropriate for complex samples, and then apply these alternative approaches to several NCSES data sets, comparing the resulting inferences for a variety of statistical problems and educating data users about appropriate analytic methods.
Funding:
National Science Foundation
Funding Period:
09/15/2014 to 08/31/2017