Overview. The proposed research tackles the problem of accounting for complex sample designs in Bayesian inference. Unfortunately, blending the Bayesian paradigm, with its emphasis on complex modeling, with the survey sampling paradigm, with its emphasis on non-parametrics and robustness, has been difficult. Particularly problematic has been incorporating weights into analysis, which can be either unnecessary, helpful to avoid magnifying model misspecificiation, or required in the setting where sampling is informative. Recently, the principal investigators for this proposed research have developed methods for incorporating complex sample designs in a weighted finite population Bayesian bootstrap procedure, with applications to combining data from multiple surveys, as well as accounting for complex sample designs in multiple imputation in a relatively straightforward and convenient fashion. The methods we propose have very general application, and we will consider three specific applications: accounting for complex sample design in a joint longitudinal data model of mean and variance trajectories to predict onset of senility from short memory tests using the Health and Retirement Survey; in the setting of small area estimation, using a combination of data from the National Health Interview Survey and the Behavioral Risk Factor Surveillance Survey to develop county-level estimates of risky behavior; and in a missing data setting, accommodating both sample design and measurement error when imputing diet and biomarker measures in the National Health Interview Survey using observed data from the National Health and Nutrition Examination Survey.
Intellectual Merit. Much scientific research relies heaving on probability samples of either the US population or specific subgroups to provide information about the health of the US population. Increasingly researchers use such data not only to provide baseline or population descriptive statistics, but to develop and assess statistical models to better predict health outcomes and/or understand their antecedents. These models are increasingly complex, involving multiple levels of hierarchy, mixture components, and other forms of latent variables that are most easily understood and fit in a Bayesian setting. This proposal will develop methods to incorporate complex sample designs to be incorporated into Bayesian models in a relatively straightforward post-processing step using importance sampling. The proposed methods will be assessed through a variety of simulation settings, with applications to major scientific and health problems of interest. We will also incorporate our methodological developments in the general purpose IVEware survey and missing data software.
Broader Impact. Analysts of survey data, whether they work in clinical, epidemiological, public policy, economics, or agricultural research, typically have two choices: apply Bayesian methods that ignore complex sample design features, or avoid complex Bayesian models in favor of methods that allow easy incorporate of complex sample design features in inference. The proposed methods to be developed will eliminate the need to make this choice by providing a general approach to incorporate complex sample designs in Bayesian inference. In particular, this work will allow easy incorporation of design effects in small area estimation, item and unit non-response, and hierarchical regression models or other complex general multi-level models. In addition, two Ph.D. students in survey methodology will assist the PIs with the objectives of the proposed research, which will greatly benefit the development of these students with regard to statistical approaches to these types of problems. In addition, our proposed software development will be freely shared via IVEWare. Finally, we will disseminate results from this project in top-tier journals in statistics and survey methodology, and incorporate the results into teaching materials