Respondent driven sampling (RDS) is a new method specifically proposed for sampling rare or hidden populations. Traditional probability sampling methods are considered impractical for rare populations as they require extensive screening efforts or impossible when the population desires to be hidden. RDS starts with the members of the target population and traces their social networks as well as the networks of those who are connected. Through chain referrals stimulated by incentivized recruitment coupons, RDS exploits the social networks for sampling purposes without screening. With the pressing needs for studying rare/hidden populations despite the increasing expense and difficulty in sampling for such populations, RDS has gained popularity rapidly and its use is expected to increase in the foreseeable future. However, in contrast to the vast volume of research using RDS data, its methodological assessments are very limited, and statistical inferences for RDS are under-developed as their formal base sits on strong assumptions that can be easily violated in practice. Scarcity, if not near absence, of publicly available RDS data makes objective methodological assessments even more challenging, leaving the level of population representation through RDS under-scrutinized.
The proposed study provides the beginning of an already-overdue empirical investigation into the realities of RDS data collection through the Total Survey Error (TSE) framework and into RDS inferences that consider the realities of data collection. While TSE is a fundamental framework in survey methodology that allows examining error sources and properties systematically, to our knowledge, empirically examining RDS from perspectives that integrate the TSE framework into key RDS assumptions has never been attempted. The study is designed to evaluate realities of RDS data collection with respect to sampling productivity, error properties, and replicability, to improve its current data collection and inference practices and to promote further methodological works by making the data products and software from this study publicly available. This study will be carried out by a team of scientists collectively representing the views of all aspects of survey research, including survey design, data collection, estimation, and data usage for policy and program developments.
Funding:
National Science Foundation
Funding Period:
09/15/2015 to 08/31/2019