There has been substantial interest in whether and how analyses of social media data can add value to social research and the production of official statistics. Some researchers hope that survey findings can be augmented with social media content, while others hope that costs might be reduced and timeliness improved by replacing at least some survey data, e.g., some variables, data from some waves of longitudinal data collection, with social media analyses. Skeptics worry that the rationale for this approach succeeding is not evident and that purported alignment between analyses of social media posts and survey responses are not compelling. We have four objectives, each with associated research activities. The four objectives are:
- Explore conditions under which alignment between surveys and social media are most likely to occur. We will initially analyze the correspondence between answers to a single question on the Census tracking poll and Twitter posts as a calibration exercise, and then extend this to other questions from the tracking poll, an additional social media platform (Reddit), and other Census surveys that address other topics, e.g., the American Community Survey.
- Mine social media for qualitative insights. To address this objective, the researchers will apply Natural Language Processing techniques such as Topic Modelling to social media corpora to explore the promise of this content for use in developing survey instruments much as focus groups are currently used, e.g., to identify vocabulary used by target groups, but automated and on a much larger scale.
- Improve statistical products by including social media analytics. Much as political scientists have shown that incorporating certain variables derived from social media use into survey data sets can improve the predictive ability of models, so we will explore the possibility of providing variables derived from social media in Census statistical products.
- Exploit the interconnections in social media. Most attempts to identify relationships between social media posts and survey data treat posts as independent texts roughly analogous to survey responses. Yet social media posts are at least potentially social in the sense that they are threaded exchanges in which prior posts lead to subsequent posts. Using Natural Language Processing and discourse analytic techniques designed to extract meaning from extended texts, we will explore if there is a connection between survey data and social media threads that analyzing isolated posts does not detect.
We will discuss emerging results with Census Bureau partners so that this research is as valuable as possible to the agency’s goals and mission.