Behavioral scientists aim to understand human behavior at the level of the individual (psychology), the level of the group (sociology), and within specific political and economic contexts (economics, political science). Social media data provide a potentially important avenue for learning about human behavior in real time by tapping multiple aspects of an individual’s beliefs, behaviors, emotions, and social networks. Such “data in the wild” provide a lens into human behavior and attitudes not available through traditional survey research. However, it is often unclear how to reconcile the behavioral processes that generate the millions of data points available with the carefully designed measurement strategies that produce typical social scientific data or how these data can yield the clear metrics of validity and reliability necessary for behavioral research. Such data are also rarely protected by the types of ethical safeguards provided to survey respondents. Hence, although these data may offer important new avenues for understanding human beliefs and behavior, harnessing these data in ways useful to social scientists is a major scientific challenge.
Computer scientists know how to “tame” these data — clean them, mine them, and make sense of them. They are leaders in using social media to investigate current events, such as tracking behavior related to social movements and virus outbreaks across the globe. The way they approach the data, however, does not typically adhere to established social science research designs or incorporate rigorous checks of the data’s validity or reliability as it relates to the individuals providing the data or the constructs measured. They also do not focus on understanding the generalizability of the data to the population at large, but instead target specific descriptive and learning tasks, making some of their methods, algorithms and results less useful to the broader research community. In order to extract significant research value from social media data, computer scientists and social scientists must integrate their expertise — or converge — to create and adapt computer science algorithms and data mining methods in ways that adhere to the design structures, measurement rigor and ethical protections of social science.