BSSR Resources, Methods, and Measures

If we want the field to take heterogeneous effects of behavioral interventions seriously, we need shared infrastructure

The past decade has seen a surge in enthusiasm for the potential of behavioral science (e.g. "nudge," "wise interventions," and more) to inform policy innovations and ameliorate persistent health problems. Over the same period, however, the behavioral sciences have been rocked by a replicability crisis that has undermined confidence in the rigor of the field's empirical methods and the reliability of its basic findings. This has raised serious questions about how much potential behavioral interventions really have to make meaningful contributions to societal well-being, and about how well we understand our basic phenomena. However a nascent scientific revolution has been building in parts of the behavioral science community. This revolution stems from an increasing appreciation of the importance of heterogeneity in treatment effects. We argue that unless the field leans into this "heterogeneity revolution", the field's approach to hypothesis generation, hypothesis testing, and theory development will produce a perpetual cycle of promising initial findings that appear not to hold up at scale and therefore fail to have a meaningful impact in the world. What would this entail? Currently there is a near-total absence of a research infrastructure that would make it feasible to study heterogeneous intervention effects scientifically. Instead, PIs recruit sites or partners for studies, one by one, and we sign data-sharing agreements with agencies on a case by case basis, and we are content to have hand-picked samples and low response rates. It is exceedingly rare to conduct longitudinal, randomized trials in representative samples of sites or populations with high (>90%) response rates. Yet random samples with high response rates are crucial for learning from heterogeneous effects, because they ensure that the variability in the qualities of people or places that could moderate our behavioral interventions also appears in our samples. Without knowledge of the full population, then we will continue the field's pattern of failed replications without an explanation. Of course, the logistical demands of research that takes heterogeneity seriously—most notably the need for scientific samples—are simply too formidable for individual scholars to take on by themselves with any regularity. But shared infrastructure can help solve daunting collective problems. One example of what such a shared infrastructure might look like is the NSF-funded Time-sharing Experiments in the Social Sciences (TESS), allowing researchers to conduct online experiments in a professionally-managed nationally-representative panel of U.S. adults. The same kind of logic could be applied to social and behavioral field experiments. Indeed, if the NIH, NSF, and IES put just a portion of what they currently spend on participant recruitment for each intervention evaluation grant into a shared infrastructure that PIs could apply for time on, then the infrastructure could pay for itself many times over in cost savings in just a few years. If we built infrastructure expressly for the purpose of understanding heterogeneous effects, we could slow or stop the perpetual cycle of promising interventions coming from basic science that fail to meet their potential when scaled to new contexts or populations. And we could make huge gains in another major area of interest: reproducibility, replication, and transparency.

2. Key words


3 votes
4 up votes
1 down votes
Idea No. 339