1. A Backtesting Protocol in the Era of Machine Learning by Robert D. Arnott (Research Affiliates, LLC) and Campbell R. Harvey (Duke University – Fuqua School of Business) and Harry Markowitz (University of California at San Diego)
2. The Economics of Cryptocurrency Pump and Dump Schemes by JT Hamrick (University of Tulsa – Tandy School of Computer Science) and Farhang Rouhi (University of New Mexico – Computer Science Department) and Arghya Mukherjee (University of Tulsa – Tandy School of Computer Science) and Amir Feder (Technion-Israel Institute of Technology) and Neil Gandal (Berglas School of Economics, Tel Aviv University) and Tyler Moore (University of Tulsa – Tandy School of Computer Science) and Marie Vasek (University of New Mexico – Computer Science Department)
3. p-Hacking and False Discovery in A/B Testing by Ron Berman (University of Pennsylvania – The Wharton School), Leonid Pekelis (OpenDoor), Aisling Scott (Independent), Christophe Van den Bulte (University of Pennsylvania – Marketing Department)
“p-Hacking and A/B testing have received a lot of attention recently, with many attempts to devise methods to help experimenters make better decisions and avoid false-discoveries. p-Hacking is an umbrella term for engaging in behaviors that generate statistically significant effects where none exist, and is known to generate false discoveries in academic and biomedical research. Commercial experimenters may also engage in p-hacking and the resulting false discoveries are likely to result in bad business decisions. It is therefore important to understand the extent and consequences of p-hacking among experimenters who run A/B tests.
We were excited when Optimizely, the leading A/B testing platform, generously allowed us to collect and analyze data on 2,101 experiments. The data stem from just before Optimizely put in place protections against p-hacking. One key finding is that p-hacking is quite common; more specifically, when running a fixed horizon experiment, slightly more than half of experimenters stop their experiments based on the level of the p-value reached. Another key finding is that about 70% of experiments (whether p-hacked or not) involve treatments that generate no effect at all. Finally, the data allowed us to estimate, conservatively, the substantial economic value of the damage caused by p-hacking through increasing the number of false discoveries.
These findings have already proven useful for Optimizely, who now uses always-valid tests that adapt to stopping behavior, and we hope that they will be equally useful for researchers as well as practitioners, whether they are running A/B tests, or researching and designing A/B testing platforms.” – Ron Berman
4. Beyond WEIRD Psychology: Measuring and Mapping Scales of Cultural and Psychological Distance by Michael Muthukrishna (London School of Economics and Political Science) and Adrian V Bell (University of Utah) and Joseph Henrich (Harvard University) and Cameron M Curtin (Harvard University) and Alexander Gedranovich (London School of Economics & Political Science (LSE)) and Jason McInerney (Iowa State University) and Braden Thue (Harvard University)