One of my roles at Aberdeen University is to coordinate research ethics approvals in Computer Science. Usually ethics is pretty straightforward, we either approve proposals as is or subject to relatively minor changes. However, once in a while we get a more difficult case.
This happened last week, when we were asked to approve a proposal to effectively do A/B testing. That is, the researcher, who was working with a real-world service provider, essentially wanted to try different messaging strategies with different clients of the service provider, and evaluate the impact of these strategies on the clients, including analysing how different types of clients reacted (eg, was one strategy better for women and another for men).
As a CS reseacher, I think this kind of real-world study is great, and indeed in a recent blog I suggested that the NLP community consider using A/B testing to evaluate NLP systems. But when I look at this from an ethics perspective, there are some troubling issues, and there doesnt seem to be a consensus on how to deal with them (or at least I couldnt find one).
If readers have suggestions or thoughts about this, please let me know. Especially since I and my colleagues on the research ethics committee will need to decide how we respond to this proposal.
The biggest problem is informed consent, which is the heart of modern research ethics. In other words, we should explain to potential subjects what we are doing and any potential side effects, in a way which they can understand (not 10 pages of legalistic fine print), and then get their consent to be in the study, without applying any pressure or coercion. Of course there are “edge cases”, for example when using children as subjects we need to get informed consent from parents/guardians, although we should also ask the children themselves if they are old enough to understand.
Unfortunately, informed consent is difficult in A/B testing. Partially because it may bias the subjects, and also partially because subjects may not care enough to be willing to engage sufficiently to provide informed consent. For example, a stereotypical corporate A/B test is to change the shade of blue on a website and see if this increases click-throughs. Suppose we set things up so that if someone tries to look at the experimental website, they are first directed to an Informed Consent page which explains the study and asks for informed consent; the user clicks yes or no and proceeds to the actual website. Most subjects probably wont bother to read the information, they’ll just click through to the website as fast as possible. And if subjects do read about the study and realise we are evaluating the impact of different shades of blue, this will almost certainly change how they react to colours when they get to the website, and hence bias the results of the study.
Companies generally address this problem by not asking for informed consent. For example, if Amazon wants to use A/B testing to check the impact of a different shade of blue, or indeed whether a new messaging strategy will increase sales, it doesnt seek ethical approval or try to obtain informed consent, it just goes ahead and runs the experiment. And we accept this, partially on the basis that Amazon is free to do whatever it wants on its website as long as it doesnt break the law or rules on advertising standards.
However, academic researchers are held to higher ethical standards than companies; we are expected to behave ethically, not just avoid breaking the law. This is especially true if our research is publicly funded. So the fact that Amazon doesnt ask for informed consent for A/B studies does not automatically imply that researchers dont need to ask for informed consent.
I dont have a good answer to this question. Perhaps (??) we can take inspiration from ethical guidelines on studies which involve deception, where upfront informed consent is also not possible. The BPS guidelines say that subjects in such experiments need to be informed about the true nature of the experiment after they have finished their participation (or earlier), and also that deception is not acceptable if subjects may be uncomfortable, angry, or otherwise have objections when they are debriefed. Perhaps we could do something similar for A/B testing? Eg, inform all users about the study after the event (which in this case is straightforward, the service provider knows who the users are and has email addresses for them). And maybe also make it clear to the researchers that if subjects are angry or complain when they are debriefed, then they will not be allowed to do further experiments of this type?
A related issue is data protection. As researchers, we are expected to get people’s agreement to hold data about them, but we cannot get this ahead of time in A/B testing, for the reasons described above. Probably the best approach is to ask the service provider to gather and analyse the data, so all the researchers see are aggegates and the results of hypothesis tests. Of course, this means that the researchers will need to be very explicit about data analysis and hypotheses up-front, since they wont be able to “play” with the raw data afterwards; this may be a good discipline in any case! It also means that the service provider will need to be able to calculate the aggregates and run the hypothesis tests, which requires its staff to be familiar with statistics.
We could also ask people for permission afterwards, when we explain our experiment to them (as described above). The problem here is that most people wont read the explanation. Because of this, an opt-out strategy (where someone is excluded unless they specifically ask to be included) is not ethically acceptable. An opt-in strategy (where someone is excluded unless they specifically ask to be included), is ethically OK but runs the risk that only a handful of people will opt in.
2 thoughts on “Research Ethics of A/B Testing”