Examples of evaluating real-world impact
I describe several papers which measure real-world impact of NLP systems, using different methodologies (A/B test, before/after eval, clinical trial, observational study). I hope these examples inspire and encourage more people to consider evaluating real-world impact!