Evaluation Grand Challenge: Is NLP System Good Enough for a Use Case?
I was recently asked by someone if it was possible to easily determine whether an NLP system was good enough for a specific use case. Currently this is very hard. Making it easy could be a “grand challenge” for evaluation!