Lets use error annotations to evaluate systems!
I am excited by the idea of using error annotation to evaluate NLG systems, where domain experts or other knowledgeable people mark up individual errors in generated texts. I think this is usually more meaningful and gives better insights that asking crowdworkers to rate or rank texts, which is how most human evaluations are currently done.
Copy and paste this URL into your WordPress site to embed
Copy and paste this code into your site to embed