How to do an NLG Evaluation: Task-Based (Extrinsic) Performance in Real-World Context
Advice on how to evaluate an NLG system by getting people to use it in the real world, and then measuring how effective the system was.
Advice on how to evaluate an NLG system by getting people to use it in the real world, and then measuring how effective the system was.
Advice on how to evaluate an NLG system by asking human subjects to rate the system, when they are using it in a real-world context.
A high-level discussion of the different ways of evaluating NLG systems.
Advice on how to evaluate an NLG system by asking human subjects to rate the system, in an artificial experiment context (ie, not real world usage). This is the most common type of NLG evaluation
Evaluation in NLP/NLG has a long way to go before it reaches the standards of hypothesis testing in clinical medicine