How to do an NLG Evaluation: Human Ratings in Artificial Context
Advice on how to evaluate an NLG system by asking human subjects to rate the system, in an artificial experiment context (ie, not real world usage). This is the most common type of NLG evaluation