BLEU works much better for MT systems and NLG systems. In this blog I present some speculations as to why this is the case.
Some comments on how different components in the NLG pipeline can “add value” by contributing to the ultimate goal of generating texts that easy for people to read and understand.
I think surface realisation becomes especially challenging when syntax depends on semantics or pragmatics. From engineering perspective, handling phenomena that only occur in a few languages can be painful.
Good software engineering is criticial when building NLG systems, including requirements analysis, design, testing, and support.
Some suggestions for people who are new to NLG and want to learn more about it.
I think we should use rules to make simple high-value decisions, and learning to make complex low-value decisions, within an architecture where ML decision makers are embedded in a rules-based framework.
I am concerned that some people seem to ignore quality issues in training data.