There is a military saying that “amateurs discuss tactics, professionals discuss logistics”. Similarly I think AI professionals should focus on data more than models. I suggest four simple initial questions to ask about your data if you want to build an ML system.
It can be very exciting to apply powerful analytics and ML techniques to analyse data sets, but we need to be careful, otherwise we will make mistakes.
I’m beginning to think that in some ways the NLP community *encourages* researchers to use poor-quality or otherwise inappropriate data sets. Which is a truly depressing thought…
15 years ago, I siad a grand challenge for CS/AI./NLG was to help the general public effectively understand and use data. Progress on this has been less than I hoped, but this remains a worthwhile and important challenge!
Perhaps the most common reason for bad NLG output texts is low-quality input data. Ie, Garbage In, Garbage Out is true regardless of our technology.