I was surprised to find out that some institutions require PhD students to publish a certain number of papers before they can graduate. This is not my view; my goal as a supervisor is to train students to be good scientists, and rigid publication targets are not appropriate for this goal.
I’m a strong proponent of human evaluations, but they need to be high quality in order to give meaningful results; a quick/cheap/sloppy human evaluation may not be very useful.
Texts produced by NLG systems need to communicate valuable, useful, and accurate information. I would love to see more research on content production and selection in NLG.
If we want to use NLG to communicate information to all sorts of different people, then it would be really helpful if the NLG system can adapt its language to the reading skill, domain knowledge, emotional state, etc of the user. I think this kind of user adaptation is essential to achieving my vision of using NLG to humanise data.
I think NLG can help humanise and democratise data and AI reasoning. If so, this would provide huge benefits to society in a world which will increasingly by driven by data and data-based reasoning.
I would like neural NLG researchers to focus on more challenging datasets, and make some suggestions.
Users want to be able to modify and customise NLG systems on their own, without needing to ask developers to make changes. Academic researchers mostly ignore this, which is a shame, since there are a lot of interesting and important challenges.
This is a personal blog, about how Covid lockdown has affected me. In practical terms I’m much better off than many people I know, but I still find that lockdown life has lost a lot of its “fizz” and become “flat”.
A few observations (not recommendations!) about what it is like to work as a researcher in university and corporate contexts.
I was impressed by a recent paper by Läubli et al which experimentally compared the results of different human evaluations in MT (eg, how do results differ between expert and non-expert human raters), in the context of understanding when MT systems are “better” than human translators. Would be great to see more experimental comparisons of different human evaluations in NLG!