A few people have recently asked me what advice I would give to PhD students. Of course there are numerous websites that give advice to PhD students, such as https://www.findaphd.com/advice/doing/phd-problems.aspx and https://www.nature.com/articles/d41586-018-07332-x . But these tend to be generic, are there things I would specifically recommend for students in NLG or NLP? What do I wish I had done differently in my PhD?
Work with real-world data, users, experiments
I think one such piece of advice is that students (and postdocs) should “get their hands dirty”. By this, I mean that people should work with real-world data, understand what real-world users care about and need, and/or try their systems out with real people to evaluate how well they work. The real world is a messy place, and we need to understand and appreciate this messiness if we are going to make serious contributions to either enhancing our scientific understanding of language as it is used in the real world, or developing useful real-world language technology.
My PhD (1985-1990) was quite theoretical. Probably the best-known part of it was my work on algorithms for generating referring expressions. After my PhD I started collaborating with Robert Dale, who was also very interested in this topic, and we eventually published this research in our paper Computational Interpretations of the Gricean Maxims in the Generation of Referring Expression. This was well received and cited (currently 850 cites on Google Scholar), but it was somewhat abstract work which was not solidly grounded in real-world use of referring expressions (although we did look at psycholinguistic findings). When I look back at this now, I wish I had “gotten my hands dirty” and focused more on real-world usage of referring expressions; I think this would have been both more interesting scientifically, and more useful practically.
Moving forward to 2020, I see a lot of PhD students and postdocs who pick up someone else’s dataset and evaluation measure, and then focus on developing better algorithms and models in this context. In all honesty, I suspect much of this work is not very useful, especially if (as is often the case) the data sets are biased and unrepresentative, and the evaluation measures say nothing about real-world utility. Get your hands dirty, by collecting at least some real-world data yourself (gives you a much better understanding of data issues), talking to users and domain experts, and/or evaluating real-world utility! You’re much more likely to discover interesting scientific facts about language and make good contributions to useful NL technology if you do this.
I realise collecting data, working with real users, and conducting rigorous human evaluation is time-consuming and difficult. Which means, being 100% honest, that people who take this approach may publish fewer papers than people who focus on tweaking models to get 1% improvement on some existing data set and evaluation metric. But I think people who engage with the real-world, as above, are more likely to make genuine long-lasting scientific and technological contributions.
Listen to what real-world is telling you
Its also really important to listen to what real-world data, users, and experiments are telling you. If the data, users, or experiments dont fit your approach, theory, or model, you need to acknowledge this and try to understand what is going on, and then use this understanding to improve your approach, theory, or model.
By this, I do *NOT* mean tweaking a model until it gets better results on a test set. I see a lot of this, where people try huge number of variants to see which one gives the best result, while ignoring multiple hypothesis issues and also in many cases effectively training on test data (in the sense of updating a model based on where it failed on the training data, and trying again). Its of course easy to get good results if you cheat (which is what above amounts to), but this does not advance NLP science or technology.
What I do mean is carefully analysing data, user/expert feedback, and experimental results, from a qualitative as well as quantitative perspective, to get insights as to what worked and what didnt work. Insights such as “users want more narrative structure in the NLG texts”, “experts are concerned about accuracy problems”, or “data is full of gaps, need to have a better way of dealing with missing data”. NOT “insights” such as “BLEU scores are a bit higher than ROUGE scores”. These insights will guide your research, and also are valuable contributions to the research community.
“Getting your hands dirty” is not an easy path, and its probably not for everyone. Its also not an all-or-nothing choice; for example, you can mostly use a pre-existing data set, but try to collect a few extra samples yourself. But I think it is the way we make real progress in NLG and NLP, and I encourage students to involve real-world data, users, and experiments in their research where this is feasible.