I seem to be getting involved in projects which use NLG to explain the reasoning of AI systems. I’m currently helping Ingrid Zukerman with such a project in Australia, and I’m also part of a new EU Marie-Curie project which starts later in 2019. Plus a few other possibilities which may or may not materialise. So I thought I’d write a few words on what I see as the underlying challenges in this area.
I hope this doesnt come across as naive! I like to think that I know something about NLG, but I am definitely still learning about explainable AI. Comments and suggestions from readers are welcome!
Purpose and Evaluation
As my former student Nava TIntarev and others have pointed out, explanations can have many purposes, and evaluation should depend on the purpose. If the goal is to help users detect mistakes in AI reasoning (scrutability), then we need to give people AI recommendations plus explanations and measure how successful they are at detecting mistaken recommendations. If the goal is increasing users confidence (trust), then we either ask users how much they trust a system, or (better but much harder) measure how explanations impact long-term usage of a system. Etc.
I am still new to the explanation field, but in all honesty much of what I read suggests that evaluation of explanations could often be done better.
A few years ago I helped Jose Alonso write a paper on explaining fuzzy rule-based systems. As part of this project, Jose developed a system which explained rules used for classifying leaves (which were learnt from data). Anyways, at my suggestion, we asked a specialist (Professor of Ecology at Aberdeen University) to look at the explanations produced by Jose’s systems. Amongst other things, he said that it was difficult because the features, concepts, and terminology used in the explanations were not the ones which professional botanists used. In other words, he was used to thinking about leaves in a certain way, and he expected that the explanations would respect and align with this perspective.
I think this is a fundamental point. Domain experts spend years learning to communicate about their domain, largely by learning concepts, features, ontologies, and lexicons (terminology) which they share with other domain experts. So if we want an AI system to explain its reasoning to domain experts, it should use these concepts (etc). So either we design the underlying AI engine to use features and ontologies which make sense to domain experts, or we try to express the AI’s features and concepts using “domain expert” terminology.
I think this relates to the classic NLG task of lexical choice, where we look for words and phrases which communicate data or knowledge-base concepts in an understandable manner to users. Which is something I’ve been interested throughout my career, indeed my PhD thesis was partially choosing words to communicate concepts from a knowledge base.
Causal and Narrative Explanations
I suspect that people want explanations which are narratives, and have a causal structure. Eg, “the patient’s blood test results suggest either Reditis or Greenitis. Greenitis usually leads to high temperature, which we dont see, so Reditis is the most likely diagnosis”. However, AI and ML techniques often use more holistic reasoning, which does not map easily onto a narrative. This is not a new problem, by the way. The Nobel-Prize winning psychologist Daniel Kahneman pointed out in Thinking Fast and Slow that in the 1950s we had linear regression models which did a decent job of diagnosing some diseases but were not used, in part because (in my terminology) regression models holistically combined many features instead of constructing a causal narrative around a few key features. And these regression models were a lot simpler (and used many fewer features) than deep learning models!
I find it interesting that I personally am very good at a complex holistic cognitive task, face recognition, which I find impossible to explain. I do a great job of identifying hundreds of people from very noisy visual data, but I cannot explain it, perhaps because I cannot construct a causal/narrative explanation for a process which fundamentally is about combining large amounts of data, perhaps in a neural-net like manner. I can write a description of my son to help people who have never met him identify him, but this description is not the way I identify him!
Challenges for NLG
I think all of the above issues lead to NLG challenges, which I look forward to working on
- Creating good evaluation techniques for NLG explanations. Hopefully we can at least start by adapting existing NLG evaluation techniques.
- Communicating features and concepts used by the AI reasoner to people using terminology and ontologies which they understand. Which is partially lexical choice, but I suspect will also impact content determination.
- Presenting the explanation as a causal narrative, even if the underlying AI reasoner does not work this way. This may be impossible in some cases (like explaining how I recognise my son), but lets see how far we can get!
If anyone is interested in working with me on NLG techniques for explaining AI reasoning, please let me know! I will definitely be looking for a PhD student to work on my new EU project, and may possibly have funding for post-docs as well.
I’m also very keen to collaborate with people who are interested in this area, feel free to contact me about this.