I am teaching an MSc course on Evaluating AI Systems. At the end of the last lecture (today), I gave the students a few simple guidelines (“words of wisdom”) to keep in mind; hopefully the students will remember these even when they’ve forgotten all of the fancier things we have discussed! Anyways, I have reproduced these guidelines here.
Guideline 1: Keep it Simple!
Of course we can do very complex evaluations, and sometimes complexity is warranted and indeed necessary. But remember that a complex evaluation will take more time, be harder to interpret, and also has a higher chance of something going wrong. So do your best to keep things simple (KISS principle), and look hard for simpler alternatives before embarking on a complex evaluation.
Guideline 2: Keep it Ethical!
Getting ethical approval is a nuisance, and sometimes people (especially in companies) dont bother. But ethical research is really important, for AI as a whole as well for individual researchers; unethical research tars the whole field. So take ethics seriously, ensure that your research is ethical, and also ensure that you go through appropriate ethical approval procedures for your institution.
Guideline 3: Be Careful!
Good evaluation is all about getting the details right, and careful execution. If you dont get the details right, are sloppy in your execution, and/or lose key data, then your results will be meaningless, no matter how clever your experimental design is. I’ve seen months of work go down the drain because people got careless, don’t let this happen to you!
Guideline 4: Do Proper Stats!
Statistics is dry, but its essential to most AI evaluation, and garbage stats lead to garbage results. So ensure that your stats make sense, that your data is carefully and correctly entered into the statistical software, and that issues such as outliers and multiple hypothesis corrections are handled in a consistent and clearly reported fashion. Its easy to get meaningless numbers out of a stats package, but remember that your goal is to get the right number.
Guideline 5: Be Skeptical!
Unfortunately, there are a lot of papers, even in prestiguous venues, which present garbage and worthless evaluations. So when reading a paper, check the evaluation to make sure it makes sense, dont simply assume it is sensible because the paper appears in a good venue. I once wasted a lot of effort because I took a reported result in a paper at face value, without checking; don’t make my mistake!
Evaluation is how we test scientific hypotheses about AI, so good evaluation is the key to scientific progress in AI. Good luck!