other

What LLMs cannot do

I summarise a few papers I have recently read on what LLMs can and cannot do. One (not surprising) finding is that LLMs skill profile is very different from humans. Which is good, means that human+LLM together can do things that humans/LLMs cannot do on their own. Also means that it makes little sense to evaluate LLMS using tests/techniques designed to evaluate people.

evaluation

There are many types of human evaluation!

Many people asume that “human evaluation” means asking people to rate or rank outputs. However there are many other types of human evaluation, most of which give more meaningful results than rating or ranking! I discuss some of these, including task-based evaluation, annotation-based evaluation, and real-world evaluation.