Comparing performance of LLMs is not very interesting
Quantitative comparisons of different LLMs are not very interesting in research papers, because the LLMs in question will probably be out of date by the time the paper is published. However looking for behaviour which is shared by several LLMs is definitely interesting and worthwhile.