I’m very worried about data contamination
Data contamination (testing and evaluating LLMs using test data which is known the the LLM) may be a huge problem in NLP, leading to a lot of invalid scientific claims. Unfortunately, many NLP researchers ignore the problem, which is really worrying.