An ongoing research project investigating over-refusal behaviour in LLMs — when models refuse legitimate queries — and how to measure and mitigate it.
Jan 1, 2025