The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment
This paper introduces the Arbiter, an agent that monitors multi-agent conversations under an inspection budget and uses active inspection tools to detect misaligned agents earlier and more accurately.