The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

This paper introduces the Arbiter, an agent that monitors multi-agent conversations under an inspection budget and uses active inspection tools to detect misaligned agents earlier and more accurately.

June 2026 · F. Tonini, F. Torrielli, A. D. Lautrup, P. Schneider-Kamp, M. M. Çelikok, L. G. Poech