A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses
Published in AAAI 2026, 2026
Recommended citation: Xiangxiang Dai, Yuejin Xie, Maoli Liu, Xuchuang Wang, Zhuohua Li, Huanyu Wang, John C.S. Lui. (2026). "A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses." Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37323-37331. https://doi.org/10.1609/aaai.v40i44.41064
