Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification
Published in arXiv preprint, 2025
We develop a novel Bandit algorithm for rapidly identifying user preferences to improve LLM responses.
Recommended citation: Xiangxiang Dai, Yuejin Xie, Maoli Liu, Xuchuang Wang, Zhuohua Li, Huanyu Wang, John C.S. Lui. (2025). "Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification." arXiv preprint arXiv:2501.01849. https://arxiv.org/abs/2501.01849
