Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
portfolio
publications
Towards Evaluating Proactive Risk Awareness of Multimodal Language Models
Published in NeurIPS 2025 D&B Track, 2025
We propose a benchmark for evaluating proactive risk awareness in multimodal language models.
Recommended citation: Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He. (2025). "Towards Evaluating Proactive Risk Awareness of Multimodal Language Models." NeurIPS 2025 Datasets and Benchmarks Track. https://arxiv.org/abs/2505.17455
ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations
Published in EMNLP 2025, 2025
We introduce ToolSafety, a safety fine-tuning dataset containing 5,668 direct harm samples, 4,311 indirect harm samples, and 4,311 multi-step samples to address safety vulnerabilities in tool-using AI systems.
Recommended citation: Yuejin Xie, Youliang Yuan, Wenxuan Wang, Fan Mo, Jianmin Guo, Pinjia He. (2025). "ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations." Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://aclanthology.org/2025.emnlp-main.714/
A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses
Published in AAAI 2026, 2026
We develop a novel Bandit algorithm for rapidly identifying user preferences to improve LLM responses.
Recommended citation: Xiangxiang Dai, Yuejin Xie, Maoli Liu, Xuchuang Wang, Zhuohua Li, Huanyu Wang, John C.S. Lui. (2026). "A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses." Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37323-37331. https://doi.org/10.1609/aaai.v40i44.41064
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Published in arXiv preprint, 2026
We propose AgentDoG, a diagnostic guardrail framework that provides fine-grained and contextual monitoring across agent trajectories, diagnosing root causes of unsafe actions.
Recommended citation: Dongrui Liu, ..., Yuejin Xie, et al. (2026). "AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security." arXiv preprint arXiv:2601.18491. https://arxiv.org/abs/2601.18491
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5
Published in arXiv preprint, 2026
We present an updated technical risk analysis for frontier AI, covering cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.
Recommended citation: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, et al. (2026). "Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5." arXiv preprint arXiv:2602.14457. https://arxiv.org/abs/2602.14457
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
Published in arXiv preprint, 2026
We propose a multi-agent framework that leverages code agents to autonomously evolve existing math problems into more complex variants while validating solvability and increased difficulty.
Recommended citation: Dadi Guo*, Yuejin Xie*, Qingyu Liu, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Dongrui Liu, Yi R. Fung. (2026). "Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?" arXiv preprint arXiv:2603.03202. https://arxiv.org/abs/2603.03202
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis
Published in arXiv preprint, 2026
We introduce ATBench, a trajectory-level benchmark for structured, diverse, and realistic evaluation of LLM-based agent safety.
Recommended citation: Yu Li*, Haoyu Luo*, Yuejin Xie*, Yuqian Fu, Zhonghao Yang, Shuai Shao, Qihan Ren, Wanying Qu, Yanwei Fu, Yujiu Yang, Jing Shao, Xia Hu, Dongrui Liu. (2026). "ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis." arXiv preprint arXiv:2604.02022. https://arxiv.org/abs/2604.02022
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
Published in arXiv preprint, 2026
We revisit whether reasoning SFT generalizes, showing that cross-domain gains depend on optimization dynamics, data quality, and base-model capability.
Recommended citation: Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu. (2026). "Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability." arXiv preprint arXiv:2604.06628. https://arxiv.org/abs/2604.06628
Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex
Published in arXiv preprint, 2026
We extend ATBench to OpenClaw and OpenAI Codex / Codex-runtime settings for trajectory-level safety evaluation and diagnosis.
Recommended citation: Zhonghao Yang, Yu Li, Yanxu Zhu, Tianyi Zhou, Yuejin Xie, Haoyu Luo, Jing Shao, Xia Hu, Dongrui Liu. (2026). "Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex." arXiv preprint arXiv:2604.14858. https://arxiv.org/abs/2604.14858
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
Published in arXiv preprint, 2026
We propose AgentDoG 1.5, a lightweight and scalable agent safety alignment framework for real-time moderation across diverse interactive agentic scenarios.
Recommended citation: Dongrui Liu, Yu Li, Zhonghao Yang, Peng Wang, Guanxu Chen, Yuejin Xie, Qinghua Mao, Wanying Qu, Yanxu Zhu, Tianyi Zhou, et al. (2026). "AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security." arXiv preprint arXiv:2605.29801. https://arxiv.org/abs/2605.29801
PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning
Published in arXiv preprint, 2026
We introduce PaSBench-Video, a 740-video streaming benchmark for proactive safety warning with frame-level risk onset and accident boundary annotations.
Recommended citation: Yusong Zhao*, Yuejin Xie*, Youliang Yuan, Junjie Hu, Jitian Guo, Yujiu Yang, Pinjia He. (2026). "PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning." arXiv preprint arXiv:2606.02443. https://arxiv.org/abs/2606.02443
