Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

portfolio

publications

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Published in NeurIPS 2025 D&B Track, 2025

We propose a benchmark for evaluating proactive risk awareness in multimodal language models.

Recommended citation: Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He. (2025). "Towards Evaluating Proactive Risk Awareness of Multimodal Language Models." NeurIPS 2025 Datasets and Benchmarks Track. https://arxiv.org/abs/2505.17455

ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations

Published in EMNLP 2025, 2025

We introduce ToolSafety, a safety fine-tuning dataset containing 5,668 direct harm samples, 4,311 indirect harm samples, and 4,311 multi-step samples to address safety vulnerabilities in tool-using AI systems.

Recommended citation: Yuejin Xie, Youliang Yuan, Wenxuan Wang, Fan Mo, Jianmin Guo, Pinjia He. (2025). "ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations." Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://aclanthology.org/2025.emnlp-main.714/

A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses

Published in AAAI 2026, 2026

We develop a novel Bandit algorithm for rapidly identifying user preferences to improve LLM responses.

Recommended citation: Xiangxiang Dai, Yuejin Xie, Maoli Liu, Xuchuang Wang, Zhuohua Li, Huanyu Wang, John C.S. Lui. (2026). "A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses." Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37323-37331. https://doi.org/10.1609/aaai.v40i44.41064

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

Published in arXiv preprint, 2026

We present an updated technical risk analysis for frontier AI, covering cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R&D, and self-replication.

Recommended citation: Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, et al. (2026). "Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5." arXiv preprint arXiv:2602.14457. https://arxiv.org/abs/2602.14457

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Published in arXiv preprint, 2026

We propose a multi-agent framework that leverages code agents to autonomously evolve existing math problems into more complex variants while validating solvability and increased difficulty.

Recommended citation: Dadi Guo*, Yuejin Xie*, Qingyu Liu, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Dongrui Liu, Yi R. Fung. (2026). "Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?" arXiv preprint arXiv:2603.03202. https://arxiv.org/abs/2603.03202

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

Published in arXiv preprint, 2026

We introduce ATBench, a trajectory-level benchmark for structured, diverse, and realistic evaluation of LLM-based agent safety.

Recommended citation: Yu Li*, Haoyu Luo*, Yuejin Xie*, Yuqian Fu, Zhonghao Yang, Shuai Shao, Qihan Ren, Wanying Qu, Yanwei Fu, Yujiu Yang, Jing Shao, Xia Hu, Dongrui Liu. (2026). "ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis." arXiv preprint arXiv:2604.02022. https://arxiv.org/abs/2604.02022

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Published in arXiv preprint, 2026

We revisit whether reasoning SFT generalizes, showing that cross-domain gains depend on optimization dynamics, data quality, and base-model capability.

Recommended citation: Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu. (2026). "Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability." arXiv preprint arXiv:2604.06628. https://arxiv.org/abs/2604.06628

Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex

Published in arXiv preprint, 2026

We extend ATBench to OpenClaw and OpenAI Codex / Codex-runtime settings for trajectory-level safety evaluation and diagnosis.

Recommended citation: Zhonghao Yang, Yu Li, Yanxu Zhu, Tianyi Zhou, Yuejin Xie, Haoyu Luo, Jing Shao, Xia Hu, Dongrui Liu. (2026). "Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex." arXiv preprint arXiv:2604.14858. https://arxiv.org/abs/2604.14858

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Published in arXiv preprint, 2026

We propose AgentDoG 1.5, a lightweight and scalable agent safety alignment framework for real-time moderation across diverse interactive agentic scenarios.

Recommended citation: Dongrui Liu, Yu Li, Zhonghao Yang, Peng Wang, Guanxu Chen, Yuejin Xie, Qinghua Mao, Wanying Qu, Yanxu Zhu, Tianyi Zhou, et al. (2026). "AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security." arXiv preprint arXiv:2605.29801. https://arxiv.org/abs/2605.29801

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Published in arXiv preprint, 2026

We introduce PaSBench-Video, a 740-video streaming benchmark for proactive safety warning with frame-level risk onset and accident boundary annotations.

Recommended citation: Yusong Zhao*, Yuejin Xie*, Youliang Yuan, Junjie Hu, Jitian Guo, Yujiu Yang, Pinjia He. (2026). "PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning." arXiv preprint arXiv:2606.02443. https://arxiv.org/abs/2606.02443

talks

teaching