Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

portfolio

publications

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Published in NeurIPS 2025 D&B Track, 2025

We propose a benchmark for evaluating proactive risk awareness in multimodal language models.

Recommended citation: Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He. (2025). "Towards Evaluating Proactive Risk Awareness of Multimodal Language Models." NeurIPS 2025 Datasets and Benchmarks Track. https://arxiv.org/abs/2505.17455

ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations

Published in EMNLP 2025, 2025

We introduce ToolSafety, a safety fine-tuning dataset containing 5,668 direct harm samples, 4,311 indirect harm samples, and 4,311 multi-step samples to address safety vulnerabilities in tool-using AI systems.

Recommended citation: Yuejin Xie, Youliang Yuan, Wenxuan Wang, Fan Mo, Jianmin Guo, Pinjia He. (2025). "ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations." Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://aclanthology.org/2025.emnlp-main.714/

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Published in arXiv preprint, 2026

We propose a multi-agent framework that leverages code agents to autonomously evolve existing math problems into more complex variants while validating solvability and increased difficulty.

Recommended citation: Dadi Guo*, Yuejin Xie*, Qingyu Liu, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Dongrui Liu, Yi R. Fung. (2026). "Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?" arXiv preprint arXiv:2603.03202. https://arxiv.org/abs/2603.03202

talks

teaching