Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex
Published in arXiv preprint, 2026
This report presents ATBench-Claw and ATBench-Codex, two domain-customized benchmark extensions for trajectory-level safety evaluation and diagnosis in OpenClaw and OpenAI Codex / Codex-runtime settings.
Recommended citation: Zhonghao Yang, Yu Li, Yanxu Zhu, Tianyi Zhou, Yuejin Xie, Haoyu Luo, Jing Shao, Xia Hu, Dongrui Liu. (2026). "Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex." arXiv preprint arXiv:2604.14858. https://arxiv.org/abs/2604.14858
