AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Published in arXiv preprint, 2026

Core Contributor

We present a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Building on this taxonomy, we introduce ATBench, a fine-grained agentic safety benchmark, and AgentDoG, a diagnostic guardrail framework that delivers fine-grained and contextual monitoring across agent trajectories while diagnosing root causes of unsafe behavior. AgentDoG variants are available in three sizes (4B, 7B, and 8B parameters) across Qwen and Llama model families.

[Code]

Recommended citation: Dongrui Liu, ..., Yuejin Xie, et al. (2026). "AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security." arXiv preprint arXiv:2601.18491. https://arxiv.org/abs/2601.18491