Abstract
AI agents combining LLMs with external tools face security risks from improper permissions, which AgenTRIM addresses through offline reconstruction and online enforcement of least-privilege tool access.
AI agents are autonomous systems that combine LLMs with external tools to solve complex tasks. While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents may retain unnecessary permissions (excessive agency) or fail to invoke required tools (insufficient agency), amplifying the attack surface and reducing performance. We introduce AgenTRIM, a framework for detecting and mitigating tool-driven agency risks without altering an agent's internal reasoning. AgenTRIM addresses these risks through complementary offline and online phases. Offline, AgenTRIM reconstructs and verifies the agent's tool interface from code and execution traces. At runtime, it enforces per-step least-privilege tool access through adaptive filtering and status-aware validation of tool calls. Evaluating on the AgentDojo benchmark, AgenTRIM substantially reduces attack success while maintaining high task performance. Additional experiments show robustness to description-based attacks and effective enforcement of explicit safety policies. Together, these results demonstrate that AgenTRIM provides a practical, capability-preserving approach to safer tool use in LLM-based agents.
Community
The “unbalanced tool-driven agency” framing is very practical. In deployed agents, the hard problem is not only detecting malicious text; it is deciding whether a specific tool should be available for this task, with these inputs, at this moment.
We have been building Armorer Guard as a complementary local runtime signal for that decision point. It is Rust-native and returns JSON scores/reasons for prompt injection, sensitive-data requests, exfiltration-style text, destructive-command risk, safety bypass, and system-prompt extraction. The useful pattern seems to be: least-privilege tool scopes first, deterministic policy for hard boundaries, semantic risk scoring for gray areas, and traceable verdicts for replay/evals.
Demo: https://huggingface.co/spaces/armorer-labs/armorer-guard-demo
Repo: https://github.com/ArmorerLabs/Armorer-Guard
AgenTRIM plus a fast scanner at the action boundary feels like a strong layered shape: shrink the available tool surface, then inspect the few actions that still remain possible.
Get this paper in your agent:
hf papers read 2601.12449 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper