arxiv:2601.12449

AgenTRIM: Tool Risk Mitigation for Agentic AI

Published on Jan 18

Authors:

Abstract

AI agents combining LLMs with external tools face security risks from improper permissions, which AgenTRIM addresses through offline reconstruction and online enforcement of least-privilege tool access.

AI-generated summary

AI agents are autonomous systems that combine LLMs with external tools to solve complex tasks. While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents may retain unnecessary permissions (excessive agency) or fail to invoke required tools (insufficient agency), amplifying the attack surface and reducing performance. We introduce AgenTRIM, a framework for detecting and mitigating tool-driven agency risks without altering an agent's internal reasoning. AgenTRIM addresses these risks through complementary offline and online phases. Offline, AgenTRIM reconstructs and verifies the agent's tool interface from code and execution traces. At runtime, it enforces per-step least-privilege tool access through adaptive filtering and status-aware validation of tool calls. Evaluating on the AgentDojo benchmark, AgenTRIM substantially reduces attack success while maintaining high task performance. Additional experiments show robustness to description-based attacks and effective enforcement of explicit safety policies. Together, these results demonstrate that AgenTRIM provides a practical, capability-preserving approach to safer tool use in LLM-based agents.

View arXiv page View PDF Add to collection

Community

armorerlabs

about 3 hours ago

The “unbalanced tool-driven agency” framing is very practical. In deployed agents, the hard problem is not only detecting malicious text; it is deciding whether a specific tool should be available for this task, with these inputs, at this moment.

We have been building Armorer Guard as a complementary local runtime signal for that decision point. It is Rust-native and returns JSON scores/reasons for prompt injection, sensitive-data requests, exfiltration-style text, destructive-command risk, safety bypass, and system-prompt extraction. The useful pattern seems to be: least-privilege tool scopes first, deterministic policy for hard boundaries, semantic risk scoring for gray areas, and traceable verdicts for replay/evals.

Demo: https://huggingface.co/spaces/armorer-labs/armorer-guard-demo
Repo: https://github.com/ArmorerLabs/Armorer-Guard

AgenTRIM plus a fast scanner at the action boundary feels like a strong layered shape: shrink the available tool surface, then inspect the few actions that still remain possible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2601.12449

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.12449 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.12449 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.