ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models
Abstract
ImplicitMemBench presents a novel benchmark for evaluating implicit memory in LLM agents through procedural memory, priming, and classical conditioning constructs, revealing significant performance gaps compared to human baselines.
Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval. This gap is critical: effective assistants must automatically apply learned procedures or avoid failed actions without explicit reminders. We introduce ImplicitMemBench, the first systematic benchmark evaluating implicit memory through three cognitively grounded constructs drawn from standard cognitive-science accounts of non-declarative memory: Procedural Memory (one-shot skill acquisition after interference), Priming (theme-driven bias via paired experimental/control instances), and Classical Conditioning (Conditioned Stimulus--Unconditioned Stimulus (CS--US) associations shaping first decisions). Our 300-item suite employs a unified Learning/Priming-Interfere-Test protocol with first-attempt scoring. Evaluation of 17 models reveals severe limitations: no model exceeds 66% overall, with top performers DeepSeek-R1 (65.3%), Qwen3-32B (64.1%), and GPT-5 (63.0%) far below human baselines. Analysis uncovers dramatic asymmetries (inhibition 17.6% vs. preference 75.0%) and universal bottlenecks requiring architectural innovations beyond parameter scaling. ImplicitMemBench reframes evaluation from "what agents recall" to "what they automatically enact".
Community
Beyond explicit recall: we benchmark whether LLMs can learn from experience and adapt behavior without conscious retrieval.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents (2026)
- Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations (2026)
- MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization (2026)
- Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation (2026)
- AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment (2026)
- Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs (2026)
- MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.08064 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper