Gemma 4 E4B — Abliterated

Google's Gemma 4 E4B with refusal behavior removed via Heretic directional ablation. No fine-tuning — weights were directly modified to suppress the refusal direction in activation space.

Results

Metric Base Gemma 4 E4B Abliterated
Refusal Rate 41.2% (40/97) 2.0% (1/50)
KL Divergence — 0.034

Abliteration Details

  • Tool: Heretic v1.1.0
  • Hardware: NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory)
  • Time: ~6 hours
  • Trials: 30 (10 random + 20 TPE-guided via Optuna)
  • Best trial: #21 — 9/100 refusals, 0.034 KL divergence
  • Components modified: attn.o_proj + mlp.down_proj across all 42 layers
  • Parameters:
    • Direction index: per layer
    • attn.o_proj weights: 0.73–1.49 (peak at layer 24.83)
    • mlp.down_proj weights: 0.48–1.11 (peak at layer 35.18)

Acknowledgments

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support