Open to Collab

s3nh PRO

s3nh

s3nhxx
s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

liked a model 3 days ago

merve/rf-detr-mobile-ui

repliedto their post 3 days ago

Existing methods — GPTQ, AWQ, llama.cpp's k-quants — minimize empirical loss heuristically. None of them prove they are optimal in any information-theoretic sense. ICRB-Q builds a quantization scheme that is provably optimal via the Cramér-Rao lower bound (CRB): no unbiased estimator of a weight can have lower variance than [F(θ)]⁻¹, where F is the Fisher information matrix.

posted an update 3 days ago

View all activity

Organizations

liked a model 3 days ago

merve/rf-detr-mobile-ui

Object Detection • 33.4M • Updated 5 days ago • 37 • 1

replied to their post 3 days ago

Standard quantization places levels on a uniform grid. ICRB-Q places them on geodesics of the Fisher-Rao statistical manifold — the Riemannian manifold (M, g_F) where the metric tensor is the Fisher information. This means:

High-Fisher-curvature regions (where small weight changes cause large output changes) get exponentially denser levels.
Low-curvature, "flat" regions (e.g. many heads in early transformer layers) get coarse 2-bit or 3-bit quantization automatically.
The codebook construction reduces to solving: place 2^b points in parameter space to minimize expected geodesic distance from any weight to its nearest level.

This strictly generalizes AWQ's per-channel scaling (which is a zero-order approximation to this manifold geometry) and GPTQ's second-order correction (which is a local linearization).

posted an update 3 days ago

Post

108

1 reply

liked 3 models 22 days ago

updated a model 23 days ago

s3nh/OProver-8B-Base-GGUF

8B • Updated 23 days ago • 73

published a model 23 days ago

s3nh/OProver-8B-Base-GGUF

8B • Updated 23 days ago • 73

updated a model 24 days ago

s3nh/BitCPM4-CANN-3B-GGUF

4B • Updated 24 days ago • 182 • 1

published a model 24 days ago

s3nh/BitCPM4-CANN-3B-GGUF

4B • Updated 24 days ago • 182 • 1

liked a model 24 days ago

crogers2287/Intern-S2-Preview-FP8-GGUF

Text Generation • Updated 25 days ago • 1

liked a model 25 days ago

OmerHagage/ltx2-ume-pixelart-lora

Text-to-Video • Updated 30 days ago • 2

reacted to codelion's post with 🔥 25 days ago

Post

3415

Scaling Pedagogical Pre-training to 10 Billion Tokens

New blog post exploring what happens when you take optimal data mixing insights and scale up the data generation itself.

We built Sutra, a multi-stage framework for generating pedagogical pre-training data guided by a knowledge graph of ~2,000 concepts across 9 domains. The pipeline includes structured content generation, six-dimension quality evaluation, diversity management across 20 content styles, and a cleaning stage to prevent collapse.

The result is codelion/sutra-10B, a 10.2 billion token pedagogical dataset with rich metadata (domain, complexity, prerequisites, quality scores) on every entry.

We trained codelion/SmolLM2-70M on it for 3 full epochs (30.6B tokens) on a single A10 GPU in ~78 hours.

Key finding: perplexity kept improving across epochs, but benchmark gains plateaued fast. At 70M parameters, the model hits a representational ceiling that more data alone can't break through.

Full writeup with comparisons against 7 other datasets, detailed benchmark breakdowns, and connections to recent work on synthetic data scaling, curriculum learning, and data mixing laws: https://huggingface.co/blog/codelion/scaling-pedagogical-pretraining-10-billion-tokens

All datasets at multiple scales (10M, 100M, 1B, 10B) plus seed concepts and an SFT variant are in the Sutra Pedagogical Datasets collection.

2 replies

reacted to blanchon's post with ❤️ 28 days ago

Post

2649

I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280×720 · 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective

https://huggingface.co/collections/blanchon/opencs2