Aleksei Dorkin's picture

Aleksei Dorkin PRO

adorkin

·

AI & ML interests

Computational Linguistics

Recent Activity

liked a model about 16 hours ago

kyutai/helium-1-2b

liked a model about 16 hours ago

speakleash/Bielik-11B-v3.0-Instruct

liked a Space about 16 hours ago

ZurichNLP/subword-tokenization

View all activity

Organizations

upvoted 2 articles 3 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

5 days ago

•

44

Article

🇵🇭 FilBench - Can LLMs Understand and Generate Filipino?

+7

Aug 12, 2025

•

23

upvoted a changelog 3 days ago

Changelog

Community Evals and Benchmark Repositories

4 days ago

• 32

upvoted a collection 4 days ago

Instruction Pretrained Experiments

Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1

upvoted a paper 7 days ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published 10 days ago • 87

upvoted 3 collections 9 days ago

Open Coding Agents

11 items • Updated 6 days ago • 45

MMFineReason

High-quality STEM reasoning dataset for Multimodal LLM post-training. • 14 items • Updated 6 days ago • 20

Continually pre-trained models

Language-specific LLMs continually pre-trained from fully open English base models • 2 items • Updated 19 days ago • 1

upvoted a changelog 10 days ago

Changelog

View Running Jobs Count from the User Menu

11 days ago

• 42

upvoted a collection 11 days ago

MS MARCO Mined Triplets

These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 16 items • Updated 11 days ago • 13

upvoted a collection 12 days ago

Trinity-Large

5 items • Updated 5 days ago • 39

upvoted an article 14 days ago

Article

Why Your AI Strategy Needs Hugging Face Storage

14 days ago

•

12

upvoted a changelog 17 days ago

Changelog

Sort Datasets by Size

17 days ago

• 79

upvoted 2 changelogs 18 days ago

Changelog

MLX Hardware Compatibility

18 days ago

• 44

Changelog

Sort Models by Parameter Size

18 days ago

• 33

upvoted a paper 18 days ago

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26, 2025 • 46

upvoted a collection 21 days ago

Jan 12 Releases

36 items • Updated 21 days ago • 2

upvoted 3 collections 24 days ago

MetricX-24

A collection of MetricX-24 models (https://aclanthology.org/2024.wmt-1.35/) • 6 items • Updated Jul 10, 2025 • 11

Llama-Embed-Nemotron-8B

State-of-the-Art Text Embedding Model • 3 items • Updated 5 days ago • 4

TranslateGemma

3 items • Updated 25 days ago • 206