bluelightai-dev/clt-eval-modernbert-tokenized
Viewer
• Updated • 328k • 60
bluelightai-dev/clt-train-modernbert-tokenized
Viewer
• Updated • 1.94M • 74
bluelightai-dev/clt-pretrain-data-v3-eval-tokenized-Qwen3-256
Viewer
• Updated • 212k • 17
bluelightai-dev/clt-pretrain-data-v3-tokenized-Qwen3-max-1024
Viewer
• Updated • 4.04M • 43
bluelightai-dev/clt-pretrain-data-v3-tokenized-qwen3
Viewer
• Updated • 1.81M • 33
bluelightai-dev/clt-pretrain-data-v3
Viewer
• Updated • 2.99M • 36
bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample
Viewer
• Updated • 6.32M • 19
bluelightai-dev/dolma3_mix-150B-1025-sample
Viewer
• Updated • 4.97M • 30
bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
• Updated • 115k • 41
bluelightai-dev/clt-mixed-eval-data
Viewer
• Updated • 60k • 8
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
• Updated • 2.6M • 25
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
• Updated • 194k • 45
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
• Updated • 2.52M • 43
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
• Updated • 24
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
• Updated • 2.44M • 45
bluelightai-dev/clt-pretrain-data-v2
Preview
• Updated • 40
bluelightai-dev/MathPile_Commercial-formatted
Viewer
• Updated • 389k • 27
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
• Updated • 1.34M • 59
bluelightai-dev/common-corpus-sample-open-web
Viewer
• Updated • 4.8M • 43
bluelightai-dev/common-corpus-sample-open-source
Viewer
• Updated • 2.02M • 14
bluelightai-dev/common-corpus-sample-open-science
Viewer
• Updated • 284k • 10
bluelightai-dev/common-corpus-sample-open-government
Viewer
• Updated • 373k • 39
• 1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
• Updated • 462k • 22
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
• Updated • 1.22k • 6
bluelightai-dev/dclm-full-deduped-sample
Viewer
• Updated • 4.92M • 53
bluelightai-dev/the-stack-dedup-sample
Viewer
• Updated • 474k • 31
bluelightai-dev/pythia_clt_pretrain_data_tokenized
Viewer
• Updated • 3.5M • 62
bluelightai-dev/clt_eval_data_qwen3_tokenized_256
Viewer
• Updated • 245k • 18
bluelightai-dev/clt_pretrain_data_qwen_tokenized
Viewer
• Updated • 16.7M • 115
bluelightai-dev/clt_posttrain_data_qwen_tokenized
Viewer
• Updated • 1.34M • 27