arxiv:2405.07863
Wei Xiong
weqweasdas
AI & ML interests
Machine learning, RLHF
Organizations
models 23
weqweasdas/zephyr-7b-dpo-full
Text Generation • 7B • Updated
• 86
weqweasdas/zephyr-7b-gemma-dpo
Updated
weqweasdas/zephyr-7b-sft-full
Updated
weqweasdas/zephyr-7b-dpo-qlora
Updated
weqweasdas/gpt2-cpt-dutch
Text Generation • 0.1B • Updated
• 7
weqweasdas/zephyr-7b-gemma-sft
Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6_weight085
Text Generation • 7B • Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6
Text Generation • 7B • Updated
weqweasdas/raft_baseline_zephyr_packing_model6
Text Generation • 7B • Updated
weqweasdas/raft_baseline_openchat_llama13b_model1
Text Generation • 7B • Updated
datasets 261
weqweasdas/qwen15b_train_simple_subset5k_for_difficulty_transition
Viewer
• Updated
• 5k • 9
weqweasdas/ultrafeedback_binarized_processed
Viewer
• Updated
• 61.1k • 4
weqweasdas/qwen7b_prompt_difficult
Viewer
• Updated
• 15.7k • 7
weqweasdas/qwen7b_openr1_with_scores_sub
Viewer
• Updated
• 57.7k • 5
weqweasdas/qwen7b_openr1_with_scores_filtered_0375
Viewer
• Updated
• 24.3k • 7
weqweasdas/qwen7b_openr1_with_scores
Viewer
• Updated
• 75k • 4
weqweasdas/from_default_filtered_openr1_with_scores_filtered_05_and_filtered_allwrong
Viewer
• Updated
• 25k • 9
weqweasdas/validate
Viewer
• Updated
• 1.68k • 45
weqweasdas/dapo_with_scores
Viewer
• Updated
• 13k • 5
weqweasdas/dapo_and_openr1_can_be_evaluated_by_daporm_deduplicate_with_scores
Viewer
• Updated
• 34.1k • 3