xbench

community

https://xbench.org/

AI & ML interests

None defined yet.

Recent Activity

huxueyu submitted a paper 6 days ago

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Lucky2022 updated a dataset 11 days ago

xbench/AgentIF-OneDay

Lucky2022 published a dataset 21 days ago

xbench/AgentIF-OneDay

View all activity

huxueyu

submitted a paper to Daily Papers 6 days ago

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Paper • 2601.20613 • Published 12 days ago • 10

Lucky2022

updated a dataset 11 days ago

xbench/AgentIF-OneDay

Viewer • Updated 11 days ago • 58 • 244 • 3

Lucky2022

published a dataset 21 days ago

xbench/AgentIF-OneDay

Viewer • Updated 11 days ago • 58 • 244 • 3

huxueyu

updated a dataset 24 days ago

xbench/AgentIF-OneDay

Viewer • Updated 11 days ago • 58 • 244 • 3

huxueyu

in xbench/AgentIF-OneDay 24 days ago

Update README.md

#8 opened 24 days ago by

huxueyu

in xbench/AgentIF-OneDay 25 days ago

Update README.md

#7 opened 25 days ago by

Create README.md

#6 opened 25 days ago by

Delete README.md

#5 opened 25 days ago by

huxueyu

in xbench/AgentIF-OneDay 26 days ago

Upload data.jsonl

#4 opened 26 days ago by

huxueyu

in xbench/AgentIF-OneDay about 1 month ago

Upload 132 files

#3 opened about 1 month ago by

Upload 132 files

#2 opened about 1 month ago by

Upload data.jsonl

#1 opened about 1 month ago by

Lucky2022

authored a paper 3 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 38

lyangpku

published a dataset 4 months ago

xbench/DeepSearch-2510

Viewer • Updated Oct 24, 2025 • 100 • 191 • 2

lyangpku

updated a dataset 4 months ago

xbench/DeepSearch-2510

Viewer • Updated Oct 24, 2025 • 100 • 191 • 2

Lucky2022

authored a paper 8 months ago

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Paper • 2506.13651 • Published Jun 16, 2025 • 8

lyangpku

updated 2 datasets 8 months ago

xbench/ScienceQA

Viewer • Updated Jun 18, 2025 • 100 • 36 • 8

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 273 • 12

lyangpku

published 2 datasets 9 months ago

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 273 • 12

xbench/ScienceQA

Viewer • Updated Jun 18, 2025 • 100 • 36 • 8