CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
•
Updated
•
1.54k
•
908
•
18
Updated
•
85
•
6
rootsautomation/RICO-ScreenQA
Viewer
•
Updated
•
86k
•
312
•
10
rootsautomation/ScreenSpot
Viewer
•
Updated
•
1.27k
•
1.78k
•
43
Viewer
•
Updated
•
1.27k
•
427
•
7
Viewer
•
Updated
•
1.59k
•
4.15k
•
42
Preview
•
Updated
•
1.62k
•
15
Preview
•
Updated
•
2.43k
•
25
Viewer
•
Updated
•
168k
•
427
•
5
Preview
•
Updated
•
19
osunlp/Multimodal-Mind2Web
Viewer
•
Updated
•
14.2k
•
3.27k
•
88
Viewer
•
Updated
•
259
•
76
•
2
Viewer
•
Updated
•
253
•
1.73k
•
114
Viewer
•
Updated
•
7.74k
•
11.1k
•
26
xlangai/ubuntu_osworld_file_cache
Updated
•
363k
•
2
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
•
2409.08264
•
Published
•
48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
•
2405.14573
•
Published
Viewer
•
Updated
•
1.21k
•
173
•
5