arxiv:2501.02506
Zhiheng Xi
WooooDyy
AI & ML interests
None yet
Recent Activity
authored
a paper
8 days ago
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models
in Multi-Hop Tool Use
updated
a dataset
about 2 months ago
MathCritique/MathCritique-76k
authored
a paper
3 months ago
TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models
Organizations
Papers
17
models
None public yet
datasets
None public yet