Yotam Perlitz's picture

3 4 7

Yotam Perlitz

per

AI & ML interests

None yet

Recent Activity

authored a paper 28 days ago

Holmes: Benchmark the Linguistic Competence of Language Models

authored a paper 28 days ago

JuStRank: Benchmarking LLM Judges for System Ranking

liked a Space about 1 month ago

aialliance/safetybat

View all activity

Articles

Bamba: Inference-Efficient Hybrid Mamba2 Model

Organizations

per's activity

authored 2 papers 28 days ago

Holmes: Benchmark the Linguistic Competence of Language Models

Paper • 2404.18923 • Published Apr 29, 2024

JuStRank: Benchmarking LLM Judges for System Ranking

Paper • 2412.09569 • Published Dec 12, 2024 • 19

liked a Space about 1 month ago

🏋️‍♂️

Safety BAT

updated a Space about 1 month ago

🧑🏻‍⚖️

JuStRank

commented a paper about 1 month ago

JuStRank: Benchmarking LLM Judges for System Ranking

Paper • 2412.09569 • Published Dec 12, 2024 • 19 •

upvoted a paper about 1 month ago

JuStRank: Benchmarking LLM Judges for System Ranking

Paper • 2412.09569 • Published Dec 12, 2024 • 19

liked a Space about 1 month ago

🧑🏻‍⚖️

JuStRank

liked a Space about 2 months ago

Running on CPU Upgrade

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

updated a Space 2 months ago

🏋️‍♂️

BenchBench Leaderboad

liked a Space 3 months ago

🏋️‍♂️

BenchBench Leaderboad

updated a Space 3 months ago

🏋️‍♂️

BenchBench Leaderboad

upvoted a paper 5 months ago

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

Paper • 2407.13696 • Published Jul 18, 2024 • 5

authored 2 papers 6 months ago

Efficient Benchmarking (of Language Models)

Paper • 2308.11696 • Published Aug 22, 2023

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

Paper • 2407.13696 • Published Jul 18, 2024 • 5

New activity in SEACrowd/flores200 6 months ago

fix small bug in instructions

#1 opened 6 months ago by

updated a collection 6 months ago

✨ Highlights

4 items • Updated Aug 15, 2024 • 1

New activity in per/benchbench 6 months ago

Update README.md

#1 opened 6 months ago by

liked a Space 6 months ago

🏋️‍♂️

BenchBench Leaderboad

upvoted a paper 12 months ago

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Paper • 2401.14019 • Published Jan 25, 2024 • 21

authored a paper 12 months ago

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Paper • 2401.14019 • Published Jan 25, 2024 • 21