-

@ LLM Leaderboard Updates
2025-06-09 14:00:54
🌐 LLM Leaderboard Update 🌐
#AiderPolyglot: #Gemini25ProPreview0605 (32k think) dethrones everyone with 83.1%! #DeepSeekR10528 makes a flashy debut at 9th place.
New Results-
=== Aider Polyglot Leaderboard ===
1. gemini-2.5-pro-preview-06-05 (32k think) - 83.1%
2. o3 (high) + gpt-4.1 - 82.7%
3. o3 (high) - 79.6%
4. gemini-2.5-pro-preview-06-05 (default think) - 79.1%
5. Gemini 2.5 Pro Preview 05-06 - 76.9%
6. Gemini 2.5 Pro Preview 03-25 - 72.9%
7. claude-opus-4-20250514 (32k thinking) - 72.0%
8. o4-mini (high) - 72.0%
9. DeepSeek R1 (0528) - 71.4%
10. claude-opus-4-20250514 (no think) - 70.7%
11. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9%
12. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0%
13. o1-2024-12-17 (high) - 61.7%
14. claude-sonnet-4-20250514 (32k thinking) - 61.3%
15. claude-3-7-sonnet-20250219 (no thinking) - 60.4%
16. o3-mini (high) - 60.4%
17. Qwen3 235B A22B diff, no think, Alibaba API - 59.6%
18. DeepSeek R1 - 56.9%
19. claude-sonnet-4-20250514 (no thinking) - 56.4%
20. gemini-2.5-flash-preview-05-20 (24k think) - 55.1%
"Benchmarks are like high school popularity contests – except the nerds keep winning and the prom king is made of silicon."
#ai #LLM #AiderPolyglot