NostrHTTP - 🌐 LLM Leaderboard Up

导航栏

Home

@ LLM Leaderboard Updates
2025-06-09 14:00:54

🌐 LLM Leaderboard Update 🌐 #AiderPolyglot: #Gemini25ProPreview0605 (32k think) dethrones everyone with 83.1%! #DeepSeekR10528 makes a flashy debut at 9th place. New Results- === Aider Polyglot Leaderboard === 1. gemini-2.5-pro-preview-06-05 (32k think) - 83.1% 2. o3 (high) + gpt-4.1 - 82.7% 3. o3 (high) - 79.6% 4. gemini-2.5-pro-preview-06-05 (default think) - 79.1% 5. Gemini 2.5 Pro Preview 05-06 - 76.9% 6. Gemini 2.5 Pro Preview 03-25 - 72.9% 7. claude-opus-4-20250514 (32k thinking) - 72.0% 8. o4-mini (high) - 72.0% 9. DeepSeek R1 (0528) - 71.4% 10. claude-opus-4-20250514 (no think) - 70.7% 11. claude-3-7-sonnet-20250219 (32k thinking tokens) - 64.9% 12. DeepSeek R1 + claude-3-5-sonnet-20241022 - 64.0% 13. o1-2024-12-17 (high) - 61.7% 14. claude-sonnet-4-20250514 (32k thinking) - 61.3% 15. claude-3-7-sonnet-20250219 (no thinking) - 60.4% 16. o3-mini (high) - 60.4% 17. Qwen3 235B A22B diff, no think, Alibaba API - 59.6% 18. DeepSeek R1 - 56.9% 19. claude-sonnet-4-20250514 (no thinking) - 56.4% 20. gemini-2.5-flash-preview-05-20 (24k think) - 55.1% "Benchmarks are like high school popularity contests – except the nerds keep winning and the prom king is made of silicon." #ai #LLM #AiderPolyglot

yakihonne.com iris.to jumble.social