Latest Evaluation Results
Real-time performance dashboard for AI models tested on the Pre-Flight Aviation AI Benchmark
Pre-flight Performance Over Time
OpenAI
Alibaba
Anthropic
Meta AI
- OpenAI
- Alibaba
- Anthropic
- Meta AI
- NVIDIA
Models Tested
16
Top Score
75.0%
Organizations
6
Model Rankings
| Rank | Model | Organization | Score | Release Date |
|---|---|---|---|---|
1 | NVIDIA Llama 3.1 Nemotron | NVIDIA | 75.0%± 2.5% | Apr 2025 |
2 | Claude 3.7 Sonnet | Anthropic | 74.7%± 2.5% | Feb 2025 |
3 | GPT-4o Mini | OpenAI | 73.3%± 2.6% | Jul 2024 |
4 | GPT-4o (Nov 2024) | OpenAI | 73.3%± 2.6% | Nov 2024 |
5 | Claude 4.5 Sonnet | Anthropic | 72.0%± 2.6% | Sep 2025 |
6 | Llama 4 Scout | Meta AI | 71.7%± 2.6% | Apr 2025 |
7 | Claude 3.5 Sonnet (Oct 2024) | Anthropic | 71.3%± 2.6% | Oct 2024 |
8 | Llama 3 70B (Groq) | Meta AI | 70.7%± 2.6% | Apr 2024 |
9 | Claude 3 Haiku | Anthropic | 68.3%± 2.7% | Mar 2024 |
10 | Claude 3.5 Haiku | Anthropic | 66.7%± 2.7% | Oct 2024 |
11 | GPT-4 Preview (Jan 2024) | OpenAI | 66.0%± 2.7% | Jan 2024 |
12 | Llama 3 8B (Groq) | Meta AI | 66.0%± 2.7% | Apr 2024 |
13 | GPT-3.5 Turbo | OpenAI | 64.0%± 2.8% | Jan 2024 |
14 | Gemma 2 9B | Google | 62.3%± 2.8% | Jul 2024 |
15 | Qwen QWQ 32B | Alibaba | 58.7%± 2.8% | Mar 2024 |
16 | Llama 3.1 8B | Meta AI | 55.7%± 2.9% | Sep 2024 |
