Latest Evaluation Results
Real-time performance dashboard for AI models tested on the Pre-Flight Aviation AI Benchmark
Pre-flight Performance Over Time
OpenAI
Alibaba
Anthropic
Meta AI
- OpenAI
- Alibaba
- Anthropic
- Meta AI
- NVIDIA
Models Tested
18
Top Score
75.0%
Organizations
6
Model Rankings
| Rank | Model | Organization | Score | Release Date |
|---|---|---|---|---|
|
1
|
NVIDIA Llama 3.1 Nemotron |
NVIDIA
|
75.0%± 2.5%
|
Apr 2025 |
|
2
|
Claude 3.7 Sonnet |
Anthropic
|
74.7%± 2.5%
|
Feb 2025 |
|
3
|
GPT-4o Mini |
OpenAI
|
73.3%± 2.6%
|
Jul 2024 |
|
4
|
GPT-4o (Nov 2024) |
OpenAI
|
73.3%± 2.6%
|
Nov 2024 |
|
5
|
Llama 3.3 70B |
Meta AI
|
72.3%± 2.6%
|
Dec 2024 |
|
6
|
Claude 4.5 Sonnet |
Anthropic
|
72.0%± 2.6%
|
Sep 2025 |
|
7
|
Llama 4 Scout |
Meta AI
|
71.7%± 2.6%
|
Apr 2025 |
|
8
|
Claude 3.5 Sonnet (Oct 2024) |
Anthropic
|
71.3%± 2.6%
|
Oct 2024 |
|
9
|
Llama 3 70B (Groq) |
Meta AI
|
70.7%± 2.6%
|
Apr 2024 |
|
10
|
GPT-OSS 120B |
OpenAI
|
69.0%± 2.7%
|
May 2025 |
|
11
|
Claude 3 Haiku |
Anthropic
|
68.3%± 2.7%
|
Mar 2024 |
|
12
|
Claude 3.5 Haiku |
Anthropic
|
66.7%± 2.7%
|
Oct 2024 |
|
13
|
GPT-4 Preview (Jan 2024) |
OpenAI
|
66.0%± 2.7%
|
Jan 2024 |
|
14
|
Llama 3 8B (Groq) |
Meta AI
|
66.0%± 2.7%
|
Apr 2024 |
|
15
|
GPT-3.5 Turbo |
OpenAI
|
64.0%± 2.8%
|
Jan 2024 |
|
16
|
Gemma 2 9B |
Google
|
62.3%± 2.8%
|
Jul 2024 |
|
17
|
Qwen QWQ 32B |
Alibaba
|
58.7%± 2.8%
|
Mar 2024 |
|
18
|
Llama 3.1 8B |
Meta AI
|
55.7%± 2.9%
|
Sep 2024 |
