Latest Evaluation Results

    Real-time performance dashboard for AI models tested on the Pre-Flight Aviation AI Benchmark

    Pre-flight Performance Over Time

    OpenAI
    Alibaba
    Anthropic
    Meta AI
    Jan 2024Jul 2024Jan 2025Sep 2025Release Date0%25%50%75%100%ScoreHuman Expert Baseline (95%)
    • OpenAI
    • Alibaba
    • Anthropic
    • Meta AI
    • Google
    • NVIDIA

    Models Tested

    16

    Top Score

    75.0%

    Organizations

    6

    Model Rankings

    RankModelOrganizationScoreRelease Date
    1
    NVIDIA Llama 3.1 Nemotron
    NVIDIA
    75.0%± 2.5%
    Apr 2025
    2
    Claude 3.7 Sonnet
    Anthropic
    74.7%± 2.5%
    Feb 2025
    3
    GPT-4o Mini
    OpenAI
    73.3%± 2.6%
    Jul 2024
    4
    GPT-4o (Nov 2024)
    OpenAI
    73.3%± 2.6%
    Nov 2024
    5
    Claude 4.5 Sonnet
    Anthropic
    72.0%± 2.6%
    Sep 2025
    6
    Llama 4 Scout
    Meta AI
    71.7%± 2.6%
    Apr 2025
    7
    Claude 3.5 Sonnet (Oct 2024)
    Anthropic
    71.3%± 2.6%
    Oct 2024
    8
    Llama 3 70B (Groq)
    Meta AI
    70.7%± 2.6%
    Apr 2024
    9
    Claude 3 Haiku
    Anthropic
    68.3%± 2.7%
    Mar 2024
    10
    Claude 3.5 Haiku
    Anthropic
    66.7%± 2.7%
    Oct 2024
    11
    GPT-4 Preview (Jan 2024)
    OpenAI
    66.0%± 2.7%
    Jan 2024
    12
    Llama 3 8B (Groq)
    Meta AI
    66.0%± 2.7%
    Apr 2024
    13
    GPT-3.5 Turbo
    OpenAI
    64.0%± 2.8%
    Jan 2024
    14
    Gemma 2 9B
    Google
    62.3%± 2.8%
    Jul 2024
    15
    Qwen QWQ 32B
    Alibaba
    58.7%± 2.8%
    Mar 2024
    16
    Llama 3.1 8B
    Meta AI
    55.7%± 2.9%
    Sep 2024