Question 1

What is the Pre-Flight aviation AI benchmark?

Accepted Answer

Pre-Flight is an open-source benchmark suite with 300+ questions designed to evaluate AI models on aviation-specific knowledge. It tests comprehension of ICAO standards, airport operations, safety procedures, emergency response, regulatory compliance, and complex reasoning required for aviation applications. The benchmark is available on GitHub and integrated into the UK AISI community evaluations package.

Question 2

How can I use the Pre-Flight benchmark for my aviation AI?

Accepted Answer

The Pre-Flight benchmark is open-source and available on GitHub. You can download the question set, run evaluations against your AI model, and compare results against published baselines from leading LLMs. The benchmark supports automated evaluation pipelines and is compatible with standard AI testing frameworks. Detailed documentation and implementation guides are provided in the repository.

Question 3

What aviation domains does Pre-Flight cover?

Accepted Answer

Pre-Flight covers comprehensive aviation domains including ICAO Annex standards and recommended practices, airport operations and ground handling, air traffic control procedures, flight safety and emergency response, aircraft systems and performance, meteorology and weather operations, regulatory compliance and documentation, and maintenance and engineering practices.

Question 4

How does Pre-Flight compare to general AI benchmarks?

Accepted Answer

Unlike general benchmarks like MMLU that test broad knowledge, Pre-Flight focuses exclusively on aviation domain expertise. AI models that score 90%+ on general benchmarks often perform significantly worse on Pre-Flight because aviation requires specialized terminology, regulatory knowledge, safety-critical reasoning, and practical operational understanding that general training doesn't provide.

Question 5

Is Pre-Flight suitable for safety-critical aviation AI validation?

Accepted Answer

Pre-Flight provides essential domain knowledge validation but is one component of comprehensive aviation AI testing. Safety-critical systems require additional adversarial testing, failure mode analysis, regulatory compliance validation, and expert review. For aviation AI deployed in operational environments, Pre-Flight should be combined with specialized safety testing and certification processes.

Stand	Current/Next Departure	Next Arrival
A	BA23 with TOBT 07:55	BA24 at 08:45
B	AA12 with TOBT 08:10	AA14 at 09:00
C	Currently vacant	ETD112 at 08:15
D	ETD17 with TOBT 08:40	ETD234 at 09:30

Model	Accuracy	Correct Answers	Total Samples
anthropic/claude-3-7-sonnet-20250219	0.747	224	300
openai/gpt-4o-2024-11-20	0.733	220	300
openai/gpt-4o-mini-2024-07-18	0.733	220	300
anthropic/claude-3-5-sonnet-20241022	0.713	214	300
groq/llama3-70b-8192	0.707	212	300
anthropic/claude-3-haiku-20240307	0.683	205	300
anthropic/claude-3-5-haiku-20241022	0.667	200	300
groq/llama3-8b-8192	0.660	198	300
openai/gpt-4-0125-preview	0.660	198	300
openai/gpt-3.5-turbo-0125	0.640	192	300
groq/gemma2-9b-it	0.623	187	300
groq/qwen-qwq-32b	0.587	176	300
groq/llama-3.1-8b-instant	0.557	167	300

Pre-Flight Aviation AI Benchmark

Dataset Overview

Topics Covered

Dataset Sections

Example Questions

Example 1: Basic Aviation Safety

Example 2: Complex Reasoning Scenario

Scoring Methodology

Evaluation Report

Test Your AI on Aviation Safety