Introducing the Pre-flight AI Benchmark for Aviation Safety
At Airside Labs, we're committed to advancing aviation technology through innovative AI solutions while maintaining the industry's paramount focus on safety. Recently, we had the opportunity to respond to the UK Civil Aviation Authority's (CAA) "Call for Insights - AI in Aerospace" initiative, sharing our expertise and vision for the future of AI evaluation in aviation.
Our Core Focus: The Pre-flight AI Benchmark
The centerpiece of our submission was our Pre-flight AI Benchmark - an open source evaluation framework designed to systematically assess the capabilities and limitations of AI systems in aviation contexts. Announced at the Royal Aeronautical Society AI summit in late 2024, this initiative brings together experts from across the aviation industry to develop rigorous testing methods for AI applications.
What is Pre-flight?
Pre-flight is a test suite with hundreds of aviation-specific questions and scenarios derived from ICAO and other international regulations and standards. The benchmark evaluates AI understanding across multiple dimensions:
- Regulatory knowledge and compliance awareness
- Operational procedures and best practices
- Spatial reasoning for navigation and flight planning
- Decision-making in standard and non-standard situations
- Recognition of safety boundaries and limitations
Our approach includes both a public open-source benchmark for baseline testing and a private benchmark for more rigorous evaluation, allowing aviation stakeholders to track AI model performance improvements across iterations.
Benefits of Standardised AI Evaluation
In our submission to the CAA, we highlighted several key benefits of the Pre-flight benchmark:
Safety Improvements
- Provides a standardised methodology and framework to use (inspect_evals) to identify potential AI failures before deployment
- Reduces the risk of incorrect information being provided by AI systems in operational settings by giving developers a starting set of tests
- Creates a framework for continuous improvement of AI applications available to all aviation stakeholders as a starting point
- Enables proactive identification of edge cases where AI systems might perform poorly
Economic Benefits
- Accelerates safe AI adoption in aviation by providing clear evaluation criteria
- Reduces time-to-market for aviation AI applications
- Lowers regulatory compliance costs through standardized testing methodologies
Regulatory Efficiency
- Offers regulators data on AI system limitations
- Supports risk-based approaches by identifying systems that require closer scrutiny
- Reduces regulatory burden through standardized methodologies
Challenges and Risks
We also acknowledged several challenges that need to be addressed:
Technical Challenges
- Ensuring comprehensive coverage of aviation scenarios
- Preventing AI systems from optimising specifically for the benchmark rather than real-world performance
- Distinguishing between pattern matching and genuine understanding
Regulatory Challenges
We identified the following challenges impeding the adoption of standardised tests and evaluations for AI in aerospace:
- Absence of established standards for evaluating AI systems in aviation
- Uncertainty around acceptable performance metrics and safety thresholds
- Classification of AI evaluation tools within the regulatory landscape
- Cross-border recognition of evaluation results
- Friction slowing innovation, caused through the introduction of standards onto commercial products
Collaborating with the CAA
We believe that close collaboration between industry and regulators is essential for the safe integration of AI into aviation. In our submission, we proposed several opportunities for working with the CAA:
- Establishing a joint working group to develop standardised evaluation criteria
- Participating in CAA sandbox initiatives to test our benchmark methodology, feeding back into the open source initiative
- Contributing to assessment guidelines for non-safety-critical AI applications
- Co-authoring white papers on proportionate regulation of AI in aviation
- Developing training materials for CAA staff on evaluating AI systems
Final Thoughts
We commended the CAA's principles-led approach to AI regulation, which focuses on outcomes rather than prescriptive technical requirements. This approach allows for innovation while maintaining safety standards.
Looking ahead, we suggested that the CAA consider:
- Developing a tiered regulatory framework based on the criticality of AI applications.
- Prioritising cross-border harmonisation to prevent regulatory fragmentation, i.e. engage with CANSO, ACI, Airports AI Alliance and others.
- Leveraging industry expertise through regular consultation, seeking input from a wide range of experts rather than relying on a select few.
At Airside Labs, we're committed to advancing the safe adoption of AI in aviation through initiatives like the Pre-flight benchmark. By providing standardised evaluation methods, we can help ensure that AI technologies enhance the adoption of AI in aerospace and aviation in a responsible way.
If you're interested in learning more about the Pre-flight benchmark or collaborating with us, please visit our benchmark page or contact us directly.