Airside Pre-Flight Benchmark Joins AISI Evaluations Package

Aviation AI Benchmark Now Available Through UK's AI Security Institute's inspect_evals Framework
Aviation AI Benchmark Now Available Through UK's AI Security Institute's inspect_evals Framework
LONDON, UK - Airside Labs is pleased to announce that its Pre-Flight benchmark for aviation AI systems has been accepted into the community evaluations package included with inspect_evals on GitHub, making the specialized aviation testing tool more accessible to AI developers and researchers worldwide.
This integration means that Pre-Flight can now be easily run through the widely-used inspect framework, allowing organizations to evaluate AI systems against aviation-specific criteria using familiar tools and methodologies.
"Having Pre-Flight included in the community evaluations package represents an important step forward for aviation AI safety testing," said Alexis Brooker, founder of Airside Labs. "This integration makes it significantly easier for developers to include aviation-specific testing in their AI evaluation pipelines."
The inspect_evals framework provides standardized tools for AI system evaluation across various domains. With Pre-Flight now available as part of the community package, developers can run aviation-specific tests alongside other evaluations using a consistent interface and methodology.
Pre-Flight's test suite includes hundreds of validated questions covering air traffic management, flight operations, airport procedures, and regulatory compliance. These tests are designed to determine whether AI systems can accurately reason through complex aviation scenarios and identify potentially dangerous misunderstandings before deployment.
"As AI systems become more prevalent in aviation applications, the need for accessible, domain-specific testing becomes increasingly important," noted Brooker. "The inclusion in the community evaluations package means more developers can incorporate aviation safety testing into their workflows with minimal additional setup."
Developers and organizations can access Pre-Flight through the inspect_evals repository on GitHub. Airside Labs will continue to maintain and expand the aviation benchmark, with regular updates based on industry feedback and evolving AI capabilities.
For more information about Pre-Flight and how to use it within the inspect_evals framework, visit inspect_evals/src/inspect_evals/pre_flight at main · UKGovernmentBEIS/inspect_evals or access the helper repository directly here.

Airside Labs Team
Research & Development
The Airside Labs team comprises aviation experts, AI researchers, and safety-critical systems engineers dedicated to advancing AI evaluation methodologies. Our collective expertise spans air traffic management, ground operations, commercial aviation, and AI security.
Ready to enhance your AI testing?
Contact us to learn how AirsideLabs can help ensure your AI systems are reliable, compliant, and ready for production.
Book A Demo