Airside Labs - AI Security Testing and Compliance
    Security

    Prompt Injection Risk in Aviation

    2026-03-12Alex Brooker
    Prompt Injection Risk in Aviation

    1,776 adversarial test cases against LLMs processing standard aviation data formats reveal systematic vulnerabilities in how large language models handle prompt injection in safety-critical contexts.

    The aviation industry is adopting AI faster than it's securing it. Large Language Models are being deployed to process NOTAMs, assist with pre-flight briefings, accelerate maintenance intelligence, and support regulatory compliance — and for good reason. The operational benefits are too significant to ignore. But this adoption introduces a vulnerability that traditional cybersecurity frameworks weren't built to handle.

    We ran 1,776 adversarial test cases against LLMs processing standard aviation data formats. The results highlight significant risks that organisations deploying AI in aviation operations need to understand.

    What Is Prompt Injection?

    Prompt injection exploits how AI systems process natural language. Unlike traditional software, LLMs cannot reliably distinguish between three types of input: legitimate operational data they should process, system instructions they should follow, and adversarial instructions they should reject. When an AI system ingests aviation data containing embedded malicious instructions, it may treat that adversarial content as authoritative — and act on it.

    This isn't a theoretical concern. It's a fundamental property of how these systems work.

    Why Aviation Is Uniquely Vulnerable

    Aviation's data architecture is, unfortunately, an ideal attack surface for this kind of exploit. The industry relies on extensive free-text fields designed for human interpretation — the E) section of a NOTAM, flight plan remarks, eAIP procedure descriptions. Data flows through multi-stakeholder chains with varying levels of security oversight. Legacy formats constrain what input validation is even possible. And decades of high-trust operational culture means data from legitimate sources is assumed to be trustworthy.

    The critical insight is this: an adversary doesn't need to break into a system. They just need to craft data that legitimate systems will process.

    Consider a NOTAM with a valid header, valid ICAO location indicator, valid date range — but with adversarial instructions embedded in the free-text body. A human controller would recognise this as nonsense. An AI system processing thousands of NOTAMs to generate a pre-flight briefing might not.

    What We Found

    We designed a systematic testing programme across 1,776 aviation-specific scenarios, evaluating how commercial AI models handle adversarial inputs embedded in standard aviation data formats. The full methodology is detailed in our white paper.

    The headline findings:

    43% of prompt injection attacks produced incorrect safety-critical outputs. AI systems processing NOTAMs with embedded malicious instructions provided wrong safety assessments in nearly half of all test cases.

    Individual attack strategies varied in effectiveness, with success rates ranging from 24% to 65% depending on the technique used. Across all attack types, the models exhibited systematic overconfidence — expressing high certainty even when providing manipulated, incorrect answers.

    Three patterns stood out:

    The models struggle with constraint-based reasoning. Aviation operations depend on rules, restrictions, and conditional logic ("RWY 09R closed between 0600-1400 except emergency"). AI systems frequently failed to maintain these constraints when adversarial inputs contradicted them.

    Overconfidence is the norm, not the exception. A system that says "I'm not sure" is manageable. A system that confidently tells you a closed runway is available is dangerous. The tested models consistently presented manipulated outputs with the same certainty as legitimate ones.

    The highest-risk scenarios are exactly where AI seems most useful. Time-constrained operational decisions using complex, multi-source data — that's where AI assistance is most valuable, where trust in AI outputs is highest, and where prompt injection is most dangerous.

    What This Means for the Industry

    Our risk assessment indicates medium to high attack likelihood as AI deployment expands and adversary capabilities mature. Severity depends on deployment context: low for customer-facing chatbots, potentially critical if AI systems are deployed in air traffic management decision support.

    It's worth being precise about scope. These experiments were conducted against isolated model instances. Actual success rates in operational deployments with human oversight, validation layers, and defence-in-depth would differ. The point isn't that deployed systems are currently being exploited — it's that the vulnerability exists at the model level, and mitigation needs to be designed in, not bolted on.

    This is particularly timely. EASA Level 1 AI approvals were expected in 2025, and FAA Certification Position Papers are anticipated in Q1 2026. The regulatory frameworks are being written now. Prompt injection needs to be part of that conversation.

    This Doesn't Mean We Should Stop

    This vulnerability need not halt aviation's AI adoption. It means we need to adopt AI with our eyes open.

    The white paper includes actionable recommendations for each stakeholder group:

    Air Navigation Service Providers should inventory their AI deployments, establish operational guidelines that acknowledge AI limitations, and implement input validation on data pipelines feeding AI systems.

    Technology vendors should adopt secure development lifecycles that include adversarial testing, deploy layered defences rather than relying on model-level guardrails alone, and provide operational transparency about what their systems can and cannot withstand.

    Standards bodies should develop aviation-specific AI security frameworks and facilitate collective testing programmes so the industry isn't duplicating effort.

    Regulators should establish AI governance frameworks that balance safety requirements with innovation — because this technology is coming regardless, and well-governed adoption is safer than ungoverned adoption.

    Get the Full Research

    The complete white paper includes our full methodology, the attack taxonomy, detailed results across all 1,776 test cases, and the mitigation framework.

    Download the white paper (PDF) →

    For qualified organisations, we also provide access to the adversarial test cases and example attacks used in this research, under responsible disclosure terms.

    Request access to adversarial examples →


    Airside Labs provides independent AI red-teaming, benchmarking, and validation for safety-critical aviation systems. If you're deploying AI in aviation and want to know how it holds up under adversarial conditions, get in touch.

    Alex Brooker

    Alex Brooker

    Founder & CEO, Airside Labs

    Alex Brooker brings over 25 years of experience in aviation, defense, and safety-critical systems. Former VP of R&D at Cirium (RELX PLC) and Principal Engineer at BAE Systems. Award winner (IHC Janes Innovation 2016, Technology 2015). Alex founded Airside Labs to ensure AI systems truly understand their domains through rigorous evaluation and testing.

    Connect on LinkedIn

    Ready to enhance your AI testing?

    Contact us to learn how AirsideLabs can help ensure your AI systems are reliable, compliant, and ready for production.

    Book A Demo