21 CFR Part 11 was written in 1997. Generative AI was not on the FDA's mind at the time. The regulation governs electronic records and electronic signatures — and it applies to AI systems used in any GxP context, even though the language of the rule predates LLMs by 25 years. The good news: with the right scaffolding, validating an AI feature for Part 11 is mostly engineering work, not legal work.
This is the checklist we run before code review on every regulated pharma engagement. It's not legal advice, and your QA team has the final say. But if you tick every box, you'll walk into your validation review with very few open issues.
The three things the FDA actually cares about
Strip away the legalese and Part 11 boils down to three principles:
- Authenticity — the record came from the person or system claimed
- Integrity — the record hasn't been changed, or any changes are themselves recorded
- Confidentiality — access is limited to authorised personnel
For an AI feature in a regulated workflow, you have to demonstrate all three for every record the AI touches — inputs, model outputs, human reviews, downstream actions.
The pre-build checklist
- Confirm the GxP scope: is this feature touching GMP, GLP, GCP, or GPvP data?
- Document the intended use, with specific user roles and decisions the AI informs
- Risk-classify the AI's role: advisory only, decision support, or autonomous action
- Identify all electronic records the AI will produce, consume or modify
- Agree the validation strategy with QA before development starts, not after
The last point is the one teams skip and regret. If your QA function joins after the build is done, you'll spend three months retrofitting documentation that should have been written as you went.
Identity and access (§11.10 d, g, h)
Every action against the AI system must be tied to an authenticated user. Service accounts are allowed for system-to-system communication, but human-initiated actions need a real identity.
- Unique user IDs, no shared accounts
- Role-based access control with documented role definitions
- Password complexity and rotation policy aligned with your IT baseline
- MFA for any privileged action (prompt edits, model version pinning, eval changes)
- Session timeout that matches your IT policy — usually 15 minutes for clinical systems
For AI workloads specifically, treat prompt-editing as a privileged action. Anyone who can change a system prompt can change the system's behaviour — that's the equivalent of editing production code in a non-AI system.
Audit trails (§11.10 e)
This is the meatiest section for AI. Every record the system creates, modifies or deletes needs an immutable audit trail capturing what happened, who did it, and when.
For AI-specific records, your audit trail needs at minimum:
- The exact input the model received — raw and after any pre-processing
- The model version (the provider's snapshot ID, not just "GPT-4")
- The system prompt and any retrieved context — referenced by content hash
- The model's full response, including any intermediate reasoning
- The decision or output that was actually used downstream
- The human reviewer who saw the output, what they did, and why
Write this trail to a write-once data store. Object-lock S3, an immutable database, or a purpose-built validation system all work. Don't store it in the same database as the application data — separation of concerns is your friend during inspection.
Electronic signatures (§11.50, §11.70, §11.100, §11.200)
If the AI produces an output that a person signs off on, that signature is regulated. The bar is high:
- At least two distinct authentication components (typically password + token)
- The signature is bound to the specific record it signs — you can't reuse signatures
- The signature manifestation must show: signer's name, the date and time, the meaning of the signature (e.g., "reviewed and approved")
- It must be clear from the record itself, not from a separate audit log
For AI features, the signature pattern that works is: AI generates a draft → human reviews → human signs to approve. Never let the AI's signature stand alone for any GxP decision.
Validation (§11.10 a)
"Validation" in Part 11 means demonstrating that the system does what it's supposed to do, consistently. For traditional software this is straightforward. For AI it's harder — because AI systems can behave non-deterministically and their behaviour changes when underlying models are updated.
The validation strategy we use has four layers:
- Functional validation — the system meets its documented requirements. Standard test scripts.
- Performance validation — the model meets accuracy / precision / recall thresholds on a golden dataset, with documented sampling methodology.
- Boundary validation — the system behaves correctly on edge cases, including adversarial inputs and out-of-distribution data.
- Change-control validation — any change to model, prompt, or retrieval triggers a documented re-validation, with delta testing against the previous baseline.
Your golden dataset is the single most important asset for validation. Build it deliberately. Version it. Don't let it leak into training data. Treat it the way you'd treat a control sample in a clinical trial.
Change control (§11.10 k)
This is where AI systems trip up teams that have only validated traditional software. Models change. Provider snapshots get deprecated. Prompts get tweaked. RAG indexes get refreshed. Each of these is a change that needs control.
Our rule of thumb: anything that can change the system's output is a controlled artifact. That means:
- Model versions are pinned, and version bumps go through change control
- System prompts are content-addressable artifacts — referenced by hash, stored immutably
- RAG indexes have versioning and a documented re-indexing procedure
- Tool definitions and external API integrations are documented and version-pinned
Yes, this slows you down. Yes, it's the right way to do it. The teams that skip this step end up explaining to an inspector why a regulated decision made yesterday can't be replayed today.
System documentation
Inspectors expect a documentation set. The minimum we deliver on every regulated AI build:
- System overview and intended use document
- Risk assessment with mitigation plan
- Functional and design specifications
- Validation plan, scripts, and reports
- Standard operating procedures for: model updates, prompt changes, incident response, periodic review
- User training materials with sign-off log
This documentation lives with the QA team, not just in your engineering wiki. If an inspector asks for the validation report, you should be able to hand it over the same day.
The first-pass test
Here's the question we ask ourselves at the end of every regulated AI build: if an FDA inspector walked in tomorrow and asked us to demonstrate how this AI feature works, can we produce — within four hours — the documentation, audit trail, validation evidence, and replay capability to walk them through it end-to-end?
If the answer is yes, you're in good shape. If it's "give us a week," you're not.
The good news: getting from "give us a week" to "yes, within four hours" is engineering work — versioning, immutable storage, structured logging, and discipline. None of it is research-grade. All of it is teachable. And it's the difference between passing validation on first submission and spending a quarter answering remediation requests.
We've shipped GxP-aware AI to multiple pharma clients.
Book a 30-minute discovery call — we'll walk through your validation strategy and flag any gaps.
Book a Discovery Call