21 CFR Part 11 for AI: a practical checklist

21 CFR Part 11 was written in 1997. Generative AI was not on the FDA's mind at the time. The regulation governs electronic records and electronic signatures — and it applies to AI systems used in any GxP context, even though the language of the rule predates LLMs by 25 years. The good news: with the right scaffolding, validating an AI feature for Part 11 is mostly engineering work, not legal work.

This is the checklist we run before code review on every regulated pharma engagement. It's not legal advice, and your QA team has the final say. But if you tick every box, you'll walk into your validation review with very few open issues.

The three things the FDA actually cares about

Strip away the legalese and Part 11 boils down to three principles:

Authenticity — the record came from the person or system claimed
Integrity — the record hasn't been changed, or any changes are themselves recorded
Confidentiality — access is limited to authorised personnel

For an AI feature in a regulated workflow, you have to demonstrate all three for every record the AI touches — inputs, model outputs, human reviews, downstream actions.

The pre-build checklist

Before you write code

Confirm the GxP scope: is this feature touching GMP, GLP, GCP, or GPvP data?
Document the intended use, with specific user roles and decisions the AI informs
Risk-classify the AI's role: advisory only, decision support, or autonomous action
Identify all electronic records the AI will produce, consume or modify
Agree the validation strategy with QA before development starts, not after

The last point is the one teams skip and regret. If your QA function joins after the build is done, you'll spend three months retrofitting documentation that should have been written as you went.

Identity and access (§11.10 d, g, h)

Every action against the AI system must be tied to an authenticated user. Service accounts are allowed for system-to-system communication, but human-initiated actions need a real identity.

Unique user IDs, no shared accounts
Role-based access control with documented role definitions
Password complexity and rotation policy aligned with your IT baseline
MFA for any privileged action (prompt edits, model version pinning, eval changes)
Session timeout that matches your IT policy — usually 15 minutes for clinical systems

For AI workloads specifically, treat prompt-editing as a privileged action. Anyone who can change a system prompt can change the system's behaviour — that's the equivalent of editing production code in a non-AI system.

Audit trails (§11.10 e)

This is the meatiest section for AI. Every record the system creates, modifies or deletes needs an immutable audit trail capturing what happened, who did it, and when.

For AI-specific records, your audit trail needs at minimum:

The exact input the model received — raw and after any pre-processing
The model version (the provider's snapshot ID, not just "GPT-4")
The system prompt and any retrieved context — referenced by content hash
The model's full response, including any intermediate reasoning
The decision or output that was actually used downstream
The human reviewer who saw the output, what they did, and why

Write this trail to a write-once data store. Object-lock S3, an immutable database, or a purpose-built validation system all work. Don't store it in the same database as the application data — separation of concerns is your friend during inspection.

Electronic signatures (§11.50, §11.70, §11.100, §11.200)

If the AI produces an output that a person signs off on, that signature is regulated. The bar is high:

At least two distinct authentication components (typically password + token)
The signature is bound to the specific record it signs — you can't reuse signatures
The signature manifestation must show: signer's name, the date and time, the meaning of the signature (e.g., "reviewed and approved")
It must be clear from the record itself, not from a separate audit log

For AI features, the signature pattern that works is: AI generates a draft → human reviews → human signs to approve. Never let the AI's signature stand alone for any GxP decision.

Validation (§11.10 a)

"Validation" in Part 11 means demonstrating that the system does what it's supposed to do, consistently. For traditional software this is straightforward. For AI it's harder — because AI systems can behave non-deterministically and their behaviour changes when underlying models are updated.

The validation strategy we use has four layers:

Functional validation — the system meets its documented requirements. Standard test scripts.
Performance validation — the model meets accuracy / precision / recall thresholds on a golden dataset, with documented sampling methodology.
Boundary validation — the system behaves correctly on edge cases, including adversarial inputs and out-of-distribution data.
Change-control validation — any change to model, prompt, or retrieval triggers a documented re-validation, with delta testing against the previous baseline.

Your golden dataset is the single most important asset for validation. Build it deliberately. Version it. Don't let it leak into training data. Treat it the way you'd treat a control sample in a clinical trial.

Change control (§11.10 k)

This is where AI systems trip up teams that have only validated traditional software. Models change. Provider snapshots get deprecated. Prompts get tweaked. RAG indexes get refreshed. Each of these is a change that needs control.

Our rule of thumb: anything that can change the system's output is a controlled artifact. That means:

Model versions are pinned, and version bumps go through change control
System prompts are content-addressable artifacts — referenced by hash, stored immutably
RAG indexes have versioning and a documented re-indexing procedure
Tool definitions and external API integrations are documented and version-pinned

Yes, this slows you down. Yes, it's the right way to do it. The teams that skip this step end up explaining to an inspector why a regulated decision made yesterday can't be replayed today.

System documentation

Inspectors expect a documentation set. The minimum we deliver on every regulated AI build:

System overview and intended use document
Risk assessment with mitigation plan
Functional and design specifications
Validation plan, scripts, and reports
Standard operating procedures for: model updates, prompt changes, incident response, periodic review
User training materials with sign-off log

This documentation lives with the QA team, not just in your engineering wiki. If an inspector asks for the validation report, you should be able to hand it over the same day.

The first-pass test

Here's the question we ask ourselves at the end of every regulated AI build: if an FDA inspector walked in tomorrow and asked us to demonstrate how this AI feature works, can we produce — within four hours — the documentation, audit trail, validation evidence, and replay capability to walk them through it end-to-end?

If the answer is yes, you're in good shape. If it's "give us a week," you're not.

The good news: getting from "give us a week" to "yes, within four hours" is engineering work — versioning, immutable storage, structured logging, and discipline. None of it is research-grade. All of it is teachable. And it's the difference between passing validation on first submission and spending a quarter answering remediation requests.

Validating AI in pharma?

We've shipped GxP-aware AI to multiple pharma clients.

Book a 30-minute discovery call — we'll walk through your validation strategy and flag any gaps.

Book a Discovery Call