They're Using AI to License Reactors. Is Anyone Accounting for the AI?
The nuclear industry invented materials control and accounting (MC&A). It is now using AI to write reactor license applications with no MC&A for the AI. This is that gap, and why it matters.
Last week, the Department of Energy published a milestone announcement. Working with Idaho National Laboratory, Argonne, Microsoft, and a startup called Everstar, DOE used an AI system called Gordian to convert a Preliminary Documented Safety Analysis for a Generic High Temperature Gas Reactor into sections equivalent to an NRC license application.
The output: 208 pages. The time: one day. Typical timeline for a human team doing the same task: four to six weeks.
The headline is real. Licensing document preparation is a genuine bottleneck. And expert capacity is genuinely constrained. AI tools that can compress weeks into days (hopefully, with human validation) are worth having, and DOE is right to pursue them.
But the announcement contains a single sentence that deserves more attention than it has received:
“A reviewing agent will evaluate AI-generated documents against NRC guidance to validate that they are ready for submittal.”
Ok, an agent, not a person. An AI system that evaluates the output of another AI system, then produces a judgment about whether a nuclear licensing document is ready to go to the Nuclear Regulatory Commission (NRC).
This is probably fine. Gordian appears to be a bounded, well-engineered tool with a narrow scope, human expert validation at the output stage, no indication of persistent memory or autonomous execution. The reviewing agent is likely similarly constrained. The human expert who reads the final output and signs off is doing real work, not theater.
But here is the question: how do we know? What is the formal basis for the claim that Gordian is a bounded tool rather than something with broader operational scope? Where is the declared architecture? What are the controls on the reviewing agent’s persistence, memory, and tool access? If logging fails, for example if the chain of verifiable knowledge about what the AI system did and why is interrupted, what happens?
In nuclear material accountancy, loss of continuity of knowledge is not an inconvenience. It is a trigger condition which causes operations pause or halt. Controls must be re-established and the material balance is reconciled before anyone proceeds. At least we teach this in class.
For the AI systems now being used to generate that material balance’s documentation, there is no equivalent protocol. There is a human who reads the output. That is not a safeguards regime.
The Structural Irony
Nuclear regulation exists because the industry learned, at considerable cost, that expert review and good intentions are not sufficient controls for high-consequence systems. The entire architecture of International Atomic Energy Agency (IAEA, the nuclear watchdog) safeguards, from material balance areas, key measurement points, continuity of knowledge, shipper-receiver reconciliation, independent inspection, was built on a single premise: trust is not a control. Verification is a control.
The NRC operates on the same logic. Reactor safety cases are not accepted on the basis that the engineers seemed careful. They are accepted on the basis of documented analysis, independent review, and a chain of evidence that an external party can audit.
We are now using AI systems to generate that chain of evidence. And the AI systems doing the generating are not subject to any analogous accountability framework.
This is not a criticism of DOE or Everstar. The deployment described in the announcement appears thoughtful. The semantic ontology approach, such as ensuring outputs are computed and verified rather than inferred, reflects the right instinct. The human validation step definitely matters.
The gap is not in this specific deployment. The gap is that the field has no formal basis for distinguishing this deployment from a riskier one. The controls that make Gordian acceptable are implicit and undeclared. The architectural properties that keep it within manageable scope are not documented, not inspected, and not accountable to any external party. If a future system has broader memory, more autonomous execution, or less rigorous human oversight, there is currently no framework that would detect the difference before the license application is submitted.
That is how fields accumulate risk: not through bad actors, but through the gradual normalization of controls that were never formalized in the first place.
What Accountancy for AI Would Require
Nuclear material accountancy works because it is built around a small number of conserved quantities and a defined set of events that govern how those quantities change. The logic is simple and somewhat brutal: define what you are controlling, define the boundaries within which you control it, define the events that move it across those boundaries, measure the discrepancy between what you declared and what you observed, and set alarm thresholds based on that discrepancy. Then inspect.
The same structure can be applied to AI systems. But it requires defining the right primitives, and the right primitives are not model weights or benchmark scores.
The controlled quantity is the combination of: a system’s persistent identity (its verifiable, cryptographically-bound operational handle across sessions and reboots), its capability state (what it can currently do: memory scope, tool access, replication authority), and its operational lineage (the complete parent-child history of how it came to exist and how its state evolved). These three objects, together, are what an accountancy regime needs to track.
The controlled events are equally specific: a new agent is created (genesis), one agent becomes two (fork), two agents are retired and one new one is created (merge), or an agent is retired with no successors (termination). Every other state transition is either a declared event or an anomaly.
From these primitives, a material balance equation can be written. Measurement uncertainty can be propagated through it. Statistically grounded alarm thresholds (the equivalent of the Shipper-Receiver Difference in nuclear accountancy) can be derived, specifying when a discrepancy is large enough to require investigation rather than routine reconciliation.
Continuity of Knowledge becomes a formal, measurable quantity: a function that decays when logging fails and resets when independent attestation occurs. Below a defined threshold, the system’s operational trust state degrades automatically and not because an operator decided to restrict it, but because the control architecture enforces it.
This is the framework nuclear gave the world for material. Can it apply, with translation, to AI systems? What does not yet exist is an institution that has declared it a standard and an industry that has committed to implementing it.
Why Nuclear Specifically Should Know Better
The Genesis Mission framing in the DOE announcement — “move boldly,” “transform how industry prepares regulatory submissions” — is a political signal, not a technical one. Speed is the objective. Accountancy is not mentioned.
This is the condition that produces serious incidents. Not necessarily because anyone is acting in bad faith, but because controls that were never formalized cannot be audited, or inspected, or enforced until something goes wrong and people work backward to find out what was missing.
The nuclear industry is, of all industries, the one that should recognize this pattern. It has experienced it with physical materials. It has spent decades building the frameworks that prevent recurrence. It now finds itself deploying AI into its most consequential document processes such as licensing applications, safety analyses, regulatory submissions while treating the AI systems doing that work as outside the scope of those frameworks. And we see people using LLM agents to generate regulatory compliance documents without actual understanding of their facilities details or specifics of operation.
At some point, an NRC reviewer will ask about the provenance of an AI-generated license section. They will want to know which version of the model produced it, whether the tool’s configuration has changed since the document was generated, and whether the reviewing agent’s outputs can be independently verified. The current answer to all three questions is: we don’t have a formal basis for knowing.
That answer will not survive contact with a serious licensing dispute, a safety finding, or a congressional inquiry. The question is only whether the framework exists before that moment or after it.
The nuclear industry invented material control and accountability because it had to. It will develop AI accountability for the same reason, but probably reactively, expensively, and later than it should have.

