L7 | CHATS

thought leadership

Orchestrating the 98%: The Hidden Architecture of Trustworthy AI

by Vasu Rangadass, Ph.D. | posted on June 29, 2026

When engineers recently took apart one of the most capable AI agents in production, Anthropic’s Claude Code, they found something that runs against how most of us picture artificial intelligence. The part that reasons, the model itself, accounted for roughly 1.6 percent of the system. The remaining 98.4 percent was something else: the permission structures, context management, tool routing, audit, and recovery logic that surround the model and govern what it is permitted to do.

I find that ratio clarifying, because it gives precise shape to something those of us building software for regulated science have understood intuitively for a long time. The intelligence in a working system is rarely the difficult part. The architecture that makes that intelligence reliable, repeatable, and safe to trust is where the real engineering lives. In our field, that architecture deserves the name of “operational harness”.

The deeper lesson in that 98/2 split is about where reliability comes from. A capable model, on its own, is a brilliant improviser. What turns improvisation into something an organization can depend on is the scaffolding around it: the logic that decides what the model is allowed to see, what it is allowed to do, how its work is recorded, and what happens when it makes a mistake. The intelligence proposes. The architecture disposes. Once you see a production AI system that way, a great deal that looked puzzling about enterprise AI starts to make sense.

 

why the operational harness carries more in regulated science

In a regulated environment, that surrounding 98 percent takes on a character it has almost nowhere else. It has to comply with 21 CFR Part 11 and EU GMP Annex 11. It has to carry validated SOPs, an unbroken audit trail, sample lineage, and the instrument connectivity that turns a benchtop reading into a record an inspector will trust. A foundation model can read a chromatogram and propose a deviation response. What it cannot do on its own is guarantee that the response conforms to a validated procedure, or that the data it reasoned over has a provenance that holds up under audit. Those guarantees live in the operational harness, and in our world the harness is asked to do more than in almost any other industry.

This helps explain a gap that has perplexed many observers. McKinsey’s 2025 State of AI survey found that while nearly nine in ten organizations now use AI regularly, only 7 percent have scaled it across the enterprise, and 94 percent report no significant value yet. Gartner expects more than 40 percent of agentic AI projects to be canceled before the end of 2027. Read through the 98/2 lens, the pattern becomes legible. The shortfall is rarely a shortfall of intelligence. In most of these cases, a capable model simply arrived before the architecture capable of putting it to work.

 

reasoning and enforcement are two different jobs

Here is the idea I most want to leave with anyone deploying AI in a regulated setting, because it resolves what first looks like a contradiction. A large language model is probabilistic by design. Ask it the same question twice, and it may answer slightly differently each time. That variability is a virtue in a research assistant and a genuine liability in a system that releases a batch or clears a deviation gate. The tempting response is to try to discipline the model into determinism, which leads nowhere, because the property is intrinsic to how these models work.

The more productive move is to stop fighting the model’s nature and instead divide the labor. Let the model do what it is uniquely good at: reasoning, drafting, proposing, noticing the anomaly a tired human might miss. Then place every consequential action behind a deterministic layer that checks each proposal against validated rules before anything executes. Reasoning and enforcement become two different jobs, performed by two different kinds of system. The model reasons. The harness decides what is allowed to happen.

This separation is the principle on which we built L7|ESP®. L7|ESP is the deterministic layer that holds the validated workflows, the compliance gates, and the audit trail, generated as a natural byproduct of doing the work rather than assembled afterward. L7|SYNAPSE™ is the reasoning layer, where foundation models assemble protocols from validated templates, propose deviation responses, and surface anomalies for a person to weigh. Nothing the reasoning layer produces crosses a compliance threshold until the deterministic layer has cleared it. There is a quiet elegance in the arrangement: the same knowledge graph that produces a continuous, inspector-ready audit trail is the structure that gives the model clean, contextualized data to reason over. One investment serves both the regulator and the machine.

 

the quiet economics of validating once

The same architecture resolves an economic puzzle, and this is the part I find most remarkable. If you ask an AI model to make operational decisions directly, in real time, two costs climb in ways that quickly become unsustainable. The first is straightforward: running a model at production scale is expensive, and the cost grows faster than the number of decisions it handles. The second cost is the one that matters most to anyone in quality. Because a model’s output can vary from one run to the next, its behavior must be validated, and in a GxP setting validation is rigorous, formal work. If the model itself is making regulated decisions, then in principle each of those decisions becomes something you have to validate. That is a burden no organization can carry at scale.

Concentrating the reasoning in a thin layer and the control in a deterministic harness inverts the problem. The harness is deterministic, so it can be validated once, using the computer system validation methods quality teams have refined for decades under frameworks like GAMP 5. A validated harness running validated workflows does not need re-validation each time an agent acts within it, as long as the agent’s output is checked against the harness’s rules first. Validate once, execute millions. The same rigor that makes regulated software demanding becomes the very thing that makes regulated AI affordable at scale.

 

how autonomy is earned

I want to close on the idea underneath all of this, because it is the one that matters most and the easiest to overlook. We tend to talk about AI autonomy as a dial to be turned up as the technology improves. In a regulated system, autonomy works differently. It is earned, incrementally, through demonstrated accountability. The question a regulator, a quality leader, or ultimately a patient needs answered about any consequential decision is simple and unforgiving: how do we know it was right, and if it was not, how do we trace exactly what happened? A benchmark score cannot answer that question. A system that records every action it takes, completely and verifiably, can.

That is why the 98 percent deserves to be seen as the substance of the work rather than its overhead. The model will keep improving on its own, for everyone, year after year. The architecture of trust around it is what each organization builds for itself, and it is what will separate the science that earns the right to move faster from the science that waits.

If you want to go deeper, I develop the full architecture, the economics, and the regulatory foundations in this white paper: Orchestrating the 98%: Why the Operational Harness Will Define Pharma’s Agentic Era.

 

Sources

  1. Shen, Z. et al. (2026). Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems. arXiv:2604.14228. Available at: https://arxiv.org/html/2604.14228v1. Also: VILA-Lab/Dive-into-Claude-Code, GitHub. Analysis of Claude Code v2.1.88 (~512K lines TypeScript); community analysis estimates 1.6% AI decision logic, 98.4% operational infrastructure.
  2. McKinsey & Company. “The State of AI in 2025: Agents, Innovation, and Transformation.” November 5, 2025. Survey of 1,993 participants across 105 nations. Key finding: nearly 9 in 10 organizations use AI regularly, but 94% report not seeing significant value from those investments, and only 7% have fully scaled AI across their organizations. 
  3. Gartner. “Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027.” Press release, June 25, 2025. gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
  4. U.S. Food and Drug Administration. 21 CFR Part 11: Electronic Records; Electronic Signatures. Code of Federal Regulations, Title 21, Part 11. Washington, D.C.: FDA.
  5. European Medicines Agency / European Commission. EudraLex Volume 4, GMP Guidelines, Annex 11: Computerised Systems. Brussels: European Commission. Also: EMA/FDA Joint Release (January 2026). Guiding Principles of Good AI Practice in Drug Development.
  6. ISPE. GAMP 5: A Risk-Based Approach to Compliant GxP Computerized Systems, Second Edition. July 2022. Includes Appendix D11 on AI/ML. ispe.org/publications/guidance-documents/gamp-5-guide-2nd-edition

 

 

FAQs

What is the 98/2 principle in pharmaceutical AI?

It comes from an architectural analysis of Anthropic’s Claude Code, which found that only about 1.6 percent of a production AI system is the actual decision logic, while roughly 98 percent is operational infrastructure such as permissions, context management, audit, and recovery. The principle offers a useful lens: the model is a small part of the system, and the surrounding architecture is what makes it reliable. In regulated pharma, that architecture carries even more weight, because it must also satisfy GxP requirements such as 21 CFR Part 11 and EU GMP Annex 11.

Does the choice of AI model matter for success in pharma?

It matters, though usually less than the architecture around it. Foundation models are highly capable and largely interchangeable, and they improve continuously regardless of which one an organization selects. What most determines whether a pharmaceutical AI deployment succeeds is the operational harness: the validated workflows, governed data, audit trails, and compliance gates that let a model act safely inside a GxP environment. The model is the most replaceable part of the system. The harness is the part an organization builds and owns.

What is an operational harness in a GxP environment?

An operational harness is the deterministic layer of infrastructure that surrounds an AI model and makes it safe to run under regulation. It enforces validated SOPs, maintains audit trails and sample lineage, manages instrument connectivity, and confirms that every action the model proposes falls within validated parameters before it executes. At L7 Informatics, L7|ESP provides this operational harness and L7|SYNAPSE provides the bounded reasoning layer that operates within it.

How does deterministic control work with a probabilistic AI model?

A foundation model is probabilistic, so it can produce slightly different outputs from the same input. The harness keeps that variability away from consequential decisions by separating reasoning from enforcement. The model reasons and proposes a response, and the deterministic harness verifies that response against validated rules before any execution touches a GxP record. The model never gains direct control over actions that carry regulatory or patient-safety weight.

What does “validate once, execute millions” mean?

It describes the economic advantage of concentrating validation in a deterministic harness rather than in the AI itself. Because the harness is deterministic, it can be validated using established computer system validation approaches under frameworks like GAMP 5, and it does not require re-validation each time an agent operates within it, as long as the agent’s outputs are checked against the harness’s rules before execution. This is what makes enterprise-scale AI economically viable in regulated environments.

ABOUT THE AUTHOR

Vasu Rangadass, Ph.D., President & CEO

Vasu Rangadass, Ph.D., is the President and CEO at L7 Informatics, Inc., a leader in life sciences workflow and data management. Previously, Dr. Rangadass was the Chief Strategy Officer at NantHealth, following its acquisition of Net.Orange, the company he founded, to provide an enterprise-wide platform to simplify and optimize care delivery processes in health systems. Before Net.Orange, Vasu was the first employee of i2 Technologies (currently Blue Yonder), which later grew to be a global company that revolutionized the supply chain market through innovative approaches based on the principles of Six-Sigma, operations research, and process optimization.