What AI Readiness Framework Actually Demands: Sean Blanchfield Explains
Enterprise AI is entering its operational phase.
The conversation has shifted from dazzling demos and benchmark scores to something far less glamorous, and far more consequential. Infrastructure. Governance. Operational control. The hard questions about what happens when AI stops experimenting and starts executing.
We wanted to go deeper into what real enterprise readiness actually looks like. Not in theory, but in production. What separates organizations running pilots from those building durable AI capability? What does it take to move from isolated agents to structured, accountable autonomy?
Sean Blanchfield, Co-Founder and CEO of Jentic, argues that the answer lies in developing a formal AI readiness framework. In this discussion, he explains why scaling AI is less about adding agents and more about standardizing workflows, encoding intent, and governing execution at the systems level.
The result is a reframing of AI from innovation theater to infrastructure strategy, and a roadmap for leaders determined to move beyond pilots toward production-grade autonomy.
The concept of pilot purgatory
You describe ‘stalled AI pilots’ as diagnostic gold mines rather than failures. Can you elaborate more on this?
Blanchfield: Most stalled AI pilots are not failures of ambition, intelligence, or effort. They are early-warning systems.
A pilot who stalls has usually progressed far enough to collide with reality. It has moved beyond toy demos and prompt engineering, and has tried to touch real systems, real data, and real permissions. That is where the truth comes out: integration friction, security boundaries, governance requirements, and reliability expectations. These are not surprises to a CTO, but AI pilots have a way of surfacing them very quickly and very bluntly.
What these pilots tend to reveal is important. Integration is still hard. Security and auditability are non-negotiable. Reliability and maintainability matter more than novelty. AI does not make these constraints disappear. In practice, it amplifies them. Organizations are placing probabilistic systems inside deterministic environments, and the mismatch becomes apparent immediately.
The key insight is that pilot purgatory is almost never a model problem – it is an infrastructure problem. If you want durable value from agents, you need a safe, governed way to connect AI into existing systems. You need workflows that enable auditing, replay, and trust. And you need to move away from brittle point integrations toward a platform that can own and evolve this logic over time.
Seen through that lens, stalled pilots are not dead ends – they are telling you exactly where your infrastructure is not yet fit for autonomous systems. That is valuable information, if you choose to treat it that way.
Subscribe to our bi-weekly newsletter
Get the latest trends, insights, and strategies delivered straight to your inbox.
Based on your experience, where in the AI pilot lifecycle does an infrastructure mismatch hurt the most? And why?
Blanchfield: The most damaging point is the transition from isolated demos to real system interaction.
Teams make early pilots look impressive by keeping them tightly constrained. One system. Clean or synthetic data. A human ready to step in when something goes wrong. As soon as the agent is asked to operate across multiple internal systems with varying schemas, permissions, and failure modes, complexity begins to unravel the pilot
A common misconception at this stage is that API access equals capability.
It does not. Raw API access mostly exposes complexity. LLMs can sometimes reason their way through it, but not reliably, not cheaply, and not fast enough for production. The lesson here is an old one: anything that can be deterministic should be deterministic.
For agents, that means shifting away from improvisation and toward intent-oriented, machine-readable workflows that already encode how work is supposed to be done. This includes authentication, permissions, and failure handling.
The symptoms are predictable. Pilots that worked in test environments become flaky in production. Latency and cost spike as brute-force reasoning is applied to routine operations. Security teams intervene because credentials are scattered, access boundaries are unclear, and there is no audit trail.
Deterministic workflows are the unlock that makes agents fast, reliable, auditable, and affordable at scale. The hard problem then becomes how you create and evolve those workflows quickly enough, without turning the process back into a slow, developer-only bottleneck. That is the gap we focus on: using AI to discover, simulate, and formalise successful behaviour into governed execution paths.
Critical failure points
You’ve said you have identified critical failure points in AI pilots. As a leader, could you walk us through them?
Blanchfield: We analyzed over 1,500 well-known APIs and found the same pattern repeatedly: classic integration problems that become unavoidable when the consumer is a machine rather than a human..
- First, foundational compliance. Many APIs are inconsistent, loosely specified, or structurally invalid. Humans can work around this. Automation cannot. If the interface itself is unreliable, things fail before intelligence even comes into play.
- Second, weak developer signals. Specifications often lack examples, clear descriptions, or consistent naming. Human developers fill in the gaps through experience and conversation. Agents do not have that context, and are expected to get it right the first time.
- Third, poor semantic clarity. Even technically correct APIs often fail to express intent, boundaries, or guarantees. Without explicit semantics, agents struggle to plan or choose correctly.
- Fourth, low agent usability. Many APIs are correct but awkward. Overly chatty, brittle in edge cases – that might be tolerable with a human in the loop, but it makes autonomous execution fragile and hard to trust.
- Fifth, security and authentication ambiguity. Permissions, scopes, and safe usage boundaries are frequently implicit or handled out-of-band. With agents, this becomes unacceptable very quickly, and pilots rightly get stopped – especially given the ease with which LLMs can be manipulated into unintended behaviour.
- Finally, discoverability at scale. Large enterprises have hundreds or thousands of APIs. Even when the right capability exists, agents often cannot find it, or cannot tell which option is appropriate.
Taken together, these six points explain most stalled pilots. None of them are failures of models or prompts; they are symptoms of infrastructure that was built for human developers being stressed by autonomous systems that need precision, clarity, and governance by default. We developed a six-dimensional API readiness scoring framework to move the conversation from opinions and guesswork to measurement, turning ‘is our infrastructure ready?’ into a concrete assessment with a clear remediation path.
Given the diversity of enterprise software stacks, how universal are these failure points across enterprises?
Blanchfield: They are close to universal.
We have not seen a large organization that does not encounter these issues once agents move beyond experimentation. The details vary by industry and stack, but the underlying problems are remarkably consistent.
The reason is simple. Most enterprise software estates evolved to serve human developers. Documentation gaps, inconsistent patterns, and implicit assumptions accumulated because humans could compensate for them. That technical debt remained hidden until the consumer changed from a developer to a machine.
Agents expose those weaknesses immediately.
What used to be “good enough” documentation becomes ambiguous. Informal authentication flows become governance risks. Workflows that lived in people’s heads or scattered scripts become impossible to reason about.
Any organization serious about deploying agents needs to audit for this. At a minimum: API readiness, workflow clarity, sandbox availability, and a path toward centralized execution and governance. Without that, pilots tend to remain brittle and isolated.
This is why readiness scoring and sandbox-based evaluation are foundational. They turn vague enthusiasm into a concrete, staged plan grounded in the realities of existing systems.
Infrastructure built for humans vs machines
You argue that most enterprise software was designed for human operators rather than machine agents. What are the most common assumptions in legacy systems that break down in an AI-driven world?
Blanchfield: The core assumption is that the client is a piece of static code written by a well-informed human.
Developers were expected to understand organisational context, intent, and correct usage, even when it was not fully documented. As a result, much of the business logic ended up in client code rather than being centralized.
Agents do not work that way. They do not have institutional memory.
They do not intuit intent. Instead, they rely entirely on what is expressed in machine-readable form. Agents also explore combinations and edge cases that humans never would.
There is a second mismatch around intent. Humans think in outcomes. “Cancel an order.” “Issue a refund.” Historically, that intent was implemented by bespoke client orchestration. With agents, you cannot reliably or economically recreate that orchestration logic on every call.
The architectural shift is to move intent into the platform. Expose higher-level capabilities backed by explicit API workflows, capture correct behaviour once, reuse it many times, and govern it centrally. To make this sustainable, those workflows need to live in open, portable standards.
The bottom line is simple: agents do not need more APIs; they need explicit intent, clear semantics, and validated workflows.
From diagnosis to strategic action
According to you, how can leaders extract value from failed pilots without creating blame?
Blanchfield: The first step is framing. AI pilots are experiments, not delivery milestones. The goal is not to ship an agent; it is to learn where the system holds up and where it does not.
Safe reproduction is important. If an investigation requires live systems, teams become cautious and defensive. Sandboxes that mirror real APIs allow failures to be replayed and explored safely. That shifts the culture from blame to curiosity.
The message from leadership must be consistent: failure is acceptable; unexamined failure is not. When insights lead to concrete improvements, organisations compound learning instead of becoming more hesitant.
The real winners in enterprise AI won’t be the loudest about their pilots. They’ll be the quiet ones who fix workflows, connect systems, create safe experimentation environments, and systematically address infrastructure gaps agents expose.
How should leaders shift from AI capability to an AI readiness framework?
Blanchfield: Capability is about what models can do in isolation. Readiness is about what the organization can safely allow an agent to do inside real systems.
That shift changes the questions leaders ask: not “what can this model do?” but “what can we put on the critical path without waking up the CISO?” It forces attention onto governance, repeatability, and long-term ownership.
Most organizations approach this backwards.
They bolt governance onto AI pilots after the fact. By then, it will require rebuilding everything. The shift is from ‘AI-first’ to ‘governance-first.’ Start with your business needs. Think security, compliance, control, and not with AI. Build AI capabilities on that foundation. Governance must be built in from day one, not bolted on afterwards.
Readiness is fundamentally an infrastructure problem. Enterprises do not need to rip out their stacks; they need a unified enablement layer above them. A managed gateway where workflows are orchestrated, observed, and controlled. An API sandbox so workflows can be continuously developed, tested, adapted, and improved (in due course, at-scale, by autonomous agents).
The organizations that get this right will insist on owning their workflows as first-class assets, expressed in open standards. That avoids lock-in, reduces risk, and ensures the core logic of how the business runs does not leak into opaque systems or prompts. CTOs lean in when we explain that you don’t need to replace your existing tools. You need something that sits on top of your existing stack.
Broader perspective
How do you see the role of AI advisory councils or industry standards evolving to help organizations avoid pilot purgatory?
Blanchfield: Standards are moving from abstract best practice to practical risk management.
As agents automate more work, workflows become how work is done. For an agent, a workflow is the equivalent of a standard operating procedure. Without standards here, organisations are forced into incredibly risky lock-in.
There is also a fragmentation risk. Different vendors embedding their own workflow engines across the enterprise leads to logic scattered across proprietary stacks. That makes end-to-end automation and governance almost impossible.
Workflows encode business process capability, one of the last competitive moats.
They are core intellectual property. If everything runs through ChatGPT, Claude, or Copilot, you pay rent forever as migration costs become prohibitive and your company becomes an expensive AI wrapper. Sensible leaders want autonomy to evolve them, and certainty that they won’t be used to train their vendor’s AI.
Standards like Arazzo aren’t just about interoperability; they’re about sovereignty. They give organizations inspectable, portable representations of how work gets done, supported by a broad ecosystem. With open standards, every on-ramp comes with a matching off-ramp. For us, we didn’t just adopt these standards; we hired the people who wrote them. We’re authors of the Arazzo specification and architects of the modern web. It’s our engineering DNA.
The strategic point is this: you must maintain ownership of the business logic that differentiates you. Open standards ensure you can evolve your workflows without vendor permission and without being locked into a single AI provider’s ecosystem.
Mentorship/Advice on AI readiness framework
If you had to advise a CTO or a leader who’s about to start their first AI pilot, what are the three most overlooked infrastructure factors to address upfront?
Blanchfield: First, API readiness. Most organizations already have more capability than they realize. The question is whether those APIs are usable by agents. Clear semantics, consistent structure, explicit authentication. Measure it – do not guess. API readiness scoring gives you measurement, not opinions. It turns ‘we think we’re ready’ into ‘here’s our score across six dimensions, and here’s the remediation path.’
Second, feasibility testing before ambition. Sandboxes and readiness scorecards allow teams to identify what is possible today versus what needs investment. Let agents explore safely, and reality will assert itself very quickly.
Third, centralized execution and security. Agents multiply fast. Ad hoc integrations do not scale. Credentials, permissions, and observability need to be managed by a single control plane.
The common thread is restraint. Make your existing infrastructure agent-ready, put the right controls around it, and let agents earn their way into production through evidence, not optimism. High-stakes data needs high-stakes architecture. We built Jentic for production environments serving regulated industries such as finance, healthcare, and government.
Looking ahead
How do you see agent-driven architectures reshaping enterprise software design over the next 3 – 5 years?
Blanchfield: Enterprise software will not be replaced by agents. It will be reshaped.
At design time, agents are already transformative. In production, free-running reasoning is usually too slow, too expensive, and too unreliable. We believe that most production agents will plan and route back to deterministic workflows, not improvise in front of customers or with customer data.
That drives a shift from code to workflows. Any logic that can be deterministic should be. APIs expose primitives. Workflows capture intent. Agents discover and invoke them safely.
Practically, this means APIs first, workflows on top. Greenfield systems should expose primitive capabilities through well-defined OpenAPI interfaces. Higher-level, intent-oriented behaviour should be captured as workflows (expressed in open formats like Arazzo) rather than embedded in custom clients. Those workflows become reusable building blocks that agents can discover, invoke, and compose safely.
The long-term effect is that process knowledge moves out of application code and into data. Enterprises accumulate a growing ecosystem of workflows around their APIs, and agents continuously enrich that ecosystem by identifying new patterns and refining existing ones. You get a compounding loop of better reliability, lower cost, and faster adaptation, without giving up determinism or control.
If you were writing this op-ed again in two years, what do you think would have changed? And what stubborn problems would still remain?
Blanchfield: The tone will be more operational. AI automation and agents will be normal. The question will be where they are allowed to act, and under what controls.
One clear change will be how agents are used across the lifecycle. We expect “full-fat” agents to be ubiquitous at design time: generating and maintaining APIs, authoring and repairing workflows, and exploring feasibility inside sandboxes. In production, by contrast, most agents will be “low-fat”: handing off execution to deterministic or semi-deterministic workflows rather than reasoning step by step on the critical path.
Workflows themselves will become the core unit of progress. Success will be measured less by the number of agents deployed and more by workflow coverage, reuse, and promotion rates, and by how much execution has moved from ad hoc LLM reasoning into validated, deterministic paths. Business logic will increasingly be captured as data rather than scattered across application code.
We also expect much of today’s standards churn to settle. OpenAPI, Arazzo, and MCP will likely be widely accepted defaults, and proprietary workflow formats will be viewed as a strategic risk rather than a convenience. The debate will move away from whether to adopt standards and toward how well they’re implemented in practice. Organizations that maintain sovereignty over their workflows, through open standards, will have the strategic flexibility to evolve with the market. Those locked into proprietary formats will be paying permanent rent to their infrastructure vendors. Alongside that, explicit pipelines from sandbox to production, like simulation, validation, and promotion, will become first-class infrastructure rather than bespoke processes.
What won’t change is just as important. Integration will remain the bottleneck. Legacy APIs, undocumented behaviour, and brittle contracts will still dominate timelines. AI will expose that integration debt faster, but it won’t make it disappear. Security and governance will also remain non-negotiable. Credentials, permissions, and auditability will continue to gate what reaches production, and centralised control planes will remain essential.
The tension between determinism and autonomy will persist as well. There will always be pressure to “just let the agent handle it,” and a countervailing need to draw clear architectural boundaries. The principle will hold that anything that can be deterministic should be deterministic, and deciding where that line sits will remain a leadership responsibility.
The enduring risk is that leaders still underestimate what true readiness requires. Many organizations will still start with models and demos and only confront infrastructure and workflow ownership late in the process.
The bottom line won’t change. AI will reshape how software is built and evolved. But production value will accrue to organizations that own their workflows, treat AI as infrastructure rather than magic, and apply agents where they add leverage and not unnecessary risk.
In essence
Through this conversation, Sean Blanchfield makes one thing unmistakably clear: the future of enterprise AI will not be defined by how many agents an organization deploys, but by how solid and well-governed the foundation beneath those agents truly is. Integration debt, unclear workflows, and governance gaps will remain the real bottlenecks to scale.
The enterprises that win will be those that treat AI as core infrastructure, not just another tool. They will need to own their workflows, enforce clear standards, and build AI systems that are responsible, secure, auditable, and designed for long-term control – not short-term experimentation.