Scaling Agentic AI: When AI Takes Action, the Real Challenge Begins
For the last few years, enterprise AI has mostly lived in two worlds: prediction and generation. Prediction helped organizations forecast demand, detect fraud, and optimize pricing.
Generation accelerated knowledge work through drafting, summarizing, and creating content. Now a third phase is moving from hype to reality: Agentic AI, where systems can plan, decide, and execute through production tools and workflows. This shift fundamentally changes what leaders must evaluate. The question is no longer only about whether a model produces a strong response. It is whether an enterprise can safely authorize a system to take action – and then demonstrate that those actions were appropriate, controlled, auditable, and measurable.
In this conversation, Tejas Gajjar, a Technology leader at Macy’s, Inc., explains where agentic AI exposes hidden operational gaps and what leaders should build to scale with confidence. For readers, this interview is not a speculative discussion – it’s a practical lens into what’s coming next.
Whether you are a CTO, an IT architect, or a business leader navigating AI adoption, the insights offer a rare look at how leading enterprises can prepare for AI that doesn’t just assist, but acts.
The hidden gaps agentic AI exposes
Many organizations have moved from predictive AI to generative AI. What operational gaps become most visible when enterprises begin experimenting with agentic AI that can take action rather than just generate output?
Gajjar: The first gap is that many enterprises still treat AI like a feature, while agentic AI behaves like an operator. A generative system can be wrong and still be manageable. A draft gets edited. A summary gets corrected. Agentic systems differ because an incorrect action can trigger downstream impacts across production systems, customer experiences, or financial workflows.
The biggest gaps show up in the enterprise control plane:
- Identity and entitlement hygiene, including who the agent acts as and what it is permitted to do
- End-to-end auditability, including the ability to reconstruct intent, decision, tool calls, and outcomes
- Operational telemetry that provides real-time visibility into actions, errors, and abnormal behavior
- Change management maturity, including approvals, rollback design, and incident response readiness
Organizations often underestimate tool risk. The model is only one part of the decision chain. The real exposure comes from the tools and APIs the agent can call. If those are loosely governed, the agent becomes privileged automation moving faster than human oversight can keep up.
“Agentic AI does not just stress models. It stress-tests the enterprise control plane.”
Autonomy, reversibility, and risk in mission-critical systems
As AI systems gain autonomy, how should CTOs think about reversibility, controls, and risk, especially in customer-facing and mission-critical environments?
Gajjar: I would frame the problem in terms of reversibility. In mission-critical environments, autonomy should never be introduced without the ability to quickly stop, contain, and unwind actions.
I recommend layered controls that look familiar to teams with strong operational discipline:
Subscribe to our bi-weekly newsletter
Get the latest trends, insights, and strategies delivered straight to your inbox.
- Circuit breakers and kill switches that can halt actions immediately when signals go abnormal
- Least-privilege permissions scoped to task, time, and environment
- Approval gates for high-impact actions, including financial commitments, customer-facing changes, and production modifications
- Canary execution and simulation, so actions are tested in lower-risk environments before reaching production scale
- Immutable audit trails that capture inputs, tool calls, approvals, and outcomes
In my view, reversibility is not something to add later. It is a design requirement. Without it, organizations trade short-term speed for long-term fragility.
“Autonomy without reversibility is not innovation. It is accumulated risk.”
Designing an effective AI operating model
What does an effective AI operating model look like in practice when balancing speed, governance, and measurable business value?
Gajjar: I would describe an effective AI operating model as a system that makes the right work easy and repeatable. I would break it into three layers that must move together.
1. Data and technology foundation
Agentic AI requires reliable data, secure access, and strong observability. If data quality is inconsistent and telemetry is incomplete, autonomy turns into uncertainty.
2. Portfolio and prioritization discipline
Leaders need a clear method to select use cases based on business value, feasibility, risk class, and time-to-impact. The operating model should enforce stage gates and stop low-value projects early.
3. People, adoption, and governance
Governance should be built into delivery through reusable patterns, reference architectures, and pre-approved controls. When guardrails are standardized, teams move faster because they no longer have to debate the same risk questions repeatedly.
“Governance should feel like a paved road, not a roadblock.”
Why AI pilots fail, and how to scale what works
Why do so many AI pilots stall at the demo stage, and what portfolio and prioritization disciplines help organizations scale AI successfully?
Gajjar: Many pilots are optimized to impress rather than endure. Demos often succeed with ideal inputs, relaxed controls, and limited accountability. Scaling requires production-grade integration, monitoring, and ownership.
Four common reasons why pilots stall:
- No production ownership, meaning no one is accountable for reliability, drift, or ongoing performance
- No measurable business metric, which makes it impossible to justify broader investment
- Hidden integration debt in data pipelines, workflow orchestration, and downstream system dependencies
- Unclear risk posture, where leaders hesitate because controls and approvals were never defined
My recommendation is to run AI as a portfolio with stage gates:
Prototype, controlled pilot, limited rollout, then enterprise scale. Each stage should earn the next through measurable performance, cost realism, and safety evidence.
I also emphasize that cost models must include more than inference. Scaling agentic AI includes orchestration overhead, monitoring and compliance, data movement, and incident response design.
“Scaling AI is not a model upgrade. It is an operating model upgrade.”
Trust, accountability, and responsible AI adoption
From a leadership perspective, how should CTOs prepare teams for AI adoption so trust, accountability, and responsible use are built in from the start?
Gajjar: I believe trust is the core adoption constraint. Teams adopt tools quickly, but they adopt accountability more slowly. Leaders need to build clarity into the program from the beginning.
I recommend:
- Role-based AI literacy so that engineers, analysts, product leaders, and operators have guidance that fits their responsibilities
- Clear ownership for production behavior, including who is accountable when the system makes a mistake
- Defined escalation paths for failures, policy violations, and customer-impacting issues
- Responsible-use playbooks by domain, such as customer service, pricing, supply chain, and security operations
- A culture of verification where AI actions are validated like any other system behavior
I also stress a leadership habit that matters in practice: disciplined measurement. When leaders insist on outcomes, traceability, and transparent controls, teams gain confidence, and adoption becomes sustainable.
Making Agentic AI production-ready
What does production-ready observability look like for agentic AI?
Gajjar: Observability must cover the full chain, not just model performance. Teams should be able to trace prompts, context, tool calls, policy decisions, approvals, and downstream outcomes.
I recommend tracking:
- Action success rates and rollback frequency
- Policy violations and blocked actions
- Latency and cost per workflow
- Quality signals tied to business outcomes
- Drift indicators for data, behavior, and performance
Agentic AI introduces failure modes that can appear plausible on the surface. Without traceability and real-time signals, organizations are forced to guess, and guessing is not an operating strategy.
“If you cannot trace it end-to-end, you cannot trust it in production.”
Architectural patterns for scalable, non-fragmented AI
What architectural patterns help CTOs scale AI across the enterprise without creating fragmentation?
Gajjar: I would support a unified platform approach so teams can reuse consistent components and guardrails, rather than rebuilding them in silos.
Patterns I would like to highlight:
- Domain-owned data products with clear contracts and stewardship
- Reusable agent frameworks with policy enforcement embedded
- Central identity and entitlement integration to enforce least-privilege execution
- A common telemetry and audit layer across teams and environments
- Platform-led enablement so product teams focus on outcomes rather than rebuilding foundational controls
I see fragmentation as both expensive and risky. Standardization is how speed becomes sustainable across the enterprise.
The future of enterprise AI
According to you, how will the AI world look? What are your predictions for the future?
Gajjar: I expect enterprises to move from standalone assistants to coordinated agents embedded into core workflows. I believe the differentiator will not be access to a specific model. It will be the ability to operationalize AI safely, consistently, and measurably.
My predictions include:
- Agentic systems become a workflow layer across operations and customer experiences, tightly integrated with enterprise systems
- Multi-modal capabilities become standard because real work spans text, images, voice, and structured signals
- Policy-driven execution expands as organizations automate compliance checks and enforce controls through tooling
- Infrastructure becomes more important as latency, cost, and throughput determine what is feasible at scale
- Trust becomes a competitive advantage, especially in regulated and customer-facing environments
What advice would you like to give to future tech or business leaders?
Gajjar: My advice is grounded in fundamentals:
- Build systems thinking so that strategy, architecture, operations, risk, and people are connected
- Measure value relentlessly and tie innovation to business outcomes
- Invest in data quality, security, reliability, and observability because these fundamentals compound
- Lead adoption as a change program, not only a technology rollout
- Treat responsibility as a design requirement so accountability and transparency are built into delivery
In brief:
The organizations that scale successfully won’t be those with the most advanced models, but those that build the strongest operational foundation beneath them.
As Gajjar makes clear, the real work begins when AI takes action. What separates experimentation from enterprise readiness is the discipline to design for reversibility, the rigor to measure what matters, and the maturity to treat governance as infrastructure, not oversight. In production environments where reliability is essential, that discipline becomes a catalyst, turning AI from experimentation into sustained, trustworthy impact.