Agentic AI has moved from pilot programs into routine clinical production. This article examines what that shift looks like across diagnostics, medical devices, and care delivery. From compressing antibiotic testing timelines to portable life-support devices that make autonomous real-time decisions, it outlines the technical, governance, and workflow foundations healthcare leaders need to scale AI that does not just advise, but acts.
If HIMSS 2026 demonstrated anything, it is that agentic AI has officially moved from the pitch deck into clinical workflows. In healthcare, where administrative burden is high and the cost of delayed clinical decisions is measured in patient outcomes, that transition is already visible on the ground. For years, the central question for healthcare technology leaders was whether to adopt AI at all. That question has largely been answered. The harder question that has replaced it is how to make AI do something that materially matters — at scale, in production, without breaking the very clinical workflows it was supposed to improve.
The answer the industry is converging on is agentic AI: systems that do not just surface a recommendation and wait, but autonomously execute the next step, and the one after that. In this paradigm, AI functions as an actor, not merely an advisor. In Medtech and diagnostics, engineering teams at the intersection of product development and clinical AI are already building production systems that show what this looks like in practice — compressing antibiotic susceptibility testing timelines from 16 hours to four, stratifying patient populations for urgent procedures using existing blood data, and embedding autonomous decision-making into portable life-support devices.
The difference between AI that advises and AI that executes may sound incremental. In practice, it changes almost everything about how healthcare organizations need to think about deployment, data infrastructure, governance, and what clinical transformation looks like when it is actually working. As of mid-2024, 950 AI/ML-enabled medical devices had been cleared or approved by the FDA for clinical use, according to a peer-reviewed analysis published in JAMA Health Forum. That figure reflects not only regulatory momentum but also a market that has decisively moved past the question of whether AI belongs in clinical settings. The more difficult question now is how to build agentic AI that executes reliably, integrates without friction, and scales without creating governance exposure. Organizations that understand what this requires are already pulling ahead of those still waiting to find out.
Where Agentic AI Is Already Changing Clinical Outcomes
The data was there all alongOne of the most instructive cases in AI-enabled diagnostics begins with a problem that sounds almost deceptively simple: a backlog of colonoscopies.
After a prolonged period of disruption to routine care, a large segment of the population missed recommended colorectal cancer screening. The issue was not whether colonoscopies were clinically necessary, but how to allocate limited procedural capacity to the patients who needed it most urgently, without adding new infrastructure or collecting new data.
The solution emerged from blood tests already performed in everyday practice. By applying AI to longitudinal hematology data routinely captured during care — specifically in patients over 50 who had undergone at least two blood tests within a twelve-month window — clinicians could stratify that population by clinical urgency. Some patients did not require immediate colonoscopy, while others needed it as soon as possible. AI made it possible to distinguish between these groups with precision and to direct scarce endoscopy capacity where it would have the greatest impact.
This illustrates the core principle that makes agentic AI so compelling in diagnostics: the data is already there, continuously generated through tests and encounters that would occur anyway. The real question is whether sufficient intelligence exists to convert that data into concrete clinical action — and whether that intelligence is embedded in the workflow at the point where action must occur.
The same principle applies in higher-acuity settings. Antimicrobial susceptibility testing — determining which antibiotics will be effective against a specific infection — has traditionally taken up to 16 hours. This is not merely a laboratory inefficiency; it is a clinical problem. Peer-reviewed research published in the American Journal of Respiratory and Critical Care Medicine confirms that hourly delays in antibiotic administration are associated with increased hospital mortality in sepsis patients, with the greatest risk concentrated in those with septic shock.
AI-enabled diagnostics have already demonstrated the ability to compress that 16-hour timeline to roughly four hours. Applied consistently across a sepsis population, that acceleration in time-to-result translates directly into earlier appropriate therapy and, ultimately, into measurable clinical consequences.
Life Support Used to Need Three Specialists. Now It Fits in a Bag
A second frontier in healthcare is the convergence of AI with medical device design. Here, intelligence is not a software layer sitting on top of existing hardware, but something embedded directly into the device so it can respond to clinical conditions in real time.
Consider extracorporeal membrane oxygenation (ECMO), a life-support system that circulates blood outside the body, oxygenates it, and returns it to the patient. In its traditional form, ECMO is a large, ICU-bound apparatus that typically requires two or three specialists to set up, monitor, and manage. A neurosurgeon-turned-medical device innovator posed a simple but radical question: what if it did not have to be this way? What if the system could be miniaturized, made wearable by the patient, and equipped with AI that reads continuous physiological signals and adjusts pump speed in real time as the patient lies down, sits up, or moves, without a specialist at every step?
The engineering answer is a portable ECMO device that fits in a bag, can be deployed in the field by a single clinician, and is controlled by software making microsecond adjustments based on physiological inputs. In this configuration, the AI controller is not issuing recommendations; it is making decisions continuously and autonomously as the patient's clinical state changes. This is what AI as infrastructure looks like in medical devices: not a bolt-on feature, but the device's operating logic.
The Best AI in the World Fails If It Requires a New Login
Across clinical domains, healthcare practitioners who are embedding AI into care delivery point to the same non-negotiable design requirement: AI must live inside the workflows clinicians already use — not sit adjacent to them, not reside in a separate interface, but be embedded directly within those existing workflows.
When a system demands a new login, a separate screen, or a different behavioral pattern, adoption breaks at the point of care, regardless of how powerful the underlying model may be. Accenture research cited by HFMA finds that 39% of clinicians do not believe digital health tools are effectively integrated into their current workflows — a statistic that explains much of the variation in AI adoption outcomes among organizations deploying similar technologies. In other words, the differentiator is not the model itself; it is the quality of the integration.
When this principle is applied correctly, the impact is measurable. Seeking to reduce documentation burdens on clinical staff, a leading Southeast Asian distributor of hospital medical systems partnered with FPT to develop AI Scribe — an AI-powered clinical documentation solution designed to plug into existing workflows rather than force clinicians to operate a parallel system. The results were clear: medical documentation time was reduced by up to 50%, clinician burnout and fatigue fell by up to 70%, and documentation quality improved for 75% of physicians. These outcomes were a direct function of workflow fit, not model sophistication alone.
Clinical transparency is equally critical. Clinicians will not rely on recommendations they cannot explain to patients, defend in documentation, or stand behind professionally. To earn clinical trust, a system must surface not only a recommendation, but also the reasoning behind it — the specific data inputs, the clinical rationale, and the differential possibilities that were considered. Explainability is not an academic luxury; it is the design discipline that determines whether a clinical AI tool is used in everyday care or quietly shelved after a pilot.
10 Billion Records. None of Them Labeled
Before any of these capabilities can be deployed, there is a larger challenge organizations consistently underestimate: the data they plan to build on is often not ready. In one AI development program for urinalysis, the team had access to 10 billion data records from the outset. However, none of these records were labeled, so building a usable training dataset required physically reviewing and annotating images one by one, adding years to the development timeline.
The lesson is clear: start with a single high-quality, well-labeled data source, build the algorithm, validate it, then add new sources incrementally. The instinct to aggregate multiple data sources immediately almost always runs into quality and consistency issues that are far harder to resolve at scale than they appear upfront. A global in-vitro diagnostics manufacturer operating across dozens of countries encountered exactly this challenge: diagnostic outputs across its decentralized lab network were inconsistent, with data structured differently at each site.
To address this, the organization partnered with FPT to implement automated validation of lab results, normalization of data across multiple sites using consistent data models, and embedded quality control workflows that significantly reduced manual errors and improved data consistency across its entire network. The principle holds whether the scope is one algorithm or a hundred sites: an agent operating on incomplete or poorly structured data does not just produce imperfect recommendations; it produces imperfect actions.
The Moment AI Stops Asking Permission, Governance Becomes Non-Negotiable
Deploying AI that only recommends is fundamentally different from deploying AI that directly acts. The governance requirements are different in kind. The value of agentic systems lies precisely in their autonomy — but that same autonomy can create compliance and patient safety risks if behavior drifts outside its intended scope.
Organizations deploying agentic AI are confronting governance as a precondition, not an afterthought. The most mature deployments built governance infrastructure before they scaled, not after.
In practice, that governance foundation typically rests on three elements:
- Auditability — every agent action must be traceable to a specific input and decision point.
- Runtime monitoring — real-time visibility into whether agents are operating within defined parameters.
- Proportional oversight — governance architecture calibrated to the clinical stakes of each use case, rather than a uniform framework applied regardless of consequence.
As Mass General Brigham's enterprise AI framework makes clear, AI safety is not added at the end of a deployment — it is embedded in the architectural decisions made at the beginning.
How Can Healthcare Leaders Move Agentic AI from Pilots to Scalable Impact?
Healthcare and Medtech organizations scale agentic AI successfully when they start with a focused problem, build on clean, reliable data, embed intelligence into real workflows, and rigorously measure impact before expanding. The challenge is not whether agentic AI works, but whether the foundations exist to make it work consistently when it matters most.
The question for healthcare and Medtech leaders is no longer whether agentic AI is producing results. It is. What now separates organizations that scale successfully from those that remain stuck in pilots that never convert is how they approach deployment.
Across deployments that deliver impact, the pattern is consistent. Teams begin with a specific, well-defined, measurable problem. They rely on clean, reliable data. They embed the intelligence directly into the workflow where the problem actually occurs, so it supports real decisions rather than sitting on the sidelines. Only then do they measure the impact and use that evidence to guide any expansion.
Analysis of leading healthcare AI programs confirms that organizations moving from pilot to production tend to share one trait: they prove value in a focused use case before seeking broader transformation.
The strategic question is therefore not whether to engage with agentic AI, but whether the underlying conditions are in place to make that engagement reliable at scale. Those conditions include data quality, sound governance, tight workflow integration, and the clinical trust that comes from explainability.