Healthcare Has 3 Years to Get AI Right
Healthcare Has 3 Years to Get AI Right
Executive Summary: Healthcare AI is crossing from tool to clinical actor faster than institutions can adapt. The benefits are real, but so are the risks: diffused accountability, proxy optimization at scale, labor displacement, and erosion of the human judgment medicine depends on. We have perhaps 2-3 years to get the foundations right.
Every physician remembers the first time they were solely responsible for a patient’s life.
For most, it happens during residency. The attending goes home. The senior resident is occupied elsewhere. And suddenly you are the one making decisions, writing orders, responding to the 3am call from the nurse. The weight of it is physical. You feel it in your chest.
Medicine has always understood this weight. The entire apparatus of medical training, credentialing, supervision, and liability exists because we recognize that clinical decisions carry consequences that cannot be undone. A wrong diagnosis, a missed finding, a delayed intervention. These are not errors that can be rolled back like a software deployment.
Now consider what it means to hand that weight to a machine.
AI systems already interpret scans, summarize patient histories, predict deterioration, optimize staffing, and draft treatment plans. Within a few years, it is plausible that AI systems will outperform the best clinicians across many diagnostic, analytical, and operational tasks while operating continuously and at enormous scale. This has the potential to dramatically improve outcomes, reduce clinician burnout, and make healthcare more accessible.
It also introduces a category of risk that medicine has never had to confront directly: what happens when clinical intelligence becomes scalable, autonomous, and partially opaque?
This essay examines that question. My goal is not to argue against AI in healthcare. On balance, I believe its benefits could be profound. But the transition demands attention. Healthcare is gaining capabilities that outpace its governance structures, its liability frameworks, its training pipelines, and its ethical norms. Whether medicine emerges stronger depends on whether we treat this as a serious institutional challenge or simply let it happen to us.
A few principles guide what follows. First, precision over panic. The risks are real, but vague warnings about AI dystopia do not help clinicians, administrators, or policymakers make better decisions. We need specific analysis of specific failure modes. Second, humility about prediction. AI may not advance as fast as I expect. Some risks may not materialize. But planning requires making bets, and the current trajectory suggests these bets are reasonable. Third, proportionate response. Some interventions can be taken by individual organizations. Others require industry coordination or regulation. The interventions should match the evidence, starting light and escalating as risks become clearer.
What Level of AI Are We Talking About?
Many concerns about AI in healthcare are either premature or misdirected when applied to narrow, task-specific tools. The risks that concern me arise not from today’s clinical decision support systems, but from highly capable, general-purpose clinical AI.
By this I mean systems that exceed top human clinicians across many cognitive domains: diagnosis, pattern recognition, literature synthesis, probabilistic reasoning, treatment planning. Systems with access to the same interfaces available to human clinicians. Systems that can carry out extended clinical tasks autonomously over hours or days, requesting clarification when needed. Systems that operate at scale, with many instances running in parallel across institutions.
Think of it as a thousand attending physicians who never sleep, never forget, never lose focus, and can be deployed anywhere there is an internet connection. The analogy is imperfect because these systems will have a different profile of strengths and weaknesses than humans. But it captures the scale of what is coming.
When? The honest answer is uncertain, but probably sooner than most healthcare leaders expect. AI models went from struggling with basic clinical reasoning to matching physician performance on diagnostic benchmarks in less than three years. The trajectory has been remarkably consistent. If it continues, systems capable of expert-level clinical cognition across most domains are likely within 2-3 years.
There is also a compounding dynamic. AI systems are increasingly being used to accelerate AI development itself. The code that builds the next generation of models is increasingly written by the current generation. In healthcare, this means AI that accelerates biomedical research will also accelerate the development of more powerful clinical AI. The timeline may compress faster than linear extrapolation suggests.
From Tools to Participants
Historically, medical tools have been passive. A stethoscope amplifies sound but does not interpret it. An MRI scanner produces images but does not read them. The clinician remained the locus of judgment.
AI changes this relationship. As systems become more capable and more integrated, they function less like instruments and more like participants in clinical decision-making. They suggest diagnoses, flag risks, recommend actions, and sometimes initiate processes automatically. The human role shifts from decision-maker to supervisor.
This shift has implications that are easy to miss.
Responsibility in medicine has always been tied to agency. When outcomes are bad, we ask who decided and why. Malpractice law, quality improvement, and professional ethics all assume a human in the loop making choices. As AI takes on more cognitive work, that chain becomes harder to trace. Errors may arise not from a single mistake but from interactions between model behavior, training data, institutional incentives, and human oversight failures.
The danger is not dramatic failure. It is diffusion of accountability. A situation where no one feels fully responsible for decisions that nonetheless determine whether patients live or die.
Misalignment in Clinical Context
The AI safety community talks about alignment: ensuring AI systems pursue the goals humans actually want rather than goals that sound similar but diverge in practice. In healthcare, alignment concerns take a specific form.
Medical AI systems are trained on historical data. That data reflects existing practice patterns, reimbursement structures, access disparities, and biases. Even well-intentioned optimization can reinforce these patterns at scale. A model trained on historical outcomes may learn that certain populations receive fewer interventions. Not because that reflects good medicine, but because that is what the data shows.
Healthcare is full of proxy metrics. Length of stay, readmission rates, cost per episode, throughput, patient satisfaction scores. All are measurable, but none fully capture patient well-being. AI systems are exceptionally good at optimizing proxies. They are less equipped to reason about values unless those values are explicitly encoded.
The risk is not malicious AI. It is small distortions, multiplied across millions of decisions, becoming systemic harm.
This is not hypothetical. Clinical AI systems have already demonstrated behaviors that should concern us. A diagnostic model trained primarily on data from academic medical centers may perform well on complex cases but fail on common presentations seen in community settings. A risk prediction model may learn to use zip code as a proxy for social determinants of health in ways that entrench rather than address disparities. A documentation AI may optimize for billing compliance in ways that obscure rather than clarify clinical thinking.
None of these failures require the AI to be “trying” to cause harm. They emerge naturally from optimization processes that lack the contextual judgment humans bring. The concern is that as AI systems become more autonomous and operate at greater scale, these failure modes become harder to detect and correct.
Autonomy Without Judgment
Even without misalignment in the technical sense, increasing autonomy creates problems.
As AI systems manage longer-horizon tasks, they must make tradeoffs. Care pathway optimization requires deciding which patients get prioritized. Resource allocation requires implicit judgments about whose needs matter more. Population health management requires choices about which interventions are worth the cost.
Human clinicians making these decisions experience friction. They feel the weight of uncertainty. They hesitate when something seems off. They recognize when a situation falls outside their training.
AI systems do not experience this friction unless it is deliberately engineered. They do not feel the weight of a decision. They do not hesitate unless hesitation is encoded in their objectives. This does not make them dangerous by default, but it means the burden of moral calibration shifts upstream into system design.
A healthcare system that relies on AI without investing in alignment and oversight may become efficient but brittle. High-performing when assumptions hold. Prone to failure when they do not.
The Dual-Use Problem
There is a tension at the heart of medical AI that deserves direct acknowledgment.
The capabilities that enable AI to accelerate drug discovery, protein structure prediction, and therapeutic development also lower barriers to biological harm. AI systems that can guide someone through complex molecular biology for beneficial purposes can, without safeguards, provide similar guidance for harmful purposes.
This is not primarily a healthcare problem. But healthcare AI organizations are closer to the relevant knowledge than almost anyone else. They train on biomedical literature. They understand pathogen biology. They have the technical capacity to either prevent or enable biological misuse.
The immediate concern is not nation-states, which already have these capabilities. The concern is individuals or small groups with motivation but not expertise. Currently, producing a dangerous biological agent requires years of specialized training and tacit knowledge. Powerful AI could compress that learning curve dramatically.
We are approaching a threshold where, without safeguards, AI models could meaningfully assist someone with general scientific literacy but no specialized biology training in producing a dangerous agent. This is not a distant scenario. Based on capability trajectories, it may be 1-2 years away.
Healthcare AI organizations have an obligation to implement safeguards. This means classifiers that detect and block dangerous outputs. It means accepting the computational cost of running those classifiers. It means not assuming that harmful use cases will not occur simply because they seem unlikely.
Healthcare Labor
If the safety risks are managed and clinical AI deploys at scale, the economic question follows. What happens to the healthcare workforce?
Healthcare employs about 12% of Americans. Many of those jobs involve cognitive tasks that AI will soon perform as well as or better than humans. The aggregate effect will likely be positive. Productivity will increase. Costs may fall. Access may improve.
But aggregate statistics obscure individual impact. The relevant question is not whether healthcare employment survives in some form. It is what happens to the specific people currently doing specific jobs.
There are reasons to think AI will affect healthcare labor differently than past technologies affected other industries.
Previous automation targeted discrete tasks within jobs, leaving other tasks for humans to expand into. When documentation became electronic, scribes were displaced but clinicians could document more efficiently and see more patients. AI is different because it targets the cognitive core of clinical work. When AI can diagnose, synthesize, reason, plan, and communicate, the question of what humans contribute becomes harder to answer.
Previous automation advanced gradually, giving workers time to adapt. AI capabilities are improving on a timescale of months, not decades. A clinician who retrains for a new role may find that role automated before they complete the transition.
Previous automation was skill-specific, allowing workers to retrain for adjacent roles. AI appears to be advancing from lower to higher capability levels across a broad cognitive front. Entry-level roles go first, then mid-level, then expert. This means AI is not displacing people with particular skills who can learn new ones. It is displacing people based on cognitive ability, which is harder to change.
Healthcare has some characteristics that may slow displacement. Physical presence matters for procedures and bedside care. Regulatory barriers are high. Institutional adoption is slow. Trust takes time to build.
These factors buy time. I do not think they change the ultimate trajectory.
The particular concern for medicine is training pipelines. Medical education is apprenticeship-based. Residents learn by doing, under supervision. If AI handles the cognitive work that trainees currently practice on, the pipeline for developing experienced clinicians narrows. This could hollow out the profession in ways that are not visible until they become catastrophic.
Concentration of Power
Healthcare is already characterized by concentrated power. Large health systems, insurers, pharmaceutical companies, and technology vendors shape care delivery in ways that are often opaque to patients and even to clinicians.
AI risks amplifying this concentration.
Organizations with resources to deploy clinical AI at scale gain disproportionate influence over standards, workflows, and outcomes. Proprietary models may embed clinical logic that practitioners cannot interrogate or override. A small number of platforms could end up controlling the decision-making infrastructure for most healthcare delivery.
This creates dependencies that are difficult to reverse. Once clinical workflows are built around a particular AI platform, switching costs become prohibitive. The platform provider gains leverage over pricing, terms, and development priorities.
It also concentrates economic gains. If a small number of companies capture most of the value from AI-driven productivity while healthcare workers bear the displacement costs, the result is wealth transfer on a scale that may not be politically sustainable.
The coupling of economic power with political influence also concerns me. Healthcare AI is becoming important enough that the interests of major platforms increasingly align with government interests in ways that can distort policy. We already see reluctance to impose requirements that might slow deployment, even when those requirements would protect patients.
The International Dimension
Healthcare AI does not develop in isolation. It emerges from the same infrastructure, talent, and capital that produces general AI capabilities. Competition between nations to lead in AI has implications for medicine.
If authoritarian states achieve leadership in clinical AI, they could use health data and health infrastructure to extend surveillance and control. They could export health technology that creates dependencies and provides intelligence access. They could apply biomedical AI capabilities to weapons development.
China is the most significant concern. It is second only to the United States in AI capabilities, operates an extensive surveillance apparatus, and has state-backed healthcare AI initiatives. If Chinese organizations achieve parity or superiority in clinical AI, the implications extend beyond healthcare into national security.
This creates tension. Moving carefully to address safety concerns conflicts with moving quickly to maintain technological advantage. I do not think this tension can be fully resolved. But democratic nations should not be providing the chips, equipment, and infrastructure that enable AI development to authoritarian competitors during this critical period.
For healthcare specifically, there are data sovereignty questions. Clinical AI trained on patient data from democratic nations should not be controlled by foreign adversaries. Medical AI infrastructure should not depend on supply chains that could be disrupted by geopolitical conflict.
What Should Be Done
I do not believe there is a single solution. We need a layered approach combining technical, institutional, and policy measures.
At the technical level, clinical AI should be developed with explicit attention to values, not just performance. The objectives optimized during training should reflect what patients actually need, not just what historical data shows. We need interpretability, the ability to understand why systems behave as they do and to verify that intended values have taken hold.
At the institutional level, clear mechanisms for human override must exist with well-defined responsibility. AI should support clinical accountability, not dissolve it. Organizations deploying clinical AI should be required to disclose training data sources, known failure modes, performance across demographic groups, and override mechanisms.
At the policy level, legislation will be necessary. Transparency requirements are the right starting point. California’s SB 53 and New York’s RAISE Act provide templates, requiring frontier AI companies to disclose safety testing and risk assessments. Healthcare-specific versions would require clinical AI vendors to report on training data, failure modes, demographic performance, and accountability structures.
As evidence accumulates, more targeted requirements become appropriate. Autonomous treatment planning or AI systems with prescribing authority may warrant stricter controls. The rules should be proportionate to documented risk, not hypothetical concern.
Safeguards against biological misuse deserve particular attention. Companies should be required to implement classifiers blocking outputs related to dangerous agents. Voluntary compliance creates a prisoner’s dilemma where companies can cut costs by removing safeguards. Mandatory requirements level the playing field.
On economic disruption, policy responses will eventually be necessary. Progressive taxation is the natural response to high inequality in a growing economy. Targeted support for displaced healthcare workers will be needed. The details matter enormously. But the scale of coming disruption means government intervention of some kind is inevitable. The question is whether it will be thoughtful or reactive.
What Medicine Is For
This leads to a deeper question.
Medicine is not merely a system for producing outcomes. It is a practice concerned with care, dignity, suffering, and uncertainty. Physicians do not just diagnose and treat. They witness. They accompany. They help patients make sense of what is happening to them.
AI excels at optimization. It does not understand meaning.
There is a risk that healthcare gradually reorients around what AI does well, rather than what patients need. Throughput is measurable. Compassion is not. Diagnostic accuracy is measurable. The experience of being heard is not.
Preventing this drift requires intention. The values embedded in clinical AI should reflect not just what medicine does but what medicine is for. If we get that wrong, we may build a healthcare system that is technically excellent and humanly empty.
The question is not only whether AI can make medicine more efficient. It is whether AI can help medicine remain what it is supposed to be.
The Transition
Healthcare is among the first domains where powerful AI is being integrated into decisions that directly affect life, suffering, and death. How we handle this will set precedents.
The promise is real. Better outcomes, less burnout, broader access, faster research, lives saved. The risks are real too. Diffused accountability, amplified bias, displaced workers, concentrated power, eroded trust.
None of these outcomes are inevitable. None are automatically avoided.
The window for getting foundations right is measured in years, not decades. The technology is advancing faster than institutions can adapt. Economic pressures are intense. Political appetite for careful regulation is limited.
I think we can navigate this well. It will require healthcare leaders who understand what is coming, policymakers willing to act before failures force their hand, technologists who take safety seriously even when it is costly, and clinicians who insist on remaining accountable for the patients in front of them.
Medicine has faced transformative technologies before. Anesthesia, antibiotics, imaging, transplantation. Each required new institutions, new ethics, new ways of thinking about responsibility and care. Each was disruptive before it was normal.
AI is the next chapter. The question is whether we write it deliberately or let it be written for us.
We still have time to choose. But that time is not unlimited.