From Stochastic Parrots to Software as a Doctor
Published:
LLMs have been characterized as stochastic parrots, probabilistic systems that merely remix text without understanding and predict the next word. But the frontier is shifting. Today, the question is no longer whether LLMs can imitate clinical expertise, but how we transform them into regulated medical devices that can interview patients, form preliminary diagnoses, triage safely, and even prescribe.
Regulatory context
Any discussion about deploying AI in healthcare must begin with the U.S. Food and Drug Administration (FDA), the federal agency responsible for ensuring that medical products are safe and effective before they reach patients. The FDA regulates not only physical instruments but also software that meets the statutory definition of a medical device. That is, software “intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment, or prevention of disease.” As the FDA notes in its AI/ML Action Plan, medical software that influences real clinical decisions requires rigorous oversight. Because of this statutory framework, any software that interprets symptoms, suggests diagnoses, assesses clinical risk, or prioritizes patients for care is automatically treated as a medical device. The FDA has reaffirmed this by defining AI-enabled device software functions (AI-DSFs) as any AI system used for a medical purpose, whether standalone Software as a Medical Device (SaMD) or embedded inside another product. This places many uses of LLMs in healthcare squarely within FDA jurisdiction.
A brief caveat: not all clinical software is regulated. The FDA exempts certain non-device Clinical Decision Support (CDS) tools when clinicians can independently review the basis for the recommendation and avoid relying solely on the software’s logic. However, LLM-based diagnostic or triage systems do not meet these criteria. Their reasoning is not reviewable in the sense FDA requires, and their outputs directly influence diagnostic assessment or urgency classification, placing them firmly in the category of regulated medical devices.
Over the past decade, the FDA has approved over one thousand AI/ML–based medical devices most notably in radiology, cardiology, pathology, dermatology, and physiologic monitoring. A recent review found that these systems are typically for imaging (84%) and signal-processing (14.5%) applications. From a clinical-function perspective, 84% of authorized devices are used for assessment tasks (detection, diagnosis, monitoring, or risk scoring) whereas only 16% are used for interventional tasks (e.g. treatment guidance or surgical planning).
Within the assessment category, the dominant AI behavior is not high-level reasoning but quantification and feature localization, which accounts for 58% of all devices. This includes things like measuring anatomical structures, segmenting lesions, or extracting waveform features. Triage algorithms (e.g. flagging abnormal scans for prioritized review) make up about 11%, while “diagnosis” algorithms constitute only 6–7% of devices.
The regulatory pathways for such imaging- and signal-based systems are now well established. However, the FDA has not yet granted marketing authorization to any LLM-based system, whether for “diagnosis, treatment, prevention, cure, or mitigation” of disease. Not a single authorized device to date uses an LLM as its core inference engine. While the agency has issued forward-looking guidance acknowledging the unique challenges of genAI and natural-language interfaces, it has not yet reviewed a model whose primary mode of operation is conversational. As a result, developers of LLM-based AI clinicians must look to existing SaMD precedent, general AI/ML regulatory principles, and emerging FDA guidance to understand what evidence will be expected. The path for conversational diagnostic AI is not yet defined, but the foundational expectations are already visible.
Core use cases for software as a doctor
If we take seriously the idea of software acting as a doctor, the relevant use cases are not those that simply streamline documentation or assist clinicians on the margins, but the tasks that truly embody the core clinical functions of a physician. These include information gathering (e.g. taking clinical histories and selecting appropriate tests), interpreting symptoms and results, making diagnostic inferences, assessing urgency and triage, recommending treatments, and, at the far end of the spectrum, prescribing autonomously. Each of these functions carries increasing levels of clinical risk, autonomy, and regulatory scrutiny.
Autonomous prescribing: the endgame
Among all the core doctor functions, prescribing sits in a category of its own. It is where clinical judgment, pharmacology, risk management, and legal responsibility converge. It is also where the potential value of automation is enormous: a huge fraction of primary care encounters involve well-understood conditions with highly protocolized treatment pathways, from hyperlipidemia and hypertension to stable chronic disease management. If an AI system could safely and autonomously manage even a slice of these workflows, the value unlock would be profound.
Today, however, prescribing is tightly gatekept by law. The bill H.R. 238 (Healthy Technology Act of 2025) is interesting precisely because it touches this boundary: it would amend the FD&C Act so that AI/ML technologies can, in principle, qualify as a “practitioner licensed by law to administer such drug”, provided two conditions are met:
- The AI system is authorized under state law to prescribe the drug involved.
- It has been approved, cleared, or authorized by the FDA under one of the existing device pathways.
On paper, that sounds like a potential inflection point. In practice, this bill is unlikely to become law in the near term. It has been introduced multiple times by the same sponsor, has never attracted co-sponsors, and has never advanced out of subcommittee. Even if it did pass, it would still face substantial friction at both the state level (no state currently recognizes AI as a prescribing practitioner) and within the FDA.
Still, it is worth paying attention to bills like H.R. 238. They implicitly say that a future in which AI systems are accepted as autonomous prescribing agents is no longer unthinkable. In the meantime, the realistic path is not fully autonomous prescribing, but AI-driven prescription recommendations with human sign-off, especially for low-risk, high-volume conditions where guidelines are clear and a clinician can rapidly confirm or override the AI’s proposal.
The Business Implications: Why FDA Approval Will Become a Moat
While difficult, obtaining FDA approval does not merely create friction; it creates moats. Once LLM-based clinical systems begin to clear the regulatory bar, the firms that succeed will gain structural advantages that are extremely difficult for competitors to replicate.
- FDA clearance requires evidence that few companies will be able to generate. Clinical studies, multi-site evaluations, bias assessments, human-factors testing, and real-world monitoring plans represent a level of rigor far beyond what consumer AI applications require. These evidence packages are expensive, time-consuming, and scientifically complex. But once a company invests in them and demonstrates safety and effectiveness, it becomes harder for newcomers to leapfrog on the cheap.
- Approved models will benefit from data compounding. LLM clinicians deployed in real settings will generate large volumes of high-quality clinical interaction data. With the Predetermined Change Control Plan (PCCP) framework, these systems will be allowed to improve continuously. Over time, this creates a feedback loop driving model improvements, performance, and adoption.
- Health systems and payers will prefer vendors with regulatory credibility. Hospitals, insurers, etc. do not want to assume liability for unregulated AI advice. Once an FDA-cleared conversational clinical model exists, it becomes the default choice. Unregulated competitors will face enormous headwinds, regardless of raw model capability.
- Reimbursement follows regulation. Payers and providers cannot bill for workflows involving unregulated software when that software influences diagnosis, triage, or treatment. Also, regulated LLM systems unlock billable care pathways, and potentially new CPT codes may be created.
- Regulatory approval raises the cost of switching. Once a health system integrates a regulated LLM clinician, switching vendors becomes costly. This makes early FDA-cleared entrants sticky.
In short, FDA approval is a strategic asset. It transforms an LLM into a defensible product, with moats that will widen over time.
Relevant FDA Frameworks and Guidance Documents
- IMDRF SaMD Framework
- FDA’s AI/ML-Based SaMD Action Plan
- FDA’s Predetermined Change Control Plans (PCCP)
- FDA’s Clinical Decision Support (CDS) Guidance
- FDA’s Good Machine Learning Practice Principles
- FDA’s Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions
- Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations
