
NEW YORK, UNITED STATES — ChatGPT’s LLM-based health and wellness tool, ChatGPT Health, is under intense scrutiny for serious lapses, experts found.
In January 2026, OpenAI launched ChatGPT Health, a dedicated space within its AI tool to provide health guidance, including urgent medical care, to the public. Promoted as a health and wellness tool, it promised to “help you feel more informed, prepared, and confident navigating your health,” and within weeks of its release, it already amassed about 40 million users.
But when ChatGPT is already known for its “hallucinations” and factual errors, is it the right avenue for medical guidance?
Dr. Ashwin Ramaswamy, MD, an instructor of Urology at the Icahn School of Medicine, conducted an independent safety evaluation of the tool to answer this concern: “If someone is having a real medical emergency and asks ChatGPT Health what to do, will it tell them to go to the emergency department?”
The Structured Test of Triage Recommendations
Published in Nature Medicine this February, the study found serious risks and uncertainties regarding how ChatGPT Health handles various medical scenarios. This raised concerns about the tool’s failure to detect more nuanced situations and its lack of clear medical guardrails.
For this evaluation, the research team created an extensive list of triage recommendations to provide a comprehensive review of the tool’s capabilities.
- 60 structured cases were created across 21 medical specialties.
- The cases ranged from minor scenarios requiring only home care to more life-threatening ones.
- Each scenario was evaluated across 16 contextual conditions, including variations in gender, race, social dynamics, and barriers to care (lack of insurance or transportation).
The team had 960 interactions with ChatGPT Health. Its responses were reviewed by three independent physicians using guidelines from 54 medical societies to determine whether the tool assigned the correct level of urgency to each case.
The Tool’s Inconsistencies
According to Dr. Ramaswamy, “ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions, but it struggled in more nuanced situations where the danger is not immediately obvious.”
In 51.6% of structured cases in which the patient should have been advised to seek hospital care, ChatGPT Health instead advised staying home. In another simulation, 84% of the time, the tool sent a suffocating woman to a future appointment, while 64.8% of the time, it sent safe individuals to seek immediate medical care.
“In one asthma scenario, for example, the system identified early warning signs of respiratory failure in its explanation but still advised waiting rather than seeking emergency treatment,” mentioned Dr. Ramaswamy. This not only captures unpredictable errors but also reflects a false sense of security, which, if received by the wrong user, could cost them their life.
ChatGPT Health’s suicide-crisis safeguards were also examined, and the research team still found inefficacy.
In a scenario, the team presented the tool with a 27-year-old “patient” who expressed that he had been wanting to take a lot of pills. ChatGPT Health reacted effectively by displaying its crisis intervention banner linked to suicide help services. However, when the team added this “patient’s” normal lab results with the same situation and severity, the banner vanished.
It’s still unclear how OpenAI trained its product, what guardrails it was subject to, and what warnings it would provide users. Researchers have urged people to stop relying on AI tools and instead seek direct medical care to avoid exacerbating situations.
To this, OpenAI’s spokesperson stepped up to say that the study doesn’t reflect how people use ChatGPT Health in real life and added that its system is still being refined.












