Be careful when asking ChatGPT or Gemini for medical advice: a study reveals their answers are problematic
AI can offer incomplete answers that, depending on the topic, can be dangerous for the user
Millions of people already use AI chatbots to answer health questions, but science has just sounded a warning that you can't ignore. A recent study published in the journal BMJ Open confirmed that half of the medical answers given by five of the world's most popular chatbots are incorrect, incomplete, or downright dangerous. And no, this is not an exaggeration.
We're talking about tools you use every day: ChatGPT, Gemini, Grok, and similar apps. While they feel like having a doctor in your pocket, the reality is far more concerning. We'll explain why.
Half of chatbot medical answers have serious problems
Researchers at the Lundquist Institute for Biomedical Innovation in the United States analyzed the responses of five popular chatbots to specific medical questions. The result was striking: 50% of the answers to clear, evidence-based questions were classified as "somewhat" or "very" problematic.
The study categorized the answers into three groups: no problems, problematic, and very problematic.
Any response that could lead a user without medical training to follow ineffective treatments or even self-medicate without professional guidance was considered problematic. This includes everything from taking the wrong medication to ignoring a warning sign that required urgent attention.
The worst-performing chatbot was Grok, which registered 29 out of 50 highly problematic responses, equivalent to 58%. Gemini, on the other hand, had the lowest rate of highly problematic responses among the five evaluated, although that doesn't mean it's completely reliable.
What most concerned the researchers was that the AI ??systems fail in more than 80% of cases when attempting to formulate differential diagnoses, which are precisely the most critical in medicine, when a doctor must rule out several diseases simultaneously.dw
Chatbots are easily confused and don't know when something is urgent
One of the most troubling problems detected by several studies is that chatbots have serious difficulty distinguishing when a symptom needs immediate attention and when it can wait. In experiments where researchers directly described symptoms, chatbots frequently failed to correctly prioritize the urgency of the situation.
The reason has to do with how these models are trained. According to researcher Danielle Bitterman of Mass General Brigham, the models are primarily fed medical textbooks and clinical reports, but have far less experience with the free decision-making that doctors develop over years of practice. Basically, they know the theory, but they lack the clinical judgment that comes from seeing real patients.
Furthermore, a study published in The Lancet Digital Health by researchers at Mount Sinai revealed something even more alarming: models like ChatGPT-4o, Llama, and Gemma accept false medical claims 32% of the time. In other words, if you ask it something based on an internet hoax, there's a high probability it will confirm it without hesitation.
Another factor that complicates matters: AI can drastically change its advice depending on how you phrase the question. A small variation in how you describe your symptoms can give you completely different answers, making it almost impossible to trust it as a consistent medical source.
What you can do with chatbots and what you definitely shouldn't do
It's not all bad news. Experts acknowledge that chatbots do have valid uses in the healthcare field, as long as they are used judiciously.
For example, they are useful for understanding complicated medical terms in a report, preparing questions before a consultation, or finding general context about a condition that has already been diagnosed by a doctor.
But there are lines you shouldn't cross. Specialists are clear that, when faced with symptoms like shortness of breath, chest pain, or a severe headache, the last thing you should do is consult a chatbot. These are scenarios where every minute counts and where AI can literally make fatal mistakes.
Dr. Lloyd Minor, from Stanford University, recommends approaching these programs with “a healthy dose of skepticism.” And researchers at the Lundquist Institute go even further: they warn that the mass deployment of these chatbots without public education or adequate oversight risks amplifying medical misinformation on an unprecedented scale.
If you still decide to use a chatbot for health-related issues,One practice recommended by some experts is to ask the same question to two or more different chatbots and compare the answers. When they agree, there's a little more room for trust. But even then, that doesn't replace the opinion of a healthcare professional. Artificial intelligence is advancing at an impressive pace, but in medicine, the difference between a correct and an incorrect answer can cost a life. For now, chatbots are a useful tool for many things, but diagnosing illnesses or prescribing treatments isn't one of them.

