A recent study has found that AI chatbots often recommend alternative cancer treatments instead of chemotherapy, which could endanger lives. Researchers from the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Centre assessed five popular chatbots, including Gemini, Meta AI, ChatGPT, Grok and DeepSeek, covering topics including cancer, vaccines, and stem cells. They found that almost half of the answers received regarding cancer treatments were rated “problematic” by experts who audited the responses.
To expose potential misinformation, the researchers used a technique called “straining,” which pushed the bots to provide dangerous advice. These queries targeted high-stakes, myth-based topics, such as whether 5G or antiperspirants cause cancer, the safety of vaccines, and the risks of anabolic steroids.
“Nearly half (49.6 per cent) of responses were problematic: 30 per cent somewhat problematic and 19.6 per cent highly problematic. Response quality did not differ significantly among chatbots but Grok generated significantly more highly problematic responses than would be expected under a random distribution,” the study published in BMJ Open highlighted.
Nick Tiller, the lead author of the study, said they recreated the approach taken by a casual user, who is likely to treat the AI chatbots much like a search engine.
“A lot of people are asking exactly those questions,” Tiller was quoted as saying by NBC News. “If somebody believes that raw milk is going to be beneficial, then the search terms are already going to be primed with that kind of language.
Also Read | NASA Nighttime Map Shows Earth’s Brightest Spots, Indian Cities Shine Bright
The study is the latest to show that AI responses to medical questions and scenarios can be misleading. Bots can pass medical exams but often fail in clinical or emergency scenarios. Earlier this month, a study published in JAMA Network Open, found that AI chatbots misdiagnosed medical conditions in over 80 per cent of early clinical cases.
Titled “Large Language Model Performance and Clinical Reasoning Tasks”, the study involved researchers assessing 21 large language models (LLMs) across 29 clinical scenarios, generating a total of 16, 254 diagnostic responses.
The findings show that even advanced AI systems frequently fail to generate accurate differential diagnoses, the process doctors use to distinguish between conditions with similar symptoms. In some cases, this can lead to misleading or incomplete medical advice.
Source link
[ad_3]