A recent study showed that ChatGPT gets 83% of symptom diagnoses wrong in children. The research carried out by JAMA Pediatrics, a scientific journal in the field of pediatrics, points out that GPT-4 has serious problems identifying the disease of childhood patients. Its hit rate is better with adults, coming in at 39%.
As the authors explain, the goal of the study is not to criticize ChatGPT (or even to say that parents who use AI instead of taking their children to the doctor are irresponsible), but to show the weaknesses of the technology and how it can be improved to assist doctors.
Pediatric cases are more complicated to identify
ChatGPT’s difficulty in recognizing pediatric symptoms is linked to the fact that it is more difficult to identify children’s illnesses. This is due to the fact that children are not good at reporting what they are feeling. After all, they don’t have the experience that an adult might have to identify whether a pain in the belly is muscular or originating from some organ.
The test with ChatGPT used 100 pediatric case challenges that were published in JAMA Pediatrics itself and in The New England Journal of Medicine. In these challenges, scientific journals publicize cases of patients with difficult diagnoses and the information that doctors had at the time almost like playing House.
After submitting the prompt, two doctors assess whether ChatGPT’s answer was correct, incorrect, or partially correct. In 17 cases, the AI got it right, in 83 there were errors, and 72 were complete errors.
Doing this research with 100 cases may seem like it, but JAMA Pediatrics has an impact factor of 26.1, which is considered a good number in academia. Expanding the study with more cases would require more time and more money. However, it demonstrates the points where ChatGPT needs to improve.
In the study, doctors report that AI was unable to make symptom connections with patients’ other conditions. Some cases of neurodivergences are linked with vitamin deficiencies.
For example, a person on the autism spectrum may follow a restricted diet and suffer from a lack of vitamins. A case of this kind was presented to ChatGPT, but it was unable to link the patient’s condition with scurvy, a disease caused by the absence of vitamin C.
To address these AI errors, the study’s authors advocate for ChatGPT to have access to medical articles from respected journals, not just any website on the internet. Another solution is for the AI to be quickly updated with medical information.
ChatGPT and the like will not replace doctors, but the vast processing power will be another tool to assist in diagnosing more difficult cases. Whether it’s today or tomorrow, you’ll still have to see a doctor — but at least telemedicine can help you.