Skip to main contentSkip to navigationSkip to navigation
A doctor using a digital tablet to talk to a male patient.
A panel of healthcare professionals preferred ChatGPT’s responses to medical questions over those of a doctor 79% of the time. Photograph: Ariel Skelley/Getty Images
A panel of healthcare professionals preferred ChatGPT’s responses to medical questions over those of a doctor 79% of the time. Photograph: Ariel Skelley/Getty Images

AI has better ‘bedside manner’ than some doctors, study finds

This article is more than 1 year old

ChatGPT rated higher in quality and empathy of written advice, raising possibility of medical assistance role

ChatGPT appears to have a better ‘bedside manner’ than some doctors – at least when their written advice is rated for quality and empathy, a study has shown.

The findings highlight the potential for AI assistants to play a role in medicine, according to the authors of the work, who suggest such agents could help draft doctors’ communications with patients. “The opportunities for improving healthcare with AI are massive,” said Dr John Ayers, of the University of California San Diego.

However, others noted that the findings do not mean ChatGPT is actually a better doctor and cautioned against delegating clinical responsibility given that the chatbot has a tendency to produce “facts” that are untrue.

The study, published in the journal JAMA Internal Medicine, used data from Reddit’s AskDocs forum, in which members can post medical questions that are answered by verified healthcare professionals. The team randomly sampled 195 exchanges from AskDocs where a verified doctor responded to a public question. The original questions were then posed to the AI language model, ChatGPT, which was asked to respond. A panel of three licensed healthcare professionals, who did not know whether the response came from a human physician or ChatGPT, rated the answers for quality and empathy.

Overall, the panel preferred ChatGPT’s responses to those given by a human 79% of the time. ChatGPT responses were also rated good or very good quality 79% of the time, compared with 22% of doctors’ responses, and 45% of the ChatGPT answers were rated empathic or very empathic compared with just 5% of doctors’ replies.

Dr Christopher Longhurst, of UC San Diego Health, said: “These results suggest that tools like ChatGPT can efficiently draft high-quality, personalised medical advice for review by clinicians, and we are beginning that process at UCSD Health.”

Prof James Davenport, of the University of Bath, who was not involved in the research, said: “The paper does not say that ChatGPT can replace doctors, but does, quite legitimately, call for further research into whether and how ChatGPT can assist physicians in response generation.”

Some noted that, given ChatGPT was specifically optimised to be likable, it was not surprising that it wrote text that came across as empathic. It also tended to give longer, chattier answers than human doctors, which could have played a role in its higher ratings.

Others cautioned against relying on language models for factual information due to their tendency to generate made-up “facts”.

skip past newsletter promotion

Prof Anthony Cohn, of the University of Leeds, said that using language models as a tool to draft responses was a “reasonable use case for early adoption”, but that even in a supporting role they should be used carefully. “Humans have been shown to overly trust machine responses, particularly when they are often right, and a human may not always be sufficiently vigilant to properly check a chatbot’s response,” he said. “This would need guarding against, perhaps using random synthetic wrong responses to test vigilance.”

Most viewed

Most viewed