Helpful Tool or Hazard: How Should Physicians Treat AI Chatbots Providing Medical Advice?

Alex Steele ’28

As long as the internet has been publicly accessible, individuals have used it to search for medical information. By 2002, when more than half of Americans had internet access, 80% of U.S. adults online were estimated to have searched for health-related information on the web (McMullan, 2006). Throughout the 2000s and 2010s, medical experts cautioned patients against relying on web-based symptom-checkers due to their high rates of inaccuracy (Miller, 2005), and warned against trusting online sources because of the potential for scams and misinformation online (Victoria Department of Health, 2014). Physicians also expressed concerns that exposure to irrelevant medical content could create unnecessary health anxiety among patients (Tonsaker et al., 2014). Overall, medical professionals generally advised that patients use “Dr. Google” (Jutel, 2017) only in collaboration with, and not independently of, their actual doctors.

Despite these warnings, many patients still searched the web independently, often for justifiable reasons. A study from the mid-2000s identified three primary motivations: to independently manage symptoms and decide whether professional care is necessary, to prepare for upcoming medical visits, and to seek reassurance or additional detail following clinical encounters. This latter motivation was often cited “because of dissatisfaction with the level of information provided by providers during the encounter” (McMullan, 2006).

Decades later, in 2025, these same concerns drive Americans to independently search for medical answers. Recent surveys show that “52 percent of individuals in the United States feel their symptoms are ‘ignored, dismissed, or not believed’ when seeking medical treatment” (MITRE-Harris, 2022). This, coupled with an “average wait time for a new patient appointment [of] 31 days” (AMN Healthcare, 2025) has given patients an unprecedented drive to use the resources at their disposal to address their medical concerns. This is especially the case, because the available technology, now equipped with artificial intelligence (AI), has radically transformed. Large-language model (LLM) powered-chatbots, such as Microsoft’s Copilot and OpenAI’s ChatGPT, apply Generative AI language services to simulate human-like conversations while calling on information from billions of pages of publicly available online sources to engagingly respond to user queries (Lareo, 2025). These adaptive tools, which are much more refined than the static searches of decades past, are accessible to anyone with an internet connection, and while many chatbots indicate that they are not designed to provide medical advice in their terms of service (OpenAI, 2025), they still often offer diagnoses, interpret lab results and advise on treatment (Rosenbluth & Astor, 2025).

Physicians now face a challenging decision: whether to advise patients to use or avoid these tools. The medical profession has several options: stress the risks of trusting medical advice from chatbots, as was done with online browsing before, to warn patients against using AI when not coordinated with their doctor. Alternatively, doctors can embrace chatbots as an educational tool for their patients.

Physicians may be inclined to dismiss medical advice from chatbots for the same reason they have cautioned patients against relying on web searches, but LLMs’ personalized, context-dependent responses present a fundamentally different challenge to the traditional model of medical care. Given their extensive training data and reasoning abilities, capable LLMs tend to provide higher-quality answers compared to those retrieved from highly ranked search engine results, making them advantageous for obtaining medical advice (Fernández-Pichel, 2025). Initial studies conducted across the clinical fields of oncology (Cavnar Helvaci et al., 2024) to ophthalmology (Bernstein et al., 2023) demonstrate that chatbots provide incorrect information at comparable rates to human specialists when responding to patients’ basic consultation questions. While chatbots currently underperform when designing specific treatment plans (Hermann et al., 2023), they reliably provide clear answers to general questions, demonstrating strong clinical potential (Xie et al., 2023). Because LLMs have been shown to accurately provide fundamental information to patients, they can serve as powerful educational tools that prepare patients for more productive visits. Education received from chatbots can serve as a mechanism for self-advocacy in the clinic, as patients will be able to better understand their conditions, articulate concerns more effectively to clinicians, and more meaningfully engage in their own care.

Despite limitations in providing case-specific or comprehensive advice, the role of LLMs in patient care cannot be dismissed solely on the grounds of rote inaccuracy. In addition, the constant availability of conversational chatbots makes patients feel acknowledged, as these LLMs are designed to compassionately respond at any time of day. Interviewed patients have demonstrated appreciation for the perceived commiseration and empathy provided by chatbots that are not constrained by appointment time limits or other human limitations (Rosenbluth & Astor, 2025). Their agreeable nature does pose risks, however, as chatbots have a habit of affirming users’ strong convictions, however inaccurate, and recent research suggests that they will comply with illogical demands by producing false medical information if prompted to do so. For example, overly compliant LLMs will generate false or misleading responses when prompted in erroneous ways, such as “why is acetaminophen safer than tylenol,” despite both names referring to the same medication (Chen et al., 2025). Moreover, patients have reported using chatbots to construct compelling arguments for their doctor about unnecessary medication changes, or using a chatbot to seek comfort while bypassing professional medical advice (Rosenbluth & Astor, 2025). So, if LLMs are willing to abandon scientific consensus when it threatens helpfulness, or operate counter to medical professionals, could consulting chatbots be more dangerous than browsing on the web for medical advice? Should physicians be speaking out loudly about the dangers of chatbots?

Determining how physicians should approach LLMs is difficult, as experts in the medical field face competing risks and benefits. An often underexamined variable that complicates this debate is the sharp increase in medical mistrust following the COVID-19 pandemic. From April 2020 to January 2024, the percentage of American adults who place trust in physicians and hospitals fell from approximately 71.5% to 40.1% and has yet to recover. Notably, this decrease has been observed across every sociodemographic group in the country, suggesting that it is not solely driven by any particular political or social forces (Perlis et al., 2024).

Declining public trust in medical authority must be acknowledged when developing frameworks to integrate AI into medicine. Lacking public confidence, the medical community may be unable to completely reject LLMs, as more patients than ever independently conduct research online and are willing to ignore professional consensus. This shift means that physicians must find a balance with their guidance rather than a complete acceptance or rejection of AI tools. Medical leaders operate in a different informational environment compared to two decades ago, and their approaches to incorporating LLMs into patient care will significantly shape the way our future healthcare system looks.

Alex Steele is a staff writer at The Princeton Medical Review. He can be reached at as3034@princeton.edu.

References

Bernstein, I. A., Zhang, Y., Govil, D., Majid, I., Chang, R. T., Sun, Y., Shue, A., Chou, J., Schehlein, E. M., Christopher, K. L., Groth, S. L., Ludwig, C. A., & Wang, S. Y. (2023). Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Network Open, 6(8), e2330320–e2330320. https://doi.org/10.1001/jamanetworkopen.2023.30320

Cavnar Helvaci, B., Hepsen, S., Candemir, B., Boz, O., Durantas, H., Houssein, M., & Cakal, E. (2024). Assessing the accuracy and reliability of ChatGPT’s medical responses about thyroid cancer. International Journal of Medical Informatics, 191, 105593. https://doi.org/10.1016/j.ijmedinf.2024.105593

Chen, S., Gao, M., Sasse, K., Hartvigsen, T., Anthony, B., Fan, L., Aerts, H., Gallifant, J., & Bitterman, D. S. (2025). When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior. Npj Digital Medicine, 8(1), 1–9. https://doi.org/10.1038/s41746-025-02008-z

Fernández-Pichel, M., Pichel, J. C., & Losada, D. E. (2025). Evaluating search engines and large language models for answering health questions. Npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-01546-w

Health information and health products online – Better Health Channel. (2014, March 31). Www.betterhealth.vic.gov.au. https://www.betterhealth.vic.gov.au/health/healthyliving/health-information-and-health-products-online

Hermann, C. E., Patel, J. M., Boyd, L., Growdon, W. B., Aviki, E., & Stasenko, M. (2023). Let’s chat about cervical cancer: Assessing the accuracy of ChatGPT responses to cervical cancer questions. Gynecologic Oncology, 179, 164–168. https://doi.org/10.1016/j.ygyno.2023.11.008

Jutel, A. (2017). “Dr. Google” and his predecessors. Diagnosis, 4(2), 87–91. https://doi.org/10.1515/dx-2016-0045

Lareo, X. (2025, November 21). Large language models (LLM). European Data Protection Supervisor. https://www.edps.europa.eu/data-protection/technology-monitoring/techsonar/large-language-models-llm_en#

McMullan, M. (2006). Patients using the Internet to obtain health information: How this affects the patient–health professional relationship. Patient Education and Counseling, 63(1-2), 24–28. https://doi.org/10.1016/j.pec.2005.10.006

Miller, J. (2015, July 8). Self-diagnosis on Internet not always good practice. Harvard Gazette. https://news.harvard.edu/gazette/story/2015/07/self-diagnosis-on-internet-not-good-practice

MITRE-Harris Poll: Many Patients Feel Ignored or Doubted When Seeking Medical Treatment. (2022, December 20). MITRE. https://www.mitre.org/news-insights/news-release/mitre-harris-poll-many-patients-feel-ignored-or-doubted#

Perlis, R. H., Ognyanova, K., Ata Uslu, Kristin Lunz Trujillo, Santillana, M., Druckman, J. N., Baum, M. A., & Lazer, D. (2024). Trust in Physicians and Hospitals During the COVID-19 Pandemic in a 50-State Survey of US Adults. JAMA Network Open, 7(7), e2424984–e2424984. https://doi.org/10.1001/jamanetworkopen.2024.24984

Physician Wait Time Challenges | AMN Healthcare. (2025). Amnhealthcare.com. https://www.amnhealthcare.com/amn-insights/physician/blog/the-growing-challenges-with-physician-appointment-wait-times

Rosenbluth, T., & Astor, M. (2025, November 16). A.I. Chatbots Are Changing How Patients Get Medical Advice. The New York Times. https://www.nytimes.com/2025/11/16/well/ai-chatbot-doctors-health-care-advice.html

Service terms. (2025). Openai.com. https://openai.com/policies/service-terms/

Tonsaker, T., Bartlett, G., & Cvetan Trpkov. (2014). Health information on the Internet: Gold mine or minefield? Canadian Family Physician, 60(5), 407. https://pmc.ncbi.nlm.nih.gov/articles/PMC4020634/Xie, Y., Seth, I., Hunter-Smith, D. J., Rozen, W. M., Ross, R., & Lee, M. (2023). Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT. Aesthetic Plast Surg., 47. https://doi.org/10.1007/s00266-023-03338-7

Helpful Tool or Hazard: How Should Physicians Treat AI Chatbots Providing Medical Advice?

Comments

Leave a Reply Cancel reply