How ChatGPT scored on knowledge of TMDs

Apr 16, 2026

Large language models (LLMs) like ChatGPT appear to generate accurate information about temporomandibular disorders (TMDs), according to a study recently published in the European Journal of Dental Education.

However, ChatGPT may be best applied for patient education under the guidance of qualified specialists, the authors wrote.

“ChatGPT 3.5 produced accurate and coherent responses about TMDs but should be used as a supplementary educational tool under professional supervision,” wrote lead author Dr. Hilal Yilanci of the Istanbul Medipol University in Turkey (Eur J Dent Educ, March 29, 2026) and colleagues.

LLMs, such as ChatGPT, are increasingly being used in dental education, though their reliability in clinical settings remains uncertain. The study’s goal was to evaluate the credibility and effectiveness of ChatGPT’s responses relating to TMDs among dentists and dental students, they wrote.

The researchers posed nine TMD-related questions to ChatGPT 3.5 and used its answers to develop an online survey. A total of 115 participants, including 60 dental students and 55 dentists, rated the responses using a five-point scale. Additionally, a panel of 14 experienced TMD and orofacial pain specialists independently evaluated the responses for accuracy, completeness, and adherence to diagnostic guidelines.

The mean scores for each response ranged from 4.28 to 4.48, with an overall mean total score of 39.59 (range: 34.55 to 44.63) and a median of 40 (range: 25 to 45), indicating generally positive evaluations. Most ratings were clustered at the higher end, suggesting overall satisfaction among participants. However, some variation reflected differing perceptions between dentists and dental students, they wrote.

While differences between the two groups were not statistically significant, dental students had a slightly higher total mean score (range: 34.84 to 45.66) compared to dentists (range: 34.34 to 43.40). Expert panel analysis showed strong agreement across all dimensions, with accuracy median scores ranging from 4 to 5. Completeness scores ranged from 3.79 to 4.5 and guideline adherence from 3.71 to 4.43, both with strong agreement, indicating that ChatGPT’s responses were generally accurate, comprehensive, and aligned with clinical guidelines.

Although the study included senior dental students and general dentists, the sample may not represent the wider dental community, highlighting the need for future research with more diverse and randomized populations, the authors added.

“It can be concluded that ChatGPT concerning TMDs could be applied under the supervision of specialists in the field, which can help patients have a better understanding of TMDs,” Yilanci and colleagues wrote.