A detection model built on transformer architecture may accurately identify early-stage dental caries in panoramic x-rays, improving precision and efficiency. The study was recently published in Scientific Reports.
Additionally, the model appeared to outperform dentists and traditional convolutional neural network (CNN)-based methods in early-caries detection, the authors wrote.
“This research demonstrates the efficacy of domain-adapted Transformer architectures for early dental caries detection and establishes its potential utility as a decision support tool for enhancing diagnostic accuracy and screening efficiency in dental practice,” wrote the authors, led by Liwei Wang of the General Hospital of PLA Northern Theater Command in China (Sci Rep, January 23, 2026, Vol. 16:1, 3507).
Researchers developed a hierarchical transformer model designed to improve early dental caries detection by combining multiscale feature extraction with spatial attention. Image preprocessing involved histogram equalization, bilateral filtering, and normalization to enhance radiograph quality. The transformer encoder was customized for panoramic dental images, incorporating 2D-aware positional encoding and spatially focused attention, they wrote.
Features from multiple network depths were fused using dynamically weighted aggregation, along with channel-wise and spatial attention modules to further refine lesion representation across severity levels. The model was trained and validated on a dataset of 3,856 panoramic radiographs from adult patients, including 12,847 annotated carious lesions collected from three dental hospitals between 2021 and 2023.
The model achieved a mean average precision of 87.3% across all caries stages, with sensitivities of 81.3% for D1 lesions and 84.7% for D2 lesions, outperforming both traditional CNN-based models and average dentist performance. Its advantage was most evident in early-stage detection, where it reached 79.8% average precision for D1 lesions and 82.6% for D2 lesions, exceeding all baseline methods, they wrote.
Comparison with expert annotations showed strong agreement in diagnostic outcomes. Across all severity grades, the model demonstrated an overall sensitivity of 88.4% and a specificity of 91.7% on the test set. Additionally, the system enabled real-time analysis, processing each panoramic radiograph in approximately 70 ms.
Nevertheless, the study had limitations. The researchers relied on only panoramic radiographs, which may restrict how well the findings apply to other dental imaging methods, the authors added.
“The system enhances diagnostic consistency and enables standardized reporting format for improved documentation quality,” they concluded.




















