
Diabetes mellitus is one of the largest global public health challenges, affecting millions worldwide. Among its many complications, diabetic retinopathy (DR) is a microvascular end-organ complication that poses a significant threat to vision. With approximately 34.6% of individuals with diabetes experiencing some degree of DR, early detection and effective staging are critical for timely intervention and treatment. However, despite advancements in artificial intelligence (AI), developing models that can generalize across diverse datasets for DR staging remains a formidable challenge.
The Problem: Generalization in DR Staging Models
One of the primary obstacles in building effective DR staging models is their inability to generalize well when deployed on datasets that differ from their training data. This issue arises due to distribution shifts between the source domain (training dataset) and the target domain (deployment dataset). Such shifts often lead to a degradation in model performance, limiting the utility of these AI-driven solutions in real-world clinical settings.
Our Contribution: Enhancing Generalization with Vision Transformers and Multi-Source Domain Fine-Tuning
Our study, “Deep Learning Generalization for Diabetic Retinopathy Staging from Fundus Images,” addresses these challenges through two significant contributions:
Benchmarking Pretrained Models for DR Staging: We compare alternative pretrained models of self-supervised vision transformers (ViT) for the task of DR staging. To the best of our knowledge, this is the first systematic evaluation of different ViT-SSL (self-supervised learning) models for DR staging.
Introducing a Multi-Source Domain Fine-Tuning Strategy: We develop a novel fine-tuning approach that leverages data from multiple source domains to enhance the generalization performance of DR staging models. This strategy ensures better adaptation to unseen and out-of-distribution (OOD) datasets.
Referring to item 1. Performance of different pre-trained ViT using SSL.
Experimental Insights from Over 90,000 Retinal Images
Our experiments span seven datasets, comprising over 90,000 retinal images, and provide several groundbreaking insights:
Domain-Specific vs. Natural Image Foundation Models: Recent studies, such as Zhou et al.’s work on RETFound (a foundation model for retinal images published in Nature), suggest that domain-specific pretrained models outperform general models trained on natural images. However, our research challenges this assumption. By benchmarking RETFound against DINOv2, a vision transformer pretrained on millions of natural images, we demonstrate that for our task the general models outperform domain-specific ones for DR staging.
Value of Multi-Source Domain Fine-Tuning: To address the generalization gap, we propose a multi-source domain fine-tuning strategy. This method significantly improves performance on OOD datasets, showcasing its potential for creating robust and clinically reliable DR staging models.
Why This Research Matters
Our findings have broad implications for the field of digital health and AI in clinical applications. By providing quantitative evidence for the effectiveness of vision transformers in DR staging and introducing strategies to enhance generalization, our work offers practical solutions to a critical problem in AI-driven healthcare. Moreover, our research is fully reproducible. All experiments are conducted on open-access datasets, and our algorithms is freely available via our platform Lirot.ai.

Conclusion
The staging of diabetic retinopathy using AI holds immense potential to revolutionize diabetic eye care. Our work not only benchmarks cutting-edge vision transformers for this task but also introduces a robust fine-tuning strategy to overcome generalization challenges. By making our methods and findings openly accessible, we aim to drive progress in the field and ensure that AI solutions can effectively benefit patients worldwide.
Stay tuned for more research outputs in the exciting field of AI and Ophthalmology!
This work has been published in:
Men, Yevgeniy A., Jonathan Fhima, Leo Anthony Celi, Lucas Zago Ribeiro, Luis Filipe Nakayama, and Joachim A. Behar. "Deep learning generalization for diabetic retinopathy staging from fundus images." Physiological Measurement (2025).
Comments