A Harmonized Multi-Source Dataset with Baseline Deep Learning Validation for Staging Diabetic Retinopathy

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate automated grading of diabetic retinopathy (DR) significantly depends on the quality of retinal fundus images. Inferior-quality pictures, resulting from inadequate lighting, motion blur, distortions, or incomplete retinal coverage, may obscure minor lesions and diminish the accuracy of model predictions. This study constructs a harmonized multi-source dataset using a multi-dimensional image quality assessment framework for multi-class DR staging. Retinal images are collected from IDRiD, Messidor-2, SUSTech-SYSU, APTOS 2019, DeepDRiD-v1.1, and Zenodo DR V03 datasets. The proposed pipeline includes preprocessing, image quality assessment using technical quality and medical relevance indicators, dataset-specific statistics, and adaptively thresholded using DR severity-aware percentiles derived from stratified samples with weighting to match diagnostic needs. Baseline deep learning models were trained for three hierarchical DR classification schemes to validate the dataset. Experimental results show that the quality-filtered merging of datasets improves model generalization accuracy by 3-7% compared to the normal merging of datasets. This work provides a benchmark dataset and baseline performance results to facilitate future research in DR staging and medical image classification.

Original languageEnglish
Pages (from-to)130-147
Number of pages18
JournalInternational Journal of Mathematical, Engineering and Management Sciences
Volume11
Issue number1
DOIs
Publication statusPublished - Feb 2026

Keywords

  • Baseline validation
  • Deep learning
  • Diabetic retinopathy
  • Hierarchical classification
  • Image quality assessment
  • Label harmonization
  • Retinal fundus images

ASJC Scopus subject areas

  • General Computer Science
  • General Mathematics
  • General Business,Management and Accounting
  • General Engineering

Fingerprint

Dive into the research topics of 'A Harmonized Multi-Source Dataset with Baseline Deep Learning Validation for Staging Diabetic Retinopathy'. Together they form a unique fingerprint.

Cite this