Evaluating video-based synthetic data for training lightweight models in strawberry leaf disease classification
Abstract
Collecting large, well‑labeled image datasets is a persistent challenge and a significant bottleneck in agricultural research. This study explores the use of video‑based synthetic data, generated via Sora, to support plant disease detection. The study focuses on strawberry leaves and creates a synthetic training set of 1,467 images, which are used to fine‑tune six lightweight deep learning models. These models are then evaluated on 618 real leaf images drawn from public datasets. Among the tested architectures, ResNet‑18 shows the highest performance, with accuracy, precision, recall and F1‑score all close to 98.7%. A five‑fold cross‑validation yields an average accuracy of 98.9%, underscoring the stability of the approach. The findings demonstrate that synthetic video‑derived data can effectively be used to train compact models for strawberry leaf disease classification.