EffiViT3+: Deep learning-based segmentation of thyroid nodules using EfficientNet-B7 and Vision Transformer-based hybrid encoder
Abstract
Thyroid nodules are round or oval lesions that appear independently of normal tissue in the thyroid gland and are quite common in the general population. Although most of these nodules are benign, some are malignant and can develop into cancer. Ultrasonography, biopsy, and thyroid function tests are used to diagnose benign or malignant nodules. However, biopsy, which is performed when cancer is suspected, is an invasive method for patients, is costly, and most nodules turn out to be benign. Therefore, ultrasonography is used as a fast, painless, and economical method for detecting thyroid nodules. To perform a clinical evaluation of nodules, it is necessary to segment them from thyroid ultrasound images, and this segmentation process can be performed quickly and objectively using deep learning techniques. This study presents an advanced hybrid deep learning approach consisting of EfficientNet-B7 and Vision Transformer for the automatic segmentation of thyroid nodules from ultrasound images. This approach combines the EfficientNet-B7 and Vision Transformer architectures in a hybrid encoder architecture to enable the simultaneous extraction of local details and global contextual information. In the decoder part, the UNet3Plus architecture is used to effectively integrate multi-scale features that are sensitive to different thyroid nodule sizes and shapes. To further improve feature extraction, the Atrous Spatial Pyramid Pooling module was integrated with the hybrid encoder outputs, and features were combined using special fusion mechanisms. To optimize segmentation performance, a new combination of Lovász Loss, Focal Tversky Loss, and HD95 loss functions was tested. Finally, experiments were conducted on the TN3K dataset, which consists of 3,493 thyroid ultrasound images and their corresponding masks. The test results showed a Dice score of 0.8737 and an IoU value of 0.7774, demonstrating performance improvements compared to existing methods.