In industrial acoustic environments, abnormal sound detection is a vital technique for monitoring machine health and ensuring operational safety. However, real-world industrial settings are often filled with unpredictable background interference such as human speech, footsteps, and environmental noise, which significantly challenge detection accuracy. Existing methods frequently rely on large-scale models or assume noise-free laboratory conditions, limiting their practical deployment.
In this paper, we propose a lightweight and interference-resilient sound anomaly detection model based on semantic embedding trees (IRASD-GST). By constructing a hierarchical semantic tree using text embeddings and large language models, we obtain a semantically meaningful vector space for audio tags. Combined with CNN–BiLSTM-based feature extraction and event masking, the model accurately filters irrelevant sounds and identifies true equipment anomalies.
Experimental results demonstrate high performance under noisy conditions, achieving 96% precision and 97% recall, with a model size under 7 million parameters and an inference time of 0.034 seconds. These properties make the proposed model highly suitable for edge deployment in real-time industrial monitoring systems.
If you have any questions about submitting your review, please email us at [email protected].