Interpretable learning for detection of cognitive distortions from natural language texts
Abstract
We developed a technology that, based on a dataset annotated for cognitive distortions, builds an interpretable model capable of detecting cognitive distortions in natural language texts. The novelty of the approach lies in the fact that the learning and detection methods are based on structural patterns such as N-grams, incorporating heterarchical relationships between them through the “priority on order” principle. We investigated and released two types of detection models: plain binary classification and a model based on a multi-class representation. We optimized the hyper-parameters of the models and achieved an accuracy of 0.92 and an F1 score of 0.95 in a cross-validation experiment. Additionally, we achieved over 1000 times higher processing speed and lower computational cost compared to LLM-based alternatives.