Abstract [eng] |
As a complex ecosystem composed of flora and fauna, the forest has always been vulnerable to threats. Previous researchers utilized environmental audio collections, such as the ESC-50 and UrbanSound8k datasets, as proximate representatives of sounds potentially present in forests. This study focuses on the application of deep learning models for forest sound classification as an effort to establish an early threats detection system. The research evaluates the performance of several pre-trained deep learning models, including MobileNet, GoogleNet, and ResNet, on the limited FSC22 dataset, which consists of 2,025 forest sound recordings classified into 27 categories. To improve classification capabilities, the study introduces a hybrid model that combines neural network (CNN) with a Bidirectional Long- Short-Term Memory (BiLSTM) layer, designed to capture both spatial and temporal features of the sound data. The research also employs Pareto-Mordukhovich-optimized Mel Frequency Cepstral Coefficients (MFCC) for feature extraction, improving the representation of audio signals. Data augmentation and dimensionality reduction techniques were also explored to assess their impact on model performance. The results indicate that the proposed hybrid CNN-BiLSTM model significantly improved classification loss and accuracy scores compared to the standalone pre-trained models. GoogleNet, with an added BiLSTM layer and augmented data, achieved an average reduced loss score of 0.7209 and average accuracy of 0.7852, demonstrating its potential to classify forest sounds. Improvements in loss score and classification performance highlight the potential of hybrid models in environmental sound analysis, particularly in scenarios with limited data availability. |