Abstract [eng] |
Sound source localization methods are used in most acoustic input devices to estimate the direction and distance, to amplify some sounds, and to attenuate others. Increasingly accurate location of the audio source is needed in technology automation: remote conferencing, smart home management, security and surveillance systems, and more. Determining the direction of sound entry using a tightly coupled array of microphones is the most studied area. However, the development of distributed systems makes it possible to realize wireless networks of acoustic sensors, which have the advantage of greater distances between microphones. In this work, it is proposed to use a network of wireless acoustic sensors to determine the coordinates of an audio source in 3D space. When the microphones of such a system can move, we must use not only the audio recordings but also the coordinates of the acoustic sensors. This is not difficult to do with slow-running localization algorithms, but the structures of the machine learning models developed so far are not appropriate. Therefore, the aim is to evaluate the location of the sound source using a dynamic system of acoustic sensors. Analyzing the scientific literature, localization methods and applications of various developed neural network models to sound processing tasks are distinguished. After selecting the appropriate structure of the wireless acoustic sensor system, an audio input system for data collection under real conditions is implemented. The location of the audio source is evaluated using geometric localization and grid search algorithms. Attempts are made to determine which machine learning techniques are most appropriate. In this work, the suitability of convolutional, recurrent, residual and attention-based layers for the determination of sound source coordinates is investigated. The successful application of modified commonly used neural networks for localization will induce the suitability of the respective neural layers for sound analysis and object coordination. The developed wireless acoustic sensor network system can record and send data in real time to a central controller where localization is performed. However, the static noise in the audio recordings and the time synchronization of the system nodes may not be accurate enough to achieve accurate localization of the audio source. The study found that sound spectrograms and convolutional recurrent neural networks are best suited for localization because they emit noise from sound and reduce the amount of unnecessary information. As a result, the model works faster and more accurately than arithmetic algorithms. |