Abstract [eng] |
The master 's thesis deals with the problem of how to classify concrete images into healthy and cracked ones by using different artificial neural network architectures, and then to perform crack segmentation in different ways. The methodological part describes the principle of operation of the convolutional neural network, how the network training is performed, explaining the importance of the stochastic gradient descent and the ADAM optimizer. The different architecture of the convolutional neural networks used in the study are then reviewed. Special attention is paid to the U-Net type architecture, which is used for semantic crack segmentation. During the classification task, the models EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3 were examined, with and without the use of image augmentation during the training. Later, AlexNet and VGG-16 models without image augmentation were tested to find the best classifier. All models are trained using the "Surface Crack Detection" image set, which consists of 40000 images of concrete with different chemical properties and shades, half of which have a crack and half of which is intact concrete. After training, each model was tested with 1000 images of highresolution cracked concrete. Assessing the accuracy of each model in the test sample yielded very similar results, so it was decided to select the best model by calculating what proportion of the highresolution image model classifies correctly. This task was best performed by the EfficientNetB3 model trained without the use of image augmentation. It correctly classified 952 images of 1000. In the next step, a rough image segmentation was performed. During it, a 1024 × 1024 image is divided into 64 128 × 128 size tiles, and in each of them, using an already trained classification model, the probability of a crack is assessed and the result is displayed in the form of a heat map. The results of this task were compared using the best classification model EfficientNetB3, without image augmentation, and the ensemble of models consisting of EfficientNetB3 and EfficientNetB2, without image augmentation, and EfficientNetB1 with image augmentation. The model ensemble did not provide significantly better results, so the EfficientNetB3 model trained without image augmentation is proposed for a rough classification. Subsequently, different types of the U-Net architecture were tested for semantic crack segmentation, with different backbone models: EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, EfficientNetB4, and EfficientNetB5. The U-Net architecture with the EfficientNetB0 backbone model performed the best. |