| Title |
Efficient transformer-based road scene segmentation approach with attention-guided decoding for memory-constrained systems |
| Authors |
Lisauskas, Bartas ; Maskeliunas, Rytis |
| DOI |
10.3390/machines13060466 |
| Full Text |
|
| Is Part of |
Machines.. Basel : MDPI. 2025, vol. 13, iss. 6, art. no. 466, p. 1-21.. ISSN 2075-1702 |
| Keywords [eng] |
computer vision ; deep learning ; image processing ; neural networks ; semantic segmentation |
| Abstract [eng] |
Accurate object detection and an understanding of the surroundings are key requirements when applying computer vision systems in the automotive or robotics industries, namely with autonomous vehicles or self-driving robots. A precise understanding of road users or obstacles is essential to avoid potential accidents. Due to the presence of many objects and the diversity of the environment, the segmentation of the road scene remains a challenging task. In our approach, a Transformer-based backbone is employed for robust feature extraction in the encoder module. In addition, we have developed a custom decoder module in which we implement attention-based fusion mechanisms to effectively combine features. The decoder modification is specifically designed to maintain fine spatial details and enhance the global context understanding, setting our method apart from conventional approaches that typically use simple projection layers or standard query-based decoders. The implemented model consists of 17.2 million parameters and achieves competitive performance, with a mean intersection over union (mIoU) of 76.41% on the Cityscapes validation set. The results gathered indicate the ability of the model to capture both the global context and fine spatial details that are critical to the accurate segmentation of urban scenes. Furthermore, the lightweight design makes the approach suitable for deployment on memory-limited devices. |
| Published |
Basel : MDPI |
| Type |
Journal article |
| Language |
English |
| Publication date |
2025 |
| CC license |
|