Title Generatyvinės duomenų augmentacijos tyrimas dviejų plokštumų monokuliniam nelambertinių paviršių gylio žemėlapių sudarymui
Translation of Title Generative data augmentation "research" for dual-plane monocular depth estimation of non-lambertian surfaces.
Authors Bankauskas, Vilius
Full Text Download
Pages 72
Keywords [eng] monocular depth estimation ; non-lambertian surfaces ; data augmentation
Abstract [eng] This project examines one of the most relevant and rapidly evolving fields of modern computer vision – Monocular Depth Estimation (MDE), which is critically important for the development of autonomous systems, robotics, and augmented reality technologies. The primary scientific and engineering problem addressed in this work stems from fundamental physical limitations: currently popular depth estimation algorithms and physical sensors, such as LiDAR or Time-of-Flight cameras, often fail to correctly identify the geometry of transparent, semi-transparent, or reflective surfaces. Since these non-Lambertian objects do not reflect light uniformly, they distort sensor measurements, causing standard neural networks to misinterpret their distance from the camera or treat them as empty space. The research object of this project encompasses monocular depth estimation model architectures, their training processes, and the accuracy of the resulting depth maps when analyzing specifically non-Lambertian surfaces. The aim of the work is to improve the ability of depth estimation models to reliably evaluate transparent objects by employing an innovative method – highquality synthetic training data created by generative artificial intelligence. To achieve this goal, several key objectives were implemented: a comprehensive scientific literature analysis of modern MDE architectures was performed, a unique data synthesis methodology was developed using the multimodal model Google Gemini (which allowed for the simulation of transparent object images along with their geometric properties in the background), and the adaptation and fine-tuning of the selected base DepthAnythingV2 model were executed. During the research, methods of systematic literature review, image processing, neural network training, and comparative analysis were applied. To evaluate the model improvements, the DIODE dataset and quantitative metrics such as RMSE, AbsRel, and the accuracy coefficient delta were used. The results obtained during the implementation of the project revealed that integrated synthetic data allow MDE algorithms to significantly better distinguish transparent objects from the background geometry, which directly correlates with reduced prediction errors and increased visual consistency of depth maps in complex scenes. Finally, the conclusions state that generative artificial intelligence is an effective tool for filling missing data gaps, which opens new possibilities for more accurate environmental perception. The structure of the work consists of an introduction, a list of abbreviations and terms, a literature review, a methodological part, an implementation and experimentation section, conclusions, and a list of references.
Dissertation Institution Kauno technologijos universitetas.
Type Master thesis
Language Lithuanian
Publication date 2026