Title Gilusis skatinamasis mokymasis kompiuterinių žaidimų agentų valdymui
Translation of Title Deep reinforcement learning for computer games agents’ control.
Authors Šlikas, Žygimantas
Full Text Download
Pages 62
Keywords [eng] reinforcement learning ; neural networks ; vae ; computer games
Abstract [eng] To apply artificial intelligence to a wider range of problems, it must be able to: explore its environment independently, learn the most optimal actions in specific situations, and be capable of planning several steps ahead. To solve such tasks, reinforcement learning algorithms are used and recently, neural networks–based ones. High demand for training data is one of the main issues. In this study, we conducted experiments with modifications of the popular PPO algorithm and aimed either to reduce the amount of data required or to achieve better results with the same amount of data. The experiments were carried out in three environments with different characteristics — computer games: “Ms. Pacman”, “VizDoom Health Gathering”, and “VizDoom Defend the Center”. We proposed two separate modifications: a distributed policy function and VAE-based image compression. The distributed policy function is based on ideas from ensemble machine learning methods such as random forests. During training, the branches of the distributed policy function acquire different random weights, allowing the agent to learn a more diverse overall strategy. In one of the games, the best variation achieved an average of 1,518 points per episode, while the unmodified algorithm scored 1,377, indicating ~10% improvement. All algorithm variations had training instabilities and converged to either better or worse results, but the modified algorithm showed a much higher variance in results. In the best trial, our modified algorithm scored 2,046 points, while the unmodified one scored 1,666, which corresponds to ~23% better performance. In the other modification, we explored how a VAE could be used as a potentially faster neural network training method when applied to image compression. This was intended to help us use the data more efficiently by enabling repeated training on it. However, after various integration experiments, it turned out that training the VAE and PPO simultaneously was quite unstable and did not yield better results than the original algorithm.
Dissertation Institution Kauno technologijos universitetas.
Type Master thesis
Language Lithuanian
Publication date 2025