Skatinamojo mokymosi pritaikymas pasitikėjimo vertai rekomendacinei sistemai ir dinaminei kainodarai

Žydrūnas Bautronis

Title	Skatinamojo mokymosi pritaikymas pasitikėjimo vertai rekomendacinei sistemai ir dinaminei kainodarai
Translation of Title	Reinforcement learning for trustworthy recommendation system and dynamic pricing.
Authors	Bautronis, Žydrūnas
Full Text
Pages	70
Keywords [eng]	reinforcement learning ; trustworthy AI ; recommender systems ; dynamic pricing
Abstract [eng]	Over the past decade, the growth of Artificial Intelligence (AI) in e-commerce has increasingly relied on Reinforcement Learning (RL), especially, to optimize dynamic pricing and personalized recommendations. Yet real-world deployments face critical challenges linked to model transparency, fairness, and decision stability, especially when algorithmic choices directly influence users’ experiences and business outcomes. This study presents a structured framework that explicitly embeds trustworthy AI principles into RL-based pricing systems, evaluating not only performance but also interpretability and socio-economic fairness. Using real transactional data, we built a bespoke RL environment that simulates dynamic pricing, consumer demand, inventory levels, and product categories. Several RL algorithms (DQN, PPO, A2C) were compared in terms of decision-making behavior, learning stability, and compliance with trustworthiness criteria. To enhance explainability, we conducted trajectory audits and SHapley Additive exPlanations (SHAP) analyses, tracing why RL agents chose specific prices and how those choices aligned with market logic and ethical constraints. Results show that while some models focused recommendations on short-term reward maximization, others—most notably the DQN agent—adopted more adaptive, risk-aware strategies that balanced profitability with inventory management and demand elasticity. The proposed DQN approach achieved a 12.58 % profit increase over the baseline pricing strategy (raising total profit from €4.18 million to €4.71 million) while recording zero unethical price hikes under low demand. SHAP analysis revealed that stock levels, demand shifts, and product elasticity had the largest positive impact on actions, whereas inventory hoarding and price increases in unfavorable conditions contributed negatively. By combining conventional RL performance metrics with explainability and fairness assessments, this research demonstrates that integrating trustworthy-AI principles into agent design yields responsible, business-aligned outcomes. The proposed methodology offers a practical blueprint for safely deploying reliable RL agents not only in dynamic pricing but also across other e-commerce domains where ethical and regulatory compliance is paramount.
Dissertation Institution	Kauno technologijos universitetas.
Type	Master thesis
Language	Lithuanian
Publication date	2025

„Skatinamojo mokymosi pritaikymas pasitikėjimo vertai rekomendacinei sistemai ir dinaminei kainodarai“