Assessment of greenhouse gas emissions using hotspot analysis and machine learning methods

Mantas Ginkus

Title	Assessment of greenhouse gas emissions using hotspot analysis and machine learning methods
Translation of Title	Šiltnamio efektą sukeliančių dujų emisijų vertinimas taikant karštųjų taškų analizę ir mašininio mokymosi metodus.
Authors	Ginkus, Mantas
Full Text
Pages	59
Keywords [eng]	greenhouse gas emissions ; spatial analysis ; panel data ; Gaussian process boosting ; machine learning
Abstract [eng]	Greenhouse gas (GHG) emissions vary across countries due to differences in economic activity, energy use, technology, and other persistent country-specific factors. Empirical research on these differences typically relies on country-level panel data. While traditional panel-data regressions are widely used to control for unobserved heterogeneity, they impose restrictive functional forms. More flexible machine learning methods have also been applied to emissions data, but often in pooled settings that do not explicitly account for persistent country-level heterogeneity. At the same time, the emissions literature documents spatial clustering across countries, motivating the analysis of spatial dependence both in the raw data and in model residuals. This thesis analyses cross-country differences in production-based per-capita GHG emissions in Europe using a combination of exploratory spatial analysis and machine-learning methods for panel data. The analysis is based on a balanced panel of 25 European countries observed annually from 2000 to 2023. It addresses two main questions: whether GHG emissions exhibit spatial clustering, and how explicitly modelling persistent country-specific heterogeneity affects predictive performance, residual behaviour, and model interpretation in flexible machine-learning models. Spatial patterns in emissions are analysed using local hot spot analysis and spatio-temporal clustering methods, which identify statistically significant clusters of high and low per-capita emissions and characterise their evolution over time. Predictive modelling is conducted using Gaussian Process Boosting (GPBoost), estimated both in pooled form and with explicit country-specific intercepts and time trends. Models are trained and evaluated using time-respecting expanding-window cross-validation. Beyond average predictive accuracy, residuals are analysed across countries, time, and space to assess whether models capture persistent heterogeneity and spatial structure. Model interpretation is examined using SHAP values, with feature attributions compared across model specifications. The results show that European GHG emissions exhibit spatial clustering that is persistent in some regions and evolving in others. Explicitly accounting for country-level heterogeneity in GPBoost leads to only modest improvements in predictive accuracy, but clear improvements in residual behaviour, including reduced systematic country-level bias, lower within-country residual dispersion, and the removal of spatial structure in prediction errors. Differences in SHAP-based feature attributions demonstrate that the estimated contribution of observed variables depends on whether persistent country-specific heterogeneity is modelled, even when overall predictive performance changes little. These results indicate that unobserved heterogeneity should be explicitly considered when applying machine-learning methods to panel data, particularly for interpretation.
Dissertation Institution	Kauno technologijos universitetas.
Type	Master thesis
Language	English
Publication date	2026

„Assessment of greenhouse gas emissions using hotspot analysis and machine learning methods“