Title Kernel density estimator by minimizing bias /
Authors Ruzgas, Tomas ; Pupalaigė, Kristina
DOI 10.15388/DAMSS.14.2023
ISBN 9786090709856
Full Text Download
Is Part of DAMSS 2023:14th conference on data analysis methods for software systems, November 30 – December 2, 2023, Druskininkai, Lithuania.. Vilnius : Vilnius University press, 2023. p. 82-83.. ISBN 9786090709856
Abstract [eng] A histogram is one of the oldest and most popular density estimators. Histogram and its representation were first introduced in 1891 by Karl Pearson. For the approximation of density, the number of observations falling within the range is calculated and divided by sample size and the volume of range. The histogram is based on a step function. Derivatives, which can be equal to zero or not defined, strongly affects the further histogram analysis. For example, it can cause problems when trying to maximize a likelihood function which is defined in terms of the densities of the distributions. It is important to mention that the histogram was kept as the only nonparametric density estimator until 1950‘s, while substantial and simultaneous progress was made for density and spectral density evaluations. Later in 1951, Fix and Hodges, in a not very well known publication, presented the basic algorithm of nonparametric density evaluation. This previously not published technical report was formally presented to the public only 1989, as review made by Silverman and Jones. Researchers have focused on the problem of statistical discrimination and did their investigations when the parametric form of the sampling density was not originally known. Later, several common algorithms and alternatives in theoretical modeling were introduced by Rosenblatt in 1956, Parzen in 1962, and Cencov in 1962. Then followed the second wave of important and primarily theoretical papers by Watson and Leadbetter in 1963, Loftsgaarden and Quesenberry in 1965, Schwartz in 1967, Epanechnikov in 1969, Tarter and Kronmal in 1970, and Kimeldorf and Wahba in 1971. The natural multivariate generalization was introduced by Cacoullos in 1966. Finally, in the 1970’s the first papers focusing on the practical applications of these methods were published by Scott et al. in 1978 and Silverman in 1978. These and later multivariate applications awaited the computing revolution. Since the kernel estimate is calculated at sample points, a bias occurs, the effect of which is strongly felt in small sample sizes. Various techniques are used to reduce its influence. Here, our approach was slightly different. We construct a kernel function whose form is such that the influence of observations on the estimation is reduced, and the main attention is placed on their environment. The form of the proposed kernel function is complex, which in turn raises other challenges.
Published Vilnius : Vilnius University press, 2023
Type Conference paper
Language English
Publication date 2023
CC license CC license description