Abstract [eng] |
Most algorithms work properly if the probability densities of the multivariate vectors are known. Unfortunately, in reality these densities are usually not available, and parametric or non-parametric estimation of the densities becomes critically needed. In parametric estimation one assumes that the density f underlying the data yi where i varies from 1 to n, belongs to some rather restricted family of functions f(•;θ) indexed by a small number of parameters θ=(θ1, θ2, …, θk). An example is the family of multivariate normal densities which is parameterized by the mean vector and the covariance matrix. A density estimate in the parametric approach is obtained by computing from the data an estimate θ0 of θ and setting f0=f(•;θ). Such an approach is statistically and computationally very efficient but can lead poor results if none of the family members f(•;θ) is close to f. In nonparametric density estimation no parametric assumptions about f are made and one assumes instead that f, for example, has some smoothness properties (e.g. two continuous derivatives) or that it is square integrable. The shape of the density estimate is determined by the data and, in principle, given enough data, arbitrary densities f can be estimated accurately. Most popular methods are the kernel estimator based on local smoothing of the data. Quite popular are histospline, semiparametric and projection pursuit algorithms. While constructing various probability density estimation methods the most difficult task is to find optimal parameters, e.g. for kernel algorithm it is a problem to find smoothing parameter, for histospline method difficulty is to produce the points with estimated densities, etc. Even though many papers are written related to this subject and offer various methods for parameters determination but those procedures are not very suitable for small size data. In this case it would be effective to use data projections because the task to choose best fitting parameters becomes more complicated when the dimension of data grows. One of the ways to improve the accuracy of probability density estimation is multi-mode density treating as the mixture of single-mode one. In this paper we offer to use data clustering in the first place and to estimate density in every cluster separately. To objectively compare the performance, Monte Carlo approximation is used for ten types Gaussian mixtures. While using various methods to evaluate the accuracy of probability density estimations we tried to use clustered and not clustered data. In this paper we also tried to reveal the usefulness of cluster preprocessing. |