Abstract [eng] |
Generating de novo drug-like molecules can lead to major societal and technological advancements. Molecule generation is relevant to solving problems in various fields: from individual treatment to energy production and storage. The importance and relevance of generating new drug-like molecules is straightforward, to find new drugs you need to look for de novo molecules. The use of deep neural networks to generate de novo drug-like molecules can accelerate the discovery of new drugs. To accomplish this task, the structure of neural networks is used - variational autoencoder: using encoder, the representations of the molecules are compressed into latent space vectors and subsequently decoded into valid molecules using a decoder. The aim of this study - to investigate the potential of new drug-like molecule generation using variational autoencoder, to determine the dependence of variational autoencoder from a length of latent vector and molecules. In this study, four models of variational autoencoder were trained with different latent vector lengths at 56, 156, 196 and 254, and four models of variational autoencoder with different molecular lengths of 60, 80, 100, 120. The accuracy of models is checked by three aspects: training and validation samples accuracy, the ability of models to regenerate molecules with thousands of tests and the generation of valid molecules. In all respects, the least accurate model was the latent space vector at level 56, so it can be said that the latent space dimension of 56 cannot represent the molecular data set adequately. Accordingly, the available molecular set is best characterized by a latent space dimension equal to 156. The analysis of the models with different molecular lengths concluded that the structure of data set by which the trained model is used is very important for the model's accuracy in molecules recovery. New drug-like molecules, with the desired minimum inhibition constant, are generated in two ways: using a variational autoencoder and linear interpolation between latent space vectors. By generating new molecules with a variational autoencoder using different latent space dimensions, two identical molecules were obtained, indicating that latent space, irrespective of dimension, locates similar molecules nearby. After the linear seven-step interpolation, it was found that the first iterations of interpolation molecules were visually more similar to the parent molecule, and in the last iterations they were more similar to the target molecule. |