Abstract [eng] |
Interest in blockchain technology has been growing since 2008 when this concept was created. It is a relatively new technology that has been around for only 12 years but has received much attention in the media and from academics. The main object of the media’s focus is the bitcoin cryptocurrency, for which blockchain technology was first developed. The popularity of cryptocurrencies also attracts various scammers who engage in improper activities and seek financial gain. So far, the most significant damage has been done in 2019 and is estimated at $ 4.3 billion. As bitcoin is the most popular cryptocurrency, it bears the most considerable amount of damage caused by these thefts and frauds. Ethereum, the second-largest cryptocurrency by capitalization, is also receiving attention from scammers. Fraud detection is the first step in reducing risk and preventing potential theft and fraud. This study aims to develop a machine learning model using big data analytics methods that would be able to process large amounts of data and successfully identify fraud within the bitcoin and ethereum blockchain. All bitcoin and ethereum transactions are publicly available. Using these big data, the features (number of transactions received, an average value of the received transaction, etc.) that were used to develop the models were extracted. The k-means and the isolation forest methods were applied to create fraud detection model. Due to the big amount of data available, ensembles of these methods were developed. The developed machine learning models identified addresses that are associated with cases of fraud and scam. Looking at the overall results, one k-means model, an ensemble of k-means models, and an ensemble of isolation forest models found almost the same number of frauds in the bitcoin blockchain (29–30). In the ethereum blockchain, frauds were best detected by using an ensemble of k-means models, which caught a total of 65 scams. Three different data sets of fraud were used to verify the results. The developed models in the BitcoinTalk dataset identified 15 of the 16 bitcoin addresses associated with frauds. This is very good result, as a maximum 5 cases of fraud were detected in similar studies before. In the Ponzi schemes dataset were identified 64 scams in the ethereum blockchain out of 102. The developed models in the CryptoScamDB dataset identified 14 scams in the bitcoin blockchain of 140 because this dataset included smaller scams. Studies by other authors using Ponzi schemes and CryptoScamDB dataset use different methods (e.g. classification methods are used, results are calculated differently) and therefore the results are not comparable. This study has also shown that machine learning models developed using bitcoin transaction data can be successfully used to detect fraud in the ethereum blockchain. However, models developed using bitcoin transaction data detect fewer cases of scam in the ethereum blockchain than models developed using ethereum transaction data. |