In February 2018, with increased complaints of virtual currency exchange scams, the US Commodity Futures Trading Commission issued an advisory note for the public to create awareness. This was crucial because the incentives were so alluring that it is almost impossible to keep people away from this gold rush of the 21st century.
Risks Involved With Virtual Currency
- Most cash markets are not regulated or supervised by a government agency;
- Platforms in the cash market may lack critical system safeguards, including customer protections;
- Volatile cash market price swings or flash crashes;
While Governments across the globe have been busy framing policies, Jiahua Xu and Benjamin Livshits, researchers at Imperial College of London, have released a white paper discussing the anatomy of cryptocurrency pump-and-dump schemes and how machine learning can be used to forestall such events in the future.
The researchers have traced the message history of over 300 Telegram channels between July and November this year to identify the pumping events. They have analysed the features of the coin movement in the market throughout the pump-and-dump process. And, have developed a machine learning model which runs on random forest algorithm to predict the likelihood of a possible pump event. The model confirms that market movements contain hidden information that can be utilised for monetary purposes.
How Pump-And-Dump Happens In 4 Steps
- The organiser opens a channel accessible to a potential pumping group. They invite members by advertising and posting invitations on popular forums like Reddit. Once the group exceeds 1,000 members, they are ready to pump.
- The organiser broadcasts the time and date of a future pump event. As the time nears, the admin tips the members on how to buy fast and how long to hold the coin to lure more users.
- The admin announces the coin on predetermined time and date. They use an OCR proof pattern to evade and hamper the machine from detecting. During the first minute of the pump, the coin price typically surges, increasing manifold.
- As the coin price approaches peak value, it starts dropping and the participants dump it or sell it to walk out with full pockets. This trend continues until the price falls even below the original price.
In this case study, the researchers targeted BVB, an obscure coin listed on CoinMarketCap. This coin was launched in 2016 by the supporters of the German football club, Borussia Dortmund.
The total buy volume measured in BVB is 1,619.81 thousand BVB, the sell amount 1,223.36 thousand BVB. This volume discrepancy between the sell and the buy sides indicates a higher trading aggressiveness on the buy side.
Predictive Modeling: An ML Approach
The table above illustrates the key features that went into training the model. For the ease of standardisation of data and due to its high pump-and-dump frequency, the researchers focused on predicting coins pumped in Cryptopia.
On average, there are 358 coin candidates at each pump, out of which one is the actual pumped coin. The number of coins considered varies for each event due to constant listing/delisting activities on the part of exchanges. The full sample contains 47,487 pump-coin observations, among which 133 are pumped cases,15 accounting for 0.3% of the entire sample population. The sample is apparently heavily skewed towards the unpumped class and needs to be handled with care at modelling.
To avoid overfitting, the sample data is split into three datasets chronologically between July and October; generating 27,759 data points in the first set, of which 78 are pumped cases. The validation set consists of 10,106 data points, among which 28 are pumped cases. And, the test set has 9,755 points with 27 pumped cases.
Random Forest with stratified sampling has been chosen for classification and a generalised linear model (GLM) for logit regression.
Due to the heavily imbalanced nature of the sample, when using RF, the model always includes TRUE cases when bootstrapping the sample to build a decision tree.
Model RF1 stays loyal to the sample’s original TRUE/FALSE ratio, with 0.3% of TRUE’s contained in each tree-sample. RF2 and RF3 raise the TRUE/FALSE ratio to 1.2% and 6%, respectively.
Whereas, LASSO (least absolute shrinkage and selection operator) regularisation is applied to the GLM models to avoid the problems arising due to skewed distribution.
Both the random forest model and GML are able to predict whether a given coin will be pumped as a likelihood ranging between 0 and 1. In terms of F1 measure, RF models, in general, appear superior to GLM models both with the training sample and the validation sample.
At each pumping event, the researchers checked the coin’s normalised vote that has breached the predetermined threshold limit. They then purchased the coin one hour before the announcement based on the model’s prediction. With all the coins purchased, the investment, measured in BTC, on each coin is proportionate to its vote supplied by the random forest model.
The results show that the model suggests purchasing 6 coins of which 5 are actually pumped.
- Is the first of its kind of pump-and-dump schemes in the world
- Shows that pump-and-dump activities are a lot more prevalent that previously believed Specifically, around 100 organised Telegram pump-and-dump channels coordinate on average 2 pumps day which generates an aggregate artificial trading volume of $7 million a month.
- Helped in developing a predictor that, given a pre-pump announcement can predict the likelihood of each coin being pumped with an AUC (Area Under Curve) of over 0.9 both in-sample and out-of-sample.
- Formulates a simple trading strategy that, based on historical data, gives a return of 80% over a period of three weeks, even under strict assumptions.
When Satoshi Nakamoto proposed his peer to peer electronic cash system, he envisioned a foolproof trustworthy transaction service which enables the two parties to trade without the involvement of any financial institution. Like any other profitable venture, this too has attracted fools and frauds alike where one flourishes at the cost of other. Though the virtual currency itself is safe with encryptions, new methods are being devised to sensitise the system like discussed above. And, machine learning might be one such solution which shall be used to cement the cracks.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad