The Potential of a Large New Data Set to Enhance AI's Ability to Detect Crypto Money Laundering

One task where AI tools have proven to be particularly superhuman is analyzing vast troves of data to find patterns that humans can’t see, or automating and accelerating the discovery of those we can. That makes Bitcoin’s blockchain, a public record of nearly a billion transactions between pseudonymous addresses, the perfect sort of puzzle for AI to solve. Now, a new study—along with a vast, newly released trove of crypto crime training data—may be about to trigger a leap forward in automated tools’ ability to suss out illicit money flows across the Bitcoin economy.

On Wednesday, researchers from cryptocurrency tracing firm Elliptic, MIT, and IBM published a paper that lays out a new approach to finding money laundering on Bitcoin’s blockchain. Rather than try to identify cryptocurrency wallets or clusters of addresses associated with criminal entities such as dark-web black markets, thieves, or scammers, the researchers collected patterns of bitcoin transactions that led from one of those known bad actors to a cryptocurrency exchange where dirty crypto might be cashed out. They then used those example patterns to train an AI model capable of spotting similar money movements—what they describe as a kind of detector capable of spotting the “shape” of suspected money laundering behavior on the blockchain.

Now, they’re not only releasing an experimental version of that AI model for detecting bitcoin money laundering but also publishing the training data set behind it: a 200-million transaction trove of Elliptic’s tagged and classified blockchain data, which the researchers describe as the biggest of its kind ever to be made public by a thousandfold. “We’re providing about a thousand times more data, and instead of labeling illicit wallets, we’re labeling examples of money laundering which might be made up of chains of transactions,” says Tom Robinson, Elliptic’s chief scientist and cofounder. “It’s a paradigm shift in the way that blockchain analytics is used.”

Blockchain analysts have used machine learning tools for years to automate and sharpen their tools for tracing crypto funds and identifying criminal actors. In 2019, in fact, Elliptic already partnered with MIT and IBM to create a AI model for detecting suspicious money movements and released a much smaller data set of around 200,000 transactions that they had used to train it.

For this new research, by contrast, the same team of researchers took a much more ambitious approach. Rather than try to classify single transactions as legitimate or illicit, Elliptic analyzed collections of up to six transactions between Bitcoin address clusters it had already identified as illicit actors and the exchanges where those previously identified shady entities sold their crypto, positing that the patterns of transactions between criminals and their cashout points could serve as examples of money laundering behavior.

Working from that hypothesis, Elliptic assembled 122,000 of these so-called subgraphs, or patterns of known money laundering within a total data set of 200 million transactions. The research team then used that training data to create an AI model designed to recognize money laundering patterns across Bitcoin’s entire blockchain.

As a test of their resulting AI tool, the researchers checked its outputs with one cryptocurrency exchange—which the paper doesn’t name—identifying 52 suspicious chains of transactions that had all ultimately flowed into that exchange. The exchange, it turned out, had already flagged 14 of the accounts that had received those funds for suspected illicit activity, including eight it had marked as associated with money laundering or fraud, based in part on know-your-customer information it had requested from the account owners. Despite having no access to that know-your-customer data or any information about the origin of the funds, the researchers’ AI model had matched the conclusions of the exchange’s own investigators.

Correctly identifying 14 out of 52 of those customer accounts as suspicious may not sound like a high success rate, but the researchers point out that only 0.1 percent of the exchange’s accounts are flagged as potential money laundering overall. Their automated tool, they argue, had essentially reduced the hunt for suspicious accounts to more than one in four. “Going from ‘one in a thousand things we look at are going to be illicit’ to 14 out of 52 is a crazy change,” says Mark Weber, one of the paper’s coauthors and a fellow at MIT’s Media Lab. “And now the investigators are actually going to look into the remainder of those to see, wait, did we miss something?”

Elliptic says it’s already been privately using the AI model in its own work. As more evidence that the AI model is producing useful results, the researchers write that analyzing the source of funds for some suspicious transaction chains identified by the model helped them discover Bitcoin addresses controlled by a Russian dark-web market, a cryptocurrency “mixer” designed to obfuscate the trail of bitcoins on the blockchain, and a Panama-based Ponzi scheme. (Elliptic declined to identify any of those alleged criminals or services by name, telling WIRED it doesn’t identify the targets of ongoing investigations.)

Perhaps more important than the practical use of the researchers’ own AI model, however, is the potential of Elliptic’s training data, which the researchers have published on the Google-owned machine learning and data science community site Kaggle. “Elliptic could have kept this for themselves,” says MIT’s Weber. “Instead there was very much an open source ethos here of contributing something to the community that will allow everyone, even their competitors, to be better at anti-money-laundering.” Elliptic notes that the data it released is anonymized and doesn’t contain any identifiers for the owners of Bitcoin addresses or even the addresses themselves, only the structural data of the “subgraphs” of transactions it tagged with its ratings of suspicion of money laundering.

That enormous data trove will no doubt inspire and enable much more AI-focused research into bitcoin money laundering, says Stefan Savage, a computer science professor at the University of California San Diego who served as adviser to the lead author of a seminal bitcoin-tracing paper published in 2013. He argues, though, that the current tool doesn’t seem likely to revolutionize anti-money-laundering efforts in crypto in its current form, so much as serve as a proof of concept. “An analyst, I think, is going to have a hard time with a tool that’s kind of right sometimes,” Savage says. “I view this as an advance that says, ‘Hey, there’s a thing here. More people should work on this.’”

Savage warns, though, that AI-based money-laundering investigation tools will likely raise new ethical and legal questions if they end up being used as actual criminal evidence—in part because AI tools often serve as a “black box” that provides a result without any explanation of how it was produced. “This is on the edge where people get uncomfortable in the same way they get uncomfortable about face recognition,” he says. “You can’t quite explain how it works, and now you’re depending on it for decisions that may have an impact on people’s liberty.”

MIT’s Weber counters that money laundering investigators have always used algorithms to flag potentially suspicious behavior. AI-based tools, he argues, just mean those algorithms will be more efficient and have fewer false positives that waste investigators’ time and incriminate the wrong suspects. “This isn’t about automation,” Weber says. “This is a needle-in-a-haystack problem, and we’re saying let’s use metal detectors instead of chopsticks.”

As for the research impact that Savage expects, he argues that even beyond blockchain analysis, Elliptic’s training data is so voluminous and detailed that it may even help with other kinds of AI research into analogous problems like health care and recommendation systems. But he says the researchers do also intend their work to have a practical effect, enabling a new and very real way to hunt for patterns that reveal financial crime.

“We’re hopeful that this is much more than an academic exercise,” Weber says, “that people in this domain can actually take this and run with it.”

Title: Unleashing the Power of Big Data: Enhancing AI’s Ability to Detect Crypto Money Laundering

Introduction:
As the digital landscape continues to evolve, so do the methods employed by criminals to launder money. Cryptocurrencies have emerged as a popular choice for illicit activities due to their perceived anonymity and decentralized nature. However, with the advent of big data and advancements in artificial intelligence (AI), there is newfound potential to combat crypto money laundering effectively. In this article, we explore the potential of a large new data set to enhance AI’s ability to detect and prevent crypto money laundering.

Understanding Crypto Money Laundering:
Crypto money laundering refers to the process of disguising the origins of illegally obtained funds through cryptocurrency transactions. Criminals exploit the decentralized nature of cryptocurrencies, making it challenging for traditional financial institutions and law enforcement agencies to trace and identify illicit activities. This necessitates innovative solutions that leverage technology and data analysis to detect and prevent such activities.

The Power of Big Data:
Big data refers to vast amounts of structured and unstructured data that can be analyzed to reveal patterns, trends, and insights. By harnessing big data, AI algorithms can identify anomalies, detect suspicious patterns, and uncover hidden connections that may indicate money laundering activities within the crypto space. The larger the data set available, the more accurate and comprehensive the AI models become.

The Role of AI in Detecting Crypto Money Laundering:
Artificial intelligence plays a crucial role in analyzing massive amounts of data quickly and efficiently. Machine learning algorithms can be trained using historical data to identify patterns associated with money laundering activities. By continuously learning from new data, AI models can adapt and improve their detection capabilities over time.

The Potential of a Large New Data Set:
A large new data set holds immense potential for enhancing AI’s ability to detect crypto money laundering. This data set can include information from various sources such as cryptocurrency exchanges, blockchain transactions, social media platforms, and public records. By integrating and analyzing this diverse range of data, AI algorithms can identify suspicious activities, track the movement of funds, and uncover hidden connections between individuals or entities involved in money laundering.

Benefits of Using a Large New Data Set:
1. Enhanced Accuracy: With a larger data set, AI models can identify subtle patterns and anomalies that may go unnoticed with smaller data sets. This leads to more accurate detection of potential money laundering activities.

2. Improved Speed: AI algorithms can process vast amounts of data in real-time, enabling faster detection and response to suspicious transactions. This reduces the window of opportunity for criminals to carry out their illicit activities.

3. Comprehensive Risk Assessment: By analyzing a wide range of data sources, AI models can provide a holistic view of potential risks associated with specific individuals, addresses, or transactions. This enables financial institutions and regulatory bodies to make informed decisions and take appropriate actions to mitigate money laundering risks.

4. Adaptive Learning: AI models trained on large data sets can continuously learn and adapt to evolving money laundering techniques. This ensures that detection algorithms remain up-to-date and effective in combating new and emerging threats.

Conclusion:
The potential of a large new data set to enhance AI’s ability to detect crypto money laundering is immense. By leveraging big data and advanced AI algorithms, financial institutions, law enforcement agencies, and regulatory bodies can stay one step ahead in the fight against illicit activities within the cryptocurrency ecosystem. As technology continues to evolve, the collaboration between big data analytics and AI will play a pivotal role in safeguarding the integrity of the financial system and protecting against money laundering threats.