Transforming Blockchain Security:  Introducing Our Advanced Clustering Algorithms and Heuristics for Bitcoin and Smart Contract Chains such as Ethereum and Tron

Merkle Science
December 7, 2023

As the cryptocurrency world expands beyond Bitcoin, diversification has introduced both opportunities and risks. The evolution from Bitcoin-centric operations to the widespread use of EVM and TRON blockchains has been driven by the newer chains’ faster transaction speeds, lower costs, and increased liquidity. As the sector transitions to a multi-chain ecosystem, it also opens up new avenues for illicit activities and marks a critical juncture in the need for enhanced security measures.

The core of EVM and TRON blockchains lies in smart contracts. Unfortunately, smart contracts are increasingly being targeted by hackers, posing a significant challenge for investigators and law enforcement agencies. Legacy blockchain analytics tools often lack comprehensive insights into DeFi hacks, do not expose smart contract vulnerabilities, and have poor attribution for EVM and TRON cases. 

Merkle Science is addressing this gap by deploying state-of-the-art algorithms specifically optimized for EVM and TRON chains to systematically group addresses that are under the control of the same entity or service. 

Methodology and Examples

Smart Contract Clustering

Our first enhancement focuses on smart contract clustering. We've researched and developed a groundbreaking method that examines the structure, operations, and behavioral patterns deployed by VASPs and their smart contracts. Specifically, our algorithm analyzes types of transactions to cluster addresses belonging to entities such as VASPs or individuals. Our approach is applicable to all EVM-based chains and Tron.

Unlike conventional methods that rely on both deposit and withdrawal transactions for identification, our system swiftly identifies addresses linked to specific entities without this dependency. This innovation accelerates cluster detection, setting us apart from traditional heuristics.

Our approach centers on 'create' type transactions, where new addresses or smart contracts (like tokens, deployers, or routers) are generated. The guiding principle is straightforward: if an entity creates something, it is owned by that entity. This clear-cut approach streamlines ownership attribution.

Consider this scenario, for instance, in a specific transaction – 0xee38418bdf6aa9ab95e6d54f18ce88d50ca6fb9630e8a1c3d399c804e21c1598 (refer to Figure 1) the address 0xF99e5F80486426E7d3e3921269FFee9c2Da258e2 (Address A) generates a new smart contract 0xf02b075f514c34df0c3d5cb7ebadf50d74a6fb17 (Address B) for subsequent transactions. 

In this streamlined process, when an originating address (Address A) creates a smart contract (Address B) and is tagged for illicit activities, this tag is immediately applied to Address B as well. This direct correlation plays a crucial role in swiftly identifying and attributing illicit activities to related entities. For example, if Address A is involved in suspicious transactions, our heuristics automatically analyzes Address B as well. By efficiently linking these tags, we substantially increase the accuracy of identifying and attributing unlawful activities in our network.

According to our research, this example is not isolated; many entities create smart contracts tailored to their user base. Our approach adeptly assigns tags to all related smart contracts, tracing them back to the originating entity, thereby improving ownership attribution accuracy. 

Figure 1: Smart contract create transaction

Message-Based Identification 

Our second approach focuses on message-based identification. In many cases,  the initiator of the transaction embeds messages within the transactions themselves. These messages serve a dual purpose: they identify the transactions belonging to them and start a conversation with the receiver. A prime example is the Ethereum transaction: 0x7a8912583520304ce2364fa165dafe94461a91ab2dcf45dab942e296594dc40a 

This particular transaction includes a message suggesting the possibility that the transaction was performed by a hacker. By recognizing these early warnings, entities can avoid engaging in transactions with such addresses, thereby safeguarding their assets more effectively.

 

                         Figure 2: Ethereum Transaction Analysis for Hacker Identification

Our analysis extends beyond just Ethereum Virtual Machine (EVM) based chains – it encompasses Unspent Transaction Output (UTXO) based chains like Bitcoin, which also allow the addition of messages through OP_Return. 

In both EVM-based chains and Bitcoin, the capability to embed messages is available to any entity. While broadening communication avenues, this universality lowers the confidence level in automatically assigning attributions based solely on these messages. However, this feature becomes strategically valuable in raising red flags. When messages originate from entities suspected of illicit activities, this system serves as an early warning, triggering heightened scrutiny, investigations, and other proactive measures against potential risks. 

Deposit Address Attribution

Our third approach lies in optimizing deposit address attribution. We observed that some entities consistently use the same deposit address. By tracking the flow of funds originating from these addresses, we can identify the main hot wallet of a VASP and its associated hot wallet changes.

Once we have successfully identified the hot wallet of the VASP, our enhanced algorithm detects other deposit addresses associated with that specific hot wallet with high accuracy. Recognizing that hot wallets can change over time, our algorithm is designed to be dynamic, constantly identifying new hot wallets. This advancement streamlines our operations, increasing speed and efficiency by eliminating redundant procedures like re-dusting. We continue to refine these processes, aiming for even greater operational excellence.

For instance, consider a scenario where Entity A aims to dust crypto exchange Binance. The process starts with Entity A logging into Binance and initiating a transaction. Binance allocates a specific deposit address for this transaction, in this case, 0x573bB28732090a7517e15456D1dC535213f34e0C. Following the deposit of funds into this address, Binance promptly initiates a transaction. Subsequently, Binance executes an additional transaction, transferring the funds from the deposit address to its hot wallet. This sequence of actions provides critical insights into the movement of funds within Binance.

In the EVM network, the technique of deposit address-based clustering has proven effective for identifying hot wallets and other associated addresses. The validity of this method is acknowledged by Binance itself. Take, for example, the deposit address 0x573bB28732090a7517e15456D1dC535213f34e0C. This address, associated with a user or entity on Binance, demonstrates the process: funds deposited to this address are swiftly transferred to Binance's hot wallet.

Figure 3: Identifying Hot Wallets through Deposit Address Clustering in EVM

However, in some cases, there are frequent transactions made before the exchanges move the funds to the Hot wallet. This helps in saving transaction fees. For example, in the case of address 0xdF18109294eC1D42318E22beca370a47270f5aC1 there are few transactions made to the deposit address before the funds are moved to Binance's hot wallet. We Identify such transactions and our improved algorithm detects such cases as well.

 

Figure 4: Funds getting accumulated before moving from the Deposit address to the Hot wallet

Upon successful identification of Binance's hot wallet, we attribute all associated addresses exhibiting similar transaction patterns to Binance. It's common for a deposit address to interact with multiple hot wallets of an entity as the entity may update its hot wallet, especially if Binance updates its hot wallet system. Our methodology capitalizes on this behavior, assigning these hot wallets and their related deposit addresses accordingly. 

As an example, consider the deposit address 0xd782083513c782d9e42107ee73909c0849a96219.  Initially, funds sent to this address were directed to a wallet identified as Binance. This is evident in the accompanying figure. Even in cases where the specific identity of a Binance hot wallet, like 0x28C6c06298d514Db089934071355E5743bf21d60, is initially unknown, our updated algorithm effectively recognizes and categorizes the address as a current hot wallet, as labeled on Etherscan.

Figure 5: Funds movement from a Deposit address to different hot wallets

Our advanced approach to hot wallet identification and clustering marks a significant leap in blockchain analytics. By tracking deposit addresses and their interactions with hot wallets, we offer a dynamic solution that adapts to wallet updates. This streamlined system not only improves efficiency and reduces costs but also bolsters security against illicit activities. 

These upgrades signify a major step forward, delivering a swifter and more efficient clustering process that substantially improves the accuracy of our attribution capabilities.

This article showcases cutting-edge research from our Data Science and Intelligence Team, featuring contributions by Rachit Agarwal, Shuang Liu, Sanjana Sharma, Aniket Mazumdar, and our Chief Technology Officer, Nirmal A.K., with contributions from Blockchain Policy and Research Insights Associate, Vidushi Tiwari