npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

aleno-malicious-smart-contract-detection-ml

v0.1.6

Published

This bot detects when suspicious smart contract are deployed

Downloads

45

Readme

Malicious Smart Contract ML

Description

This repository contains an advanced smart contract detection bot, designed by Aleno, to identify malicious smart contracts on the Ethereum blockchain. The bot achieves this by analyzing the opcodes of smart contracts. Our approach builds upon the foundation laid by Forta's ML Bot.

This bot uses an improved model from the original bot by enhancing data quality and addressing the dataset's inherent imbalance.

A full description of the model can be found in malicious-contract-detection.

Model Configuration

Data Used For Training

  • Malicious Dataset

Our malicious dataset is primarily derived from Forta's publicly available dataset on GitHub. We have augmented this dataset with recent hacking incidents while excluding fishing contracts. Data sources include Forta, DeFiMon, and DeFiHackLabs.

We've further refined the dataset by eliminating contracts with duplicated code, resulting in 154 malicious contract records.

  • Benign Dataset

Benign contract data was sourced from Ethereum smart contracts verified on Etherscan, accessible here. We extended this dataset with 5000 recently verified contracts from Etherscan to align with recent hacking incidents. To maintain consistency with Forta's work, we chose to use only 15000 records from this dataset.

Algorithm

For the analysis of smart contracts' opcodes and the extraction of common and important opcodes found in malicious and benign contracts, we employed a technique borrowed from natural language processing called TF-IDF (term frequency–inverse document frequency). This technique extracts numerical features from text (opcodes, in this case). These features are then fed into a LogisticRegression model to predict whether a contract is malicious or not.

  • TF-IDF extracts opcodes in chunks: unigrams, bigrams, trigrams, and 4-grams.
    • Example of a unigram: PUSH1
    • Example of a 4-gram: PUSH1 MSTORE PUSH1 CALLDATASIZE
  • Analyzing in chunks helps retain the relative position information of the smart contract opcodes.

NOTE: Compared to the original Forta work, we improved the model training phase using the SMOTE oversampling technique to synthesize malicious contract records and address the fact that the malicious contract class represents only 1% of the dataset.

Model Versions

The model has undergone multiple versions with corresponding performance metrics and a comparison with Forta's results:

| Model Version | Created Date | Avg Precision | Avg Recall | Avg F1-Score | Alert Rate | Notes | |---------------|--------------|---------------|------------|--------------|------------|--------------------------| | Forta V1 | 09/30/2022 | 88.6% | 59.4% | 69.6% | 222.125 | | | Forta V2 | 11/05/2022 | 73.36% | 48.37% | 53.97% | 112.75 | FP Mitigation for V1 | | Forta V3 | 02/06/2023 | 87.78% | 55.195% | 62.077% | TODO | FP Mitigation for V2 | | Aleno V1 | 18/12/2023 | 81.17% | 87% | 84% | 68 | No FP mitigation yet |

  • For Forta models, average precision and recall were calculated via stratified 5-fold cross-validation with a decision threshold set to 0.5, while Aleno results were obtained on a test dataset, as SMOTE didn't allow comparing results since datasets are not the same.
  • Alert-rate = the number of Ethereum alerts daily (average of 7 days).

Improvements

  1. Chain-Specific Models: Currently, this model was trained exclusively on Ethereum smart contracts. To enhance its effectiveness, it may be beneficial to create machine learning models tailored to each blockchain, trained on chain-specific smart contracts. For instance, a dedicated model could be trained specifically for Binance Smart Chain (BSC) contracts, considering the unique characteristics of that blockchain.

  2. Dynamic Management of Known Contracts: The model currently operates only on unknown contracts, while known contract types are identified based on their signature. However, the list of known contracts is static. To improve the model's adaptability, consider implementing a mechanism to dynamically update the list of known contract hashes. This can be achieved by regularly adding new frequently encountered contract hashes to common_contract_hash_set.json, ensuring that the model remains up-to-date with emerging contract types.

  3. False Positive Mitigation: Improve False positive mitigation by looking at cross chain funding, and ping API to check if contract deployer's have known labels. How can we improve mitigation that uses etherscan api for verified contracts? (There can be a lag between etherscan and the bot that could initially trigger a CRITICAL alert and a second bot with a lag can lower).

Supported Chains

  • Ethereum
  • BSC
  • Polygon
  • Optimism
  • Arbitrum
  • Avalanche
  • Fantom

Alerts

  • SUSPICIOUS-CONTRACT-CREATION

    • Fired when a new non-token and non-proxy contract is predicted as malicious.
    • Metadata will include the following:
      • Link to OKO Contract Explorer to review decompiled contract code and ABI. This only works for Ethereum.
      • Function sighashes
      • ML model score and threshold
      • Addresses observed in the created contract (either through storage or static analysis)
      • Any wallet tags associated with the addresses. The bot queries the wallet tags from Luabase. This only works for Ethereum.
    • Finding type: Suspicious
    • Finding severity: High
    • Attack Stage: Preparation
  • SUSPICIOUS-CONTRACT-CREATION-SUSPICIOUS-FUNDING

    • Fired when a new non-token and non-proxy contract is predicted as malicious and that funding analysis of its deployer is suspicious (using privacy tools like tornado cash)
    • Metadata will include the following:
      • Link to OKO Contract Explorer to review decompiled contract code and ABI. This only works for Ethereum.
      • Function sighashes
      • ML model score and threshold
      • Addresses observed in the created contract (either through storage or static analysis)
      • Any wallet tags associated with the addresses. The bot queries the wallet tags from Luabase. This only works for Ethereum.
    • Finding type: Suspicious
    • Finding severity: Critical
    • Attack Stage: Preparation