Inside Our VuleReSol Agent Architecture

Our first autonomous AI agent(s), VuleReSol, perform complex security attacks on Ethereum L1 protocols that traditionally required expert human engineers. They run 24/7, either on your GitHub repo or on an on-chain deployed protocol - continuously looking for vulnerabilities, ideating ideas and developing exploits and proof-of-concepts.

System Architecture Diagram

Core System Components

  1. Navigator:
    • Interfaces with Ethereum nodes and source code repositories
    • Uses Foundry, Hardhat, and other tools to compile code and repositories
    • Maintains comprehensive contract mappings and inheritance hierarchies
    • Tracks function-level code coverage status (Unknown/Seen/Researched)
    • Provides low-level access to contract functions, state variables, and entry points
  2. Explorer Agent:
    • The "manager" agent, responsible for traversing the codebase, delegating ideas to subagent and manage tasks for the whole system
    • Traversing the codebase via Navigator (^), entry points, contracts, functions, state variables
  3. PoC Exploit Agents:
    • Receives potential vulnerability ideas from Explorer Agent
    • Actively deploys contracts to local Ethereum node to validate exploits
    • Only reports confirmed vulnerabilities that were successfully exploited
    • Filters out false positives through hands-on validation
  4. LLM Endpoint:
    • The engine of every agent, generate functions to call
    • OpenAI compatible API endpoint for our own fine-tuned models
    • LLama3.2 fine tuned on Solidity code + vulnerabilities dataset + function calling
  5. Rational DB:
    • Stores and indexes discovered vulnerabilities
    • Maintains historical analysis data and patterns
  6. Vector DB:
    • Enables semantic search across thousands of past vulnerabilities (and in the future, full of ethereum deployed contracts) to be used by the agents
    • Tracks relationships between similar exploit patterns

How It Works: An Example Flow

  1. The process begins with a trigger - either automatically via GitHub Actions/scheduler, or manually through our dashboard.
  2. We retrieve the target code, which can come from either:
    • An on-chain contract address
    • A GitHub repository
  3. The system analyzes and compiles the codebase:
    • Detects existing development frameworks (Hardhat/Foundry)
    • Injects our own framework if needed
    • Ensures successful compilation
  4. The Explorer Agent activates and, using the Navigator component:
    • Scans for referenced protocol addresses
    • Fetches and compiles all dependent protocols
    • Maps the entire ecosystem of contracts
  5. Code traversal begins:
    • Starts with external/public functions
    • Systematically walks through all code paths
    • Identifies potential vulnerabilities through pattern matching (vector db) and reasoning (O1 like)
  6. When the Explorer finds something interesting:
    • Creates detailed delegation report (what/where/how)
    • Spawns a PoC Exploit Agent
  7. The PoC Exploit Agent:
    • Loads relevant code from Navigator
    • Deploys to local Ethereum node
    • Iteratively attempts exploitation through self-playing interactions
    • Either concludes no vulnerability exists or successfully exploits it
    • On success: Creates finding, alerts team, proposes fixes
  8. Throughout this process:
    • Agents maintain continuous loops with LLM endpoint
    • Explorer manage tasks, which user can inject more things to it
    • User can look at the sandboxes environment the agents are trying and exploit

Technical Roadmap 2025

  • L2 Ecosystem Expansion:
    • Support for Optimism, Base, and other EVM-compatible L2s
  • Solana Smart Contract Analysis:
    • Starting with Rust programs, then C, Python and Anchor
  • WebGent Development:
    • Full-stack web application vulnerability engineer. API and frontend
  • Dataset and Model Improvements:
    • Expansion of vulnerabilities dataset for Solidity
    • Enhanced agent decision-making models
    • Automated exploit generation improvements
  • Twitter Personality:
    • Connecting our agent to a few twitter handlers to engage with community
    • It will run against active (undeployed) bounties, and comments can trigger it to investigate more stuff
    • A mix of @truth_terminal, but in the style of @zachxbt

Platform walkthrough: