Inside Our VuleReSol Agent Architecture
Our first autonomous AI agent(s), VuleReSol, perform complex security attacks on Ethereum L1 protocols that traditionally required expert human engineers. They run 24/7, either on your GitHub repo or on an on-chain deployed protocol - continuously looking for vulnerabilities, ideating ideas and developing exploits and proof-of-concepts.
Core System Components
- Navigator:
- Interfaces with Ethereum nodes and source code repositories
- Uses Foundry, Hardhat, and other tools to compile code and repositories
- Maintains comprehensive contract mappings and inheritance hierarchies
- Tracks function-level code coverage status (Unknown/Seen/Researched)
- Provides low-level access to contract functions, state variables, and entry points
- Explorer Agent:
- The "manager" agent, responsible for traversing the codebase, delegating ideas to subagent and manage tasks for the whole system
- Traversing the codebase via Navigator (^), entry points, contracts, functions, state variables
- PoC Exploit Agents:
- Receives potential vulnerability ideas from Explorer Agent
- Actively deploys contracts to local Ethereum node to validate exploits
- Only reports confirmed vulnerabilities that were successfully exploited
- Filters out false positives through hands-on validation
- LLM Endpoint:
- The engine of every agent, generate functions to call
- OpenAI compatible API endpoint for our own fine-tuned models
- LLama3.2 fine tuned on Solidity code + vulnerabilities dataset + function calling
- Rational DB:
- Stores and indexes discovered vulnerabilities
- Maintains historical analysis data and patterns
- Vector DB:
- Enables semantic search across thousands of past vulnerabilities (and in the future, full of ethereum deployed contracts) to be used by the agents
- Tracks relationships between similar exploit patterns
How It Works: An Example Flow
- The process begins with a trigger - either automatically via GitHub Actions/scheduler, or manually through our dashboard.
- We retrieve the target code, which can come from either:
- An on-chain contract address
- A GitHub repository
- The system analyzes and compiles the codebase:
- Detects existing development frameworks (Hardhat/Foundry)
- Injects our own framework if needed
- Ensures successful compilation
- The Explorer Agent activates and, using the Navigator component:
- Scans for referenced protocol addresses
- Fetches and compiles all dependent protocols
- Maps the entire ecosystem of contracts
- Code traversal begins:
- Starts with external/public functions
- Systematically walks through all code paths
- Identifies potential vulnerabilities through pattern matching (vector db) and reasoning (O1 like)
- When the Explorer finds something interesting:
- Creates detailed delegation report (what/where/how)
- Spawns a PoC Exploit Agent
- The PoC Exploit Agent:
- Loads relevant code from Navigator
- Deploys to local Ethereum node
- Iteratively attempts exploitation through self-playing interactions
- Either concludes no vulnerability exists or successfully exploits it
- On success: Creates finding, alerts team, proposes fixes
- Throughout this process:
- Agents maintain continuous loops with LLM endpoint
- Explorer manage tasks, which user can inject more things to it
- User can look at the sandboxes environment the agents are trying and exploit
Technical Roadmap 2025
- L2 Ecosystem Expansion:
- Support for Optimism, Base, and other EVM-compatible L2s
- Solana Smart Contract Analysis:
- Starting with Rust programs, then C, Python and Anchor
- WebGent Development:
- Full-stack web application vulnerability engineer. API and frontend
- Dataset and Model Improvements:
- Expansion of vulnerabilities dataset for Solidity
- Enhanced agent decision-making models
- Automated exploit generation improvements
- Twitter Personality:
- Connecting our agent to a few twitter handlers to engage with community
- It will run against active (undeployed) bounties, and comments can trigger it to investigate more stuff
- A mix of @truth_terminal, but in the style of @zachxbt
Platform walkthrough: