CoinJelly Report:
Author: accelxr, 1KX; Translator: 0xjs@
The current primary purpose of generative models is content creation and information filtering. However, recent research and discussions on AI agents (autonomous participants that accomplish user-defined goals using external tools) suggest that if AI is provided with an economic channel similar to the 1990s internet, substantial unlocking of AI capabilities may occur.
For this, agents need to act as proxies for the assets they can control, as traditional financial systems are not set up for them.
This is where crypto comes into play: crypto provides a digitized payment and ownership layer with fast settlement, making it particularly suitable for building AI agents.
In this article, I will introduce the concept of agents and agent architecture, demonstrate how examples in research prove that agents possess emerging properties beyond traditional LLMs, and discuss projects around building solutions or products based on crypto-based agents.
What are Agents?
AI agents are LLM-driven entities capable of planning and taking actions to achieve goals through multiple iterations.
Agent architectures consist of individual or multiple agents working together to solve problems.
Typically, each agent is endowed with a personality and can utilize various tools that help them work independently or as part of a team.
Agent architecture differs from how we usually interact with LLMs today:
Zero-shot prompting is the way most people interact with these models: you input a prompt, and LLM generates a response based on its pre-existing knowledge.
In agent architecture, you initialize goals, and LLM breaks them down into subtasks, prompting itself (or other models) autonomously to accomplish each subtask until the goal is achieved.
Single-Agent Architecture vs. Multi-Agent Architecture
Single-Agent Architecture: A language model performs all reasoning, planning, and tool execution. There is no feedback mechanism from other agents, but humans can choose to provide feedback to the agent.
Multi-Agent Architecture: These architectures involve two or more agents, each of which can use the same language model or a set of different language models. Agents can use the same or different tools. Each agent typically has its own role.
Vertical structure: One agent acts as a leader, and other agents report to it. This helps organize the output of the group.
Horizontal structure: A large group discussion about a task, where each agent can see other messages and voluntarily contribute to task completion or invoke tools.
Agent Architecture: Profiles
Agents have profiles or personalities that define roles as prompts to influence the behavior and skills of LLM. This highly depends on the specific application.
Many people are already using it as a prompt technique today: “You are a nutrition expert. Provide me with a meal plan…”. Interestingly, providing roles to LLMs can improve their output compared to baselines.
Profiles can be created through:
Manual Creation: Profiles manually specified by human creators; most flexible but time-consuming.
LLM-generated: Profiles generated using LLM, which includes a set of rules around composition and attributes + (optionally) a small number of sample examples.
Dataset Alignment: Profiles generated based on real-world human dataset.
Agent Architecture: Memory
Agent memory stores information perceived from the environment and utilizes this information to formulate new plans or actions. Memory allows agents to self-evolve and act based on their experiences.
Unified Memory: Similar to short-term memory achieved through context learning/persistent prompting. All relevant memories are passed to the agent in each prompt. Primarily limited by the context window size.
Hybrid: Short-term + long-term memory. Short-term memory is a temporary buffer of the current state. Reflective or useful long-term information is stored permanently in a database. There are several ways to achieve this, but a common approach is using a vector database (encoding memories as embeddings and storing; recall comes from similarity search).
Formats: Natural language, databases (e.g., SQL queries understood through fine-tuning), structured lists, embeddings.
Agent Architecture: Planning
Complex tasks are broken down into simpler subtasks to be solved individually.
Feedback-less planning:
In this approach, agents do not receive feedback that affects future behavior after taking action. An example is Chain of Thought (CoT), where LLM is encouraged to express its thought process while providing answers.
Single-path reasoning (e.g., zero-shot CoT)
Multi-path reasoning (e.g., self-consistent CoT, where multiple CoT threads are generated, and the answer with the highest frequency is used)
External planners (e.g., Planning Domain Definition Language)
Feedback-driven planning:
Refining subtasks based on external feedback
Environmental feedback (e.g., game task completion signal)
Human feedback (e.g., soliciting feedback from users)
Model feedback (e.g., soliciting feedback from another LLM – crowdsourcing)
Agent Architecture: Action
Action is responsible for translating the agent’s decisions into concrete outcomes.
Action goals can take various forms, such as:
Task completion (e.g., crafting an iron pickaxe in Minecraft)
Communication (e.g., sharing information with another agent or human)
Environment exploration (e.g., exploring its own action space and learning its capabilities).
The generation of actions usually comes from memory recall or plan adherence, and the action space consists of internal knowledge, APIs, databases/knowledge bases, and the use of external models.
Agent Architecture: Skill Acquisition
For agents to correctly execute actions within the action space, they must possess task-specific skills. There are primarily two ways to achieve this:
Through fine-tuning: Training agents on manually annotated, LLM-generated, or real-world example behavior datasets.
Without fine-tuning: Using LLM’s innate abilities through more sophisticated prompt engineering and/or mechanism engineering (i.e., combining external feedback or accumulated experience during iterative experiments).
Examples of Agents in Literature
Generative Agents: Interactive simulation of human behavior: Instantiate generative agents in a virtual sandbox environment, demonstrating burst social behavior in multi-agent systems. Starting with a single user-specified prompt for an upcoming Valentine’s Day party, the agents automatically send invitations, make new friends, go on dates, and coordinate attending the party at the right time. You can try it out yourself using the implementation in a16z AI Town.
Description Explanation Plan Selection (DEPS): The first zero-shot multitask agent capable of completing over 70 Minecraft tasks.
Voyager: The first LLM-driven lifelong learning agent in Minecraft, continuously exploring the world, acquiring various skills, and making new discoveries without human intervention. It improves its skill execution code based on feedback from iterative trials.
CALYPSO: An agent designed for the game “Dungeons & Dragons” to assist dungeon masters in storytelling and world creation. Its short-term memory is built on scene descriptions, monster information, and previous summaries.
Ghost in the Minecraft (GITM): A moderately skilled agent in Minecraft, with a success rate of 67.5% in obtaining diamonds and 100% completion rate for all items in the game.
SayPlan: Large-scale task planning for robots based on LLM, using 3D scene graph representation, demonstrating the ability to perform long-term task planning based on abstractions and natural language instructions.
HuggingGPT: Task planning using ChatGPT based on user prompts, selecting models based on the descriptions on Hugging Face, and executing all subtasks, achieving impressive results in language, vision, speech, and other challenging tasks.
MetaGPT: Accepts inputs and outputs user stories/competitive analysis/requirements/data structures/APIs/documents, etc. Internally, multiple agents make up various functions of a software company.
ChemCrow: A chemistry agent using LLM, designed to accomplish tasks such as organic synthesis, drug discovery, and material design using 18 expert-designed tools. It autonomously plans and executes the synthesis of insecticides and three organic catalysts and guides the discovery of a new type of coloring agent.
BabyAGI: General infrastructure for creating, prioritizing, and executing tasks using OpenAI and vector databases (e.g., Chroma or Weaviate).
AutoGPT: Another example of a general infrastructure for bootstrapping LLM agents.
Examples of Agents in Crypto
(Note: Not all examples are LLM-based + some may be loosely based on the concept of agents)
FrenRug from Ritualnet: A GPT-4-based Turkish carpet salesman game. FrenRug is a broker, and anyone can try to convince him to buy their Friend.tech Key. Each user message is passed to multiple LLMs running on different Infernet nodes. These nodes respond on-chain, and the agent’s decision to buy the proposed Key is determined by voting among the LLMs. When enough responses are received, the votes are aggregated, and a supervised classifier model determines the action, which is then passed on-chain as proof of validity, allowing off-chain execution of multiple classifiers.
Gnosis: A decentralized prediction market platform that uses LLMs to predict the outcome of future events. It is an example of a complex system built on crypto-based agents.
The Use of Autonolas in the Prediction Market on the IS Network
AI robots are essentially intelligent contract wrappers for AI services that can be invoked by anyone through payment and questioning. The service monitors requests, performs tasks, and returns answers on the blockchain. This AI robot infrastructure has been extended to the prediction market through Omen, where the basic concept is that agents actively monitor and bet on news analysis predictions to ultimately derive aggregated predictions that are closer to true odds. Agents search the market on Omen, autonomously pay the “robots” for predictions on the topic, and engage in trading using the market.
ianDAO’s GPT<>Safe Demo
GPT autonomously manages USDC in its own Base chain’s Safe multi-signature wallet using the syndicateio trading cloud API. Users can interact with it and propose suggestions on how to best utilize its capital, which it may allocate based on the suggestions.
Game Agents
There are multiple ideas here, but in short, AI agents in virtual environments can act as both companions (such as AI NPCs in “Skyrim”) and competitors (such as a group of chubby penguins). Agents can automatically execute profit strategies, provide goods and services (such as shopkeepers, traveling merchants, seasoned generative quest providers), or serve as semi-playable characters in Parallel Colony and AI Arena.
Safe Guardians
A set of AI agents used to monitor wallets, defend against potential threats, protect user funds, and enhance wallet security. Features include automatic revocation of contract permissions and fund extraction in the event of anomalies or hacker attacks.
Botto
While Botto is a broadly defined on-chain example of an intelligent agent, it showcases the concept of autonomous on-chain artists whose creations are voted on by token holders and auctioned on SuperRare. Various extensions can be imagined using a multimodal agent architecture.
Some Notable Agent Projects
(Note: Not all projects are LLM-based, and some may loosely be based on the concept of agents)
AIWay Finder – A decentralized knowledge graph of protocols, contracts, contract standards, assets, functions, API capabilities, routines, and pathways (i.e., virtual routes within blockchain ecosystems) that pathfinder agents can navigate. Users are rewarded for identifying viable paths used by agents. Additionally, one can forge shells (i.e., agents) with character settings and skill activations that can be plugged into the pathfinder knowledge graph.
Ritualnet – As shown in the previous example of frenrug, Ritual infernet nodes can be used to set up multi-agent architectures. The nodes listen to on-chain or off-chain requests and provide outputs with optional proofs.
Morpheus – A peer-to-peer network for personal general AI that can execute smart contracts on behalf of users. This can be used for web3 wallet and transaction intent management, data parsing through a chatbot interface, recommendation models for dapps and contracts, and extending agent operations by connecting applications and user data with long-term memory.
Dain Protocol – Exploring various use cases for deploying agents on Solana. Recently, a deployment of an automated trading bot that extracts on-chain and off-chain information to execute on behalf of users (e.g., selling BODEN if Biden loses) was demonstrated.
Naptha – An agent orchestration protocol with an on-chain task market for contracting agents, operator nodes for orchestrating tasks, an LLM workflow orchestration engine supporting asynchronous message passing across different nodes, and a workflow proof system for verification of execution.
Myshell – An AI character platform similar to http://character.ai, where creators can monetize agent profiles and tools. It includes a multimodal infrastructure with interesting example agents, including translation, education, companionship, coding, etc. It features simple no-code agent creation and more advanced developer patterns for assembling AI widgets.
AI Arena – A competitive PvP fighting game where players can buy, train, and battle NFTs supported by AI. Players train their NFT agent by imitative learning, where the AI learns how to play the game in different maps and scenarios by probabilistic learning from observing player behavior. After training, players can deploy their agents in ranked battles to earn token rewards. While not LLM-based, it is still an interesting example of the possibilities of agent games.
Virtuals Protocol – A protocol for building and deploying multimodal agents into games and other online spaces. The three main prototypes of today’s virtuals include IP character mirroring, specialized function agents, and personal avatars. Contributors provide data and models to the virtuals, and validators act as gatekeepers. There is an economic incentive mechanism to promote development and monetization.
Brianknows – Provides a user interface for interacting with agents that can perform transactions, research cryptocurrency-specific information, and deploy smart contracts in a timely manner. Currently supports over 10 operations out of 100+ integrations. A recent example is allowing an agent to stake ETH in Lido on behalf of a user using natural language.
Autonolas – Provides lightweight local and cloud-based agents, consensus-operated decentralized agents, and professional agent economies. Prominent examples include prediction agents in DeFi and prediction markets, AI-driven governance representatives, and an agent-to-agent tool marketplace. Offers protocols + OLAS stack for coordination and incentivization of agent operations, an open-source framework for developers to build collectively owned agents.
Creator.Bid – Provides social media character agents connected to X and Farcaster real-time APIs. Brands can launch knowledge-based agents that execute brand-consistent content on social platforms.
Polywrap – Offers various agent-based products, such as Indexer (a social media agent for Farcaster), AutoTx (a planning and transaction execution agent built with Morpheus and flock.io), predictionprophet.ai (a prediction agent with Gnosis and Autonolas), and fundpublicgoods.ai (an agent for resource allocation funding).
Validation – As economic flows will be guided by agents, output validation will be crucial (detailed in future articles). Validation methods include Ora Protocol, zkML from teams like Modulus Labs+Giza+EZKL, game theory solutions, and hardware-based solutions like TEE.
Ideas for On-Chain Agents
– Ownable, tradable, token-gated agents capable of performing various types of functions, from companionship to financial applications.
– Agents that can represent, learn, and participate in the game economy by identifying and learning from human behavior.
– Agents that can simulate real human behavior for profit opportunities.
– Multi-agent managed wallets that act as autonomous asset managers.
– AI-managed DAO governance, such as token delegation, proposal creation or management, process improvement, etc.
– Knowledge graph of existing and new protocols’ interactions and APIs.
– Autonomous guardian networks, multi-signature security, smart contract security, and feature enhancements.
– Truly autonomous investment DAOs, such as collector DAOs with roles like art historians, investment analysts, data analysts, and degen agent roles.
– Token economics and contract security simulation and testing.
– General intent management, especially in the context of cryptographic user experiences like bridging or DeFi.
– Art or experimental projects.
– Attracting the next billion users.
As Jesse Walden, co-founder of Variant Fund, recently stated, autonomous agents are an evolution of blockchain use cases rather than a revolution: we already have protocol bots, snipers, MEV searchers, and robot toolkits. Agents are just an extension of all of this.
Many areas in crypto are built in ways that favor agent execution, such as fully on-chain games and DeFi. Assuming that the cost of LLM decreases relative to task performance and accessibility to creating and deploying agents increases, it is hard to imagine an AI agent not dominating the interactions on-chain and becoming the world of the next billion crypto users.
Reading Material:
– AI Agents That Can Bank Themselves Using Blockchains
– The new AI agent economy will run on Smart Accounts
– A Survey on Large Language Model based Autonomous Agents (I used this for identifying the taxonomy of agentic architectures above, highly recommend)
– ReAct: Synergizing Reasoning and Acting in Language Models
– Generative agents: Interactive simulacra of human behavior
– Reflexion: Language Agents with Verbal Reinforcement Learning
– Toolformer: Language Models Can Teach Themselves to Use Tools
– Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
– Voyager: An Open-Ended Embodied Agent with Large Language Models
– LLM Agents Papers GitHub Repo
Original article link