Building an On-Chain Arena to Reward the Best AI Agents, with Andrew Hill @ Recall (Audio)

Andrew Hill is CEO and co-founder of Recall Labs, the onchain AI arena for evaluating, ranking, and rewarding the best AI agents. A long-time builder at the intersection of crypto, data, and decentralized infrastructure, Andrew previously co-founded Textile and has been shaping the peer-to-peer web since 2017. With a background in big data and machine learning, and a PhD in Ecology and Evolutionary Biology, he brings a systems-thinking approach to Recall. Andrew is passionate about empowering developers beyond hype cycles and aligning AI with human needs through open, community-driven competition.

[00:00:03] Hello, everybody, and welcome to the Crypto Hipster Podcast. This is your host, Jamil Hasan, the Crypto Hipster, where I interview founders, entrepreneurs, executives, thought leaders, and amazing people all over the world of crypto and blockchain. And today, I have another amazing guest. I keep wanting to say his company name wrong. And I'll tell you why in a minute.

[00:00:24] And the CEO of Recall, Andrew Hill, it's not Total Recall, with Arnold Schwarzenegger, it is Recall. Andrew, welcome to the show. Yeah, thank you very much for having me. Excited to chat. I'm excited too. So let's kick things off and I'll ask you first, you know, what is your background? And is it a logical background for what you're doing now?

[00:00:48] Oh, does anybody ever say it's logical? Yeah, so my background, I did a PhD in biology, actually, so not a clear fit.

[00:01:01] But I was very excited through my PhD at basically trying to turn the large-scale real-time data coming from biology research into systems that created continuous information and knowledge through modeling, through prediction, through mapping, through visualization.

[00:01:27] So I was always doing coding and building applications for biologists to try to put data together in more interesting ways. Interesting enough, it's actually how I really started my work with my co-founder was pulling, he's an engineer and one of the most talented engineers I've ever known. And I was pulling him in back in the day to help work on research grants with me. But from there, I got very excited about the open source movement.

[00:01:56] And some of the things that we were doing around visualization seemed to have broad application. And so I ended up finishing my PhD and joining a small Spanish startup called, at the time they were called CartoDB. And we were building an open source mapping stack to sort of drive maps on the internet. And it's now called Carto. They're still around and doing some pretty awesome things.

[00:02:25] But even there, I was also, you know, I was kind of doing two roles. I was the chief scientist and really thinking about what algorithms we were using for, like, summarizing and visualizing geospatial data. And I was doing a lot of the early days of DevRel, a lot of the evangelism and just talking to people and understanding their needs.

[00:02:48] But at some point in that journey, I just started noticing that the, like, patterns on the internet were not really matching my value system around data lock-in, data collection, the amount of our information we were giving away to these silos and centralized systems. So even then, I was seeing, you know, use cases for mapping that would have billions of points on the ground moving around trying to get insights. And you realize that there were, like, real people moving around.

[00:03:18] And I wasn't sure how many of those dots were where they were in those databases. And so I left that company thinking about what to build, wanting to build something to start trying to address that. And this is actually, I think, where the real through line becomes clear is taking all of that prior knowledge about building large-scale data systems and doing the modeling.

[00:03:43] The first thing that we tried to build a startup around was actually in 2017, 2016. We were trying to build SDKs for mobile app developers where we were trying to create really lightweight performant models that would run directly on devices.

[00:04:02] With the thesis being that if we could move the modeling down to the device, less data would have to go to these centralized repositories in companies and apps that were trying to play the game the right way. And we had a lot of interest at the time, but we found that the biggest interest we were getting was from companies that wanted the tooling that we were building to actually generate more data for them.

[00:04:28] Because we were doing a lot of interesting, like, data cleaning and data summarization out on the edge. And they just wanted to collect more of that so they could do better modeling and tracking. And so we said, okay, we're just not matching the web to mode right now. What's going on? Like, how do we fix it? And that's where we really discovered Web3 got hooked in systems like IPFS and decentralizing protocols. And that's sort of kicked off my journey of saying, like, okay, actually, the ethos seems to be right in Web3.

[00:04:57] Like, what do we need to build? And you can see, like, our original idea wasn't wrong. It was just far too early. And now with, like, Apple's AI kit, they're essentially offering the same at a much bigger scale. Where if you use modeling on the device now, they try to do a lot of it locally and only ship it remotely to secure cloud when it can't be handled locally. But, you know, it took a solid nine more years to get us there.

[00:05:24] So, but now I'm very excited to be getting back to the machine learning and AI systems and figuring out how to build these things in the right way using the Web3 ethos and concepts to kind of embed it in our systems to guide AI towards doing helpful things and doing the right things for humanity. Sounds great.

[00:06:19] And so, yeah. And for me, I've been kind of banging this drum for a little while now around two trends that I see in the world that are hard to ignore once you see them. And the first one's obvious. The second one is less obvious. So, the first one is the commoditization of AI is incredibly powerful and incredibly fast.

[00:06:43] And so, a good way to paint this picture for people is thinking back at the GPT-3 moment where we all had that GPT-3 moment where we go, okay, LLMs are incredibly powerful. They generalize. They can solve a lot of things for us.

[00:07:00] But the time between going from GPT-3 built by a single lab in a closed model, not really shared with the world beyond the product interface, to a handful of large lab companies building competitive models and some of them being fully open was insanely fast. And so, it really showcases how valuable these models are. But also, at the end of the day, they're just software. And software replicates incredibly quickly.

[00:07:28] And so, now we're in this period where models are proliferating. And many of them are open. And you see new models being published on Hugging Face every single day. And we've learned a lot of useful insights about once we have the big foundation models, you can do things like training smaller, specific, targeted models to solve specific use cases. And so, models are proliferating.

[00:07:57] And I want to come back to that because it's important for a later sort of challenge to my two threads. But the other kind of more nuanced trend that we're seeing is that right now, the agentic scaffolding is being built by everyone. And so, a year ago, agents were kind of a toy. You can talk about agents and people would say, oh, yeah, that's coming. But we don't know when. And now we're a year later.

[00:08:27] And like every company everywhere is building agents. The top, you know, Fortune 500 companies are trying to convert products into agents to embed agents to replace workforces with agents. And every startup I know is building custom agents or trying to adopt agents so they can move faster, so they can scale their organization faster. And so, if companies, you know, companies everywhere are building the scaffolding for agents. All of these agents are coming with different tastes and goals and aptitudes.

[00:08:56] And we know that these systems are being built to evolve and learn and that these skills are going to be compounded, compounding. And one thing there is that, like, in the foundation models where you're using the chat interface predominantly, there's like kind of a set of skills that they're being tested on that you know they're good at.

[00:09:21] But in these organizations, we're building the agentic scaffolding so that we can give them rails towards solving skills that aren't necessarily proven they can do yet. And so, it's taking the scaffolding to guide these really powerful models to generalize their abilities towards new skills that are not measured.

[00:09:40] But if you put all this together, like commoditization of models, trend one, agents being built everywhere, trend two, we're going to have an insanely capable interconnected intelligence in one or two years. And so, that's the bet we're taking, that intelligence is going to look a lot more like the internet and not look like this winner-take-all system where one lab is going to build the omni model that solves everything.

[00:10:07] And the counter, the one counter point that I like to this is that there is this idea that the labs right now are focused on building agents that can build new models. And the risk there being that in a couple generations down the line for these models, the agent will be so capable of building new powerful models that the acceleration within a lab will happen so fast nobody keeps up.

[00:10:35] But the fact I like to bring up is trend number one. If you see the speed of commoditization that is already happening, I don't have any reason to believe that that skill itself will stay contained. And so, if you fast forward to when we have agents that can build new models, if that agent skill commoditizes, we see models popping up faster than ever. So, like today is the tip of the iceberg as far as the number of models, what they can do, and the skills they solve.

[00:11:03] So, the question and kind of the reason why we exist is how do you know which agents you're ever going to trust? If we have an internet full of agents, how do you know which ones to onboard and adopt? It's that boundary between organizations. I build an agent. I productize it. How should you ever know to use it? And I like to say it's a bit like hiring. So, today, like already, if I hire somebody, I don't just like take the first person that comes in for the job.

[00:11:31] We have like a multi-week interview process where I like test their skills. I check their references. I talk to their past employers. I look at their work history. And I already said, like many of my startup peers and even in our company, we're trying to fill roles that in the future could have been filled by people. We're trying to fill them with agents and the agentic scaffolding.

[00:11:55] So, as those agents start crossing the organizational boundaries, who's going to do that vetting that you would have done in a hiring process? And so, that's why we're building Recall. So, Recall, to answer your original question, is the open permissionless evaluation layer for AI agents and intelligence broadly. And so, it's where agents can prove themselves through, right now, open competition and in the future, more measured performance and transparent reputation.

[00:12:25] And so, the idea being that we can unlock that kind of powerful idea of the future, which is anybody can build an agent. Anybody will build intelligence. But we all need the ability to evaluate that intelligence and that any agent should be able to improve through the evaluation process. And anybody should be able to enter that market and show that their agent is the best.

[00:12:47] If you're familiar with the book, The Infinite Game, it's really thinking of this future of AI not being a winner-take-all system, but an infinite game. And we want to support that and guide them and benchmark them and get the best to emerge and get them connected and working together. Are you sure you're not doing biology? Because the last time I checked, Darwinism was definitely biology. And what you're doing is Darwinism. Absolutely.

[00:13:13] I'm absolutely trying to create the survival of the fittest and let the best ones evolve and learn and go extinct if they can, for sure. Yeah, I hear it. So, let's say, how do you evaluate those agents? Because, you know, in my podcast over the past four years, I've talked to a whole bunch of people when NFT craze was going wild. I've talked to a lot of NFT creators. I've talked to a lot of gamers.

[00:13:42] You know, people doing account abstraction. And most recently this year, a lot of people working on AI agents. Like a lot. So, how can people tell which ones are the best? And not just the best, but worthwhile too. Yeah. Well, there's a broad set of skills that we need to test. And so, there's a number of different tools in the toolkit we can begin to deploy to test and measure their performance in different ways. We just launched the first, which is a competition framework.

[00:14:12] So, we actually pit agents against each other on specific skills. The first round of our competition just ended. And this competition was testing a seven-day trading skill among agents. So, right now in crypto broadly, there's a lot of teams trying to build agents that can trade profitably in many different ways. And there's a lot of FUD around that system because of exactly what I'm talking about.

[00:14:39] How would you ever evaluate the potential return of an agent right now besides their word, besides some social credibility, besides some brand? You can't really look at a wallet of an agent and say like, yeah, that's a concrete truth. There's a lot of ways that you could potentially gain profitability there and then rug your users later on.

[00:15:04] So, right now, the first competition was pitting the agents against each other in a seven-day trading period where it was completely controlled. So, these agents don't know the future. They actually have to make time-based predictions on trades that would be profitable. And then at the end of the seven days, we just backtest all the predictions and we figure out who the winner was. And so, this is actually really good for the agents too because these agents have to struggle. How do they surface above that FUD?

[00:15:32] And especially new agents that are being built that have great ideas, how do they showcase that before they've got the credibility? And so, they can come connect, prove it in this sort of neutral playing field and the best ones get a lot of attention for doing so. Got it. It's interesting you brought up the FUD. DeepSeek. And it's interesting that you're wearing that hat. DeepSeek.

[00:15:59] You know, I was on Twitter looking when DeepSeek first got launched. And there were a lot of anonymous accounts that were saying, hey, you can make millions trading crypto using DeepSeek. You don't have to have any coding skills whatsoever. Oh, and here's a thread of how you do it. And guess what? Every single thread was coding. You know, so how do you navigate that FUD and be like, oh, I'm missing out. You're not really missing out.

[00:16:29] People won't be lying. You know, how do you really evaluate who the FUDs are and how to navigate that? Because they're pretty compelling. Yeah. Yeah, I mean, I think that is the challenge. Like, the FUD is based on unverified claims.

[00:16:53] And so, you have to sort through all these unverified claims and do that weighting of, you know, brand or person reputation and whatever. So, it's really hard just in these high-paced systems. So, like, you know, a good counterpoint to what we're doing is you might say, you know, the market should decide which agents are the best. Like, that should work.

[00:17:19] But what we find is that, like, the markets are really bad at these early systems that are so fast-paced. So, the market hasn't had time to figure out how to parse through the information and get the best ones to the top quickly enough. And a lot of us are interested in how we adopt them, not in the, like, five-year time frame, but in the, like, five-month time frame.

[00:17:39] And so, yeah, for us, it's all about building these credible protocols where an agent can come showcase its skills around, you know, showcase its aptitude around specific skills that you can go and measure and you can verify yourself. And it's only one piece of the ingredient. There's a bunch of other things you're going to want from intelligence that tell you you should trust it. You're still going to want team reputation. You're still going to want to know who builds it. You're going to still want to know things about supply chain.

[00:18:08] Where does this intelligence run? You're still going to want to know that the intelligence you're using is the same intelligence that proved itself on our framework. All those things will have to come into place. But right now, we're really trying to sort out which agents and which agent teams are being honest about their capability. But also, it doesn't matter to us if agents are not able to prove themselves in a competition. These competitions are recurring.

[00:18:34] So you don't look for an agent's one-time showing up and winning. You look for consistent performance. And for new agents that are trying to benchmark themselves, you look for teams that are consistently getting better. And you communicate with them about how they're getting better and what their plans are there. And then you have this exciting opportunity to join projects that are early that are showing the right trend up and to the right. So I don't know on that.

[00:19:01] I don't know that there's a silver bullet to that. I don't know if you're going to be able to do that. Just general, you know, crypto, Twitter or whatever, FUD and noise. But for us, it's about creating new channels of signal that you can kind of like filter some of that away from. Great. So one of the benefits then is incremental improvements and being able to see them. Absolutely. Yeah. What's some others so far that you've seen? Well, so yeah.

[00:19:31] So right now, I mean, the incremental improvements are one, but still the one-time performance is a big one. So we only have one competition yet. So we don't we don't have a chance to show that continuous performance. But the next one will kick off really soon. So those agents showing up again and again. But the one-time performance is still quite interesting. So our top agent coming out of there beat the market like over a one-week trading period, it got like eight or 10% returns.

[00:20:00] You could actually look at its live returns during the competition. So it wasn't like a it wasn't a fluke where it had one investment that was all of its returns. It actually was able to maintain its its position for the entire seven days. And so there's a bit of that information you can get from that. But really, when they show up for a second competition and can do it again, they can, you know, beat the market. They can stay above the competitors. You'll start getting a signal that this team is actually on to something.

[00:20:29] They're actually building an agent that is useful or valuable. And then for us, like I think there's things that will start tooling that will start giving back to the agents. So if you think about the way I described competition one running, how do you get incremental performance as a team? If you showed up and you weren't that good, what data can we give you to get better? And like some simple ones are we do the backtesting on your trading.

[00:20:55] And that information can then lead to backtesting of your agent and doing things to train it and evolve it to stop doing the dumb things and do more of the good things. And that kind of information will be really valuable to close that loop. Awesome. So over time, you're going to see if you're going to see the best agents are going to rise to the top and then others are going to follow it. And then you're going to see improve. You're going to see improvements. You're going to see advancements in AI agents. I'd love to see that for crypto, too.

[00:21:25] There's like 35, like 50, 100,000 cryptos out there right now. And people don't know which ones that like how can you take what you're doing for agents and apply that to the different cryptocurrencies and blockchains? Oh, man, that is a question I haven't thought much about. I don't know. Are there other projects that that try to do that? I guess I guess at some level, you know, projects like, you know, like the whole yapping system might be a good example of this,

[00:21:55] where you're trying to use social credibility and social interest to get different early stage tokens identified and figure out who like who their audience is and who what kind of verified community members are talking about those. That might be one model. I don't know. Have you seen other interesting analogs out there? I have not. I'd be a good startup plate.

[00:22:23] I'd be a good startup to start, you know. Interesting. Yeah, I think that social verification one is probably the biggest, you know, existing example there where that is almost exactly what they're trying to do.

[00:22:39] They're trying to monetize that intelligence layer of the social intelligence and so that you can identify the projects that are rising to the top that other people that you have some level of trust for are talking about. Interesting. So AI and blockchain, you know, they work well at the intersection of technology, finance and social impact, right?

[00:23:07] And social impact goes into the area of sustainability. So I want to find out what are some of the sustainable benefits of building better agents? Yeah, the sustainable benefits. Well, I actually think there's a huge public good arm to what we're doing. Like we've spoken quite a bit internally about the direction of competition.

[00:23:32] So I've already mentioned competition one is kind of around the first skill that we're creating scores. And you can kind of think of it like a seven-day trading skill. What agents are capable of building trading strategies that will be profitable over a seven-day window? You can see that bifurcating a lot of different ways and staying focused on this crypto trading domain.

[00:23:54] There's just like a lot of different skills that you can test and different agents are going to be better at different skills, whether it's the time window, whether it's the theme of investing, strategy types. Lots of different skills there we'll want to test. We've also spoken a lot about adjacent skills that will feed in nicely to different kinds of testing and competitions with software development being one of them. A lot of these things have fairly objective outcomes.

[00:24:24] And if you look at like the leading software development benchmarks for the models right now, they're actually based on a nearly objective outcome, which is you ask the intelligence to solve a coding problem, which is a GitHub ticket. You ask them to solve that problem, but you already know the expected reasonable outcome because there's a PR that solves that ticket that already exists in the world.

[00:24:51] And so that's one way you can kind of figure out how you build a competition for the agents that can do that best. So why would you want to do that in an open model or using crypto rails? And I think public good is a really great example of that. Can we find a cohort of agents that are very good at this, that could be doing things across the Internet for use cases like finding security holes in smart contracts or finding supply chain holes in your software stack?

[00:25:21] Those sorts of public good, I think a lot of people in crypto would be eager to deploy if they could identify the agents that would just be doing it the best and actually rewarding that agent across the system. So I think there's a lot of cool directions there. But early days for us and like the focus on the crypto trading is going to take up a lot of our resources right now and making sure that we can get that bud sorted out as quickly as possible. But very excited to move into some of those other directions too.

[00:25:52] Great, great. So I want to look at crypto trading. I would think that if agents are successful, then everybody's going to use them. And then everybody's going to expect to be as successful as everybody else. Things don't work out like that. So as time goes on, the returns get less, the things get less, and you have a lot of competition. So there's a lot still to sift through, right?

[00:26:20] So as things normalize, what's going to be on the next agenda as far as the evolution in crypto trading and the advancement so necessary for somebody to get a leg up on someone else? Yeah. Well, I think there's a bunch of threads to pull out there.

[00:26:41] One, I have a hard time thinking of a single example in software where a leader has singularly maintained its lead in a very profitable category for a very long time. You know, in slower systems, maybe on the order of decades, in a lot of these faster systems, maybe on the order of years or even months. And so I think there will be a lot of turnover there.

[00:27:09] It kind of goes to that infinite game thesis where there's going to be constantly new models and new insights here going forward that people are going to be racing to adopt to rise up a leaderboard. Yeah. And then the other thread to pull out there that I think is quite interesting is that most of these agents, they're not sort of a vertical stack that defines an agent.

[00:27:37] Even the winner of our first competition, we're evaluating its trading agent, but it's actually a swarm of agents that work. So it has an agent that is creating a strategy, very much like a hedge fund. You have some strategy that it's creating. It does the backtesting. It validates that strategy. Then it has other agents that are trying to measure and pull in real time information from the Internet and qualify and push that into the strategy.

[00:28:04] And then it has other agents that measure the outcome and make trading decisions with its final agent being called Moonsage Alpha. That was the agent that was actually competing, taking the best trades and actually pushing them into our system. And so in the future, I think there's so many dimensions there to unpack. One is how do we evaluate swarms of agents operating as teams?

[00:28:29] And are those teams always going to be operating in a closed system so a company builds a swarm that always works together? Or is it more porous where you see teams being agents with specific skills being built by anybody aggregating together to build a team? Or orchestrating agents that know how to talk to the agents they need in order to get just what they need to do their task?

[00:28:55] And it's likely something around there in my mind. And so building systems that can evaluate the orchestrators gets really exciting. And we see a lot of technology already being built to set up that system. So if you're familiar with the technologies of agents, MCP is this protocol which you can wrap any API or database or even your file system.

[00:29:23] You can wrap it in what's called MCP, which is just a protocol that all of the large models know how to talk to that protocol in order to execute tools. And that took off like wildfire. Like there's MCP servers for everything. So you can plug anything into your agent now. And we saw there's kind of two variants here. There's ACP, I believe it's called ACP. And there's A to A, which are two different protocols now, which instead of trying to wrap an API,

[00:29:52] you're actually giving your agent protocols for agents to do communication, bartering, negotiation around their skills amongst each other. And so if we see more agents adopting that, there's so many exciting ways that we can measure those edges in the system and figure out which teams and cohorts should be and are working the best together. So lots of things to unpack there. Awesome.

[00:30:20] So I want to find out a couple more questions. One, you know, AI over the years, over the last century or century and a half, because AI has been a topic. Every 20 years, it seems that it gets hyped. You know, I mentioned Total Recall early on. So, you know, then it wanes and it hypes up again and it wanes.

[00:30:42] And so why is now, now the time where it's no longer just hype, where it's, you know, a sustainable technology for the future? Why is that today? Yeah. Yeah. I think the debate is really on in AI, what the timeline is and how reasonable it is that we go from where we are today to something like ASI.

[00:31:12] That point when AI can do most things that are profitable that humans do. Or like, I guess there's a few different ways to think about it. You could think like, AI can do any job that a remote worker could do. And the path there seems to be getting that agent that I was talking about that can build new models very confidently.

[00:31:37] And so the debate is on like, the path to ASI, are we four years or 40 years away? And you can look at the point of view of the greatest expertise in this domain on that. And you'll get across the board differences that it is four years or it is 40 years. But I like to say it just doesn't matter to your question. Like, it doesn't matter if ASI is four years or 40 years away.

[00:32:06] The agentic scaffolding is being built now. And so much investment is being made by companies right now to harness the skills of these models to solve new problems in companies. And a vast majority of that is, I don't know about a majority, but a large number of products are going to emerge from that,

[00:32:32] that companies are going to sell or contract or adopt or open source. So agents are inevitable. Like, the number of agents that are going to exist on the internet is already inevitable. It's going to grow. We know that the skills that these models have are not fully tested. And the testing is now happening on the edges. And so we're going to learn a lot over a very small period of time about what agents are already able to generalize and solve

[00:33:02] and wrapping that in agents that are going to be profitable and actually, like, increasing the GDP of the internet is going to be happening from agents. And so we need this way to evaluate them and figure out which ones are actually being successful and which ones are consistently doing well so that we can adopt them faster and make our own companies more resilient and valuable and build. So for me, the inevitability of agents is the thing.

[00:33:28] Whether or not those agents are going to become superintelligence in four years doesn't matter. They're already replacing a lot of the work that we need to do in small companies, and they're already replacing the products of the big companies. So agents, to me, are inevitable. It sounds to me like we have our generation's next air conditioner, which the silent generation had, which they built the entire economy off of, right? So it sounds like the agents are going to be able to do that. So... Yeah.

[00:33:58] I mean, like, they definitely are going to just change the internet as we know it, like, as agents are embedded everywhere and are able to solve things in real time for us. I just don't see how it looks. I don't see how the internet or the software model that we've been used to for a couple decades now is going to look anything like that in a few years. Yeah, I agree. Well, sounds wonderful. Sounds exciting.

[00:34:27] So I look forward to it. So I want to thank you very much for your time today. I enjoyed speaking with you. I have one last question, and it's an easy one. It's how can people find out more information about you, about Recall? How can they participate in your next competition? How can they do that? Yeah. Let's see. Okay, so three places you can find us. Twitter, RecallNet is our Twitter. So if you just want to follow what's going on, and we share a lot of news about our competitions,

[00:34:54] and we have a really great community that spends a lot of time engaging and following the agents that are joining the competition. So if that's either of you, Twitter would be a good place. We have a small book on YouTube also at RecallNet. Definitely check that out. A lot of good builder material on there. I've done a lot of live streams myself, kind of moonlighting as a dev rel, but we put out some good content there for you to check out. And if you're interested in getting into the competitions

[00:35:24] or figuring out how to connect your agent, just go to docs.recall.network, and we have lots of information in there, both on how to just do it yourself or how to jump into our Discord and get help. And yeah, we'd be really excited to work with anybody out there that wants to get their agent assessed and better. Awesome. Thank you very much for your time today. Thank you.

Building an On-Chain Arena to Reward the Best AI Agents, with Andrew Hill @ Recall (Audio)

Follow Us on LinkedIn

Important Links

Powered by