---
Transcript
Intro
**** · Can we start with the backstory of why you decided to build Pi? I personally simple tools that are stable that I can rely on even if they have non-deterministic parts. So you can ask Pi to modify itself. Pi doesn't have MCP. People just ask Pi to build MCP support into PI.
**** · Non-engineers participating in engineering process is a thing now.
**** · You might have a PM who wants to try out a feature without wasting time of an engineer. Now you can do that. The problem is that people are now so focused on everybody can do everything now that they forget that you still need a process to guard rail all of that.
**** · But you just recently wrote we all need to slow the f down.
**** · All the companies claiming that all of their code is now written by agents.
**** · Yes, we know the quality is garbage. We feel it in our bones when we use your product. It's garbage. I think people need to What if I told you that one of the most influential AI coding agents of 2026 was built by a single developer in Austria who got frustrated with existing AI coding agents. This is Pi, a minimalist self-modifiable coding agent which has quietly become the engine behind the wildly popular personal AI assistant OpenClaw. Mario Zner is the creator of Pi and joining him today is Armen Roner, the creator of Flask and now an early adopter and contributor to Pi. In today's episode, we cover the backstory of Pi and why self-modifying software is much easier to do with AI agents. What Armen learned interviewing 30 plus engineering teams about how AI agents are changing how they work and why software quality feels it's trending down, the case against MCP and why CLI are becoming so popular and many more. If you want to hear from two very grounded voices in the industry honestly talk about what's working and what isn't and why we need to slow down as an industry, this episode is for you. This episode is presented by Statsig, the unified platform for flags, analytics, experiments, and more. This episode is brought to you by Work OS. Engineers love to build. Today's episode will be a great example of this. We'll get into why and how PI was built from the ground up. But when you're shipping a product, some problems are better solved with trusted infrastructure built for scale.
**** · Enterprise features SAML, directory sync, and audit logs are some of those. Work gives you APIs to add them in days, not in months. Ship faster without reinventing the wheel. And now, let's get into the episode.
**** · Mario and Armen, it's so good to have you here on the podcast.
**** · Thanks for having us.
**** · Thank you.
**** · So, as a kickoff, Mario, how did you get into tech and eventually into building AI stuff?
**** · Oh, well, that's a long story. How much time do we have?
**** · So, I'm a kid of the '9s, **** · and got my first PC at 909 96. And the trigger for that was that I loved computer games. We were working poor, so we couldn't afford any of the Game Boy and NES, Super NES stuff. But I had an uncle who had an Amigga 500 and I would go to his place every second day and just play games there. And eventually my parents told me if you work you can save up and buy yourself a computer. And in reality, my dad would do what's he called? Schwartz of it.
**** · Well, you're not necessarily paying the taxes on your Yeah. So, he would do his normal he would do his normal job and after his normal job, they would go fix cars and work at construction sites and Yeah. It's very common in Europe. I know everyone did that.
**** · And after two or three years or so, they just said, "It's time." And took me to a computer shop in the nearby big city and bought me a 486. And that's how it started **** · Pentium 486.
**** · Yeah.
**** · an Intel 486DX40 MHz with turbo button and that's where I started and I've always been into games a lot which also led to graphics programming and through sheer luck I got a job while I was studying at university at applied science organization who was doing NLP stuff machine learning applied machine learning taking research results and trying to stuff them into industry applications And that's where I learned the ropes of machine learning. That was all before deep learning became a thing. And I quit that domain in 2010 111ish because I joined a startup in San Francisco.
**** · And then later came back and joined another startup with two friends in in Sweden where we did an ahead of time compiler for jaw bite code to iOS that got sold. And since then I have a little bit more time and I've always kept up with machine learning stuff because obviously super interesting. and yeah and then GPT happened and that's the story.
**** · Yeah. And here we are. And then Armen where were your roots?
**** · So my roots are definitely not working poor but I because my parents ran an architectural office where they adopted computers for cat drawing. My first computer was old computers that they recycled. So, my first computer, even though I'm younger, was in three. I'm so sorry for you.
**** · And so, none of the computers that I ever had were capable of playing computer games properly. because one, they used Windows NT, which at the time didn't do anything. So, you had to build your way through it. And the only way you could get them to run was because before it didn't know yet how to get the Windows 95 or Windows 311.
**** · that was before it booted into either one of those you could boot it into DOSs really old DOSS games at a time when you could already get better stuff but because it was this thing I started toying around with quick basic a lot with to Pascal I bought a bunch of books on that and I that was my roots of of learning how these things work and it just I wasn't ever really good at this but I found it really interesting this idea of No, for sure.
**** · I I We call it a tabler in Germans.
**** · No, I swear to you I was when I when I started dabbling with this, I just really sucked. But over time, you if you keep doing this, you get better. and then in 2002 or three, I used I used to use Deli a lot, which was a visual version of Turbo Pascal.
**** · Yeah.
**** · And in 2002 or 2003 someone also showed me because I I' I've got this idea I want to use Linux and then I del I didn't work on Linux and then I found Python and through that I started doing some Pyth programming and there was a YUbuntu just came out in 204 and that was a venturebacked vehicle but it they created all this local community. So there was this Ubuntu association. I together with a bunch of friends we started the German Ubuntu foundation not that foundation association and we ran this online community called YUbuntu users for four or five years and we and because Yubuntu was popular the community grew and then scaling problems came so that's how I got into web development and then for building this I just I wanted to build a templating engine a web library all of this and then eventually I bundled that together and made this flask frame work which got very popular and even nowadays still is a thing that clankers to spit out.
**** · That's hilarious.
**** · h but I left it and then in in 2013 14 or so I worked on computer games for a couple of years in London but then afterwards I went back to open source and I worked on Century for 10 years and then left in April last year to try something new.
**** · So both of you are originally from Austria. In fact you now live in Austria as well you were doing games, you were working at Sentry, you also did games before. Then the third person who's not in the room but was on this podcast just before is Peter Stainberger also from Austria. Great that the two of you meet where the three of you meet cuz I've I've recently seen a bunch of photos especially before Open Claw and Pi started you hanging out the three of you experimenting playing with AI.
How Mario, Armin, and Peter Steinberger met
**** · I think the two of us met on the internet on Reddit. It depends because I definitely met you once when I was at university.
**** · **** · So, but you didn't recognize me that time and I was useless.
**** · I was already famous.
**** · but yeah, we abstractly met on the internet, but eventually we met up in Vienna. we were screaming a lot at each other, but on the internet, but in a in a very cute way, in a very non-confrontational way. And even though we might not think alike in all areas of of our lives, it was a cultured exchange, I would say. So that was nice. and Peter I 60° of Peter Steinberger I was working at an office in my town and the company that gave me free office space in exchange for being a mentor to the CEO had some business dealings with Peter's company, PSPDF Kid.
**** · PSPDF Kid. Yeah. and eventually came to the office in Gratz and I think that's where we met the first time and then also the same year we met at the conference in Istanbul and just hung out for an entire night and that's where it all started. Nice. And then how did the both of you go from being skeptical about AI when these tools came out and again both of you have to at that point and by 2022 you've been doing a decade plus of building complex software in different domains. what was your first reaction to it and then eventually how did you come across to the side of well this thing is really interesting so for me it was I think in 2022 I think co-pilot GitHub copilot came out before TPT yes in 2021 yeah and through my previous startup stuff I was working with Ned Friedman and Miguel Daza from Samarine because they acquired the company with Samarine yeah they acquired the company I talked about earlier the Java compiler thing I knew Ned Freriedman from our early startup stuff and eventually moved to GitHub and then was in my DMs in 2022 I think and asked if I wanted to have access to GitHub copilot the tap tap autocomplete thingy and I was I don't really care I don't think this is going anywhere and he's no man it's the future got to try it's the future so I tried it and it was absolutely horrible but yeah after when GPT came out and especially when they started providing API access I did a lot of projects just figuring out what works and what doesn't work not necessarily in the coding space but eventually once they had tool calling that's when they became very interesting or function calling as openi called it back then but it took until 200 I would say 24 end of 24 October or so for that to be useful and that's where the coding agents also became interesting and then 2025 the cloud code team came out with cloud code and that introduced the chanting search. So just give the agent a way to plow through your file system and read all your files and it made the whole difference all the things that came before cursor with indexing and any based stuff and all of that just went away and I know that the CEO of Chroma is probably mad at me for saying this but that was the difference that it didn't it wasn't a dense and sparse search thing that the agent could go through it was just give it access to your files.
**** · That was it for me. That's where it clicked for me.
**** · I think my path was similar. because I think Copilot came out quite a bit earlier, but I know that there was a program at GitHub that gave you early access to Copilot at the time. I think it was this maintainers group or something where I still was in I got the feeling for Copilot that this will be really interesting.
**** · but not in any way in which it is now because I felt oh I am in open source for such a long time and now they're doing training in open source data. It's there is something at the very least this will be controversial. I didn't think about it being productive. I felt oh this is going to be is going to be a controversial thing with training open source data and and I was I remember for a time was I was trying to probe it really whether there's flask in there. Then I was trying to probe it really adversarial. So one of the things that I probed on is I probed on will it retail GPL code and I remember at one point I got it to spit out the carax inverse carax inverse square root function which is was very easy because also it was had a very specific name. So was very easy to get the recall but I also found you can you can tab in a certain way then it would then continue putting license text on top of it.
**** · It was completely wrong. it came from an open source GPL drop of of Doom originally I think. and so it was it would have been GBL code if it would have done that but it attributed MIT license from a random dude and I was oh Mr.
**** · Cop that's the wrong thing and that tweeted at the time got really popular and then people started sharing with me because I was at a time not really exposed to how much actual AI progress was being made in those labs. Yeah, I didn't come from this AI space or ML space. So I was I learned about a university and oh there's AI winter and then nothing happens. But through this tweet and some other things I other than I I re recognized that there was something there there's there's CEOs in certain companies are convinced this will get off and that's how I started paying attention to it and I was essentially I was trying all kinds of stuff with the API can you do bug fixing things I got really interested in it but it didn't at all feel the world is going to change until cloud code and you also changed your stance on the whole oh my god this spitting out open source code. It memorized.
**** · So, because my my shtick for many years now has been that I really I'm a I want people to share stuff I think human progress comes from building on top of each other and I'm a huge supporter of the fact that in the US you take knowledge from one company to another company that then no competes I this pirate approach to sharing.
**** · Yeah. spread of knowledge.
**** · Yeah.
**** · And so I was my optimal version is copyrights don't exist in a way or very limited version of this. I was I really didn't care that it spits out GPL code and doesn't attribute I was oh maybe this will just completely destroy copyrights and for me that was oh this is I if that's the outcome of I'm I'm fine with it. So it was but it was it was an interesting thing in the beginning that it it creates this license violation I want to see what chaos will emerge from it and so far I think mostly what has emerged from it is a strong belief now that the system in place for copyrights has some presumptions assumptions in the US about how it's supposed to work and we're all ignoring that now because we want to create the mess first and then reeregulate probably because at least in theory a lot of the things that we're producing now are probably by historic readings of the copyright interpretation not copyrightable.
**** · Yeah, that's an interesting one.
How 30 dev teams use AI agents: learnings
**** · But speaking of jumping to today so an interesting thing that you did recently, we talked about it just before is as part of your new startup is building things on top of agents and you talked to about 30 different engineering teams saying hey how are you using agents inside of your company inside of your team? What did you learn from large companies to startups?
**** · I think the a bunch of learnings entirely unsurprising is that whenever people had vacation, there was more time spent on trying these tools and just to be clear you talk with folks at the likes of Meta Startups.
**** · Yeah.
**** · a bunch of different people, So a bunch of different people from different European dinosaurs are you pointing at me?
**** · Well the European dinosaur would be someone cementss.
**** · Yeah.
**** · Or I also talked to companies which are in a critical space. And what when adoption happens when people have vacation is that when your CEO or your tech lead comes and says you got to use cursor now you got to use cloud code now is you don't get it in a way because you need to spend some time on there's a there's a it's a two to three week thing until it really clicks on you and so I always felt with the people that I knew I had a lot of free time I left the company in April until October. I was I can dive into this and I this is how does nobody get this and a catnip for all of it was it was crazy catnip. I didn't sleep much all of this. But what happened within the company seemingly is that when there was Thanksgiving, there was for the Europeans a lot of it was over summer and then at Christmas a lot of people and they also get free credits during those times and so more and more people get Oh, you mean the companies often give you generous so and so more and more people went into this and especially after Christmas I would guess in more than half the companies I talked to after Christmas it really exploded and it exploded in And so in all the ways we would expect it where all of the sudden the quality drops and and it doesn't necessarily drop because people want to make worse code but because it takes some effort to stay within this and we have seen this in the startup ecosystem already in the summer last year if you if you pay attention to the YC startups a lot of them some of them have their stuff on GitHub or for some period of time on gith when you can look at it and at the time because of plan MD files checked in and everything attributed to CLA. So that vibe coding thing was for prototypes and whatever that built that out. that was already out there to see. But then gradually a small version of this has been code bases with a little bit of vibe slop on top. And an interesting part of this was how engineering teams and companies are now responding to that with all kinds of different findings. But a lot of it has been challenge to review PRs. They're getting larger and larger and they're becoming more psychological and engineers specifically are having a hard time keeping up with the longer PRs that they're more frequent.
**** · Yeah.
**** · And there also there a lot of the code in those PRs is how an engineer wouldn't do it because as an engineer you get a really bad feeling committing certain code because you think of your future self and the agent really does not care. This is I will retell this story over and over, but I worked for an Xbox One game at the time around the Xbox One launch. So that was a fixed date.
**** · It has to release on that day. So I worked on the Halo Master Chief Collection and there was a game where you had a matchmaking component and you had to start this thing and whatever. And it was it was an all hands-on deck situation where people had to go in and unslop the humanmade slop that was the matchmaker.
**** · And it was it was it was a system with way too many states. We call it an emergence state machine because it was 16 bulls on one massive thing and in theory there were only six valid states but in reality it was a geometric explosion of possible states.
**** · And that's how a centi code feels where it really should only be a very clearly defined system but in all reality they're oh we can config doesn't load let's catch it down and load the default config. So instead of failing, it now recovers. But now your code is way more complex than it should be because instead of failing properly, it is now recovering and entering these many more failure states.
**** · And that makes it much harder to work with this code because you can also not really ask the agent to refactor applications oh yeah, this could be possible. So we need to maintain this variant. I think it's even worse than what you described about your humanmade complex system because there are moments of brilliance in agents where they spit out perfectly fine simple code exactly the amount and type of code you didn't need for that specific thing and you as the steering engineer looking at that wow this is amazing I can just sit back and not care because it's obviously doing the thing two minutes later you have another agent running in this window and it spits out the worst horrible garbage suppose but you might not notice because now you have fallen into automation bias and think your your agent is doing the job well. Do you think this might be our bit of a human bias because typically onboarding a new engineer you have when you join a new grad you review their code and if it's terrible code you will review the next one thoroughly until they get to the point that oh it writes the code that I do and then it typically takes 6 months or a year or something that but then I can trust this person. Yes, but you don't have anything that with agents. agents don't learn. You can put as much stuff in the agents and do you build a memory system, but that's not the same type of learning than u a human does.
**** · Obviously, humans are failable as well.
**** · No, no, no matter. But they have some capability of learning and retaining that learning, **** · Yes.
**** · And they also feel pain. And I think that's one of the defining things about humans. It's ties back to what you said. Eventually, if the pain gets too big, you as a human are incent incentivized to fix the cause of your pain. And in a codebase, the cause is usually terrible interfaces, terrible complexity that you want to get rid of because you can no longer maintain that system. Isn't this why just holding on to that, senior engineers are always in demand because from the CEO sees a senior engineer as they just get it done, but in reality, a senior engineer, most senior engineers who are effective, they've had battle scars. They've been burned. They felt the pain. You they saw what happened when they left tech def spiral. So they now make all these decisions that they know they will help avoid. And of course through this progress goes faster. I personally think and your mileage may vary. But a good engineer is an engineer that says no a lot and I don't need this a lot.
The importance of judgment
**** · Mhm.
**** · Because that keeps complexity down. If you're using agents, the exact opposite happens. You say yes, I want this and that. I want this and I want this and I want this because I don't have to type it myself. I don't have to think about it. I just give the little machine a prompt and it will spit out something that looks the thing I wanted. Good enough. And that's where all the problems start.
**** · And one thing that I also think is good engineering is all about knowing the trade-offs that you have to make.
**** · And there's sometimes the solution is if you were to sit at university and learn about it, you learn that you shouldn't be doing this in a way. I think Kell Henderson had this once where he said you do the dumbest solution first until it doesn't work anymore because the actual problem is there's so much stuff that you need to do that if you do the solution the correct solutions all of this it is it you're creating the complexity that kills you at scale and the engineer learns that but also if you if you don't have that battle scar it's very hard for you to argue correctly because it is it is this learning process that gives you the authority to then convince other engineers in the engineering org that you should be doing it this way. That is part of it. You learn that. But the other thing is also that the agents give you now world knowledge access. And one of the other things that I learned through interviewing engineering teams now is that the senior person says no knowing something and then 48 hours later the junior comes by and said I talked to the agent and I already had this inkling but now I have all the evidence of why we shouldn't be doing it this way because previously you really didn't have that readymade access to someone who can tell you a senior off.
**** · Yeah.
**** · Ex and this is creates other stresses now that were previously not every team has that because it's people going to the doctor with a jetp print out and saying this is what the machine said you better do that. Is it fair to say that we are based on what you're seeing and talking we might face a thing where it's very hard for experienced engineers to it's harder just for them to say no in spite of the product manager or a junior engineer saying it's much worse because the product management comes in sends pull requests and autom should them yeah that's another thing non-engineers participating in engineering processes is a thing now ask how that works Ask him how does it work?
Challenges when non-engineers write code
**** · How does it work Armen?
**** · Well, it's hard because if because on the one hand it's well intended, If someone who's what is your experience? Is this your company talking with other people?
**** · So, first of all, we have a little bit of this air we're small and so my co-ounder for instance sometimes has a pork on the website. I talked to people that have that at scale where the marketing team all of a sudden does stuff on a website and the sales team creates ever more elaborate sales demos that land up on a GitHub or partially at this one of the most funniest one was where the sales demo built a feature that didn't exist but nobody noticed so this is all this is new because previously none of that happened.
**** · But I think it's empowering if your entire empowering it's there's a good thing to it in too if your entire or if everybody in your or can participate in in the creation of software in some form previously people couldn't do that you had a designer who could figure something out in Figma but they might not be able to put it into a clickable dummy demo whatever or you might have a PM who wants to try out a feature without wasting time of an engineer. Now you can do that. The problem is that people are now so focused on everybody can do everything now that they forget that you still need a process to guard rail off all of that.
**** · And the integration part is the hard thing. It's that Peter gave this idea of the prompt request, but I'm really warming up to this idea once you've demonstrated it, I no longer need your code. And just to recap, the prompt request was him saying that he doesn't to get pull requests. and said he would rather see the prompt because he will run the prompt or he will tweak it and it will generate it in the style that for me it's less about I want to see the prompt as it what is it supposed to be doing and now that we understand because in many ways I think the interesting part is often you don't really fully know what you wanted to do in the first place and so the act of creating clarifies what you really want to do and so that part is highly valuable often the approach and the code that comes out of it is not what an engineer there with sufficient seniority would have done. So it's not I want your prompt so that I can reclank my clanker so that it does it slightly better, but more now that we know what we wanted to build, it's probably faster for me to start.
**** · Yeah.
**** · And I also disagree with Peter on I just need your prompt. I value seeing a terrible implementation of something. if I get a pull request and most of the pull requests we get on the Pi repository are made by agents without a lot of human touch. let's say then I immediately know okay this is going to be garbage but it's valuable garbage because someone has put in at least a minimum amount of thought instructing their agent to create this pull request and I get to see how a shitty implementation of what they wanted to build looks and I get to I don't need to waste my own time on trying that out. So somebody else tried it out already that the naive dumb agent do the thing do no mistakes version and that saves me time. I'm not saying I pull requests by agents because they're terrible and they autoc close them now, but they have value. It's it's not just a prompt. It's on an exponential rate sigmoid eventually always because ser dynamics, but I think we're going to find out way earlier than in previous cycles that this is a bad idea.
**** · That's good news. What I think is going to be interesting and I don't know the answer to this but I read this fascinating retelling of the British industrial revolution and how it changed the textile industry.
Downsides of over-automation
**** · The industrial reind industrial. Yeah.
**** · Yeah.
**** · And so the the the general thesis on that article was every time something at the head of the pipeline got optimized. It created an incentive downstream of the whole thing to create something is in the beginning if you can weave the thing faster then eventually you need to have Garn that can be weaved at faster speeds then eventually you need to everything turned a bottleneck all the way down and ultimately the biggest bottleneck in the entire thing turned out to be what I think is the next bottleneck we're hitting in engineering which is at one point you made a shirt and if you didn't the shirt you went back to the person that made it and they fix it up for you and so the actual thing was if the shirt is bad nobody cares about anywhere who destroyed fur in the process. Is it just going to get a new one? the responsibility went from anyone in this chain to the entire factory as a whole doesn't have to care responsibility anymore because we have we've commoditized the whole thing so much that you don't you don't have to do this. And if you take the engineering approach of it, it's a pretty significant part of running a company and running a service is running it reliably. And so you have these postmortems on incidents to figure out what went wrong in the process and you go back and fix the shirt.
**** · Yeah.
**** · And and the thing is we we are running all on this idea that every engineer that is in this creation process that ultimately let up carries some responsibility and that we're going to that person and not saying to blame that person but to figure out why did you do wrong here? And so if you do if the machine now produces stuff at 10 times the speed the responsibility thing does not scale in the same way because a machine cannot yet be responsible. And I don't know if there is a future where you can abstract away human failure so much in how we run engineering that now the entire company now no longer cares about who signed off on a poll request or something that. We that we automate it in the same way I think as we are automating t-shirt creation. I just don't yet see that. But so here's the thing. I think one thing we software engineers or IT people underestimate is just how freaking complex the world is and how much human squishiness is in each little nook and granny and corner, So we thinking, oh, we could we were now able to automate that thing. now we can automate everything every bit of knowledge work. But we as software engineers are so bad at becoming domain experts that we don't see all the non-machine parts that go into a workflow and we running through the same fallacy here again. We seeing models doing incredible things.
**** · I'm not disputing that. this is for me this is w all my research in the 2000s is now null and void because transformers can do all the things. But we are overextending that to everything we always do in software we did in edtech. Yeah. We have tablets in classrooms now.
**** · Sure. Now it's solved. Education is solved because we have now computers. well, in fact, I've heard I don't know which country it was, but they're now rolling back in Sweden. They're they're taking the tablets out from the classroom.
**** · It turns out if you do some scientific investigations into the tactics and effects on pupils, if you do just throw a bunch of tablets into a classroom, close it and hope for the best. Turns out the best is terrible. so yeah, I'm that for me I think the biggest takeaway from the past two to three years is the hype is terrible because it's dehumanizes everything and I want to not be part of that circus.
**** · Well, speaking of not wanting to be part of the circus, let's talk about Pi, which is a which is a very popular Let me get my clown nose and also minimalist coding agent. C can we start with the backstory of why you decided to build PI at a time where there were already agent harnesses around because they were suboptimal.
Pi
**** · Tell me more.
**** · Yeah, sure. I so I was a believer in cloud code just because they created that whole genre through the invention of a gentic search. inventions. There were precursors to that and shows of giants and so on, but they were the first that packaged it up in a really compelling package. And at the time that fit my workflow really well. It was simply it was predictive.
**** · So the LLM uristic nature or stoastic nature of being unpredictable, but everything around the LLM was nice and tidy and easy to understand.
**** · So were you were a happy user of claw code, **** · I was super happy. I was proitizing it. But eventually the team started dog fooding and getting more and more tokens I guess and increased velocity and team size. And with that came more features and much much more bucks.
**** · And I personally simple tools that are stable that I can rely on even if they have non-deterministic parts. But all the deterministic parts should be as stable as possible. And that was just not the experience with cloud code around summer 2025.
**** · Mhm. So I soured on that real hard.
**** · Was it was it bugs? Was it unexpected behavior?
**** · so they take away your control of the context. They would inject stuff behind your back which is bad. And then your workflows that used to work stop working because there's now a system reminder that you don't even see in the UI. that will modify the behavior of the model. They would also do this to the system prompt. I I reverse engineered. I wouldn't call opening an offiscated JavaScript file and unoffiscating it reverse engineering coming from a more low-level background, but I reverse engineered cloud code during the summer of 2025 and build a little service where I can track the progression or evolution of the system prop and tool definitions in cloud code and it's every release it was messing with stuff. CC history. Mario.at if you want to see that. And yeah, that just messed with my workflows and I don't appreciate that. If I commit to a development tool, I want it to be a stable, reliable thing a hammer. I don't want my hammer to break at a different spot every day.
**** · Yeah, that's terrible. So, that's what happened with Claude. But again, I'm this is not I'm not roasting the team. I think they're some of them are really nice people I got to know on the internet. They're just dog fooding and that's perfectly fine. We need somebody who goes the full velocity way. I But I don't want to work with a tool that.
**** · Yep.
**** · Because I can't get work done.
**** · It sounds the move fast and break things. to break things was not for you.
**** · No.
**** · And then I looked into alternatives and AMP and Droid came out around that time. I think pretty early in 2025.
**** · I don't remember.
**** · Was early was very early. I think they spun off from the same experience of taking because I think AMP was around when Clo came out. I'm pretty sure around that time. Yeah.
**** · Yeah.
**** · In any case, I looked into those harnesses and they were super good. they were just super expensive as well because none of them could use what made cloud code enticing on top of it being a cool tool the subscription and that works in an enterprise setting where you're paying by token anyways but it doesn't work for the small tinkerer in the garage while I'm not a small tinkerer in the garage in the financial sense anymore I still relate to that community and I would to use my subscription with something so I looked into open source alternatives and found open code. But while that wipes me with my OSS roots, it too did stuff to the context I didn't appreciate behind my back. pruning tool results after a certain amount of tool result token output or asking an LSB server after every single edit the model makes if there is an error. Yes, there will be an error because the model isn't done yet with its work. So the code doesn't compile. So the LSP server will so reaching out to LSP. The language language server protocol server. Yes. So when you go into VS code and you type some TypeScript you have in the bottom some error diagnostics and that comes from an LSP server for TypeScript and Open Code runs an LSP server on your behalf in the background and feeds the model with diagnostics from that server on every edit. We as programmers how do we work We go into one or more files. We added line after line after line and only then look at the errors that resulted from that. In open code's case or in other harnesses cases that also support LSP, the model calls an edit tool to change lines and they would inject the diagnostics after every edit call and that's just not smart because now you're confusing the model with you have an error, you have an error, you have an error on the model yeah I know I know I'm not done yet. Oh, it's not Yeah, it's not great.
**** · Anyways, TLDDR open code wasn't for me either. It was also I had to fork it to modify it which I don't think should be necessary. So then I just thought how hard can it be? I built my own little thing.
**** · And then your own little thing is pretty minimalistic. What does it use? What's the basics of PI?
**** · The basics of PI are my own abstraction over all the LM provider APIs because I didn't the VCEL SDK, the VCELI SDK for various reasons. Armen wrote a blog post eventually about that as well. It's obviously good to use. Lots of people use it. It just didn't fit my old man sense of abstraction.
**** · But this is the beauty of software and open especially open source. You can build your own always.
**** · Yeah.
**** · And now with agents you can even do it faster and produce terrible complex software. No. So I built an abstraction over that. Then I built a little abstraction for generalized agent loop with tool calling and streaming all of that. I built a bespoke little tool that doesn't flicker or not a lot. And then I tied that all together into a coding agent that looks clot code or codeex or whatever you have. that's it. And the extent ability comes from the fact that this minimal core has so many hook points that you can hook into with a simple TypeScript module that gets loaded into the same node process and that allows you to do things provide the LLM with custom tools do your own compaction implementation fully revamp the TUI itself. You can modify everything in the TUI. So if you have a special the terminal UI exactly if you want the TUI to behave differently for a specific workflow you have say you're non techy you can change the toy to become whatever you need as a non techy and I have a couple of nonty friends that did that because they don't need to know how to build this they can just ask pi to build it and pi will modify itself oh so this is a thing so you can ask pi to modify itself because of the extension points and it can write code that extends itself and it's trivial, but it's a big unlock.
**** · Is this what you meant when you said that? For open code, you needed to fork it to modify it. It doesn't have this.
**** · It does have a plug-in system, but there's not a lot of extension points and was very rigid. I think they changed it recently. I think it's much more open now. I haven't kept up with it, but might be better now.
**** · So, I guess Pi Star has this very minimalistic thing. As I understand the tools it has is read, write, read, write, edit, bash. It's all you need. That's it. And then you can start to make it your own okay at what are examples that people would add. Pi doesn't have MCP. People just ask Pi to build MCP support into PI. Pi doesn't have a plan mode. Armening goes and my plan mode must be fantastic bespoke and super.
**** · I don't have a plan mode.
**** · Yeah.
**** · But he has five implementations of a plan mode until he realized plan mode is entirely useless.
**** · Other people just messing with the UI and making it their own, a different visual style of the editor box where you enter your prompts, stuff trivial stuff, more cosmetic stuff. other people have re-triggered it for fullblown RL environment for open weights models where they use pi as the agent that does that part of the RL execution environment. So it's you can do anything really. What drew me to it beyond using the library abstraction was in fact the custom tools part because one moment for me was over Christmas again many people had some time and I tried to build other things and I and Peter was talking to me in November that he's vibing without looking at code more or less. I don't know exactly how he said but he's he can do this now okay I want to build a thing where I don't look at the code. I wanted it to not look slop. I wanted I wanted a version of it where afterwards even though I don't really look at the code it should look what I would have written and so and I wanted to make a game and so then I started the whole experience with a just basic pie we want to build a game but before we build a game I want you to set up the codebase in a way that you can validate the changes that you're making but also I can see them a a twoprong approach I wanted to be in the loop but also O have the agent be able to validate itself and what emerged out of that was well first of all it built itself some debugging tools into the game so it can make screenshots and run a simulation and dump out state and read it again but also pi can show images in a TUI and and I added so a bunch of I talked with the twanker to figure out what would be interesting things to do but we ended up having a all the screenshots I can tap through quickly in the UI or I can pull Alo this great feature it can reverse to an earlier state in the conversation and then it can branch within the conversation to build a bunch of stuff around that because these the sessions especially with screenshots and it become very token inefficient very quickly. It was one of the other things that pi was rather quickly rather good at was having a lot of screenshots in it because open claw people had a lot of screenshots in their chats and open claw is using pi. Yeah. So yeah, but having this it felt really magical for me to treat the problem as I don't know what the way of engineering here is but very clearly part of it is I should be in the loop so we can figure out how to specifically for the problem at hand do that and it turned out for web project and computer games and some of the other things I tried they're different but very many of them are come down to a similar thing where The agent interacts now with my program and it should do the most optimal way and I want to interact with it in conjunction with it interacting with the program and the entire experience should be as little confusing as possible to both me as a human and to the agent. And I found it very fascinating just to see how that emerges where your tool all of a sudden when you launch it in this program looks and feels different than if you launch it in the other program.
**** · I really this point. Arma made just a few seconds ago that AI works best when the engineer stays in the loop and the system can validate what changed. And this is a great time to mention our season sponsor Sonar. AI can now generate code faster than you can verify it. Sonar, the makers of Sonar Cube, sees this leading to serious gap in verification. With the rise of coding agents autonomously writing code, verification is no longer a nice to have. While the latest coding models are extremely intelligent, they also are errorprone and they don't fully understand your code base and your context or your objectives. This is why verification must be mandatory in agentic workflows. Sonarq provides a zerorust multi-layered approach to code verification that is consistent and repeatable. It analyzes semantic syntax, data flows, and architectural boundaries at agent speed acting as a critical trust and verification layer before any code reaches production. Covering 40 plus languages and 7500 issue types, Sonar Cube is the most comprehensive code verification platform available.
**** · And with easy integration via MCP, CLI, and hooks, it fits into your existing AI tool chain. Let agents move fast and have Sonar Cube as the independent multi-layered verification for safe, reliable, and auditable agentic development. Head to sonarsource.com/pragmatic to start verifying your agentic workflow today. I'd also to talk about our presenting sponsor, Statsig. Static build a unified platform that enables both experimentation and continuous shipping. Built-in experimentation means that every roll out automatically becomes a learning opportunity with proper statistical analysis showing you exactly how features impact your metrics. Feature flags let you ship continuously with confidence. And because it's all in one platform with the same product data, teams across your organization can collaborate and make datadriven decisions. To learn more, head to stats.com/pragmatic.
**** · With this, let's get back to the episode and to the topic of general versus purpose-made tools.
**** · Yeah, I spend a lot of my youth on construction sites to earn money. And you don't use a hammer for all your problems at a construction site. You have a screwdriver, you have your hammer, you have your drill, you have whatever. And I think in engineering, it's the same. I'm not using the same tool for every task I do as an engineer. So now, if I use an agent, I don't want a general agent for every task, per se. I want a specialized thing where I know the performance will be topnotch for that specific task because we built the harness in the way that the agent can be most effective at this task just because of the construction of the way the harness is constructed and that's what I wanted to enable with PI. That said, I'm probably the person that has the least amount of modifications in Pi. I have two extensions that I use and they're trivial. They're just if you see a URL that looks a GitHub issue or pull request thing, pull down the details via the GitHub API and display me a small little widget on top of the editor that gives me the issue title, the author account, and a link to the issue. That's all I do.
**** · Well, but it might work for you as a minimalist.
**** · Yeah, that's how I work on the on the Pyon repository because I might have two or three of sessions open in which I process an issue or pull request. That way I remember what s what the session was about.
**** · But sounds you also made your Pi for that for working on the Pi monor repo a specific one. And if you if you were working on a if you went back to building games, you'd probably have a I never thought of the fact that you might want a different harness for a different task. I guess we just assume that most developers you work on your main thing at work. You might have a side project and just experime experiment with whatever. But this I wonder if this is a new thing that we could never have. We could never have custom tools for a project. That just sounds crazy, Here's here's the my intuition is this. I think where we are going is software that modifies itself on behalf of the users's wishes and needs and the agents can do that now if you give them enough rope to modify themselves. And I think with Pi that is my first foray into this selfmodifiable malleable thing.
**** · just for the coding agent sector but I think this this can be extended to all knowledge work to a degree for specific tests within the broader set of knowledge work obviously the humanization and so on but yeah the next plan here is to have an alternative user interface to the TUI because the TUI is obviously limited and the best alternative stack is obviously the web because it works everywhere and can do anything so once I have that built out that then it really becomes was interesting because then you're not limited anymore to the line based rendering of a terminal. Now you can do really interesting stuff. And so yeah, we'll see how that works out.
**** · And one reason that I learned about Pi before I knew that it was this minimalist interface is how OpenClaw is using Pi. How did that come? And we were hanging out and reviewing each other's blog posts and just throwing ideas at each other. And in October, I started building out Pi and Peter started be building out V relay, his little WhatsApp assistant, so to speak.
OpenClaw + Pi
**** · Oh, that's how it started.
**** · Yeah.
**** · And he was in search of a gent or copy. I think it started out by him taking Pi and cloning it and calling it towel and then modifying it. But eventually he got tired of having to maintain that. So he just said, I'm going to use your stuff. And that's how it ended up being. Pi wouldn't have compaction if it weren't for open call.
**** · No, I specifically built that because Peter was crying in the in chat and I need compaction. Okay, you get compaction, but I'm going to tell all my users, don't use compaction. It's bad for you.
**** · Yeah, but that's I guess the beauty of building on top of open software one another, **** · it has pros and cons. Yes, I'm now get to enjoy all the openclaw instances that think bugs in open claw are pi bugs. So they autonomously send me a gazillion issues and pull requests without the users probably even knowing and I get to deal with that in my open source. So that's not that's a negative side effect.
**** · Well, so you're you're really on the receiving end of this I guess.
**** · just open call itself is which is much more exposed to this problem. they have tens of thousands of issues now and there's no way they can get a good grip on that.
**** · But how are you dealing with the fact that you now have open claw just AI autonomously opening things on your repo as a maintainer? Do you build tools to battle this and try to close them out or build a tool for open claw ones which embeds issue and pull requests into a 3D space so I can see the clusters of similar things that agents would have sent to the repository and then I can bulk select things and close them out in Oh really? So you have a 3D visualization.
**** · Yeah.
**** · open for context at I think it's less crazy now but end of December to I think midFebruary it was exploding obviously but this explosion almost directly translated to I I was on this repo refreshing pull request and the number went up we tried to contri contribute and help out Peter a little bit but I immediately gave up I didn't know how to do anything useful there was looking at this I was This is a type of software engineing I'm just not used to.
**** · Yeah, I I would fix two things and spend an hour on them and then five minutes after I committed and pushed it, some clanker comes along and just reverts my fixes and this is not how I Okay. Can we talk about the name of the name Clanker?
**** · Oh, sure. so Clone Wars, Star Wars, I never watched it. but kids of friends of mine watched it a lot while we were visiting them. So I through osmosis got the lore and there is an army of robot robots and the Jedi would call them clankers or people who call them clankers because when they move they clank clank. Yeah, that's the origin of that. Yeah. So an AI a droid. Yeah. Exactly. Yeah. But coming back to the how do you deal with the influx of agentic pull requests and issues? I just auto close every pull request. A human agent doesn't matter.
“Clankers”
**** · what I do is if I haven't had contact with you previously, my GitHub workflow knows about this because if you had, you're in a file in my Git repository, your account name. So if you're not in there and you send me a pull request, your pull request gets autoc closed.
**** · Mhm.
**** · And then my little workflow posts a comment under your pull request that says, "Hey, thanks so much for contributing. Really appreciate it.
**** · Could you please open an issue in a human voice? no longer than a screen's worth of text and if I it I type looks good to me and then that account name gets put into the file and the next time they send a pull request they pass and it turns out agents don't see the comment my GitHub workflow posts underneath the pull requests. So this is a great filter for filtering out agents and keeping the humans safe more or less from this is interesting. I wonder if this might be the an unavoidable future where we just need we need a way to separate is this coming from a human with an intent or an AI.
**** · I don't necessarily care if if it were a good PR then if it came from a machine it's it's it's fineish. I think what's interesting in PI is and open CL even more so is it accumulates pull requests well there was no intentionality behind it at all and so the person that dispatched the machine didn't care that much about it but didn't even know about it or didn't even know about it and I've done open source for many years and there was also there was a there was a big difference between someone send a pull request up or an issue and hey please fix this but didn't care enough to even reply to questions anymore. this not uncommon. And then you don't have to fix that, but you have to close it out because maybe it's it's still useful input, but it clearly that person wasn't caring enough. And with the pull request, it's even worse now because they come in so quickly that many of them cannot be merged anyways without manual resolution of the conflict. And there's a there's a lack of back pressure mechanism because even I as a human if I see there's 500 pull requests open I probably will not contribute to this thing now because at the worst I will make it worse. Yeah.
**** · And I think previously in open source you had the people who would just send issues and be very entitled and say you're the worst person on the planet if you don't fix my little issue. But that's fine that can be handled. And pull requests were special because it needed a human to invest quite a bit of time to produce them and you don't have that anymore. You just have people, oh this should be easy. agent, please do this thing.
**** · Make no mistake, send it to this repository and that's just not going to happen. So what we need are bottlenecks. I'm not necessarily I don't necessarily need human verification or a verification that you're human. I just need a bottleneck that allows me to process the amount of incoming things as a human because in order for Pi to not dtoriate into a pile of garbage, I still believe that it needs me and other capable people reviewing at least the important code and for that I need bottlenecks because otherwise I can't deal with.
**** · It's it's a second law of thermodynamics, It's everything degrades towards chaos and you have to put extra energy in to keep it away from this from this outcome and we don't see and feel the pain of the codebase anymore if we stop looking at it and people don't feel the pain or they feel no restraint anymore and it's the issues are also interesting because on the one hand it is something great about someone doing an investigation and sending you a description of that can be good and can be bad, but they look very similar.
**** · it takes quite a bit of energy to tell apart a good and a bad AI generated issue request. And unfortunately, most of them are not great, but some of them are good and that's also it's weird all of it is weird. I really don't know what the future of open source is in many ways because the a lot of open source really worked because people piled out on hard problems and so they congregated around it and said now we need to have a good database so we're going to put all this energy on building a good database and the value of open source came from there's some hard problems and we're going to our energy together and we're trying to figure out how to solve it and now it feels open source is all about growing stuff up. What really grinded me so mad was people particularly a lot of Atlantic engineering now is building more stuff for Atic engineering. So it's it's yuborous or yuborous or what I call it and I see this tweet and it's oh I solved problem XYC and here is my solution for it and you click on this thing it's it's 48 hours old that person probably never used the thing that they built. I would to suggest to the viewership to look at Arvin's GitHub account over the last year and what happened there.
**** · Yeah, I built a lot of the stuff, but I don't then go on Twitter say "Hey, I solved the problem, " is I have a [ __ ] ton of VIP slop on my GitHub account and I wish I could mark it differently because maybe there's some utility in it, but unless you're going to have that codebase still be there a year, a year and a half from now and someone is still using it, the utility of that is not validated in a way. And there's so many markers and metrics you can look at now for GitHub that really demonstrate this explosive growth of it.
**** · But if you were to then maybe find some other number to see how many of the things that are being created are turning into really fundamental pieces that can sustain open source communities that can that can deliver this value that scales amazingly. We haven't created many VIP engineered projects that have become that. But I how you mentioned energy and how open- source always worked. If we just think preai again, let's say Linux, the most successful or widely used open source project, it has both an energy and a structure. people come in with intent that they want to add something.
Open source and AI
**** · They have a process where it goes through. There's human trust at every level. There's a little pyramid and in the end it all goes back. Each change request goes up one level and in the end Lionus does the cut. But there's a lot of energy. There's a lot of intent.
**** · there's a lot of humans there's a lot of humans and it was always about human energy and now we suddenly have this AI which it's just tokens now they're who knows how much they're subsidized or not or it's just machines doing and then suddenly they create plausible things that look human energy and it's hard to differentiate and suddenly just there was this wrench disagree I don't think a lot has changed to open source okay the volume has changed no yes But that's just a number. the amount of as you said the amount of useful and maintained projects has probably not changed a lot.
**** · So you're saying that the ones that were there, they're still useful and maintained.
**** · Not even the ones that were there, there might there's a specific rate of new open source project that survive longer than two weeks.
**** · Mhm.
**** · That's always been the case, **** · Mhm.
**** · So now we just have more projects that die after two days than before. But we still have the same amount of projects that will have a long-term viability just because there are humans that care to maintain the thing over a long time. Build a community of humans that support the entire thing. Build an ecosystem around the entire open source project. That makes you say not you're not believer into mold book.
**** · No, good job meta putting that up. Super useful. no. I I think at the end of the day we're freaking out when we don't need to because apart from the fact that I personally can now generate code faster speed of light for me building an open source project and that entails not just the code but the community around it the spirit around it the ecosystem around it nothing changed what changed is mechanical parts I need the bottlenecks to deal with the influx of exponentially growing agents pull requests whatever GitHub itself is under immense pressure Because now it's not just humans hammering their infra, it's now billions or millions of Open Claw instances hammering their infra.
**** · Yeah, everybody complains about GitHub going down. I think they're doing a pretty good job. that's a lot of traffic that's coming their way since Christmas. It's open call. So yeah, I I would be a little bit more optimistic. We're just indeed messing around and finding outstage at the moment and everybody wants tokens to be a KPI just lines of code used to be a KPI. We've seen this speaking around of things that don't change and messing around and finding out. You wrote a tweet or you wrote somewhere that your biggest enemy is complexity. It's also your agent's biggest enemy. Can we talk about that?
Complexity as the enemy
**** · Very simple. If I have a 600 lines of code bis and my agent can at best be affecting effective up to a context window size of around 200,000 tokens, how much of the code can the agency see?
**** · A third, Great. if you manage to get all the relevant code for a task into that context window, you're probably okay. Although that is a separate project an information retrieval pro problem which is not solved and which agentic search also doesn't solve that is does are you sure that the agent finds all the relevant code it needs to find to fulfill a thing that's also where all the garbage code comes from because it doesn't see all the thing it needs to see in this case let's assume the best case information retrieval is solved everything fits into a context agent does a good job okay that's not the reality we're living in because now the agent spit out so much code they themselves cannot possibly read into their context on a new task anymore. what **** · Yep. They develop their own context window.
**** · Yeah.
**** · Exactly. The complexity they add is their own worst enemy because eventually the code base will be so big and so complicated and so interconnected that the agent has absolutely no way on a technical level to ingest all the context it needs to do the new task. And I would to point out that the agent has learned all of this garbage from the internet and from us because on the internet there's all our old code. While there are some pearls, there's also a lot of swine.
**** · because we have a gazillion GitHub projects from the olden days where we just tried out things and because instances Linux or any other really well-maintained and well-ritten open source project are minuscule in compared to all the rest of the garbage and a machine learning model will converge towards not the well simplified to the mean and what is the mean then it's it's not the handful comparatively of excellently engineered projects it's all the garbage on the internet all the cargo culting all the trend type of the day stuff and that's what we get when we let the agents do all the things for us.
**** · Yeah.
**** · So we have this problem of things are getting more complex which slows agents down which will in fact impact quality which we were just talking about but Armen now that you're you're building your own startup you two of you're building your startup now how are you and you're working with agents and they will have these things how are you dealing with generating code building products balancing quality tech complexity I'm dealing with that badly look I think that we're coping. We're not dealing.
Building an AI-native startup
**** · I don't know if I wrote this in the blog. I definitely have it on my slides for the for a conference here. It was I enjoyed the time from April to about October immensely because it felt I can do so much but also there was no heightened expectation the world has not yet gotten used to this idea that everything has to now also move at 10 times the speed. And there was a there was a moment of time where I felt we worked in this vibe tunnel thing in the beginning and it was it felt so much fun because I have time now to play with the kids and I just prompted a little bit on my phone and it felt VIP tunnel was where you could set up with your phone talking with your machine where it wasn't that easy terminal **** · Yeah.
**** · And it's not that we did much with it, but I it it had this happy vibe and I know that I spent too much time on the computer, but it didn't I didn't feel any pressure.
**** · But now it's this we're collectively feeling everything has to ship faster. It has to iterate faster the the baseline that we want to achieve in terms of fidelity and everything has to be higher. And so now it feels very stressful even in your own startup.
**** · Yeah.
**** · Because to some degree you cannot you can be the most stoic person in the world and it's still going to get at you in a way that I'm slowly learning to work with my own emotions in a way on on dealing with this. is I find it very hard in a way to because I was I was used to things working a certain way and I I knew how I do some stuff and then I fell a little bit too much in the trap of giving into the machine and doing things in a way that I normally wouldn't have done things that you regret.
**** · It's definitely a gentic regret.
**** · Gentic regret. Yeah. And so the quite frankly the answer is I I feel now with a little bit of power of hindsight learned some things that I wish I would have learned probably in November.
**** · Tell us.
**** · Well a lot of it is really the recognition that if you there's no back channel to the to me or to any other engineer when under normal circumstances there was a back channel.
**** · was this this feeling of things are not quite in the codebase there was this now the change is harder and the complexity do you see then the complexity of the pull request getting higher but if you rubber stamp it then what's what's the back channel there and so this mechanism this back pressure this friction in the codebase you don't feel when you work with the agent I think there's a way to measure it and if I scan through my sessions on a project from start to current date. I think the frequency of curse words increases because the agent starts messing up more because it itself cannot deal with the complexity of the add to the project. And I would be really interested in whether this measurable because I feel it in most of my projects now it occurs a lot more.
**** · But you mentioned friction in the software. You didn't say tech depth. You didn't say complexity. What what is this friction? cuz I don't remember us talking about this pre- AI at all.
**** · So, I found this ironically funny and it's sad, but so I will not name any names, but there was a what I what I assumed was an incident related at least in part to achieve engineering on a company where they shipped out a configuration change that ultimately result in a security issue and look things happen. But the link that I saw on this had the social preview of that company's tagline and the tagline was ship without friction. And that g that get me really gave me pause because you I know as an engineer we used to talk about you got to get rid of all the things in the way so that you feel happy shipping stuff. But there always were changes where you really wanted to think do you want to drop the database do you want to merge this migration which might take a table lock that could potentially take you down. is there's there's this moments every once in a while where you really you were really supposed to think and you and people created checklists or people created mechanical gates that would where you would have to confirm something there's there's certain things that we used to put particularly if you run a SAS company did you put stuff in so to slow things down or and in some of the best engineering teams in order to mature a service you have to define an SLO you have to define yeah expectations and if your service is supposed to be critical but there's some other stuff that unlocks on this tree of requirements that you and and a lot of engineers be a this is also this bureaucracy but the reality is if you do this correctly then it saves you time and it it makes you happier. You're not waking up at in the morning all of this is useful.
**** · It's friction injected to deliberately slow things down. I guess the easiest example in any decentsized company you have services based on tier based on criticality. the highest tier software now needs to have let's say two or three code reviews or an approval from a director to do a configuration change which again all slows down but it's we know this is on purpose by adding this friction we want you to think do I want to push through this friction in terms of time invested or effort or having to justify things etc. It makes you think about do I really want to add this to the codebase if I know that the end effect will be that it has to go through this entire chain of arteries. so it can be coming back to saying no to yourself to avoid pain going through that process and then taking on the pain when that you have the comm you have the backing you have the confidence as well, so typically when it's a higher friction thing, let's say a tier one service or highest tier service where a director have to sign off. When you're a new joiner on the first day and you don't know the context, you probably know that's a pretty large ask and you'll probably socialize, get by in from a from an experience and to say oh, this is the thing, you'll go with them, Back to human dynamics a little bit.
**** · I think the the thing is there's a there's a there's a very delicate balance in the whole thing because you don't want the friction to be just an accident of having created bad developer experience, But some things look the same and but they but they were deliberate but they maybe were not sufficiently documented but there's this feeling now get rid of all the friction so that the agent can be very autonomous so that he can run many of them simultaneously. A lot of it comes from that. I it the these things are rather slow and the only real time saving that you get from it is parallelism and so somewhere there is this trap. I feel a little bit more experienced now in managing the trap but I don't have the solution for that either. And I will not say that here's an example codebase where I felt really great about the stuff that I built except for pre-existing libraries from before aentic days where I still feel strong emotional attachment to them and much more careful about doing them than any of the code that we other than pi to which I don't have access.
**** · Oh no, there's there's still no access. there's a lot of sloppin pie, but I try to avoid it in the in the bits and pieces where I know that's important code we have an HTML export functionality where it takes the current session and just spits out an HTML file that you can then host on GitHub and whatever. I have not looked at a single line of code for that function. I don't care if it's broken, if it looks when it comes out. But then there's the agent loop itself or the extension loading mechanism and all of that stuff and that's important and the way I deal with ensuring that has or at least trying to ensure that it has high quality is I refactor mercilessly because that pulls me into the codebase.
**** · I need to understand what I want to change structurally not just line per line and syntactically or whatever. I need to understand what's going on to do a good refactor. and doing that every now and then I'm doing now at the moment prompted by wanting to add a new feature that's currently not possible with the current architecture being in the code is the one thing that keeps the codebase quality high and the complexity low but that's against the industry wisdom of burning as many token maxing yeah that's that's that's an interesting one happening but you just recently wrote on the same theme a blog post called we all need to slow the f down can we rehash some of the thinking and what triggered you So just put it out there. Okay. So the basic gist is okay, your agent can now spit out 10 times more code a day than you can, but it also means it spits out 10 times more boooos errors. Even if it has half your error rate, then okay, it's not 10 times more. It's five times more. It is still more than you would spit out. So the rate of deterioration in your codebase has now increased. And now go dark factory. Now take a 100 agents that do this to your codebase.
“Slow the F down”
**** · What's the end result of that? So that's the first problem, And you need some way to review all of that code that now gets generated to fix all the boooos. But you can't as a human because as a human you're used to spitting out 1.5k lock a day and that's about the limit that you can review well agent spits out 10 times that no chance you can review that. And not all of the code by the agent might be important the HTML export thing, But even if the agent spits out three to 5k a day, you have no way of reviewing that in any meaningful sense.
**** · And then if you do the armies, yeah, and then the armies, this is interesting. So you call it the dark factory. The idea being that tens or hundreds or thousands of agents, you give them a spec, they go and they break it up, they organize themselves, they the mayor and all that jazz. They have the qual the QA agent.
**** · They have the you give them roles. You give them context and then you give them enormous amounts of tokens and spend. And the idea is or the hope is that your software will be done in Oh, there will be something will be done. I definitely something's going to be done. First your purse and then No. Yeah, sure. More power to the people that make that work. I can't make it work. And the reason I think I can't make it work is because I still care about the quality of my product. And I don't care if it's built by hand or by agent. I just want the quality to be good. Both in terms of how easy it is to maintain it and add new stuff to it on an developer side and on the user side.
**** · All the companies claiming that all of their code is not written by agents. Yes, we know quality is garbage. We feel it in our bones when we use your products. It's garbage.
**** · U so I don't want that. And yeah, I think people need to turn around and say, "Hey, what are we even doing here?" we have these wonderful machines now that can take away so much pain from us by doing the stuff we hate doing and doing that really well. Why don't we start by giving us some more free time to work on the interesting bits and delegating the stuff we know they can do to them large on large across the entire organization. find all the things that annoy the out of you and have the Asians automate that for you. And then you suddenly have time to think about what do we want to build? What do our users need? And if we decide to build the thing, then we can pull in the agents again and say, and we're going to polish the out of that because now we have the time and the means and the tools to do an excellent job. But that's not how we're working. We we build an army of agents and install beats and make a big spec that hopefully will result in something crazy. But here's the thing. We talked about where did the agents learn the knowledge from, The internet. So garbage to mediocre. Now if you write a spec, what's the best possible spec you can have? The best possible spec is well you define exactly how it should work.
**** · You give it test cases.
**** · Best possible spec is the software itself. Oh, I see what you mean. So, yes.
**** · Okay. You write a spec that's not the software itself. So, that means there's a lot of blanks that need filling in.
**** · Yes. What do you think is the agent going to fill those planks in?
**** · Well, most likely from, stuff it from his training data.
**** · Yeah.
**** · And we already identified what the quality of that training data is, **** · Garbage to mere. Well, and even before AI, don't forget Stack Overflow had a really big criticism because there was this thing of well you control C controlV from Stack Overflow and oftent times there will be some answers where the first answer was either not correct or not correct in many cases. Reax for email was a good one. You emailed Reax for email. First page was Stack Overflow. Everyone just copied the first solution and I think underneath number three it was said it missed a bunch of cases.
**** · Yeah.
**** · But here's the thing though. I'm I'm not saying agents or humans are better. They're clearly not. But agents also don't solve that problem. And if you then don't let just one agent that's already 10 times more productive as you do the thing that it's bad at and that you as a human are bad at, but a hundred of those, what do you think is the outcome?
**** · Yeah, it's just very simple math. Let's talk about another controversial topic.
MCPs vs. CLI
**** · MCP versus CLI.
**** · Oh, it's it's it's it's coming up. And now I'm hearing a lot of people really going for CLI is the future. And I think I'm sitting with two of them. But also MC MCPS are also really popular inside of large companies, especially when you talk with a bunch of people working at large companies. It seems MCPs have found a real product market fit inside of larger enterprises.
**** · Despite what people might think, I don't hate MCP quite as much.
**** · Seems Oh. Oh, wait. We have it on recording.
**** · Yeah.
**** · No, we don't deal in absolutes.
**** · We're in CIF. So I my fundamental challenge with MCP is that I think first of all the spec is very complex I think for it but it's this is this is just generally how specs happen to be.
**** · So it's a bit the the core of its time. So there's an inherent complexity in it. But if you if you were to say okay so what is it really doing at the end of the day it's it's authentication and it's invoking some stuff and MCP even theoretically there's structured responses but MCP for the most part is run some stuff put stuff back in the context and then work with it. So it fills your concept very quickly. And there's Cloudflare has this code mod MCP which is in principle I really I have an MCP for testing which is a JavaScript interpreter that gives me access to the Google API. And between an MCP this and a skill, there's not a huge difference because the skill also needs to be in a system prompt. So that defines it. But the agents are just very very good at running code and MCP is not quite running code. It's rag is input in and do some stuff and maybe some state transition at the model also doesn't see but it is in that sense just in it's a hard problem to solve but it does solve off it solves a whole bunch of things. I want it to work. I just still don't get it to work I wish it could work and I my suspicion is still the glue is has to be code execution but because MCP servers are largely not defined in a way that the model understands them. I haven't found ways to compose MCP tools reliably. I found ways to make the MCP itself be composable by having the MCP be one tool run code, but I haven't found ways to then orchestrate larger ones. I want it to work and I think it has found its niche and I don't think it's going to go away.
**** · I think it's just a victim of its own success really. when the whole thing started, I think it was in October 2024, it was more or less a solution to get external services into consumerf facing chat apps.
**** · Yeah.
**** · Connect your emails, connect your one drive, connect your whatever pretty much. And then IDs also took it over because it was convenient, the cursors, the wind surfs.
**** · Yeah.
**** · But I think the origin was the consumer side, not the developer side. And I think that's a totally great use case. I don't want my mom to having mess around with code generation or whatever to invoke some API or call some API and so on.
**** · perfectly fine use case and then developer side also picked it up and thought oh this is a great way to provide tools to my LLM tools as in the system prompt somewhere there is if you want to call this tool provide this JSON payload and you get this thing back and that felt at the time because if you read a tropics documentation they would say our models can deal with about 30 to 40 tools in the context and even that wasn't the case at 12 20 they would just break down but doesn't matter. but there was still a yeah this can work if you keep it small and contained and very specific to your use case. And then people started building MCP servers that would just map an entire open API spec into a gazillion tools.
**** · Yeah.
**** · And that's where it all fell apart. So that's the first problem. Very bad MCP servers from big corporations that thought we need this now. What's the fastest thing we can build? I just push the open API spec of our APIs through this thing and make it an MCP server.
**** · That's garbage. The second problem is that it's inherently non-composable. if you want to combine a tool out the MCP tool outputs of two different servers, they need to go through the context the model itself needs to do the data transformation the the yeah the composition of multiple pieces of data fetched through and compared to this with a CLI. It's a pipe, **** · Exactly.
**** · the model only sees the end result and it is it is super free in how it massages that data and that's also the idea behind code mode it's a hack it's okay we now have MCP we know it doesn't work for this specific use case where you have multiple sources of true data and you want to combine them but don't pull that through the context so let's build code mode and code mode is we take all the MCP servers you expose that as functions in Typescript and then the model can just write some code that calls the MCP service and then does the composition in the code. It's it's how many interactions do we want here?
**** · We can just let the mod write the code.
**** · We don't need the MCP server. And then the third part is David from Sentry is a big proponent of MCP because it's off the off thing. And honestly, that's again for me super valid. But the model itself doesn't make sense anymore. I think that there's a there's a world for MCP2 which is ironically maybe based more on there's a company called stainless which generates SDKs out of open specs and I'm I'm really warming up to the idea of maybe it is an MCP is entirely based on off plus libraries or or directly HTTP request against offsp specs because if you compose it together there. And I think one of the things that's also underappreciated and you as you see I think if you see pi do its stuff because it's transparent of the tool calls that it does. It's magical at times how creative agents get at large outputs. for instance pi when it when it runs a program in bash and it produces too many lines of code only reads I don't know what the cut off is but it reads the first couple and oh if you want the rest of the file it's 20 megabytes large and it's in this file. And then the agent is "Oh, 20 m, that's too much. I'm going to grab on the file, " And it they get really ingenious in how they're interacting with it. And and MCP takes that away. The question is how would how would you define MCP in a way where it wouldn't take that away, where it still has all of that magic and and capability and and I don't really know the answer because I think it's hard but off need solving and composibility needs solving and I think there's a there's a bright future of that stuff and also what Mario said if coding agent wouldn't have become so popular then the idea of code generation code running for non-code related problems probably wouldn't have taken off quite as much too. but the most capable personal agents, Open CL being a good example of it, they're just coding agents hidden from you. And then that just naturally some random person who is not a programmer is going to say, how am I going to do this? And the model doesn't say install this MCP. The model says okay, I can write a Python script that does it. And so you naturally have this in the the crazy space, you have the adoption of more code execution. in the compliant enterprise space you don't have that there's a different path that and I personally don't think that models are going anywhere else other than code generation going forward for any a aentic task I think that's that's mostly a function of there being a lot of training data for code generation and code generation being a very easy means to control computers so I don't see a different paradigm there coming out of the model labs anytime soon so I think taking that as the assumption where the future is going, we just need to figure out how to make code generation work within an enterprise setting with off and all of the other enterprisy things that entails. So, so let's do a fun trying to predict a year out which is hard but in 2027 knowing some of these basics just again from first principles where do you think these coding agents might be and the software engineering workflow might be this is just again speculation we know we cannot predict the future but where do you think that there'll be a lot of focus in the coming year and we might in an optimistic case see some results in tools and how we work and what's working what's not working.
Predictions and staying up to date
**** · I have no idea. I honestly have no idea. I could make up something that's probably not going to happen. I think the self-mability thing is obviously something I believe in. I think we will see more of that and see so self mutable software.
**** · Yeah.
**** · Yeah.
**** · including the tools themselves with which we built the software and I think that will expand I not only to the tech sector but also to non- tech applications of agentic u tools my is it dog years with your time seven is that how it works so that's that's the model I have now of how this stuff works it's when you ask me what's going to be in a year it's seven years and to me that makes it incredibly hard to have any predictions about the future because it's still not one year maybe now it's a one year from people starting to using cloud code but it feels it is much longer much more time behind and more time has passed and I think now the closest that I can imagine is going to be we we know that code execution and code generation and this harnessing around it this is this is going to be it because reinforcement learning gets more of that data and my my strong hypothesis is that as more and more people are starting to wake up to this you can do interesting things with agents there will be a societal recognition also of how much more dependent you are on two companies and I think we'll have a conversation about that part we should have a conversation about that particular as Europeans because we don't really have these labs over here and so I hope we have that conversation but my best guess is that we'll wake up to the fact that we are now I engineering teams already now telling me that they have code bases that they think they couldn't maintain anymore without the machine. My guess is that one of those companies will be public and all of a sudden and it will be expensive and I think that might dominate or at least become a conversation that's much bigger than the question of are you using pi or using cloud code or something this. I also see a we've seen this with was it myths the new cloud model oh no spot the new GPD model they will only give this to select partners so now we are seeing a split in who can get the best intelligence yep or the perceived best intelligence it'll be interesting dynamics so both of you are working on AI on popular AI tools you're building a startup that of course you're using AI and it's also around agents. How do you both keep up to date? I've just seen things and it's not as easy to get me on a hype train as it used to be, but that comes with age.
**** · It's definitely easier not being in San Francisco because I think that just drives me crazy. I hear so many things from my peers over there and that's just yeah, I'm not going to go to San Francisco. Thank you.
**** · So having a peaceful environment around you where it's not all about tech might be helpful.
**** · It helps having a kid. It helps just going outside, climbing trees, going ice skating, and then looking back at what you did just half an hour ago and be why would I do that? That's just stupid. I'm into the detriment of maybe people that are trying to stay in contact with me. I got very good at not muting notifications, not reading emails, and that has in part become necessary, I think, over the last year or so. But this it turns out that passage of time sometimes clarifies stuff a lot because if it's really necessary it's going to going to reach you again. I have an unhealthy Twitter addiction which I'm not particularly proud of. but in terms of source of interesting things that is still a thing but I try to now consume it in a form of if it's really important it will stay in the discourse for quite a while and I just wait it out. and if it's if it's there until 3 weeks after it originally happened then probably something to it and and I don't need this three week head start necessarily but it is honestly it's really hard. It is really hard to deal with this because there's a there's a genuine excitement in it and I feel my my 20 more than 20 years of experience in that space of software engineering doesn't it tells me a lot of stuff but at the same time it hits you in certain ways where you felt there will be grounding and there will be something to build on and a strong foundation and now it feels well seemingly everybody else doesn't care about that foundation anymore so maybe you don't need the foundation and for quite a while it it works and that is weird and I feel since we've been funemployed in 2025 when all this started that we had a head start I see all the excitement the two of us and Peter had in April last year has nobody else no but nobody else at the time has shared that excitement that much and then the Christmas break came and now everybody else has that excitement that we had in April so Now they are learning groups.
**** · Now they are catnipping themselves to immeasurable amounts of lost sleep and at terrible code bases. and I think it will self-correct because it's not sustainable.
**** · Yeah, we we did see this as well.
**** · I did a deep dive in the parametric engineer at early March when a lot of people who were very excited in January about all and they started to use the new models what it can do. They went all in at work or on side projects. In about 2 months time, a lot of them were "Hang on, it introduced all this complexity. It has these things. I'm not going as fast as I thought I would be, etc." So, I guess there's just a natural thing where you have a time, anything new, A job, anything.
**** · You have a honeymoon period where you've got the blinders on, which you should, by the way. And then you start to realize and maybe overcorrect, but there's a natural thing where it in general it just takes time to see the outcome of your decisions.
**** · Yeah.
**** · So, so I'm not worried about all the dark factory and all the software is dead and sus is dead and all that. I generally believe this is just part of the hype machine and that will selfcorrect.
**** · Yeah. As closing, what's a book that you would recommend and why?
**** · Code by Pet Salt.
**** · Classic. I just love it. It's just such a great read. It's also for non techies and it's the first thing I recommend if anybody asks me what's your job and pointing at that it's it has much less to do with computers than you think.
**** · And I read recently breakneck which I unfortunately forgot the author of it goes a little bit into an exploration of how China works and how maybe Europe and the US are different. and I found it at least thoughtprovoking.
**** · Well, Mario and Armen, thanks a lot for this conversation. It was great to have it in person. Thanks for having us.
**** · Thank you.
**** · This was a really fun conversation.
**** · Thanks to Mario and Armen. The idea of self-motifiable software really grew on me. Mario said how Pi doesn't have MCP support, plan mode, and many other features that devs would want from it, but you can build it into his own code.
**** · So far, it's working. Pi is popular because it modifies itself. I wonder if and when this concept of self-modifying software thanks to AI will spread outside of just the dev tool. I also liked how we talked about the observation that agents don't feel pain but humans do. When a codebase gets too complex the human engineer feels the issues this creates and this tech depth is what pushes refactors and rewrites.
**** · But agents simply do not do this. They just keep adding to the complexity. And in a codebase where devs regularly feel the pain of the codebase and do something about it, the quality will probably be also better. And finally, the MCP versus the CLI discussion, this was a good one. MCP is more about offering tools for AI through context and CLIs allow piping one tool after the other. Both Mario and Armen are more of the fans of the CLI, but in all fairness, MCP has its use cases, for example, inside larger companies. The tool for the job. Do check out the show notes below for related theatic engine deep dives that go even deeper into related topics. If you've enjoyed the podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating for the show. Thanks and see you in the next