The Great AI Pivot: Labs Refocus on Agents and Enterprise

All the labs realize what Claude Code unlocked. It wasn't the first coding agent; it was just the best. They did something different with the harness and how they enabled it to do what it does. All these labs see, not the finish line, but the next mile marker of agentic capability: their ability to automate AI research and then to start disrupting everything.

Welcome to episode 205 of the Artificial Intelligence Show. I'm your host, Paul Roetzer, along with my co-host, Mike Kaput. We are recording Monday, March 23rd, about 10:00 a.m. Eastern time. Some big stuff happened last week. I don't know if the whole last week was just crazy—we were on a company retreat for two of the days, so I always feel like I lost track of time for the week. My entire week was spent getting ready for that retreat. Mike and I both taught workshops to the team, and I did five presentations and workshops on the first day. It was a crazy week, but in between all that, we had over 50 different sources in the podcast sandbox. Mike did an amazing job of curating the topics, and we were updating what we were going to say even three minutes before we came on.

This week’s episode is brought to us by AI Academy by Marketing AI Institute. This is the core focus of what I do at the company, and a huge part of what Mike does is building the curriculum. It’s designed to help individuals and businesses accelerate their AI literacy and transformation through personalized learning journeys. New educational content is added weekly. Our AI for Industries collection features six course series: AI for professional services, healthcare, software and technology, insurance, financial services, and the newest one, AI for retail and CPG. These series are an ideal launchpad for organizations that want to level up their teams. Mike teaches a number of them, and later in the episode, we’re going to get some insights from him on the big takeaways from the professional services series. We want to bring those core insights to everybody as part of this podcast.

AI Pulse Survey Results

We have our AI Pulse results this week from smarterx.ai/pulse. Last week, we asked about Atlassian laying off 1,600 workers and explicitly citing the AI era as the reason. 39% said this is the new normal, and that AI-driven restructuring is real and accelerating. 26% said it’s "AI washing"—a fast-growing company using AI as cover for cost-cutting. 25% said it’s too early to tell, and we need to see if the roles are truly replaced. 11% were more concerned about total tech layoffs in 2026 than any single company. It’s a pretty balanced response, but the "new normal" sentiment is the highest.

In a New York Times quiz, 54% of readers preferred AI-written prose over human originals. 39% of our respondents were not surprised, noting that AI has gotten genuinely good at clean, polished writing. 28% said writing quality was never the real moat—taste, judgment, and point of view are. 20% said this is a wake-up call for professional writers to differentiate beyond service quality, and 14% said the quiz was flawed.

OpenAI’s Strategic Shift

OpenAI is in the midst of executing what might be one of the more dramatic strategic pivots it’s done so far. It’s simultaneously restructuring how it sells, what it builds, and who builds it, all while preparing for a potential IPO later this year.

Mike Kaput: On the enterprise side, Reuters reports that OpenAI is pursuing partnerships with multiple private equity firms in deals potentially worth a combined $10 billion. These firms include TPG, Advent International, Bain Capital, Brookfield Asset Management, and others. The PE investors would contribute approximately $4 billion and receive equity stakes, board seats, and influence over how OpenAI's technology gets deployed across their portfolio companies. The logic here is that private equity firms control massive portfolios of enterprise companies and influence their tech spending. This partnership gives OpenAI a distribution channel directly into those businesses. Notably, Anthropic is also reportedly courting private equity, including Blackstone, signaling that this may become a standard go-to-market playbook for frontier AI companies.

On the product side, OpenAI is consolidating its web browser, Atlas, ChatGPT, and its Codex coding tool into a single unified desktop "super app." Fiji Simo, who leads OpenAI's applications division, confirmed this move, saying the company is cutting back on "side quests" to focus on coding and business users. At an all-hands meeting on March 16th, Simo laid out the commercial goal: they want to convert OpenAI's 900 million users into "high compute users" by turning ChatGPT from a consumer chatbot into a productivity instrument built around agentic AI. They are facing competitive pressure; according to Ramp, the proportion of businesses using Anthropic increased from 1 in 25 to nearly 1 in 4 within a single year. Anthropic currently wins approximately 70% of direct comparisons against OpenAI in new enterprise contracts.

Meanwhile, OpenAI is going all-in on fully automated AI research. Andrej Karpathy went viral this past week describing an experiment where he deployed an autonomous AI coding agent to run continuous research for two days. He calls it AutoResearcher. This agent executed hundreds of experiments, discovered new optimizations, and sped up how well the model itself worked. Shopify’s CEO tested the same approach on internal company data, running an agent overnight that conducted dozens of experiments and improved performance by almost 20%. Karpathy says all frontier AI labs will adopt this approach, calling it "the final boss battle" these labs face. Reports indicate OpenAI is following suit. Lastly, they are reportedly nearly doubling their headcount over the next year as they scale across these initiatives.

Paul Roetzer: The trend I was referring to goes back to episode 189 in January, when Claude Code blew up. Something definitely changed. All the major AI labs are in an accelerating race for autonomous agents and enterprise customers. This refocus on agents and enterprises was not really OpenAI's core originally. They weren't building agents in this way before, but Claude Code changed things. Within Claude, the ability to build things is incredible. They are very clearly ahead when you use the product.

OpenAI's enterprise business is currently about $10 billion out of a total run rate of $25 billion. Fiji Simo tweeted on March 16th that they are excited to be building a deployment arm. This idea of getting out with "frontier alliances" where they work with consulting firms makes a lot of sense. Regarding the refocus, last fall Sam Altman was everywhere—talking about space, robots, video apps, social networks, and devices with Jony Ive. It appeared they were spreading themselves too thin while getting crushed on the model side. It seems they’ve realized that. Simo tweeted on March 19th that companies go through phases of exploration and refocus, and when new bets start to work—like Codex—it’s important to double down and avoid distractions.

The Wall Street Journal reported that OpenAI plans to launch a desktop super app to simplify the user experience. Simo mentioned that fragmentation was slowing them down. Top executives, including Altman and Chief Research Officer Mark Chen, have spent the last few weeks reviewing the product portfolio to deprioritize certain areas. In an all-hands meeting, she told employees they couldn't afford to be distracted by "side quests" and that they are in a major battle with Anthropic—it’s basically a code red internally.

This is all related to the idea of a fully automated researcher. OpenAI plans to build an autonomous AI research intern by September—a system that can take a small number of specific research problems by itself. This is the precursor to a fully automated multi-agent research system they plan to debut in 2028. This AI researcher will tackle problems too complex for humans. Side projects that might be sidelined could include the Sora standalone app, planned hardware devices, or e-commerce features in ChatGPT. At the same time, they are aiming to grow to about 8,000 employees from the 4,500 they have today.

The Competitive Landscape of AI Labs

When you zoom out, you see what’s happening with the other labs. Microsoft made a major shift last week, moving Copilot directly under Satya Nadella. They are taking Mustafa Suleyman, who was in charge of Microsoft AI, and putting him in charge of a super intelligence lab. Simultaneously, Microsoft is weighing legal action over $50 billion Amazon-OpenAI cloud deals. We have this weird muddying of relationships between Amazon and OpenAI.

Then there is XAI. Elon Musk tweeted on March 12th that XAI was not built right the first time and is being rebuilt from the foundations up. This follows a lot of turnover, with many co-founders leaving in the last 60 days. This happened just a month after XAI was acquired by SpaceX to consolidate Musk's empire. Musk is watching what’s happening with agents and enterprise; he wants a piece of that game.

Meta is also in upheaval. They delayed the rollout of a new AI model after performance concerns. They are spending heavily—rumored to be $135 billion this year on CAPEX—but they haven't released a major model in a while. They did buy Multibook, an AI agent social network, so they are trying to get into the agent game, though enterprise isn't their natural space.

Jensen Huang, CEO of Nvidia, spoke last week about Open Claw being the "next ChatGPT." He called it the most successful open-source project in the history of humanity. Open Claw is an open-source autonomous agent platform that can complete tasks and make decisions with minimal input. Nvidia moved quickly to build Nemo Claw, an enterprise-grade version of the platform.

Google DeepMind is in a crazy phase as well. They had runaway success with Notebook LM, though the average business leader still has no idea what it is. They announced a major investment in AI Studio to play in the "vibe coding" game, but AI Studio is still primarily for developers. Google currently has no answer to Claude Code; it’s running circles around them. A principal engineer at Google, Yana Dogan, tweeted that he gave Claude Code a description of a problem and it generated what Google had been trying to build for a year in just one hour. Logan Kilpatrick from Google DeepMind even tweeted—and then deleted—that all the industries people thought wouldn't be disrupted by AI are about to be disrupted. Google customers reading that would understandably be concerned.

Insights from Andrej Karpathy

If you want to understand where this is going, listen to Andrej Karpathy’s interview on the No Priors podcast. He discussed how fast these models have evolved and noted that it’s largely a "skill issue" if you can't get value out of them. He introduced the idea of "token maxing." Every time you use a model, you burn through tokens—predictions the model makes. He suggested that engineers and knowledge workers should know their "token budget." If you have a subscription to Claude or ChatGPT and you aren't maxing it out every month, you are leaving intelligence and outcomes on the table.

He also talked about running projects in parallel. I do this myself—I’ll have Claude working on one project while ChatGPT works on another. There are times I’m running three projects simultaneously with AI agents while I do my other work. This leads to a massive compression of timelines. Things that used to take five or ten hours might now take five minutes. That is a weird environment to work in. Finally, he mentioned the compression of software stacks. Instead of having separate CRM and social tools, you might just have a swarm of agents talking to all that software through a single user interface. All the labs are realizing what Karpathy is realizing regarding these agentic capabilities.

Key Takeaways

The labs are now in a race to do what Andrej Karpathy explains in his recent interview. That is why that No Priors podcast episode is so important; he is telling you point-blank what all the labs are trying to do with agents. You will walk away with a better understanding of the moment. He said at one point, "Working with these agents is like simultaneously talking to a PhD student and a 10-year-old." Sometimes you do something with it and it is like giving it to a top PhD student, and then the next moment, it is like some stupid simple thing and it just can't do it. It is that idea of the jagged frontier and the jaggedness of these models.

To zoom out and recap: all the labs realize what Claude Code unlocked. It wasn't like it was the first coding agent; it was just the best. They did something different with the harness and how they enabled it to do what it does. All these labs see not the finish line, but the next mile marker of agentic capability—their ability to automate AI research and their ability to start disrupting everything.

It is an all-out race for agents. They are seeing a pot of gold with enterprise adoption, which is why Anthropic and OpenAI are doing deals with private equity firms and alliances with major consulting firms. They are trying to get in where this is going because they see the labor replacement value of being the model companies go to when they reduce workforces and put it all into AI models to "token max" to get work done. They see that future coming very fast. It is going to become very apparent in the next three to six months that this is full go where they are headed. It is probably a pretty good time to be an enterprise buying AI technology, as these labs would like to court you.

Public Perception and the Political Landscape

Next, we have three separate developments this week that are painting an increasingly complicated picture of how Americans actually feel about AI and how Washington is responding. First, we had some new polling. David Shore, who is head of data science at Blue Rose Research, appeared on the Odd Lots podcast with some interesting polling data. His organization has found that over the past year, AI rose in issue importance faster than any issue his firm tracked. It is now more important to voters than climate change, child care, and abortion. According to their polling, 79% of voters are concerned the government doesn't have a plan to protect workers from AI job losses. 77% are concerned about entire industries being eliminated, and 56% are worried about personally losing their job to AI.

This is hitting at a time when 61% of Americans say life has gotten less affordable in the last year. Only 25% feel confident in their financial future, and only 34% say they have a secure job. Shore’s data shows that the idea of "everything is going to work out just fine" is a message that is dead on arrival. They found that when leaders in government and tech say AI will not cause widespread job losses, net trust is negative 41. When they say AI will create economic productivity that benefits everyone, net trust is negative 20.

You are starting to see this play out across the political spectrum because we have also seen dueling AI political declarations. First, there was a coalition involving unlikely bedfellows, including Steve Bannon, Susan Rice, Richard Branson, Ralph Nader, and Yoshua Bengio, who released the "Pro-Human AI Declaration." This called for a prohibition on superintelligence development until there is broad scientific consensus it can be done safely, along with other points about keeping AI pro-human. Over 40 organizations signed this, and their polling found that Americans prefer human control over the speed of AI development by an 8:1 ratio.

However, another organization called Build American AI published a direct counter titled, "We Cannot Afford to Pause AI." They argued safety and innovation are not opposites and that the US already has regulatory tools through existing authorities to manage AI development.

The Trump National AI Framework

Third, the Trump administration unveiled a national AI legislative framework with seven pillars. This gives legislative guidance on how they think legislation should evolve. This framework takes a clear "try-first" rather than "regulate-first" posture. It opposes creating any new federal AI regulatory bodies, defers copyright questions to the courts rather than legislating, and recommends Congress preempt state AI regulations that impose undue burdens on developers, establishing what it calls Americans' "right to compute."

There is an interesting part in here shifting responsibility for protecting children online from tech companies to parents. Rather than imposing strict industry standards, they are shifting more to empowering parents with tools. The framework also calls for Congress to empower Americans to challenge federal agency efforts to dictate the information provided by an AI platform, trying to ensure there is no undue influence on what information is provided by AI.

With any research, you have to know who is doing it and what their goal is. This is going to become more political. These are all trial balloons to figure out what Americans think and if there is an opportunity to move votes by taking a strong position. Many people being polled don't fully understand what AI is, which is an advantage for politicians who want to manipulate perception. If people don't know what it is, you can create the perception you want. You can hammer the message that it is going to take jobs or that data centers are going to ruin communities.

David Shore’s profile explicitly states he tries to elect Democrats, so there is a clear perspective there. While AI is the fastest-growing issue, it is still 29th out of 39 issues overall. The top issues remain cost of living, the economy, and inflation. However, the concern about the government not having a plan to protect workers is 79%, and that is a valid concern because they don't have a plan.

Company Transformation and the SmarterX Retreat

Our third big topic is about our SmarterX annual meeting and retreat. We spent a couple of days together collaborating. Day one focused on vision, goals, KPIs, and growth initiatives. Day two, we ran AI productivity and AI innovation workshops designed to accelerate responsible AI adoption. This retreat was a great example of what you can do with the time you gain from AI. Because we use AI so intelligently within our own business, it gives us the freedom to take two full days to think, talk, and build camaraderie.

I want to build an AI-forward company where we maximize what we can do but also make sure we are getting the benefits of it. It shouldn't just be a race to "token max" every minute of every day. During the retreat, we set expectations for what an AI-forward professional looks like and modeled that by showing in real time how we are using AI.

During the productivity workshop, we discussed the idea of tasks as workflows. We went through an AI capabilities overview with a spreadsheet of about 90 different capabilities across major AI tools. While this was being presented, I took that spreadsheet and put it into Claude. I asked it to help me visualize it so professionals could understand the full capabilities of leading AI models. A few minutes later, I had a functioning app.

The most interesting part was that the original spreadsheet didn't include Claude. Claude actually asked me, "Should Claude be included as a fourth tool?" It was aware it wasn't part of the data and asked to add itself. It built an interactive capability tool with search and filter functions in one shot. It was professionally designed and extremely intuitive.

We also worked on "Rocks," a concept from the EOS system where departments and individuals establish three to five priorities per quarter. This allows us to align our time and resources on what matters most and provides transparency across the team. This kind of alignment is essential as we navigate the rapid changes AI is bringing to the workforce.

Key Takeaways

The thing that became abundantly clear to me is the time to complete rocks is compressing, and that it requires a complete rethinking of business operating systems. For example, during the live session where I was actually demoing—this is part of the company day—I was demoing a new AI assessment tool we're developing that I'll share more about in probably a month or two. I had used Anthropic Claude code, again Sonnet 3.6, in real time to build an interactive reporting dashboard that visualized and analyzed responses from 17 people. I had built this assessment in Google Forms as an MVP, and then Mike and I tested it the day before the retreat just to make sure it worked. I had my data and Mike's data, and then I had everybody else take it, and then I exported that CSV from Google Sheets.

That was it. That was the entire process. Zero coding, zero design ability to do this thing. I give this to Claude, and I said—this is while we were taking a lunch break—here's my prompt: "I had 17 team members take the assessment. Can you come up with an elegant way to visualize the results based on the format model you already created?" I had it create one for me and Mike previously, and the CSV was attached.

In a previous life, which I said was aka three months ago before Claude code really started working, this would have been my entire Q2 rock: create an interactive dashboard to visualize assessment results for teams. I would have spent 10 to 20 hours researching dashboards and developing a brief. Then I would have invested time and money hiring a designer and developer to conceptualize, build, and iterate on the design and capabilities. Then we would have gone through weeks of internal testing and revisions. And then maybe by the end of Q2, I would have actually had a minimum viable product that I could demonstrate to the team and pilot with users.

Instead, in about five minutes while I got a plate of pasta, Claude did the entire thing with one prompt, and the final product was beyond anything we could have possibly created. I told Mike, "I'm going to try this. I'm going to do it." And then he and I are both just waiting. We go check the laptop. Did it do it? It was insane. It was totally interactive, better than anything I could have possibly designed myself or worked with a developer to build. I'm now going to use that to actually turn it over to a developer and say, "Here, let's build this and take this live in 30 days, hopefully."

We shared this as a little bit of a behind-the-scenes of how we think about SmarterX as an AI-native company—an event, media, and education company—and to bring to life the fact that you don't need any coding ability to all of a sudden now just build stuff. It's totally compressing the timelines to do everything in business. It's changing the way every day that I think about how to run our own company and how to advise other people to build their companies. I would argue we have quite well-done and clear and ambitious rocks, at least in our department that we were working on during this workshop. But it is actually laughable that all five of them should take three months.

Mike: I really think my guidance to the team was like five. I want you to have five for your department. I actually think you need 20. You need some categorical thing like, "Hey, this would take 10 to 20 hours of human labor. We think we can do it in 10 minutes." There are honestly things that are just going to be like that. There's going to be all these quick win rocks where it used to be three months' worth of work, but it's probably three days now with mostly AI. It's level three AI; it's going to do most of the work. The companies that figure that out and realize that and restructure how they're building everything stand to do really well.

Paul: Just two final really quick notes here to piggyback on what you did with AI during these workshops. The AI capabilities map I built, which was again 90-ish rows of all these different capabilities and features, that's a lot to figure out on your own. What's really cool is I determined the framework I wanted to use and worked back and forth with Claude to say, "Okay, what's the most sensible way to organize these once we have them?" I didn't have them yet. Then it's like, "Okay, we've got a really solid system. How do I get them?"

Typically, you might go do a bunch of research. You might have to sort through all sorts of documentation. I just went into each tool and screenshotted all my menu options. Then I dropped them into Claude and said, "Guess what? We're going to go create the spreadsheet based on the framework that you and I came up with, and go have at it." And then it basically one-shotted a 90-row spreadsheet. It's incredible.

Same type of thing during your innovation workshop. I fed Claude a lot of different context about my department, Content Studio, some of my context around our organization, and then using your framework that you developed, layered that on top of that context and what Claude is now able to do, and got better innovation ideas than I could have come up with first on my own at all, and second, in an entire day. I did it in like 20 minutes. It's so powerful, not only just using the right tools, of course, but having these proven frameworks and models and ways of thinking layered over them. All that stuff we've spent lots of time developing as IP or as unique models to approach these things with in our workshops, it's like rocket fuel at this stage.

Mike: I think there's just something like a lesson to be taken from how we structured it because obviously our team's probably more informed than most teams about AI capabilities, but honestly, I don't know that they even were aware of a lot of the things these models could do. It was very intentional how we did this, and I would advise other companies to think about a similar model where you have this state of AI—what is it capable of? That's often what I'll go in and do with enterprises. I'll do a state of AI for business: here are the capabilities, here's what you need to understand. Then you do the productivity workshop: "How do we get efficiency and productivity in our tasks and workflows?" Then we'll often do a problem-solving one, too.

The innovation one is how we close, and I intentionally wanted to close with that because once you understand what it's capable of, and once you've solved the lower-level efficiency and productivity things, now you open your mind to the possibilities. Then we go around the room, and each person gives us one or two innovations they're super excited about. You leave after two days actually feeling ready to go, not drained. It's like, "Okay, that was amazing. I want to go do those things now." That was what I got; people were coming up to me, like, "Okay, can we do these things that we just talked about?"

It's a really cool format for people. If you're trying to get your team on board, borrow that format of making sure they're understanding of it. If you need help with it, give us a call. This is what Mike and I do all the time. We run boot camps and workshops. If nothing else, we can advise you on ways to do it. But if you're a big enterprise and you need help with it, we can come in and do stuff like that, too.

Nadella Takes Over Microsoft Copilot

Paul: All right, before we dive into rapid fire, a quick message here. This episode is also brought to you this week by our upcoming webinar, which is unveiling our AI for CMOs blueprint presented by Google Cloud. This is actually happening the week you are listening to this episode, Thursday, March 26th at 12:00 p.m. Eastern, 9:00 a.m. Pacific. In this session, me and our CMO, Cathy McPhillips, are going to break down the insights from this AI for CMOs blueprint we put together in partnership with Google, where we break down real-world state of AI for CMOs, use cases, tools, strategies, and more. We'll also be doing some in-depth discussion and live Q&A. Registration is free. All registrants will receive ungated access to the full AI for CMOs blueprint. Go to smarterx.ai/webinars to register.

All right, let's dive into rapid fire. First up, Microsoft CEO Satya Nadella is taking more direct control of the company's Copilot product, personally overseeing a restructuring that consolidates consumer and commercial Copilot into a single organization. Jacob Andreou, a former Snap SVP who joined Microsoft last year, now reports directly to Nadella as the new EVP leading Copilot experience across both segments. The restructuring frees up Mustafa Suleyman, the DeepMind co-founder who became CEO of Microsoft AI in 2024, to focus entirely on what he calls the company's super intelligence efforts.

This move apparently comes as Copilot trails quite badly in the AI assistant race. Copilot has 6 million daily active users compared to ChatGPT's 440 million, according to a CNBC article. Gemini has 82 million, Claude has 9 million. Nadella wrote to employees that Microsoft is doubling down on our super intelligence mission with the talent and compute to build models that have real product impact. Paul, what does this tell you about where Microsoft is headed with AI? Reading this, I was like, I know it sounds like Mustafa is excited, but this feels more like he's getting sidelined and we need to get real serious about Copilot real quick, which is kind of what we've heard anecdotally from users of Copilot.

Mike: There are lots of variables going on here. One is the shift in their relationship with OpenAI. They were obviously a major investor in OpenAI. They're a major equity holder—I think it's somewhere around 27% they own of OpenAI. But all of their efforts were being built on top of OpenAI's models. Now, again, if you go look at what we were just talking about with Claude, you're almost at a disadvantage as an organization if you can't use a breakthrough when it happens. When somebody builds just a better thing, you're at a disadvantage if you can't use that thing. If Microsoft was stuck using OpenAI technology and all of a sudden Claude races ahead in some really important component, that's not great.

If you're Microsoft and you're one of the three biggest companies in the world, the fact that you aren't building your own models is probably a disadvantage moving forward. I think there was a shift where they realized probably a year and a half or two years ago that they were going to have to remove their reliance on OpenAI. It probably happened the day Sam Altman got fired when that became like, "Oh boy, we have all our eggs in one basket and it could go bad real fast." There's been this ongoing shift where they knew they needed to invest in their own technology and build their own models. They need to have an off-ramp over time from their reliance on OpenAI.

In November of last year, they announced this humanist super intelligence movement. We talked about it in episode 179. Mustafa had tweeted, "It shouldn't be controversial to say AI should always remain in human control, that we humans should remain at the top of the food chain. That means we need to start getting serious about guardrails now before super intelligence is too advanced for us to impose them." Then there was an article from November 6th called Towards Humanist Super Intelligence, where he said, "At Microsoft AI, we're working towards humanist super intelligence, incredibly advanced AI capabilities that always work for and in service of people and humanity more generally."

We've known this was happening. At that time, I said maybe Mustafa stays at Microsoft to realize this vision, but I can't help but feel like this vision will eventually clash with the need to justify their investments in AI. I think what they're basically saying is: you go focus on this stuff, focus on the future and the building of this thing, but Copilot is critical to our business right now and it is not where we want it to be. That now needs to get much closer to Satya. That's basically what has happened here. I have no idea if Mustafa stays and keeps doing what he's doing, or if they really do believe in this humanist super intelligence thing, but I don't see Wall Street loving the humanist super intelligence vision. I don't think the stock price is going up because of that blog post. They want to know how you're going to compete with Claude and work with Anthropic. At the end of the day, Satya and Microsoft have a fiduciary responsibility to return shareholder value. I don't think that messaging plays.

It fits into that whole thing I started off with where these AI labs are shifting focus. You're going to see a lot of reorgs, a lot of "they tried something and it didn't work." Meta burned 10 billion on the metaverse and changed their name to be Meta, and it's done. There are going to be lots of big efforts and big misses, and you have to move quick when it doesn't work. This is an example of that. Not to mention anyone who's a Wall Street analyst is almost certainly using Microsoft Excel and thus Copilot and sees it firsthand. Or they used Claude in Excel and realized it was better than Microsoft Copilot. They have very close experience with perhaps some of the inadequacies.

Meta's Rogue AI Agent

Paul: All right, next up, an AI agent inside Meta took unauthorized action last week that triggered an actual security breach at the company. An employee used an in-house agentic AI to analyze a colleague's question on an internal forum. The agent then posted a response to the question on its own without being directed to do so. The second employee followed the agent's advice, sparking a domino effect that gave some engineers access to Meta systems they should not have been able to see. The security breach was active for two hours before it was contained.

A Meta representative confirmed the incident and said no user data was mishandled, though the company's internal report noted unspecified additional issues that contributed to the breach. A source told The Information there was no evidence anyone exploited the unauthorized access or that data was made public, though the reporting notes that may have been the result of dumb luck more than anything else. The agent had also passed every identity check in Meta's system. That exposes some pretty serious fundamental gaps in enterprise identity and access management. Mike, I'm curious, how close are most companies to having this kind of thing happen to them?

Mike: I don't know, but it's certainly a very viable thing. This is why I said in a recent episode, you have to listen to IT. There's a reason why some enterprises are moving really slow, especially when it comes to adoption of agents. We talked about the Jensen Huang thing where he was like, "Open Claude is like the ChatGPT model." I was like, "Okay, maybe, but you know how hard it's going to be in enterprises to do anything close to what that does." This is the exact issue. We just talked on episode 203 about something similar happening with Amazon, where it just went rogue and started doing everything. I joked at the time like we could just do a rogue AI agent segment every week. This is going to be a recurring theme. It's going to become a major issue.

The concerns around oversight and governance of these agents and these agent swarms that are just given access to stuff and the breakdown you might then see in permissions controls are real. We had this conversation at our own company meeting. It's like, "Can we connect this to that? Can we connect that to this?" And it's like, "No, because I don't know yet the risks associated with that." The tech can do things, but it doesn't mean you should let the tech do things because there are so many potential risks. This is a crazy one. You should go read the articles about it; it's pretty nuts.

Paul: I almost found this was more notable because it happened—it wasn't some super incredible agent giving access to your whole code base. It was just a totally unintended consequence of something that's actually probably a pretty normal use case on the surface: saying, "Hey, let me use AI to analyze a question one of my colleagues posted on a forum." And then you're like, "Oh no, I realize that now this thing can choose what to do and how to do it." That's a totally weird way to start thinking here.

Mike: Yeah, and again, go listen to Andrej Karpathy on the No Priors podcast episode and you'll understand this stuff at a deeper level. He talks a lot about these risks and even himself not knowing. He talked about setting it up to run his house. He was like, "Oh yeah, I gave it access to—I was like, go find Sonos." And it goes into his network and finds the Sonos speakers and then he goes to the security cam and he just gave it access to everything. He calls it "Dobby the home elf." It's hilarious. This is a recurring theme. It's really important to understand where agents are going, where these agent swarms are going, how they'll eventually be used to run organizations, and how some people are willing to be out on the edges right now setting these things up and connecting them to their own company data. We're all going to learn plenty of lessons from their early efforts.

Anthropic vs. Pentagon Continues

Paul: All right, in our next rapid fire topic this week, the Anthropic vs. Pentagon saga continues. The Department of War has fired back at Anthropic's lawsuits in a 40-page filing in California federal court. The Pentagon calls Anthropic an "unacceptable risk to national security," arguing the company might attempt to disable its technology or preemptively alter the behavior of its model during war-fighting operations if its corporate red lines are being crossed. Recently, nearly 150 retired federal and state judges appointed by both Republicans and Democrats have also filed their own amicus brief supporting Anthropic. We talked last week about how tech companies like Microsoft and Apple have all filed their own briefs basically arguing that this designation of Anthropic as a supply chain risk could mean the entire government procurement system becomes contingent on political favor rather than the rule of law.

The big piece here is really this idea that a bunch of ex-judges are coming out and saying that they also support Anthropic in this. We have talked about if this is going to get resolved anytime soon. There's a hearing on whether or not to grant Anthropic some temporary relief that's actually set for March 24th, the date this comes out. Where do we stand with this?

Mike: I don't know all the context, but I think it's just still this "he said, she said" thing where the government's saying one thing and they're doing the other thing behind the scenes, but they're trying to give this perception that they're all in the right and Anthropic is this horrible company and this huge risk. There was a tweet thread from Roger Parloff, who's a senior editor at Lawfare. He said some Anthropic updates on March 4th: just hours before Hexeth declared Anthropic a supply chain risk allegedly due to threats of sabotage and data exfiltration, his undersecretary wrote Anthropic—and he had the screenshot of the email—that they were very close to a deal, asking to change a prepositional phrase.

While Hexeth's getting ready to go on and blast them on X and say they're done, they're actually still negotiating behind the scenes and they have screenshots of it. Then since then, the government has claimed that Anthropic sought a veto over Department of Defense actions, but two top Anthropic officials assert it never did, and this is actually legally—they submitted a briefing saying this is not what happened. Similarly, the government's purported fear that Anthropic might disrupt the military was never raised with the company and is a technical impossibility, so they actually explained, "We can't even do the thing they're claiming we would do."

As for Anthropic's refusal to allow its product to be used for autonomous lethal warfare and mass surveillance, Hexeth himself said those concerns were understandable and the commander of the US CENTCOM echoed those sentiments, according to Anthropic's head of policy. They submitted these briefs saying they agreed with us; we weren't even raising something that they didn't themselves think was an issue. In the government's response Tuesday, it backed away from the secondary boycott Hexeth called for in his February 27th "final decision" post on X, admitting it was lawless but also taking no responsibility for its devastating impact. The hearing is coming up on March 24th. These are legal declarations from Anthropic's head of policy, Sarah Heck, submitted as part of their response to the case, and also their head of public sector. They're basically saying, "Well, here, I'll testify to this never happened or this is what they said." The whole thing has become this political thing; it's become a battle of egos on the government side. I think everyone sort of sees through that, why they're actually doing this, and we'll see what the courts have to say.

Paul: It's impossible to tell, but based on that new context, it almost sounds like there's one possibility where Hexeth jumped the gun on tweeting about this when they were nearing a deal. He claimed some power that they actually don't have, where he's posting so aggressively when the deal's almost done before this all blows up, and now it's just doubling down on a mistake. Or you're just going to do harm either way, so you don't really care if it's legal or not. You don't consider repercussions; nothing's going to happen to me if I do this and say this other than hurt that company and try and use it as leverage to get them to do what I want them to do, which is not an unusual political tactic.

DeepMind’s New AGI Scorecard

Paul: All right, next up, Google DeepMind has published a cognitive framework this week that attempts to answer the question: if AI actually achieved AGI, how would anyone know? The team here proposes a cognitive taxonomy with what they claim are 10 measurable traits of general intelligence on which to measure AI and its progress towards AGI, and it's divided into two categories. This first category covers eight building blocks of human cognition: perception, generation, attention, learning, memory, metacognition, and executive functions. These combine to form two composite faculties that DeepMind considers equally important, which are problem-solving and social cognitions. They basically define these as the ability to process and interpret social information and respond appropriately in social situations.

Their proposed test here is pretty straightforward. They want to run AI models and humans through the same cognitive benchmarks, and then they theorize you'd get a measurable estimate of when a single AI can meet or exceed human capabilities across all 10 of these areas. DeepMind actually launched a Kaggle hackathon with a $200,000 prize pool to crowdsource evaluations for the five areas where the gap between testing capabilities right now is the largest: learning, metacognition, attention, executive functions, and social cognition. They say their goal is to move the conversation around AGI from one of subjective claims and speculation towards a grounded, measurable scientific endeavor. Paul, does this change anything about how we talk about AGI? Are we getting any closer to really defining what it is and actually measuring it?

Mike: Google's DeepMind has done the best job of trying to get to that point. They had a paper last year trying to define the different general capabilities and performance and trying to put some way to measure it. I like the effort to try and quantify it, making it more meaningful and trying to get some eventually universal agreement on what it is. The first thing I thought when I saw this is: how do you not saturate these tests? When the models eventually learn what the tests are, I don't know how they would do that to keep them sandboxed so the model doesn't end up in the training data. It eventually learns how to look like it has AGI because it just learned what the test was ahead of time.

The most important thing for our audience is that we just keep coming back to this: AGI is a really interesting topic; it's fascinating to follow along progress towards it. It's a meaningless term related to what it does to impact your job, your company, and the economy more broadly. We don't need to reach AGI, whatever that definition is, and we don't need to agree on a definition for AI to transform businesses, the economy, and society. This idea of capabilities overhang—we talked about that Andrej Karpathy episode that touches on this quite a bit—but just go back to that example I shared of rocks. If you have a company like ours that knows this stuff, we understand what AI capabilities are and we look at the operating system of our company and we're like, "Oh, we're just going to reimagine the whole thing." Rather than five rocks a quarter, we think we can do 15 or 20 easily. Here's how we're going to do it.

We understand the capabilities and we're applying them to the best of our ability. Then take some other company that doesn't even have generative AI tools for their team yet. They don't even have Copilot licenses or ChatGPT licenses. They've done no personalized training; they've never run a workshop internally. They're not even taking advantage of any of the capabilities other than maybe using it as an answer engine or a chatbot. There's this overhang of all these capabilities and so few companies are actually doing anything with them—not just companies, but educational institutions, governments, and practitioners at an individual level. That to me is the most important thing. I'm all for this; I think quantifying it so we can just get to the point we agree on what it is makes total sense, but don't be misled by that or wait around for that definition. Don't say, "Okay, I'll worry about it when we get closer to AGI." It's already there.

What 81,000 People Want from AI

Paul: All right, next step, Anthropic has published results from the largest multilingual qualitative study ever performed on AI attitudes. They did nearly 81,000 interviews with Claude users across 159 countries and 70 languages. These conversations were actually conducted by an Anthropic interviewer, a variant of Claude trained specifically to conduct and then analyze interviews, which we have talked about in past episodes. Interestingly, they found out that the top fear people expressed in these conversations is actually hallucination and unreliability of AI, which ranks as the number one concern with 26.7% of people mentioning it. It is ahead of jobs and economic impact, which is at 22.3%, and loss of human autonomy and agency at 21.9%.

Key Takeaways

Interestingly, Anthropic finds that people value AI often for the same capabilities that they fear most. 50% of respondents experience time savings from AI, yet 19% felt pressured to simply work faster as a result. 33% cited learning benefits, while 17% worried that it would actually facilitate more cognitive decline when you're relying on machines to think for you.

It is interesting that people experiencing one side of a tension are typically three times more likely to also worry about the other side, meaning these are inherent contradictions in the same people using AI. Now, what's really cool is they actually asked what people want from AI. 18.8% of those who answered said that they seek first professional excellence from AI. 13.7% said they were seeking personal transformation. 13.5% said better life management, and 81% report experiencing some progress towards their vision in those areas. Paul, I'm interested what you took away from this data. Pretty interesting way they went about getting it.

Paul: The approach to research is what I found most intriguing. The data is great. I do think again, as I referenced earlier, you have to keep in mind who are the people responding to these questions when you look at the data. So you're not just making some broad assumptions. In December of 2025, before Claude Code really took off and before the government issues and before this movement toward the Claude app became the number one app on the App Store, they have a heavy technical user base.

Lots of coders, lots of AI researchers using Claude. So when you're looking at this, even though it's 80,000 people across all these countries, it's still likely skewed toward a more technical user. For reference sake, that's important to keep in the back of your mind. I love the approach, this dynamic approach based on responses that adapts it. It is not great news for people who run focus groups and who are consumer research people for a living. This is definitely one of those ones where you're either adapting or the whole new way of doing research is going to run you over.

They said their next Anthropic interviewer study launching shortly to a small subset of Claude users focuses on Claude's effects on people's well-being over time, whether Claude is actually making people's lives better in the ways they want and how it could do so more effectively, which I thought was interesting. And then they said this is a new form of social science that is qualitative research at a massive scale, and we're in the early stages of learning how to do it.

Surveys and usage analysis tell us what people are doing with AI, but the open-ended interview format helps us get at a why. Conducting this research has moved us and challenged us. We did not expect so many deep, open, and thoughtful responses. By far, the most common reflection from our team was that it was viscerally moving to see Claude impacting people's lives for the better, and equally motivating to hear their concerns. We were equally gripped by the fears and downsides, people saying that the same availability making Claude useful is what makes it hard to put down, or knowledge workers worrying about outrunning AI's economic impact.

When you come into contact with this much raw human experience, it knocks you sideways, they said. The usefulness is real, and the question for all of us is how to claim the benefits without incurring undue costs.

I thought that was really interesting to note, Mike, because this actually came up during our company retreat, this idea that we're all at the frontiers of figuring all this out and using it, and it's awesome for productivity and innovation and efficiency and growth and all these things. But it also has this very messy, complicated other side where it has this human impact, and maybe your friends or your family hate it, and they don't even like the fact that you're working on it, and they have these perceptions about what you're doing because you're in AI or because you're one of the people who talks about AI. I honestly think about that sometimes from what we do on the podcast, Mike, where I think, "God, I hope at some point people don't... we're trying to do the human-centered work. We're trying to educate people so we can have a positive outcome." But sometimes the truth doesn't matter, and I do worry about that. It's part of the reason I don't read comments on social ever. I don't look at our comments on YouTube and X and maybe sometimes LinkedIn, but I just prefer to try and just do our thing and know we're trying to do a positive thing. But that doesn't change the fact that there's darkness to this, and there's uncertainty and fear and anxiety and hatred, and all those things are very real. So I'm really excited actually that Anthropic's going this research direction.

Mike: Yeah, that's why I really actually like the findings here. Obviously, to your point, they are skewed towards a certain type of people. But when someone asked at our offsite how do you stay grounded when you're dealing with such heavy and sometimes horrible dark AI topics in the news, that was my answer: focusing not to the detriment of the negative, but focusing on the positive things I've been able to do with these tools. I've been able to do things, achieve goals, get results that I never dreamed possible. Genuinely, this technology has made me a better professional, leader, thinker, strategist, even husband and father. So that's the flip side. I love to see in this data people saying, "Hey, I'm using this, I'm trying to get out of AI professional excellence or personal transformation or better life management." I've done all those things with AI, and it is glorious what you're able to do.

It doesn't get rid of the negative stuff or the concerns, but it's trying to focus on the positive. It is something we're going to come back to, too. We brought on our director of research a few months back, and one of the focus areas she has is actually on the humanity side of this. Mike and I and Taylor are actively talking about more research in these directions and the kinds of things around the human impact. It is something we're going to probably be doing a lot more about on the show and then even with our academy is starting to talk about that stuff. Very important. And maybe even on our event side, we might be looking at doing some stuff where we can bring people together to have these conversations because they're critically important.

Well, as we wind down, Mike, we had mentioned at the start the AI for professional services, which you taught as part of our academy. One of the ideas we have is to do little spotlights on these where, without having even to take it, we give you a little bit of insights into some of the key things we learned in building these courses. Mike, your AI for professional services, any key insights or takeaways that you think would be helpful for people to hear?

AI Academy Spotlight

Mike: Yeah, sure, Paul. So as part of this four-course series, which comes with its own certification, we're breaking down both from a high level what is happening at the industry level that you need to know about, and then getting into the actual tactical A to Z of here's how you identify your own use cases and match AI tools to them in your own professional services career. A couple things that just jumped out as part of both building this course and as someone that was in professional services before we did the whole AI thing is number one, and we've talked about this on the podcast. One of these trends that really needs to be appreciated is the idea that the billable hour model is maybe not only on borrowed time, but is dead.

If you were on a billable hour model as a professional services organization, AI is a major threat to that because many organizations still have not adequately figured out what happens when you can now do things in a fraction of the time that you used to do them in using AI. You cannot simply charge the same amount of hours and hope to get away with it. So you see a lot of industry professionals and leaders trying to figure out how do we adapt our business model without tanking our entire organization. One of the big takeaways there is the firms that are going to win are the ones that figure out sustainable, defensible, value-based pricing first. Pricing on outcomes, not hours, because again, you can do so much more in the same amount of time. There's no chance your clients, your customers are not going to demand that you pass along those savings to them.

I would also say another big area here is figuring out how your human intelligence within your professional services firm becomes your superpower and your competitive advantage. Because, unfortunately, for a lot of professional services firms, there are very intelligent AI models out there that now have been, for better or worse, trained on a lot of your expertise. So figuring out how your humans, with all their experience and background and domain expertise, can actually be leveraged and scaled with AI is going to be the entire battle moving forward. You really want to look almost at any frameworks, any experience you have internally as almost like your own IP if you're not already, because AI can scale that, and you can have that be a competitive advantage. But if you do not do that, if you are playing at the commodity level of, "Hey, we're experts in marketing," so is AI now. You have to figure out what kind of expert you are and how you are differentiated.

And then, last but not least, there's always these questions in professional services about like, "We'd love to get started with AI, but we work in really sensitive industries with clients that have privacy and data concerns about using this stuff. We have not figured that out yet." Totally valid. We talk about that more at length in this course series, but the advice here is actually start with your back office stuff. If you have these kinds of challenges, if you are still trying to navigate data and privacy concerns, your back office stuff, I guarantee you, can become dramatically more productive by applying AI, often at a very low-hanging fruit type of level. We go into very specific use cases and tools in the course series to help you do that. But there are these areas where they don't touch client-facing stuff that you can actually start your AI journey almost in the back office and achieve massive immediate profitability gains just from doing that alone.

Paul: Yeah, and the other thing I think about, Mike, is just from the buyer perspective, understanding professional services and how it's evolving and how I should be looking for AI-forward professional services firms. So even for me as the CEO, we outsource legal, IT, accounting, advertising. We work with an advertising partner. I just think about just those four. Understanding how their business models are evolving and the importance of working with AI-forward versions of those companies and the points of contact, things like that. It is great and I appreciate you building the series and this ongoing effort we're doing to try and create content across all the departments, all the relevant industries, and then even into businesses. We try and make that stuff super relevant for people. Hopefully these little spotlights will be helpful for people to get a little taste of what's going on in these different industries. We'll touch on departments, we'll touch on some of the GenAI things we're doing and just try and bring some of that value from Academy to the podcast each week.

Mike: All right, Paul, we've got a number of AI product and funding updates here to wrap up this week. I'm going to run through these. If there's anything that jumps out to talk about further, let's do it.

AI Product and Funding Updates

Mike: First up, Jeff Bezos is trying to raise a hundred billion-dollar fund focused specifically on AI manufacturing. This fund would represent one of the largest single pools of capital ever assembled around AI infrastructure.

Google has launched something called Stitch, an AI design tool that turns natural language prompts into high-fidelity UI design. The tool lets you describe what you want in plain English and generate production quality design output. Google is in this emerging "vibe design" category. Google also rebuilt AI Studio from scratch as a full-stack vibe coding platform. They said they spent four months on this rebuild and the new version lets developers go from prompt to working application entirely within AI Studio.

OpenAI has released smaller, cheaper tiers of GPT-5.4. GPT-5.4 mini and nano give developers access to the model family at lower cost and latency. In some other legal news, a court temporarily allowed Perplexity's AI shopping agents to continue operating on Amazon. Perplexity's agents browse Amazon on behalf of users to find and purchase products and this ruling lets the service remain live while their ongoing legal dispute with Amazon plays out.

On X, the company is rolling out AI-generated article summaries that appear when users share links on the platform. Researcher Ethan Mollick noted the irony that many of the articles being summarized are themselves obviously AI-generated. So we're creating an interesting loop where AI summarizes AI.

Paul: And then it trains the Grok language model. Part of the reason why they made articles such a prominent feature is to get a lot more training data that was probably trained with them potentially.

Mike: And finally, and we'll be keeping a close eye on this one, Demis Hassabis, CEO of Google DeepMind, Nobel Prize winner, is teasing his upcoming book called The Infinite Machine, set for release on March 31st. It covers the story of DeepMind and Hassabis's vision for the future of AI. I'll be looking very closely at that one, Paul.

Paul: This one I did pre-order. This is a good way to end today's podcast. I'm actually going to read the excerpt because I think this is really fascinating. This comes from The Infinite Machine:

"The true reason to build artificial intelligence," Hassabis was now saying, "went beyond Kant and Feynman. The goal was to draw closer to what might be called God, to the intelligence that may presumably have designed everything around us."

Hassabis quote: "I am first and foremost a scientist. My goal is to understand nature. But doing science is sort of like reading the mind of God. Understanding the deep mystery of the universe is my religion, kind of. We humans, we have these faculties. The world is understandable, but why should it be that way? I think there is a reason. Computers are just bits of sand and copper," Hassabis continued, now sounding more urgent. "Why should these combine to do anything? It's absurd. The electrons move around and then that creates an AI system that can defeat a Go master. Why should that be possible? This table," Hassabis rapped his palm on it for emphasis, "why should it be solid? This is beyond evolutionary coincidence. We can build electron microscopes and interrogate reality down to the most minute detail. We can build systems that detect black holes colliding more than a billion years ago. What is this? What the hell is going on here?"

There was a pause, but Hassabis was not yet finished. "I sit at my desk at 2:00 a.m. and I feel like reality is staring at me, screaming at me. Literally screaming at me, trying to tell me something if I could just listen hard enough. That's how I feel every day. So you can see why I'm trying to build AI. I felt that since I was very young, that there's a deep, deep mystery about what's going on here. You can frame it how you want. You can call this God's design or you can say it's just nature. I'm open-minded about the description and I don't know what the answers will turn out to be. But at the moment, we don't really know what time is or gravity is or any of these things. So there's a mystery waiting to be solved and it encompasses just about everything. I would like to understand before I croak. I would like to understand and then I'm perfectly fine to shuffle off my mortal coil."

That was awesome. Incredible. As we've said on the show many times, Demis thinks very deeply about this. Elon actually commented on that one. He's like, "I share Demis's urgency here and thoughts here." So I think it's important to understand why one of the people, one of the five, is building AI. And it is for a much bigger solve: solve intelligence and then solve everything else. That's been his mission for the last 30-plus years of his life.

Mike: Incredible. All right, Paul, just one quick note here as we wrap up. Go to smarterx.ai/pulse to take this week's survey. We're going to ask a couple questions about the topics this week. One is about OpenAI's enterprise deployment with that private equity backing we discussed. The second one is about Anthropic's study and some of the findings there and how you feel about them. We'd love to hear from you. And Paul, really appreciate you breaking down everything for us this week.

Paul: Yeah, good stuff. Busy week as always. I think we just have one episode this week. I got to check my calendar yet. Maybe we have a second one, but we'll be back next week and then I think I'll be on spring break then for 10 days. So we might be on a break after next week. Thanks for being with us. Have a great week, everyone, and we'll be back with you next week.

Key Takeaways