The Age of AI: A Call to Experiment

The world is changing at an incredible pace, and the best way to keep up is to engage directly with artificial intelligence. Make it a daily practice to try something new with AI. Don't limit your imagination; whatever you can envision, use AI to see how far you can get. This hands-on approach is not just a learning exercise—it's how we will collectively build a better world.

Even just eight months ago, building an iPhone app with GPT's help over a weekend was a revelation in speed and ease. Today, the process is even faster and more accessible. The key is to explore the boundaries of what these tools are capable of and build the things you want to see exist.

What is Poetiq?

Ian Fischer, co-founder and co-CEO of Poetiq, is at the forefront of this evolution. Poetiq is building recursively self-improving AI reasoning harnesses for LLMs. Previously a researcher at Google DeepMind for a decade and a YC founder, Fischer is now focused on what he sees as the holy grail of AI: systems that make themselves smarter.

Recursive Self-Improvement Explained

At its core, Poetiq is built on the principle of recursive self-improvement, where an AI system actively enhances its own intelligence. The company's key insight was discovering a way to achieve this far more quickly and cheaply than existing methods.

Most approaches to self-improvement require training a new Large Language Model (LLM) from scratch—a process that costs hundreds of millions of dollars and takes months. This costly cycle is often rendered obsolete when a company like Anthropic or OpenAI releases a more powerful model. Poetiq's method avoids this trap.

The Fine-Tuning Trap

For a startup, relying on fine-tuning a specific model is a dangerous game. You might spend millions of dollars to create a specialized model, only to have the next frontier model from a major lab outperform it instantly. Your investment is essentially lost.

Poetiq offers a different path. Instead of being locked into a fine-tuned model, their system provides a layer that works on top of any underlying language model. This ensures that you always have a system that is better than the out-of-the-box version, regardless of new releases.

"Stilts" for LLMs

Fischer describes Poetiq's technology as building "stilts" to stand on top of foundational models. They don't see frontier models as competitors but as the essential layer upon which their system operates. Poetiq provides the critical inches that make a model the smartest, and in the competitive world of AI, those inches matter immensely.

The system Poetiq has built can automatically generate solutions for specific problems that consistently outperform the underlying LLMs without the massive expense. For any company building on top of language models, this is incredibly valuable. When a new model like GPT-5 comes out, the same harness remains compatible, delivering an even greater performance boost without requiring any changes.

A Record of Outperformance

Poetiq has repeatedly demonstrated its capability by achieving top results on difficult industry benchmarks.

Taking the Top Spot on ARC-AGI

When Poetiq emerged from stealth, it aimed to prove its system could tackle exceptionally hard problems. They targeted the ARC-AGI leaderboard, where Google's Gemini 3 DeepThink had just set the record at 45%. Two days later, Poetiq released its results, achieving 54%—a 9 percentage point improvement.

Remarkably, they did this at half the cost by building on top of the much cheaper Gemini 3 Pro model. While DeepThink's run cost over $70 per problem, Poetiq's cost was only $32. This ability to place "stilts" on any model allows them to consistently achieve state-of-the-art (SOTA) performance.

Beating Claude on Humanity’s Last Exam

More recently, Poetiq tackled Humanity's Last Exam, a set of 2,500 questions so difficult they are meant to challenge PhDs in their respective fields. AI has not yet passed it, but Poetiq achieved a score of 55%.

This result surpassed the previous record of 53.1% set just a week prior by Anthropic's Claude Opus 4.6. Furthermore, the entire optimization process for this achievement cost Poetiq less than $100,000—a fraction of the hundreds of millions typically spent on training large foundation models. This was all accomplished by a team of just seven researchers and engineers.

How the Meta-System Works

Poetiq's success isn't magic; it's the result of a core technology they call the Poetiq meta-system. This system is itself a recursively self-improving AI whose output is other systems designed to solve hard problems—the kind of problems where a base model like GPT-4 would struggle to provide a reliable result.

These generated systems, or harnesses, are a combination of code, prompts, and data built on top of one or more language models. While one could theoretically build such a harness by hand, it requires immense effort and insight. The meta-system automates this process, generating highly effective agents far more quickly and cheaply than a human team could.

This automation also allows Poetiq to optimize existing agents. A startup that has already built an agent can bring it to Poetiq, which can then improve its performance by optimizing its prompts, reasoning strategies, or other components.

Beyond RL: A New S-Curve

This paradigm is distinct from reinforcement learning (RL). It represents a new S-curve of performance. As the Poetiq meta-system improves and the underlying LLMs get better, the performance ceiling keeps shifting higher, pushing closer to AGI or superintelligence.

The goal is to hit that ceiling first. For many startups, prompt and context engineering is a manual, iterative process of tuning evals and stuffing context. Poetiq automates this. The meta-system analyzes the data, identifies failure modes, and determines whether to add more context, generate new examples, or develop different reasoning strategies to achieve better performance.

Automating Prompt Engineering

The prompts generated by the meta-system are often not what a human would write. During the ARC-AGI work, the system produced some unexpected and even slightly incorrect examples, but the team left them untouched. This hands-off approach outsources data analysis to the AI itself, allowing it to discover the most robust strategies.

While automated prompt optimization offers some performance gains, the most significant improvements come from optimizing reasoning strategies. In one internal experiment at DeepMind, manually optimizing prompts for a hard problem took performance from a baseline to 5%. However, adding optimized reasoning strategies on top of that boosted performance from 5% to 95%. These strategies are written in code, not just better prompts, and represent the true power of Poetiq's approach.

From YC Founder to DeepMind Researcher

Fischer's journey into AI was not a direct one. His first YC startup, which focused on porting mobile apps across platforms, was acquired by Google over a decade ago. The acquisition gave him the freedom to reflect on his next steps. He realized his true passion lay in AI and robotics.

Despite a background in computer security and systems building, he joined a new AI robotics team at Google Research. He quickly learned that "hardware is hard" and pivoted his focus entirely to machine learning, a field he pursued for the next decade at Google and later DeepMind.

Early Access & Putting Your Agent on Stilts

Poetiq has not yet released its product publicly, but startups with a hard problem that they can't solve with existing tools are encouraged to reach out. By visiting poetic.ai, companies can sign up for early access.

The idea is to give any agentic company the ability to achieve state-of-the-art performance. The results on ARC-AGI and Humanity's Last Exam demonstrate Poetiq's dual capabilities: enhancing both complex reasoning and deep knowledge extraction. By using these "stilts," companies can effectively vaccinate themselves against the "bitter lesson," ensuring their work remains relevant and powerful as new foundational models emerge.