Andrej Karpathy on why we're overpredicting timelines, what it actually takes to build AI employees, and why the path to AGI looked nothing like we expected.
We're in the hype cycle again. "2024 is the year of agents," people said. Karpathy pushed back: it's the decade of agents. Not because he's pessimistic. Because he's seen this movie before.
Here's what 15 years in AI teaches you: the problems are tractable, but they're still hard. We have impressive early agents—Claude, Codex, ChatGPT doing computer use. But thinking they'll replace your employees this year? That's not how this works.
Karpathy's been through multiple seismic shifts in AI. He was there when neural networks were a niche curiosity. When everyone thought game-playing RL would lead to AGI. When OpenAI tried to build agents way too early. The pattern? We keep trying to skip steps.
Let's talk about what actually needs to happen.
Strip away the buzzwords. An agent is something you'd hire as an employee. Or at minimum, an intern.
You work with people. When would you prefer Claude or Codex to do that work instead? Right now, you don't. Why not?
"The reason you don't do it today is because they just don't work. They don't have enough intelligence, they're not multimodal enough, they can't do computer use and all this stuff. They don't do a lot of the things you've alluded to earlier. They don't have continual learning. You can't just tell them something and they'll remember it. They're cognitively lacking and it's just not working."
— Andrej Karpathy
This is the gap. Not "we need 10x more compute." Not "we need better prompting techniques." The fundamental capabilities aren't there yet.
The gap between today's AI tools and actual autonomous agents
It's going to take about a decade to work through all of these issues. Not because the problems are impossible. Because they're just hard.
Karpathy's watched AI go through multiple major shifts. Each time, people tried to jump directly to the end goal. Each time, they learned you can't skip the foundation.
When Karpathy started in AI, neural networks were niche. He was at University of Toronto next to Geoff Hinton—the godfather of AI—training neural nets while most of the field did something else entirely.
Then AlexNet hit in 2012. Suddenly everyone was training neural networks. But it was still very per-task. You had an image classifier. Or a machine translator. Individual components, not full systems.
Around 2013, the field got excited about agents through a specific lens: game-playing reinforcement learning. Atari games with deep RL. The idea was compelling—agents that perceive the world AND take actions AND get rewards.
OpenAI went all-in on this. For 3-4 years, everyone was doing RL on games. Beat games, get good at different types of games, surely this leads to AGI.
"That was all a bit of a misstep. What I was trying to do at OpenAI is I was always a bit suspicious of games as being this thing that would lead to AGI. Because in my mind, you want something like an accountant or something that's interacting with the real world. I just didn't see how games add up to it."
— Andrej Karpathy
Karpathy worked on Universe at OpenAI—an agent using keyboard and mouse to operate web pages. The right idea: interact with the actual digital world, do knowledge work. But way too early.
Why? You can't just stumble around clicking and get sparse rewards. You'll burn a forest of compute and learn nothing. You need representations first.
This is the crucial insight that took the field a decade to internalize. You need powerful representations before you can build agents.
AI's evolution: Each shift taught us you can't skip the foundations
Today's computer-using agents work because they're built on LLMs. You get the language model first. You get the representations. Then you add agent capabilities on top.
The field tried to get the full thing too early, multiple times. Karpathy's own experience at OpenAI was part of that. Now we know better: you have to build the stack in order.
This is where Karpathy's intuition comes from 15 years of watching predictions versus reality.
He's seen people overpredict constantly. He's seen breakthrough moments that felt transformative but still took years to mature. He's worked in both research and industry and has developed a sense for how long hard problems actually take.
The problems we need to solve for real agents are tractable. They're surmountable. But they're difficult.
Think about what's still missing:
Each of these is hard. Not impossible. Not requiring new physics. Just hard engineering and research problems that take time.
If you average it out across all these challenges, accounting for the pace of progress Karpathy has observed over 15 years, it just feels like a decade.
Here's what's interesting: we're not waiting for a decade to start using agents. We have early versions that are extremely impressive. Karpathy uses Claude and Codex daily.
But expectations need calibration. These are tools that augment specific workflows. They're not replacing employees this year. They're getting better incrementally, and that's wonderful.
The next decade will be about systematically addressing each limitation. Making agents more reliable, more capable, more aligned with how we actually work.
People building now should focus on narrow, well-defined use cases where current capabilities are sufficient. Don't try to build the general assistant yet. Build the thing that does one part of the workflow really well.
Then expand as the underlying models improve.
AI has gone through seismic shifts with surprising regularity. Karpathy thinks there will continue to be more.
Each shift reorients the entire field. Everyone suddenly looks at things differently. But the pattern is clear: you can't skip steps.
We tried to build agents with Atari games before we had good representations. Didn't work.
We tried to build agents with keyboard and mouse on web pages before we had language models. Way too early.
Now we have LLMs. Now we have the foundation. Now agents can actually start to work.
But "start to work" doesn't mean "replace your employees." It means we have the right base to build on. The rest takes time.
You need the foundation before you can build the layers on top
There's a parallel here to how Karpathy thinks about teaching. He's become famous for his educational content—making complex AI concepts accessible through courses and tutorials.
His philosophy: find the small-order terms. Serve them on a platter.
Take micrograd—100 lines of interpretable Python that captures everything you need to understand neural network training. Everything else is just efficiency.
"So micrograd is 100 lines of pretty interpretable Python code, and it can do forward and backward arbitrary neural networks, but not efficiently. But the core intellectual piece of neural network training is micrograd. It's 100 lines. You can easily understand it."
— Andrej Karpathy
This is the same principle as building AI systems: you can't skip to the end. You need to understand the foundation first.
Education is about untangling knowledge and laying it out so everything only depends on the thing before it. You can't understand transformers without understanding attention. You can't understand attention without understanding the problem it solves.
Present the pain before the solution. Let people try to solve it themselves. Then they appreciate why the solution works.
Same with AI development. You can't build agents before you have representations. You can't build representations without massive compute and data. You can't know what you need without trying and failing.
The field learned by doing it wrong first. That's not wasted time—that's how you figure out the right order.
It's not the year of agents. It's the decade of agents.
We have impressive early versions. They'll get better every year. But expecting them to replace employees in 2024? That's the same mistake as thinking Atari games would lead to AGI.
The problems are tractable. The direction is clear. But there's real work to do. Continual learning, multimodality, robust computer use, long-term memory, real grounding—each of these is years of research and engineering.
The people building now should celebrate the progress while staying realistic about timelines. Build for today's capabilities. Expand as models improve. Don't bet the farm on capabilities that don't exist yet.
And remember: we keep trying to skip steps. Neural networks had to mature before RL agents made sense. LLMs had to mature before real agents made sense. What comes next will require agents to mature before the next thing makes sense.
That's not pessimism. That's pattern recognition from someone who's watched it happen three times already.
Bottom line: Real agents need continual learning, robust multimodality, reliable computer use, long-term memory, and real-world grounding. We have the foundation now with LLMs—something we didn't have when we tried to build agents with Atari games or keyboard-and-mouse. But building on that foundation takes time. Not because the problems are impossible. Because they're just hard. History shows we keep trying to skip steps. We can't. It's a decade of work, and that's okay. The early versions are already useful. They're getting better. We're working with these things for ten years, and it's going to be wonderful.