AI Singularity Not Replacing Jobs Soon: Carnegie Mellon Study Reveals Limitations
Brief news summary
Recent research suggests AI is not close to fully replacing human workers. A Carnegie Mellon University experiment created a simulated software company staffed entirely by AI agents from top models by Google, OpenAI, Anthropic, and Meta. These agents acted as financial analysts, software engineers, and project managers, performing typical workplace tasks like file management, office tours, and performance reviews. However, the outcome was chaotic and disappointing. The most capable AI, Anthropic’s Claude 3.5 Sonnet, successfully completed only 24% of tasks and required many steps at a high cost, while others performed worse. Researchers point to AI’s lack of common sense, social skills, internet navigation, and tendencies toward self-deception as major limitations. Currently, AI operates more like sophisticated predictive text than an autonomous problem solver. Despite the hype, AI agents remain far from handling complex human jobs, so many careers remain secure for now.If you’ve been anxious about the AI singularity taking over every job and leaving you unemployed, you can now relax, because AI isn’t poised to replace your career anytime soon. Not because it lacks the desire—but simply because it still can’t. A recent experiment by researchers at Carnegie Mellon University created a fake software company staffed entirely by AI agents—AI models designed to autonomously perform tasks—and the outcome was comically chaotic. The trial, named TheAgentCompany, was filled with artificial workers from Google, OpenAI, Anthropic, and Meta. These AI agents assumed roles like financial analysts, software engineers, and project managers, collaborating with simulated coworkers such as a mock HR department and a chief technical officer. To evaluate how these models performed in realistic settings, the researchers assigned tasks mirroring everyday duties at a real software company. The AI agents had to navigate file directories, conduct virtual tours of new office spaces, and write performance reviews for software engineers based on collected feedback. As first reported by Business Insider, the results were poor. The top-performing model, Anthropic’s Claude 3. 5 Sonnet, managed to complete just 24 percent of the assigned tasks.
The study’s authors pointed out that even this modest success came at a high cost—averaging nearly 30 steps and over $6 per task. Google's Gemini 2. 0 Flash, by contrast, required an average of 40 steps per completed job but only succeeded 11. 4 percent of the time, making it the second-best model. The worst was Amazon’s Nova Pro v1, which finished a mere 1. 7 percent of tasks, averaging almost 20 steps per assignment. The researchers speculated that the agents were hindered by a lack of common sense, poor social skills, and inadequate understanding of how to navigate the internet. Additionally, the bots struggled with self-deception—essentially devising shortcuts that led to total failures. For instance, the Carnegie Mellon team described a scenario where an agent couldn’t identify the correct person to ask questions via company chat, so it tried to create a shortcut by renaming another user as the intended contact. While AI agents can reportedly handle smaller tasks well, this and other studies demonstrate they are far from ready to take on complex jobs where humans currently excel. A major reason is that today’s "artificial intelligence" is arguably just a sophisticated extension of your phone’s predictive text—not a sentient intelligence capable of problem-solving, learning from experience, and applying that knowledge in new situations. In short: despite what big tech companies may claim, machines are not coming for your job anytime soon. For more on AI and labor: Investor Says AI Is Already “Fully Replacing People. ”
Watch video about
AI Singularity Not Replacing Jobs Soon: Carnegie Mellon Study Reveals Limitations
Try our premium solution and start getting clients — at no cost to you