Why AI sucks at freelance work and real-life tasks: AI Eye

Mass unemployment from AI temporarily suspended

AI agents can’t complete 97% of tasks on Upwork to even a basic standard.

Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer.

The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks.

While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment.

So they won’t be causing mass unemployment for a while yet.

This backs up research from August at MIT, which found that 95% of organizations had zero return on the collective $30 billion they’d invested in AI.

Upwork
Ironically the model in this photo portraying a freelancer will probably be replaced by AI AI-generated one in future (Upwork)

Why humans still have the edge over AI

AIs are good at pattern matching and predicting words. But they’re currently pretty bad at building internal models of the world, according to WorldTest from MIT and Basis Research.

For example, humans have an internal model of their own kitchen in their minds, which allows them to determine where the knives are, how long it will take for the pot to boil, and to plan a sequence of actions resulting in a meal. But the testing showed that three frontier reasoning AI models suck at it.

AI
Man vs machine (Benchmarking World Model Learning)

The researchers created 129 tasks across 43 interactive worlds (spot the difference, physics puzzles, etc). The tasks required the AIs to predict hidden aspects of the world, plan sequences of actions to achieve a goal, and determine when the rules of the environment changed. Then they tested 517 humans on the same problems.

The researchers concluded:

“Our analysis reveals that humans achieve near-optimal scores while existing models frequently fail.”

Humans perform better on these sorts of tasks because we intuitively understand environments, revise beliefs in the face of new evidence, run experiments, start from scratch and explore strategically.

And adding more compute doesn’t always work, helping in only 25 out of 43 environments.



AI gets the news wrong 45% of the time

Research from the BBC and European Broadcasting Union found that ChatGPT, Copilot, Gemini and Perplexity also suck at reporting the news, failing against key criteria, including accuracy, sourcing, distinguishing opinion from fact, and providing context.

BBC
BBC logo (BBC)

— 45% of AI answers had at least one significant issue

— 31% were sourced incorrectly

— 20% were just wrong with hallucinated details and outdated info

— Gemini was by far the worst, with significant issues in 76% of its responses.

AI cover letters get the wrong people hired

The primary purpose of a cover letter is to distinguish between low-effort applications. Anyone spending a day writing a good cover letter in response to a job ad that shows knowledge of the company is likely to be diligent and motivated.

Unfortunately, new research on Freelancer.com suggests that AI-generated cover letters have completely compromised this signal, resulting in employers hiring fewer people, and often the wrong ones.

Compared to the days before AI, skilled workers in the top quintile for abilities are being hired 19% less often, and dumb bums in the lowest quintile are being hired 14% more often.

Paul Novasad
AI cover letters are bad (Paul Novasad)

New robot looks like a lady

Chinese EV manufacturer XPeng has unveiled theXPeng Iron female robot, which bears a striking resemblance to a human. It has a similar spine movement to humans, and skin stretched over soft 3D lattice structures mimics the human body.

It’s due to go into production early next year, but the company says it requires too much compute for use in the home, so it’ll likely be used for commercial applications first, like introducing cars to customers at Xpeng stores.

80% of ransomware attacks are made up

A new paper from MIT Sloan researchers and Safe Security makes the terrifying claim that 80% of ransomware attacks are AI-driven.

Rethinking the Cybersecurity Arms Race examined 2,800 ransomware attacks and concluded that “adversarial AI is now automating entire attack sequences,” creating malware, phishing campaigns and deepfake phone calls for social engineering.

Read also

Features

Off The Grid’s success shows ‘invisible’ blockchain is the winning play

Features

Why Animism Gives Japanese Characters a NiFTy Head Start on the Blockchain

But other ransomware experts say the statistic sounds like 100% bullshit.

Researcher Kevin Beaumont tracks ransomware online and says generative AI isn’t a major part of any of them.

“The paper is almost complete nonsense. It’s jaw droppingly bad. It’s so bad it’s difficult to know where to start.”

The researchers list long-defunct ransomware, such as Emotet and Conti, as “AI-powered” and incorrectly classify IBM’s DeepLocker as malware.

“The paper was so absurd I burst out laughing,” wrote researcher Marcus Hutchins.

David Sacks is worried about Orwellian AI

Crypto and AI Czar David Sacks told the a16z Podcast he worries that the censorship we’ve seen on social media and search engines in recent years will become thoroughly dystopian with AI models.

David Sacks
David Sacks (a16z)

“I almost feel like the term ‘woke AI’ is insufficient to explain what’s going on because it somehow trivializes it,” he says. 

“What we’re really talking about is ‘Orwellian AI.’ We’re talking about AI that lies to you, that distorts an answer, that rewrites history in real time to serve a current political agenda of the people who are in power.”

“To me, this is the biggest risk of AI… It’s not The Terminator, it’s 1984.”

Third time lucky for Coke Christmas ad

Coke got roasted for its AI-generated Christmas commercial last year (itself a remake of a 1995 ad), so they remade the remake to highlight how much AI video generation has improved in the past year. 

“Last year, people criticized the craftsmanship. But this year the craftsmanship is ten times better,” Pratik Thakar, global vice president and head of generative AI at Coca-Cola, told The Hollywood Reporter.

Well, maybe 10% better.

Read also

Features

Real AI use cases in crypto, No. 3: Smart contract audits & cybersecurity

Features

All rise for the robot judge: AI and blockchain could transform the courtroom

The 60-second commercial was spliced together from two- to three-second AI-generated clips. Five employees generated 70,000 individually generated clips, which means that about 2000 clips were generated for each clip used in the ad. But it only took a month to put together, compared to the year-long production of their live-action commercials.

A survey by Attest found that around 46% of consumers in the US, UK and Australia don’t like AI-generated imagery in ads.

Google’s Project Suncatcher and other Google AI news

— There’s not enough electricity available for the planned expansion of AI here on Earth, so Google has come up with the innovative idea of launching the data centers into space.

The search engine giant has unveiled Project Suncatcher, which offers a vision for “fleets of satellites equipped with solar arrays” benefiting from “near-constant sunlight.” 

Described as a “moonshot to one day scale machine learning in space,” it’ll launch two prototype satellites in early 2027 fitted with custom AI chips from Google’s ground-based data centers.

— Google CEO Sundar Pichai says the Gemini app now has more than 650 million monthly active users. That’s up from around 350 million users in March, and 90 million last October.

— The company was forced to pull its Gemma AI model from AI Studio after the bot claimed Senator Marsha Blackburn had pressured a state trooper for prescription drugs and engaged in non-consensual behavior. In a letter to Pichai, Blackburn said it was “not a harmless hallucination” but defamation. 

Andrew Fenton

Andrew Fenton is a journalist and editor with more than 25 years experience, who has been covering cryptocurrency since 2018. He spent a decade working for News Corp Australia, first as a film journalist with The Advertiser in Adelaide, then as Deputy Editor and entertainment writer in Melbourne for the nationally syndicated entertainment lift-outs Hit and Switched on, published in the Herald-Sun, Daily Telegraph and Courier Mail.

His work saw him cover the Oscars and Golden Globes and interview some of the world’s biggest stars including Leonardo DiCaprio, Cameron Diaz, Jackie Chan, Robin Williams, Gerard Butler, Metallica and Pearl Jam.

Prior to that he worked as a journalist with Melbourne Weekly Magazine and The Melbourne Times where he won FCN Best Feature Story twice. His freelance work has been published by CNN International, Independent Reserve, Escape and Adventure.com.

He holds a degree in Journalism from RMIT and a Bachelor of Letters from the University of Melbourne. His portfolio includes ETH, BTC, VET, SNX, LINK, AAVE, UNI, AUCTION, SKY, TRAC, RUNE, ATOM, OP, NEAR, FET and he has an Infinex Patron and COIN shares.

Source: https://cointelegraph.com/magazine/ai-research-freelance-work-news-real-life-tasks-ai-eye/?utm_source=rss_feed&utm_medium=feed%3F_t%3D1762509053427%26__%3D1762509053427%26cb%3Dl9wtrg&utm_campaign=rss_partner_inbound