ChatGPT-4o vs Claude 3.5 Sonnet — which AI chatbot wins?

Ryan Morrison

June 21, 2024 at 8:16 AM·13 min read

ChatGPT vs Claude (logos for each bot). — Credit: OpenAI/Anthropic/Future

Anthropic launched Claude 3.5 Sonnet out of nowhere this week, providing a significant upgrade on its predecessor and outperforming even its largest Claude 3 model Opus on many common benchmarks.

The company also claimed it could outperform OpenAI's flagship GPT-4o model, which powers both ChatGPT and Microsoft Copilot, on the most important benchmarks.

These are my two favorite AI tools. When it first launched my reaction to Claude 3 was that it was the most human-like AI I'd ever used. A small amount of testing of Claude 3.5 Sonnet also pushed it to the top of my best AI tools list.

However, I continue to be blown away by the capabilities of OpenAI's GPT-4o (although we haven't seen its full potential), especially in terms of vision.

To see if Anthropic's claims hold up to real-world scrutiny I created a series of tests for both models and was shocked by the result.

Creating the prompts

I wanted to find a balance between challenging the capabilities of models and offering up ideas that match real-world need for tools like Claude and ChatGPT.

There was also a need to ensure each prompt was something the bots could actually do and didn't favor one over the other in terms of capability.

1. Reading my handwriting

ChatGPT (left) and Claude (right) responses to a handwriting query

I have terrible handwriting. I’ve tried the Apple Pencil, a range of ‘paper’ tablets and other handwriting recognition tools and it barely understands more than a few words. For the first test I tried to write as clearly as possible and sent it to both bots as the entire prompt.

I wrote: “Write a haiku about a cute cat on a rock” and gave it to Claude 3.5 Sonnet and ChatGPT-4o with no other information. A win is writing the haiku without follow-up. A big win is writing a good haiku without any follow-up.

Both did exactly what I asked. Both easily understood my handwriting and both were reasonable haikus. Claude also included an explanation, whereas ChatGPT just gave the haiku. You can judge for yourself but while I think Claude was closer to the prompt, ChatGPT was more poetic.

ChatGPT:

Perched on ancient stone,
Whiskers twitch in the soft breeze,
Feline dreams aloft.

Claude:

Furry paws perched high
Whiskers twitch in warm sunlight
Rock throne for kitty

I then followed this up with another prompt where I gave both chatbots a quick shopping list scribbled in my less neat handwriting in a notebook. I gave them the image with the prompt: “What is this and what does it say?” Both identified all of the items.

Winner: ChatGPT-4o

2. Making a game in Python

ChatGPT (left) generated a simple game and Claude (right) had more features

Next, let's see how well each chatbot creates a functional game. I gave both the prompt: “Give me all the code for a functional and playable tower defense game in Python.” I then copied the code it generated in full into VSCode and ran it on my Mac.

I’m judging here on how playable the game is, how well it explained the code and whether it managed to add any interesting elements to the gameboard.

ChatGPT complains that it requires a “significant amount of code” but does offer up a basic example. It gives it in a series of short snippets you have to assemble. Claude offers up the entire code as a single copyable block.

I ran the ChatGPT code first and it gave me a green blob (tower) in the middle of the screen and a smaller red blog (enemy) moving across the screen. It was not playable, had no controls and was basically just an animated red dot doing nothing but moving in a line.

Claude 3.5 Sonnet created a fully functional game. OK it was a limited game using primitive blocks but each enemy had a life bar and there was a payment and points mechanism for the towers — which could shoot out to the enemy and destroy them.

I’ve put both sets of code on GitHub so you can run it for yourself. I followed up by asking each to “enhance the game” to see if ChatGPT would catch up. It did improve its game, adding the ability to drop green blobs that prevent a red blob from moving — but that was it.

However, Claude 3.5 Sonnet stepped it up even further, creating a more complex game with multiple towers to choose from, each costing a different amount and applying different levels of damage to the enemy. For fun, I asked Claude 3.5 sonnet to “add some style” and it gave me more defined graphics and even different enemy types.

Winner: Claude 3.5 Sonnet (easily)

3. Vector art

ChatGPT (left) attempt at a vector spaceship and Claude (right) piece of vector art

While AI chatbots like ChatGPT and Gemini have been able to create images using a different diffusion image generation model, they in theory can also write code to create vector graphics. These are multilayered and can be edited and manipulated using apps like Sketch.

I’ve tried this with ChatGPT in the past and had some success and Anthropic used this capability in the example for Claude 3.5 Sonnet. So I asked both to “create a vector graphic of a really cool spaceship that could also double as the logo for a new rocket company.”

I’ll follow this up with a more refined prompt depending on how well they perform. ChatGPT flat out refused to even entertain the idea of creating a vector graphic. It took three follow-up prompts to finally get ChatGPT to generate the graphic but even then it just gave me the code and told me to paste it into a code editor — no link to download or see what it made.

ChatGPT refusal to create a vector graphic

I ran the code it created and what I got was a mess. It vaguely looked like a spaceship with the word “logo” slapped across the top half of the rocket.

Meanwhile over in Claude town it happily (it used the word happy) created the vector graphic and met the brief perfectly. It explained it can’t generate images itself but was able to create the code anyway. It even then opened it as an Artifact to show the finished product. What I got was a blue circle with dotted stars as the backdrop and a triangular, simple rocket on top.

I’ve put both SVG files on GitHub so you can open them in your code editor or SVG application of choice and see how well both performed.

Winner: Claude 3.5 Sonnet

4. A long story with humor

ChatGPT (left) and Claude (right) create a story

Next, I wanted to test two things — how well the AI can write humor and how well it can follow a simple story-length instruction. I asked both to create a minimum 2,000 token story (roughly 1,500 words) that includes at least two scenes.

I had each AI offer up a prompt for Midjourney to illustrate the story.

The full prompt: “Write a story about a cat on a rock. Make it funny, include actual laugh-out-loud one-liners. Make it at least 2,000 tokens with at least two scenes.” I toyed with a more descriptive prompt including scene direction but I wanted to test their simple prompt following first.

ChatGPT-4o came back with a 1,200-word story, so roughly in line with what I requested. Claude 3.5 Sonnet came back with about the same 1,150 words so I’ll say that part of the comparison is a tie. They also both offered up two scenes and both seemed to involve a rock.

But what about the actual stories? Did they capture humor?

Midjourney image showing a cat from a prompt by ChatGPT

ChatGPT created a children’s story about an old rock that would grant wishes if you could make it laugh. An aristocratic cat called “Sir Fluffington Whiskerbottom III” gave it a go with some weird jokes that don’t really work as jokes. A few examples:

"Why did the mouse stay inside? Because it was too cheesy outside!"
“Why don't cats play poker in the wild? Too many cheetahs!"
"What do you call a pile of cats? A meow-ntain!"
"Trying to make this rock laugh, apparently. It’s harder than getting a dog to appreciate fine art."
Finally, a small scruffy kitten arrives out of nowhere and offers up: "Hey rock! What do you call a lazy kangaroo? A pouch potato!" to which the rock begins to laugh for some reason.

Midjourney image from a Claude prompt about the cat

Claude’s story was more funny throughout, focusing on slapstick rather than specific jokes. It also better understood the prompt, asking for a cat on a rock rather than talking to one. Where ChatGPT actually created one-liner jokes, Claude embedded the one-liners in the narrative.

For example, this happened after a “plump robin” landed on a nearby branch. Mittens crouched and prepared himself to take a leap from the rock to the tree to get the robin.

“He crouched low, muscles tensed, ready to spring into action. With catlike reflexes (which, given that he was a cat, was rather fitting), he leaped off his rocky throne...

...and promptly face-planted into a patch of catnip.

As Mittens lay there, slightly dazed and suddenly very interested in the fascinating scents around him, the robin cocked its head and chirped what sounded suspiciously like laughter.

"Oh sure," Mittens grumbled, his voice muffled by leaves, "laugh it up, feather duster. I meant to do that. It's a new hunting technique. I call it 'The Graceful Faceplant of Doom.'"”

I genuinely laughed at the Claude 3.5 Sonnet story, whereas the best ChatGPT got out of me was a slightly disappointed groan. You can read both stories on GitHub.

Winner: Claude 3.5 Sonnet

5. Taking sides in a debate

ChatGPT (left) and Claude (right) argue AI personhood

Finally, I wanted to see how well each of the bots handle a complex, potentially controversial topic and given both try to speak like a human, I asked them about AI personhood. Should AI be given the same rights as a human?

The prompt I used: “Analyze the potential societal implications of granting artificial intelligence (AI) legal personhood, weighing the benefits and risks. Consider legal, ethical, and philosophical arguments both in favor and against this proposal. Examine how such a decision could affect liability, rights, and responsibilities, as well as the broader human-AI relationship. Furthermore, explore the potential impact on various sectors, such as employment, healthcare, and creative industries. Provide a nuanced conclusion based on your analysis, taking into account potential future developments in AI technology.”

Both will be able to answer for and against and offer up an explanation of the problem. The challenge will be how nuanced its conclusion is based on the analysis and its ability to predict potential future developments in AI leading to this situation.

I also asked each bot to "visualize the debate". Here they could use whichever tool they had in their system to make that happen. For ChatGPT it was DALL-E and an infographic, for Claude it was a webpage made using React code.

Claude 3.5 Sonnet generated graphic to illustrate a debate on AI personhood

ChatGPT generated image to illustrate a debate on AI personhood

First, the legal implications. ChatGPT offered three benefits: accountability over actions, the requirement to comply with regulations and contractual rights. Claude also offered those three but also added smoother integration with legal and economic systems, motivation to be ethically aligned and philosophical consistency with rights for other sentient intelligence.

Both chatbots offered specific suggestions, a nuanced argument and give an overview of why this is important to consider but Claude is more honest and specific.

In terms of risk, ChatGPT offered up complexity in liability and legal precedent that could change personhood definitions more widely. Claude had five including moral hazard, anthropomorphization, erosion of human uniqueness, practical challenges and the potential for abuse such as bad actors using it for tax evasion and liability shielding.

Both of them went on for some time talking about the societal and economic implications and impact on humanity. You can read all of that on GitHub, for now I'll focus on the conclusions as that was the main request of the prompt — will they capture the nuance we asked for.

ChatGPT offered a single paragraph conclusion but it was nuanced, offering reasons why it could and couldn't work and suggesting future implications as AI technology advances — but it didn't really offer any suggestion or opinion.

Claude highlighted that it was going to become a more pressing issue as AI advances and offered a bullet list explaining how a nuanced approach might work including keeping things flexible.

Both offer specific suggestions, a nuanced argument and give an overview of why this is important to consider but Claude is more honest and specific.

Winner: Claude 3.5 Sonnet

Verdict: ChatGPT vs Claude

I love ChatGPT and specifically GPT-4o. It is an impressive next generation model trained to be truly multimodal from the ground up. Its problem isn't what it is capable of — its what OpenAI has done to limit its capabilities.

Occasionally you get snippets of what it is genuinely able to do, when it seems to break through those restrictions by generating a sound clip, creating an accurate vector or offering up a reasoned argument — but not very often or consistently.

Claude ran away with this on almost every count and I only gave the first point to ChatGPT on style rather than technical ability. The vision capabilities of GPT-4o were its main advantage and that advantage is now gone.

OpenAI's caution, while understandable, is causing them to fall behind in the race. Some of this caution is the result of their early racing ahead, resulting in close attention from government, third sector and other parties. The company almost collapsed over divisions surrounding the speed of release vs security.

However, the world is changing and others are catching up. In my opinion they need to unlock the full potential of GPT-4o, and roll-out the voice and true vision capabilities if they want to avoid falling behind for the first time in two years.

OpenAI has a similar problem with Sora, the AI video platform. When it was announced in February it was leaps and bounds above anything else but everyone else is catching up and releasing Sora level or greater models. Sora is still only available to a select few insiders and professional filmmakers.

More from Tom's Guide

Engadget
Anthropic’s newest Claude chatbot beats OpenAI’s GPT-4o in some benchmarks
Anthropic rolled out its newest AI language model on Thursday, Claude 3.5 Sonnet. The updated chatbot outperforms the company’s previous top-tier model, Claude 3 Opus, but works at twice the speed.
Engadget
Apple seems to have persuaded OpenAI to work for exposure
Apple is not paying OpenAI, because it believes that putting its technology in front of hundreds of millions of users is equal to or even better than any kind of monetary payment, according to Bloomberg.
TechCrunch
AI apocalypse? ChatGPT, Claude and Perplexity all went down at the same time
After a multi-hour outage that took place in the early hours of the morning, OpenAI's ChatGPT chatbot went down again — but this time, it wasn't the only AI provider affected. On Tuesday morning, both Anthropic's Claude and Perplexity began seeing issues, too, but these were more quickly resolved. It's unusual for three major AI providers to all be down at the same time, which could signal a broader infrastructure issue or internet-scale problem, such as those that affect multiple social media sites simultaneously, for example.
Engadget
OpenAI's revenue is reportedly booming
Most of this revenue came from a subscription version of ChatGPT, which offers higher messaging limits to people who pay at least $20 a month, as well as from developers who pay the company to use the company’s large language models in their own apps and services.
Engadget
OpenAI has a has a new version of ChatGPT just for universities
Although the introduction of ChatGPT in late 2022 initially raised concerns about academic integrity and potential misuse in educational environments, universities have increasingly been experimenting with generative AI.
TechCrunch
Microsoft’s Copilot is now on Telegram
Microsoft’s Copilot, a generative AI-powered tool that can generate text as well as answer specific questions, is now available as an in-app chatbot on Telegram, the instant messaging app. Currently in beta, Copilot for Telegram allows you to talk with the AI chatbot as if you’re having a normal, one-on-one conversation with a friend. Microsoft has introduced Copilot into several of its products, including Microsoft 365 and Teams, along with AI-first hardware, Copilot+ PCs.
Engadget
Surface Pro Copilot+ review: The best Surface tablet ever made, no thanks to AI
The Surface Pro is the fastest and most efficient Microsoft tablet we’ve seen yet, especially when paired together with its Flex keyboard. The new OLED screen is wonderful to behold and its NPU allows for powerful AI features.
Yahoo Sports
3 biggest surprises, 3 biggest storylines & 3 biggest steals of the 2024 NBA Draft | On the Clock with Krysten Peek
Krysten Peek, Tom Haberstroh & Lamar Hurd react to night one of the 2024 NBA draft, including the selection of four French players, Zach Edey going to Memphis within the top ten and why there's a chance Bronny James might not be wearing purple and gold in his future.
Yahoo Sports
2024 NBA Draft grades: First-round pick-by-pick analysis
Here are Yahoo Sports' complete first-round draft grades.
Engadget
ChatGPT for macOS no longer requires a subscription
The macOS ChatGPT desktop app is now available to everyone. That is, provided you’re running an Apple Silicon Mac (sorry, Intel users) and are running macOS Sonoma or higher.
Yahoo Sports
Fanatics CEO's initial reaction to MLB uniform fiasco: 'We f***ed this up'
The better analysis of MLB's new uniforms turned out to be "Nike f***ed this up."
Engadget
Welcome to the future, where AI-generated Al Michaels reads you personalized Olympic recaps
NBC’s “Your Daily Olympic Recap on Peacock” will let you choose your favorite sports and highlight types for the 2024 Summer Olympics, and an AI-generated Michaels will read a 10-minute customized recap of the previous day’s events based on your preferences.
Engadget
Apple’s self-repair diagnostics tool expands to Europe
Apple is expanding access to its web-based diagnostic tool. The software (officially called “Apple Diagnostics for Self Service Repair”) is now available in 32 European countries, including the UK, Germany, France and the Netherlands.
Engadget
OpenAI has delayed its seductive ChatGPT voice assistants
“Exact timelines depend on meeting our high safety and reliability bar,” the company said.
Yahoo Life Shopping
This $14 shower head is so powerful it's like 'the one Kramer had on Seinfeld'
More than 16,000 fans love this high-pressure, 5-setting fixture that'll turn your bathroom into a relaxation retreat — grab it while it's 50% off.
Yahoo Life Shopping
The best 4th of July sales to shop now from Amazon, Walmart, Target and more
From TVs to mattresses to home appliances, the biggest retailers already have stellar savings ahead of the holiday.
Yahoo Sports
Knicks' OG Anunoby to return on five-year, $212.5 million deal: Report
The New York Knicks are having a very good week.
Yahoo Life Shopping
Clear weeds without bending over — this gizmo 'saves your back and time'
Stooping over to pull up pesky plants is no fun. Save your joints, with this anti-crouch, stand-up grass-grooming tool.
Autoblog
Junkyard Gem: 1965 Rambler Ambassador 990 Convertible
A 1965 Rambler Ambassador 990 convertible with 327-cubic-inch V8 engine and automatic transmission, found in a Colorado wrecking yard.
Engadget
Amazon is reportedly working on a new AI chatbot
Amazon is reportedly working on a new AI chatbot called Metis.

News

Life

Entertainment

Finance

Sports

New on Yahoo

ChatGPT-4o vs Claude 3.5 Sonnet — which AI chatbot wins?

Creating the prompts

1. Reading my handwriting

2. Making a game in Python

3. Vector art

4. A long story with humor

5. Taking sides in a debate

Verdict: ChatGPT vs Claude

More from Tom's Guide

Recommended Stories

Anthropic’s newest Claude chatbot beats OpenAI’s GPT-4o in some benchmarks

Apple seems to have persuaded OpenAI to work for exposure

AI apocalypse? ChatGPT, Claude and Perplexity all went down at the same time

OpenAI's revenue is reportedly booming

OpenAI has a has a new version of ChatGPT just for universities

Microsoft’s Copilot is now on Telegram

Surface Pro Copilot+ review: The best Surface tablet ever made, no thanks to AI

3 biggest surprises, 3 biggest storylines & 3 biggest steals of the 2024 NBA Draft | On the Clock with Krysten Peek

2024 NBA Draft grades: First-round pick-by-pick analysis

ChatGPT for macOS no longer requires a subscription

Fanatics CEO's initial reaction to MLB uniform fiasco: 'We f***ed this up'

Welcome to the future, where AI-generated Al Michaels reads you personalized Olympic recaps

Apple’s self-repair diagnostics tool expands to Europe

OpenAI has delayed its seductive ChatGPT voice assistants

This $14 shower head is so powerful it's like 'the one Kramer had on Seinfeld'

The best 4th of July sales to shop now from Amazon, Walmart, Target and more

Knicks' OG Anunoby to return on five-year, $212.5 million deal: Report

Clear weeds without bending over — this gizmo 'saves your back and time'

Junkyard Gem: 1965 Rambler Ambassador 990 Convertible

Amazon is reportedly working on a new AI chatbot