Updated on November 19, 2024
ChatGPT has already entered the product manager’s toolkit to design and implement unique user experiences. Its impact is profound – where it has also changed the job description and titles!
“I believe that all product managers will be AI product managers. This is because we see all products needing to have a personalized experience, or ‘A recommender system’ that is actually good.”
— Marily Nika, computer scientist and an AI Product Leader at Meta
From GPT–powered customer support, language translations, and shopping assistants, to feedback – we have covered many such use cases in our 15 ways to use ChatGPT for product engagement.
But as of 2024, we have equally or better competitive language models developed by Google and Anthropic that may perform better in certain use cases.
We make things easy for upgrading your product toolkit by comparing the latest and premium language models as of November 2024.
We call them – The AI Titans, namely the GPT–4 Turbo vs. Claude 3 Opus vs. Google Gemini 1.5 Pro. We have compared them based on:
- AI benchmarks: technical performance parameters like underlying technology, context windows, speed, etc.
- Use cases: how you can incorporate each model into your product toolkit
- Response test: prompt tests for vision, text generation, following instructions, maths problems, and reasoning
- Pricing: comparing paid versions for each
- Upcoming updates: funding raised, technology upgrades, and latest news
First — let’s understand what’s new with the top three LLMs: GPT, Claude, and Gemini LLMs
Here’s how the latest versions of models compare to their predecessors:
What’s new with GPT–4 Turbo?
GPT-4 Turbo is the latest generation model developed by OpenAI. It is the most capable model that can solve complex problems with more accuracy. Since April 2024, you can access GPT–4 Turbo on the ChatGPT Plus plan.
Compared to its previous versions, GPT–4 Turbo promises:
- Updated knowledge cut-off of April 2023
- Cheaper: 3X cost savings for input tokens and 2X cost savings for output tokens
- Larger context windows: GPT–4 Turbo provides 128k tokens compared to 16k tokens for GPT–3.5 Turbo
- Multimodality: you can now input images and text-to-speech to receive a response
Here’s how GPT–4 Turbo compares with GPT–4 and GPT–3.5 to help you gauge the improvement made:
Criteria/Model | GPT–3.5 Turbo | GPT–4 | GPT–4 Turbo |
Knowledge cut-off | January 2022 | April 2023 | April 2023 |
Accessibility | Free | Plus | Plus (since April 2024) |
Input prompt | Text | Text and Images | Text, Images, Text-to-Speech |
Context Window | 16,385 tokens (GPT-3.5 turbo–1125) 4,096 tokens (GPT–3.5 turbo instruct) | There are two versions – 8,192 and 32,000 tokens | 128,000 tokens |
Price | Input: $0.50 / 1M tokens Output: $1.50 / 1M tokens | Input: $30.00 / 1M tokens Output: $60.00 / 1M tokens | Input: $10.00 / 1M tokens Output: $30.00 / 1M tokens |
What’s new with Claude 3 Opus?
Claude 3 is the latest AI chatbot assistant developed by Anthropic. You can choose between three model options – Opus, Sonnet, and Haiku, each having varied use cases and performance levels.
Claude 3 models can handle complex tasks and demonstrate human-like fluency in their responses. Here are their key features:
- Multilingual: all Claude 3 models can provide output in non-English languages (like Japanese or Spanish), thus suitable for translation use cases.
- Multimodal: all Claude 3 models can process and analyze images and extract document data.
- More intelligent: Claude 3 models are superior in intelligence compared to previous Claude 20, Claude 2.1, and Claude 1.2 models.
Each model has a specific use case – here’s a comparison of the Claude 3 family models:
Criteria/Model | Claude 3 Opus | Claude 3 Sonnet | Claude 3 Haiku |
Speed | Similar speeds to Claude 2 and 2.1 | 2x faster than Claude 2 and Claude 2.1 | Fastest model |
Ideal for | Complex tasks | Enterprise workloads that require balance in intelligence and speed | Instant, accurate, and targeted responses |
Context window | 200K | 200K | 200K |
Knowledge cut-off | Aug 2023 | Aug 2023 | Aug 2023 |
Price | Input: $15 / 1M tokens Output: $75 / 1M tokens | Input: $3 / 1M tokens Output: $15 / 1M tokens | Input: $0.25 / 1M tokens Output: $1.25 / 1M tokens |
What’s new with Google Gemini 1.5 Pro?
Gemini is a family of multi-modal LLMs developed by Google DeepMind. They are successors to LaMDA and PaLM 2 and are positioned as a direct competitor to OpenAI’s ChatGPT. Gemini 1.5 Pro is their latest model released in February 2024 which promises long-context understanding while using less computing.
Key Gemini 1.5 Pro features compared to its previous version include:
- More context window: with 1 million tokens, Gemini 1.5 Pro can handle 11 hours of audio, 1 hour of video, and more than 30,000 lines of code!
- In-context learning: better responses from longer prompts without fine-tuning
Here’s how Gemini 1.5 Pro stacks against its predecessors:
Criteria/Model | Gemini 1.0 | Gemini 1.5 Pro |
Speed | Comparatively slower response time | Faster |
Context window | 32,000 tokens | 128,000 tokens (soon 1 million tokens) |
Use case | Cannot handle longer code blocks | Can handle longer code blocks |
Pricing | Input: $0.50 / 1 million tokens Output: $1.50 / 1 million tokens | Input: $7 / 1 million tokens Output: $21 / 1 million tokens |
Benchmarking the AI Titans: a data-driven showdown
Now that we understand the progressive updates of each model, let’s stack their latest versions together to compare their underlying technologies and performance. We will examine their key metrics such as coding, reasoning, text generation, etc so that product managers can understand the strengths and weaknesses of these AI models.
Note that these comparisons are good on paper as it is easy to fine-tune model performance to meet benchmarks.
Here are benchmark ranks sourced from Papers with code:
Benchmark/ Model | GTP–4 | Claude 3 Opus | Gemini 1.5 Pro | Source |
Code generation | #1 | #9 | #20 | HumanEval |
Sentence completion | #4 | NA | NA | HellaSwag |
Common sense reasoning | #1 | Claude 2 at #3 | NA | ARC (Challenge) |
Arithmetic reasoning | #1 | #9 | Gemini Ultra at #10 | GSM8K |
GenAI IQ Tests | 85 (for GPT–4 Turbo) | 101 | 76 | Maximum Truth |
Conclusion:
Gemini 1.5 Pro and GPT–4 Turbo were released a few weeks back, hence comprehensive benchmarking isn’t available yet. As of May 2024, GPT–4 is ahead in the AI benchmarking tests across papers. Before adopting a model, compare its benchmark performance ranks for your use case.
Here’s a video comparing GPT-4 Turbo vs Claude 3 Opus vs Google Gemini 1.5 Pro:
When to use each AI Titan for your product? – a use case comparison of GPT–4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro
More than benchmark performance, you must first fix your use case of adopting the AI model. Here’s what each of the top three LLMs has expertise in:
What is GPT–4 Turbo good for?
Image Source: OpenAI – Introducing GPTs
GPT–4’s superior understanding of language makes it suitable for content creation. You can build custom GPTs tailored to your requirements to write blog posts, copy, social media captions, and more. It is also good in natural conversations – making it ideal for use cases like customer support, negotiations, coaching, etc.
You can explore real custom GPTs to potential use cases — Custom GPT marketplace by Open AI
What is Claude 3 Opus good for?
Anthropic positions Claude 3 Opus for applications that require powerful computing. It can integrate into large enterprise workflows and applications to deliver top-level performance. If you want to save costs and want balanced performance, Claude 3 Sonnet is a better choice.
Typical use cases for the Claude 3 family include data analysis, extracting information from long documents, summarizing responses, contract drafting, and more.
For example, with its 200k context window, Claude 3 Opus saves time in reading, interpreting, summarizing, and creating legal documents.
What is Gemini 1.5 Pro good for?
Gemini 1.5 Pro presents a good use case for multi-modal applications. It combines natural language processing with computer vision and other sensory inputs. If you combine this with other Google products — you can power immersive user experiences, enhance product visualization, and unlock new dimensions of customer engagement.
Here’s what it looks like —
Gemini 1.5 Pro is also good for handling longer code blocks thanks to its 1 million tokens context window.
Here’s a video by Google where the model reasons across a 402-page manuscript:
Decoding model architectures of GPT–4, Claude 3 Opus, and Gemini 1.5 Pro
Understanding the underlying model architectures for top LLM models helps product managers anticipate which AI model aligns best with their product’s specific needs.
GPT–4 Turbo: The Transformer-based powerhouse
GPT–4 Turbo is built upon the foundations of GPT–3, where it retains its transformer-based architecture. The improvements are in its:
- Better response time
- Refined attention mechanisms lead to a more nuanced and contextual understanding of the input text.
- Expanded training data: GPT–4 Turbo is multi-modal, has new safeguards for ethical and desirable behavior, and better knowledge cut-off.
Claude 3 Opus: Anthropic’s versatile approach
Claude 3 family focuses on functional specializations. The Claude 3 Opus, its flagship model, has an architecture that blends elements of transformer-based models and other neural network architectures. It also incorporates Anthropic’s proprietary “Constitutional AI” principles, which aim to instill ethical and safety-conscious behaviors in the model.
Google Gemini 1.5 Pro: The multimodal powerhouse
— Sundar Pichai, CEO of Google for Gemini 1.5 Pro release on The Keyword
Gemini 1.5 Pro is a mid-size multimodal model that adopts the Mixture-of-Experts (MoE) architecture. MoE enables the model’s parameter count to grow without increasing the number of activated parameters per input. This ensures its efficiency in serving while accommodating expansion, thus facilitating longer context understanding.
It is also extensively trained on large-scale, multimodal datasets. It enables the model to develop a deep understanding of the relationships between different types of information.
Comparing brains of the AI Titans – Training and Learning Capabilities of GPT–4, Claude 3 Opus, and Gemini 1.5 Pro
Here’s a summary to guide your understanding of each model’s training and learning capabilities to design your internal training methods for your product’s needs:
Capability/ Model | GPT–4 Turbo | Claude 3 Opus | Gemini 1.5 Pro |
Training process | Leverages unsupervised learning techniques over vast datasets. | Combines Transformer-based models with specialized training. | Rigorous training on multimodal datasets focused on understanding varied data types and modalities. |
Adaptation and improvement to new data over time | Adapts through fine-tuning and transfer learning. | Adapts through continual learning and exposure to varied inputs. | Leverages multimodal architecture to process and reason over different types of information, contexts, and tasks. |
Comparing how user-centric the interfaces of GPT-4 Turbo, Claude 3 Opus, and Google Gemini 1.5 Pro
Let’s compare UI for all AI Titans based on ease of integration, customization options, and user feedback.
GPT–4 Turbo user interface
You can easily access GPT–4 Turbo via ChatGPT Plus’s default interface. It offers Team pricing options with a dedicated workspace where you can build and share GPTs. It also allows you to switch between personal and team accounts for easy storage and segregation of work.
Image Source: Maginative
Gemini 1.5 Pro user interface
Image Source: Beebom
Google’s Gemini 1.5 Pro looks similar to its available Gemini platform that runs on its predecessor Gemini 1.0 model. It has a sleek, modern, and easy-to-use UI that follows Google’s material design principles.
It showcases tokens used when you input your prompt and allows advanced options like setting temperature or multi-modal prompting. You can also switch between Gemini models, add stop sequences, integrate with Google Workspace, and much more.
Claude 3 Opus user interface
Image Source: TechCrunch
Once you purchase the Claude Pro, you get access to its model selector as shown in the screenshot above to choose Opus or any other model. The user interface to input prompt and check the response is similar to the free version of Claude AI – which is clutter-free and simple with options to choose light or dark mode. You can also attach files to enrich your prompt.
Comparing cost and accessibility for GPT-4 Turbo, Claude 3 Opus, and Google Gemini 1.5 Pro
GPT-4 Turbo offers a more affordable option at $10 per million input tokens and $30 per million output tokens. In contrast, Claude 3 Opus comes with a heftier price tag, with $15 per million input tokens and $75 per million output tokens, more than double the cost of GPT-4 Turbo.
Google, however, has taken a different approach with Gemini 1.5 Pro, offering a preview pricing of $7 per million input tokens and $21 per million output tokens. This makes it the most cost-effective option among the three AI titans.
Per million tokens / Model | GPT–4 Turbo | Claude 3 Opus | Gemini 2.5 Pro |
Input | $10.00 | $15.00 | $7.00 |
Output | $30.00 | $75.00 | $21.00 |
When it comes to accessibility – GPT–4 Turbo is more widely accessible via ChatGPT Plus.
Image Source: Twitter
While Claude 3 Opus and Gemini 1.5 Pro are primarily available through their respective API platforms.
We can help you make the most of the top LLM models
Using our previous AI benchmark scores, you can further weigh the tradeoffs and decide how to integrate these AI models into your product roadmap.
Here are some quick takeaways for product managers to help opt for a suitable AI model among the top three LLM models:
- For GPT–4 Turbo: gives an option to build custom GPTs for tasks such as language generation, question answering, and code generation.
- For Claude 3 Opus: excels in a variety of applications, from customer support and task assistance to content generation and creative problem-solving.
- For Gemini 1.5 Pro: the multimodal approach makes it valuable to deliver immersive and contextually-aware experiences to their users.
As a seasoned technologist, Adarsh brings over 14+ years of experience in software development, artificial intelligence, and machine learning to his role. His expertise in building scalable and robust tech solutions has been instrumental in the company’s growth and success.