Updated on September 17, 2024

Cover image of OpenAI o1: The Reasoning LLM along with the OpenAI logo and strawberry in the background.

The recent launch of ChatGPT o1 (also called Strawberry) has sent the entire Artificial Intelligence (AI) universe in a new direction. Not only does the new o1 model promise a novel methodology for improving LLM function, but it’s also more functionally effective in answering complicated questions.

OpenAI has launched two models, the new o1 preview, and a mini o1 for their Plus and Team users.

These reasoning faculties provide a competitive advantage to any business that will proactively adapt the new OpenAI o1 model.

So, let’s see how this model compares to other models that are currently on offer, we’ll cover:

  1. What is OpenAI o1?
  2. OpenAI o1 v/s ChatGPT 4 v/s Human Experts
  3. Benefits for the Customer Service Sector
  4. Use-Cases of OpenAI o1
  5. Final Thoughts on OpenAI o1

What is OpenAI o1?

OpenAI o1 is a new series of AI models designed to tackle complex reasoning tasks, such as scientific, mathematical, and coding challenges, by spending more time thinking before responding.

The o1 Mini is a smaller model that is capable of performing reasoning while being computationally efficient.

Currently, these models are priced at $15 per 1 million input tokens and $60 for output tokens.

After the launch of ChatGPT 4 in 2023, there were a slew of launches. However, none of the new LLMs showed huge improvements over ChatGPT 4 regarding answering messages.

Check out the OpenAI team that built the o1 model:

Building OpenAI o1

OpenAI o1 effectively breaks this plateau by breaking the paradigm. 

Let’s understand it in simpler terms by breaking down the problem. 

OpenAI LLMs Work in Three Stages

Illustration titled 'Three Stages for LLM' displaying the three key stages in large language models (LLMs) development: Pre-Training (depicted with an icon of gears and a brain), Instruct Tuning (illustrated with a robot holding tools), and Inference (represented by a question mark and lightbulb). Below the stages, a cute robot with a glowing laptop is featured, against a dark purple background with shining stars and floating geometric shapes.

1. Pre-Training

Pre-Training is a computationally expensive phase where transformers are trained on vast amounts of unlabeled data. However, there are two limitations on the quantity of pre-training you could do in a model:


a. Data Scarcity – A study from July 2024 estimated there will be more human-generated quality data to train new models by 2025.


b. Compute Costs – A State of Art Model (SOTA) called GPT-MoE, trained on 1.8 T tokens, required round-the-clock access to 8000 H100 GPUs for 90 days.

2. Training

This is the time spent learning reinforcement and fine-tuning a model. Usually, this involves humans who work with these models continuously to improve their responses and a machine-learning reward model that optimizes the model over time. OpenAI introduced PPO, which optimizes these models.

3. Inference

This is the time taken to answer a question. Usually, a model will recognize the context of a question and then use that context to formulate an answer.

The Innovation in OpenAI o1?

To date, most of the focus of LLM building has been on the pre-training process. The working hypothesis was that if you could feed high-quality data to an LLM during pre-training, it could reason better. 

OpenAI o1 optimizes the training and inference times instead.

Training

OpenAI strawberry (o1) has been trained on advanced (and proprietary) RL algorithms to maximize accuracy and reasoning capabilities. 

This allows it to outperform human experts in competitive tasks and participate in and win competitions at CodeForces.

Inference

Unlike previous models, OpenAI o1 “thinks” before it answers a question. Practically speaking, it breaks down a complex question into small parts, understands the contexts, and provides well-argued answers. 

This allows it to solve competitive mathematical problems (at the AIME level).

These reasoning capabilities push OpenAI o1 further and enable it to answer complex questions at the same level as human experts on several evaluations. 

Let’s see how this model fares against other models of the same class.

OpenAI o1 v/s ChatGPT 4-o v/s Human Experts

Since the capabilities of OpenAI o1 are fairly pronounced compared to previous foundational models, OpenAI designed new tests to evaluate the Model. 

1. AIME – American Invitational Mathematical Exam is a national exam used to choose the team for the International Math Olympiad. o1 was tested on the questions from the 2024 AIME exam. 

2. Codeforces – Codeforces holds several coding challenges for international candidates throughout the year. The results are used to evaluate people globally.

3. GPQA-Diamond – The Graduate Level Google-Proof Q&A framework provides 448 PhD-level questions that human experts can reliably answer only 74% of the time. Previously, GPT 4 could only answer 56% of these questions.

In evaluating OpenAI o1 (strawberry), ChatGPT 4-o, and Human experts, o1 scores significantly more on each criterion.

Evaluation of OpenAI v/s ChatGPT 4-o v/s Human Expert in the basis of math competiton, code competiton and PhD - Level Science Questions.
OpenAI o1 v/s ChatGPT 4-o v/s Human Expert

OpenAI also shows that this model has higher accuracy and lower hallucination rates than GPT 4-o. The results are as follows:

Dataset Metric GPT-4oGPT-4o-minio1-previewo1-mini
SimpleQA accuracy 0.380.090.420.07
SimpleQA hallucination rate (lower is better)0.610.900.440.60
BirthdayFacts hallucination rate (lower is better)0.450.690.320.24
Open Ended Questionshallucination rate (lower is better)0.821.230.780.93
Peformance of OpenAI in Hallucination Benchmark

As you can see, this new training method and answering questions makes LLMs more accurate and allows them to solve more challenging questions. Let’s see how this progress translates to business advantages with a focus on customer service questions. 

Benefits of OpenAI o1 for Customer Service

We have already talked about how generative AI can improve your customer service function. So, for this evaluation, we will focus on the individual capabilities of the o1 model. The core benefits of using ChatGPT o1 to power your generative AI chatbots are as follows:

1. Answering Complex Questions – Current chatbots are limited in the types of questions they can respond to. However, with the advanced reasoning capabilities of this model, it could answer complicated questions.
For example, the OpenAI demo showcases how the model can diagnose a disease when someone asks about their symptoms. 

2. Making Sense of Multiple Sets of Data – The new model can handle data and different context scenarios better. So, it is better equipped to handle multiple streams of data together. For example, it could predict the difference new tax laws can make in a customer’s investment portfolio. 

3. Navigating Multiple Questions Together – Sometimes, customers ask multiple questions when they send a query.
For example, a customer might ask, “My computer isn’t working. Is it still under warranty, and what’s the repair timeline?” This question will require backend information for warranty and information on the availability of maintenance slots. Previous models would struggle to process such a complex question, but the new OpenAI o1 can answer these questions. 

4. Fewer Hallucinations – The new model makes mistakes, but now, with chain-of-thought answering, it can discover its own mistakes and solve the answers. This means that o1-based chatbots will be more reliable when providing accurate answers to data.

Now that we understand the capabilities and benefits of this model, we should start understanding the use cases for this new model. 

CTA Banner saying build your own AI-powered chatbot for customer service with Kommunicate!

Use-Cases for OpenAI o1 Model

Given the reasoning capabilities in o1, several new use cases can be fulfilled by this new model:

1. Symptom Assessment: Given a list of symptoms, the model can predict some diseases that might be present in a customer. It can then direct the person to the healthcare provider’s concerned department.

Screenshot from OpenAI answering the disease when given a list of symptoms
Source: OpenAI

2. Answering Complex Financial Questions – For complicated financial calculations around adjustable mortgage rates and changes to insurance premiums, o1 can provide better mathematical answers.
It can also answer more complex financial questions about loan interest rates, investment portfolios, and insurance claims. Checkout this basic example!

3. Doing Basic Coding: Check out the video below to see the coding capabilities of OpenAI o1.

Coding capabilities of OpenAI o1

4. Navigating Complex Regulatory Questions – Several states have different laws around finance and healthcare. Given enough data, OpenAI o1 can provide detailed answers to these questions and help customers navigate these difficult documents. 

Remember that this is only a small preview of the final model that OpenAI plans to release. However, with the advanced reasoning capabilities, it is clear that this model will be able to handle more advanced and complicated customer questions, further automating the customer support process. 

Final Thoughts on OpenAI o1

The OpenAI o1 presents a brand new paradigm for how LLMs are used to answer questions. Sam Altman and the team have bypassed the limitations in pre-training data and focused on reinforcement learning and inference to create a model that can reason and answer more complex questions. 

Evaluating this model on international expert-level tests shows that it has remarkable capabilities when it tackles complex questions. 

This capability translates to the business use-case as well. Since customer support questions can often be complicated and require the collation of multiple datasets. 

Surprisingly, this is just a small preview, and as o1’s whole scale is revealed, we will understand further use-cases. We think this will result in huge advantages for any customer support team that adopts the model in its processes.

Build a customer support chatbot with the latest AI Models! Try out Kommunicate!

Write A Comment

Close

Eve from Kommunicate

Experience Your Own Chatbot!

You can now experience creating your very own chatbot! Just enter your URL and get started with just a click.

Create Your Chatbot Now!

You’ve unlocked 30 days for $0
Kommunicate Offer

Upcoming Webinar: Conversational AI in Fintech with Srinivas Reddy, Co-founder & CTO of TaxBuddy.

X