Home > Media News > New AI model to solve problems in math, reasoning, science of PhD-level

New AI model to solve problems in math, reasoning, science of PhD-level
13 Sep, 2024 / 09:18 AM / OMNES Media LLC

Source: http://www.webdesk.com

244 Views

(Web Desk) - ChatGPT developer OpenAI has launched a new series of AI reasoning models to solve hard problems.

Codenamed Strawberry, the new AI models are officially called OpenAI o1.

Interestingly, OpenAI has trained these models to spend more time thinking through problems before responding, much like a person would.

According to OpenAI, the models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

OpenAI said that the models learn to refine their thinking process, try different strategies, and recognize their mistakes through training.

In the tests, the new models perform similarly to PhD students on challenging benchmark physics, chemistry, and biology tasks. The model also excels in math and coding.

In a qualifying exam for the International Mathematics Olympiad (IMO), currently available GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%.

OpenAI evaluated the coding abilities of new models in contests and reached the 89th percentile in Codeforces competitions.

As an early model, it still needs many features that make ChatGPT useful, like browsing the web for information and uploading files and images.

For many common cases, GPT-4o will soon be more capable. However, this is a significant advancement for complex reasoning tasks and represents a new level of AI capability.

Given this, OpenAI is resetting the counter back to 1 and naming this series OpenAI o1.

As part of developing these new models, OpenAI has also developed a new safety training approach that harnesses their reasoning capabilities to ensure their adhesion to safety and alignment guidelines.

Safety was measured by conducting the model’s user test on jailbreaking. On one of the hardest jailbreaking tests, GPT-4o scored 22 (on a scale of 0 to 100), while the o1-preview model scored 84.

OpenAI said that these enhanced reasoning capabilities are useful for tackling complex problems in science, coding, math, and similar fields.

For example, o1 can be used by healthcare researchers to annotate cell sequencing data, physicists to generate complicated mathematical formulas needed for quantum optics, and developers in all fields to build and execute multi-step workflows.

The o1 series excels at accurately generating and debugging complex code.

To offer developers a more efficient solution, OpenAI has also released OpenAI o1-mini, a faster, cheaper reasoning model that is particularly effective at coding.

As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge. 

 

 

Tags