Openi on Friday a new AI “reasoning” model, O3-mini, latest in the company, o Family of the models of reasoning.

Open I The model preview in the first December With a more capable system called O3, but the launch comes at a significant moment for the company, whose ambitions – and the challenges – are apparently growing in the day.

Open AI is fighting the idea that this is the seeding ground in the AI ​​race Chinese companies like depticWhich Openi may have alleged that he had stolen his IP. It’s trying Her relationship with Washington on the shore As it makes a pursuit simultaneously The Mahatkanakshi Data Center ProjectAnd As allegedly this base is For one of the biggest funding rounds in history.

Which brings us O3-mini. Openi is developing his new model as “powerful” and “cheap”.

“Today’s launch marks […] An important move toward expanding access to advanced AI in our mission service, “the open -minded spokesman told Tech Crunch.

More effective reasoning

Unlike most of the big language models, check the reasoning models like O3-Mini before giving results. This helps them Avoid some defects It usually travels to models. It takes a little longer to reach the solution to these argument models, but far from trade is that they are more reliable-though not perfect-in domains such as domains.

O3-mini is fine for STEM problems, especially for programming, mathematics and science. Openi claims that the model is largely equal to the O1 family, O1 and O1-mini in terms of capabilities, but it runs fast and its cost is low.

The company claims that external testers have preferred O3-Mini’s responses at more than half-time O1-mini. O3-mini also made “hard-real-world questions” on 39 % less “big mistakes” A/B Test Vs. O1-Mini, and providing about 24 % faster responses by creating “clear” response.

O3-mini will be available to all users Chat GPT The start of Friday, but those users who pay the Openi Chat GPT Plus and the team’s plans will have a higher rate of 150 questions daily. Chat GPT Pro users will have unlimited access, and O3-mini will come to Chat GPT Enterprise and Chat GPTEDU users in a week. (No word Chat GPT Government Still).

Premium project users can select O3-Mini using the Chat GPT Dropdown menu. Free users can click or tap the new “Reason” button in the Chat Bar, or “regenerate” the response.

On Friday, the Openi will also be available through the API to select the O3-MINI developers, but initially it will not support the analysis of photos. DEVS can choose the level of “reasoning” (less, medium, or more) to the O3-Mini based on the issue of their use and delayed requirements.

The O3-Mini is priced at 5 0.55 per million cathek input token and 4 4.40 per million output token, where one million tokens are equal to about 750,000 words. It is 63 % cheaper than O1-mini, and is competitive with the prices of DPCAC’s R1 argument model. Dupic receives $ 4 0.14 per million kcked input token and R 2.19 per million output token through its API.

In Chat GPT, the O3-mini is based on middle reasoning efforts, which Open says “a balanced trade between speed and accuracy is closed.” Pay users will have the option to choose the “O3-Mini-HIGH” in the model selector, which is called Open in return for a slow response.

Regardless of which version of the O3-Mini Chatgpt users selects, the model will work with the link to find the latest answers with related web sources links. Open AI has warned that functionality is a “prototype” because it works to connect the search into its reasoning models.

Openai wrote in a blog post on Friday, “Although O1 is the reasoning model of our wider general knowledge, O3-mini provides a special alternative to technical domains that requires precision and speed.” “The release of O3-Mini is another step in the mission of the Openi that advances the limits of efficient intelligence at the cost.”

Alerts are too much

O3-mini is not the most powerful model of Openi to date, nor does it jump to Deep Sak’s R1 reasoning model in every benchmark.

The O3-mini Aime defeats R1 on 2024, a test in which the models understand and respond to what kind of complex instructions-but only with high reasoning efforts. It also defeated the R1 (by the .1 Point) by the programming test SWE Bench, but once again, with just a high reasoning effort. On low reasoning efforts, the R1 on the O3-Mini GPQA Diamond is left behind, which examines models with PhD surface physics, biology and chemistry questions.

To be fair, the O3-mini competitively answers many questions on low cost and delay. In the post, Openi compares his performance to the O1 Family:

Openi writes, “With a low reasoning effort, O3-mini achieves comparison with O1-mini, while with the middle effort, the comparison with the O3-Mini O1 achieves the performance.” “O3-mini resembles O1 in mathematics, coding and science while providing medium-sized reasoning efforts. Meanwhile, with high reasoning effort, both O3-mini O1-mini and O1 Improves

It is worth noting that the benefit of O3-mini’s performance is thin in some areas on O1. At Aime 2024, the O3-mini defeats O1 by only 0.3 % points when high reasoning is sought. And on the GPQ Diamond, the O3-Mini does not overtake the O1 score even on high reasoning efforts.

Openi claims that the O3-Mini O1 is more “safe” or safe than the Family, however, thanks to the Red Taming Efforts and its “deliberately alignment” method, which models have open security policy Forcing “thinking” when they are responding. According to the company company, one of the flagship models of the O-3-Meni Openi is “significantly forward”, GPT-4O“Diagnosis of Challenging Safety and Gel Break”.

Tech Crunch is AI -based newsletter! Sign up here Get every Wednesday in your inbox.



Source link