On Thursday, OpenAI announced the release of GPT-4o mini, a new, smaller version of its latest. GPT-4o The AI language model that will change. GPT-3.5 Turbo In ChatGPT, reports CNBC And Bloomberg. It will be available today for free users and with ChatGPT Plus or Team subscriptions, and will come to ChatGPT Enterprise next week.
The GPT-4o mini will reportedly be multimodal like its big brother (which Started in May), also able to use DALL-E 3 to interpret images and text and create images.
OpenAI told Bloomberg that the GPT-4o mini will be the company’s first AI model to use a technique called “instruction hierarchy,” which allows an AI model to prioritize certain instructions over others (such as from the company), which This can be difficult. People to perform Instant injection attacks or Jail break which overrides the built-in fine-tuning or instructions provided by the system prompt.
The value of miniature language models
OpenAI isn’t the first company to release a smaller version of an existing language model. This is a common practice in the AI industry by vendors such as Meta, Google and Anthropic. These small language models are designed to perform simple tasks at low cost, such as creating lists, summarizing, or suggesting vocabulary, rather than deep analysis.
Smaller models are usually aimed at API userswho pay a fixed price per token input and output to use the models in their applications, but in this case, offering GPT-4o mini for free as part of ChatGPT also provides exposure to OpenAI. Money will be saved.
Olivier Godement, Head of API Product at OpenAI, told Bloomberg, “In our mission to enable the bleeding edge, to build extremely powerful, useful applications, we certainly want to continue to push the envelope here with Frontier Models. Want to get the best mini models out there.”
Small large language models (LLMs) typically have fewer parameters than large models. Parameters are numerical stores of value in a neural network that store learned information. Having fewer parameters means LLM has a smaller neural network, which typically limits the depth of an AI model’s ability to understand context. Large-parameter models are typically “deep-thinking” due to the large number of connections between concepts stored in these numerical parameters.
However, to complicate things, there is not always a direct relationship between parameter size and capacity. The quality of the training data, the efficiency of the model architecture, and the training process itself all affect the performance of the model, as we have seen with more efficient miniature models. Microsoft Phi-3 recently.
Fewer parameters mean fewer computations are required to run the model, which means fewer computations are necessary on either less powerful (and less expensive) GPUs or existing hardware, leading to lower energy bills and user It costs less.
CNBC and Bloomberg seem to have possibly broken the embargo and published their stories ahead of OpenAI’s official blog release about the GPT-4o Mini. This is a breaking news story and will be updated as details emerge.