OpenAI announces GPT-4 model that can take images and text as input

Spread the love

OpenAI has announced the latest version of its GPT language model, which is GPT-4. The main innovation of the new version is that text and images can serve as input. The GPT language model forms the basis for AI chatbots such as ChatGPT and the new Bing.

OpenAI highlighted that GPT-4 accepts images and text to generate texts as output. According to the company, the new model is less capable than humans in many real-world situations, but demonstrates GPT-4 human-level performance across several professional and academic benchmarks.

The predecessor, GPT-3.5, only accepts text as input. In normal, casual conversation, the differences between GPT-3.5 and GPT-4.0 can be subtle. OpenAI argues that the differences only really emerge when the task reaches or exceeds a certain level of complexity. Compared to GPT-3.5, GPT-4 is said to be more reliable, more creative and capable of handling more nuanced instructions.

OpenAI shows some examples of GPT-4’s capabilities where a text question is asked about an attached photo. There are several examples where the model is asked to explain what is funny about the picture.

According to OpenAI, it took six months to fine-tune the performance of the latest version. A year ago, GPT-3.5 was trained as a first test session for the new system. Bugs and theoretical underpinnings have also been improved. Based on that, the GPT-4 test session was “unparalleled stability,” OpenAI says. This new version became the first OpenAI language model whose training performance could be predicted accurately and ahead of time, according to the company.

The text input capability of GPT-4 is released through ChatGPT and the new model’s api, where a waiting list before. To make the capacity for entering images more widely available, OpenAI is currently working with a single partner, namely Be My Eyes. That is a mobile app to make the world more accessible to the blind and visually impaired.

You might also like
Exit mobile version