Microsoft-backed OpenAI’s highly anticipated GPT-4 is finally here with the promise to give more accurate and safer responses. Microsoft confirmed that the new Bing search is running on GPT-4. For now, GPT-4 is available on ChatGPT Plus and as an API for developers.
Multimodal in Nature
As AIM had predicted, GPT-4 stands out from GPT 3.5 due to its multimodal nature. This AI model can receive textual prompts and images, which allows users to specify any type of vision or language-related task, similar to the text-only setting.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Read more: ChatGPT Takes NEET; Will it Pass with Flying Colors or Flunk?
Through this feature, it can generate natural language, code, and other text outputs by processing inputs that consist of both images and text. GPT-4 showcases comparable capabilities in various domains, including documents containing photographs, diagrams, or screenshots, just as it does with text-only inputs. Additionally, it can also be enhanced using test-time techniques designed for text-only language models, such as few-shot or chain-of-thought prompting.
Download our Mobile App
However, it is important to note that this image input capability is still a research prototype and not yet publicly available.
Microsoft’s intention to make GPT-4 multimodal has been very clear. In the recent paper on Visual ChatGPT, the “prompt manager” is described as a tool to share information between foundation models, such as Stable Diffusion, ControlNET, BLIP, and ChatGPT.
Additionally, Microsoft has released a research paper that focuses on Kosmos-1, a multimodal large language model (MLLM) that emphasizes the integration of language, action, and multimodal perception.
According to OpenAI’s blog post, besides being multimodal, GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities, surpassing ChatGPT in its advanced reasoning capabilities.
GPT-4 has been trained on Microsoft Azure AI supercomputers, which offer an AI-optimized infrastructure enabling the delivery of the product globally. The blog post also stated that they have put considerable effort into making GPT-4 safer and more aligned, resulting in an 82% reduction in its likelihood to produce disallowed content and a 40% increase in its ability to provide factual responses when compared to GPT-3.5.
The MMLU benchmark, which comprises 14,000 multiple-choice questions covering 57 subjects, was translated into various languages using Azure Translate disclosing that GPT-4 outperformed GPT-3.5 and other LLMs (such as Chinchilla and PaLM), in 24 out of 26 languages tested, including for low-resource languages such as Latvian, Welsh, and Swahili.
Despite these improvements, OpenAI acknowledges GPT-4’s limitations, including social biases, hallucinations, and adversarial prompts. The company encourages transparency, user education, and AI literacy as society adopts these models while striving to expand the input avenues of those who shape their models.
Greg Brockman, president, and cofounder of OpenAI will be live streaming at 1:30 am IST for a developer demo showcasing GPT-4 and some of its capabilities and limitations.