Home > Media News > Microsoft has unveiled an AI model that recognizes image content and solves ...

Microsoft has unveiled an AI model that recognizes image content and solves visual problems
3 Mar, 2023 / 08:10 am / Microsoft

Source: http://www.mashable.com

705 Views

Mashable: Multimodal AI could be the key to developing general artificial intelligence, a hypothetical technology that, sadly, could one day replace humans in any intellectual task or job.

Kosmos-1, a multimodal model developed by Microsoft researchers, was unveiled on Monday with the promise of reading the content of images, solving visual puzzles, recognizing text in images, scoring well on visual IQ tests, and comprehending instructions given in natural language.

The development of multimodal AI, which can process data in a variety of formats including text, audio, pictures, and video, is seen as a crucial step towards creating artificial general intelligence (AGI) capable of performing general tasks at human levels.

 
According to a report by ArsTechnica, the researchers argue in their academic work, "Language Is Not All You Need: Matching Perception with Language Models," that "being a basic aspect of intelligence, multimodal perception is a necessity to attain artificial general intelligence, in terms of knowledge acquisition and anchoring to the real world." The model can analyze images and answer questions about them, read text from an image, write captions for images, and score between 22 and 26 percent accurately on a visual IQ test, as demonstrated in visual examples in the Kosmos-1 study.

Although large language models (LLM) have been in the spotlight recently, some researchers in the field of artificial intelligence believe that multimodal AI could be the key to developing general artificial intelligence, a hypothetical technology that could one day replace humans in any intellectual task or job.

OpenAI, a key Microsoft corporate partner in the AI field, has set AGI as its primary objective. In this situation, Kosmos-1 looks to be a Microsoft-only initiative, with no assistance from OpenAI. With its origins in natural language processing like a text-only LLM like ChatGPT, the researchers have dubbed their work a "multimodal large language model" (MLLM).