Toshiba Corporation claims to have developed the world’s most accurate and highly versatile Visual Question Answering (VQA) AI that can recognise not only people and objects but also colours, shapes, appearances and background details in images.
The AI overcomes the difficulty of answering questions on the positioning and appearance of people and objects and possesses the ability to learn the information required to handle a wide range of questions and answers.
Toshiba presented the technology at ICANN2021, the international conference for neural networks, on 14 September.

Image Credits: Toshiba
When experimented on a public dataset comprising of a large volume of images and data text, the VQA AI correctly answered 66.25% of questions without any pre-learning and 74.57% with pre-learning. For example, the AI can find a worker standing in a designated place by asking questions like, “Is the person on a black mat?” which requires recognition of the individual, position, shape and colour.
Applying this to safety monitoring systems at production sites can help improve safety and reduce workloads on onsite supervisors. It can also be used to identify specific scenes in broadcast content and surveillance video footage.

Image Credits: Toshiba
The global AI market, including software, hardware, and services, is forecast to grow 16.4% year over year in 2021 to $327.5 billion and is expected to reach $554.3 billion by 2024. Toshiba’s new AI meets the need for flexibility with the world’s highest accuracy in answering questions and is also able to change or add questions quickly. Its ability to recognise not only people and objects but also image backgrounds, plus the extensive database at its disposal, ensure that it can process the features of images and pre-learned questions quickly to derive the correct answer.
After learning a large set of images, questions and answers that cover the presence of people and objects, and information such as their location and status, the AI can provide an appropriate answer to a question from approximately 3,000 answer patterns..