Mindtech Global, developer of the world’s leading platform for the creation of synthetic data for training AI, has today launched part six of its synthetic data guide—focused on how AI development teams can bridge data gaps in visual AI models that were malfunctioning.
Written by Mindtech’s vice-president of engineering Peter McGuinness, the guide first tackles the issues around the question of computer-generated synthetic images and the level of photorealism obtainable, and the impact on accuracy of trained systems.
It goes on to explain why AI (computers) doesn’t see the world like us (humans) and therefore needs structure of “real-world” images rather than the aesthetic appeal humans seek. McGuinness explains: “It is vital that the training data models all elements of the deployment AI system, including camera artefacts such as lens distortion and mpeg compression artefacts”
McGuinness explains there is another key issue: data gaps in visual AI models that were trained on inadequate, or worse the wrong, images—causing failures in the AI system.
McGuinness said “Usually we find that the gaps in data due to inadequate system modelling, or missing corner case data, is far more significant than any perceived “photorealism gap”.
These failures can be caused by small and subtle inaccuracies in training—such as a safety system failing to recognise workers on an industrial site when they wear hazmat suits because the system had only been trained to recognise ‘ordinary’ work clothes.
However, in this real-world example, the customer was quickly able to close the gap by using Mindtech’s Chameleon platform to synthetically create images of people in correct PPE gear – allowing the model to identify people correctly.
Photo Caption: Computers “see” images as a series of RGB values