Written by: Zeeshan Hussain
Introduction
Tired of traditional data augmentation's limitations in your computer vision pursuits? Buckle up as we embark on an exhilarating ride into the universe of Generative AI, which is revolutionizing data annotation and augmentation, streamlining these processes like never before. With a spotlight on demystifying its mechanics, and drawing comparisons with traditional approaches, you'll witness its ability to amplify the performance of AI systems in real-world practical scenarios.
The very essence of computer vision applications depends on the availability of data — large, diverse, and representative datasets that teach machines to understand the patterns in the visual world around us. Data scientists, the builders of these technical innovations, spend over 80% of their time preparing and managing data. Surprisingly, 60% of their time is spent cleaning and organizing data, whereas only 19% is spent acquiring datasets.
The uphill battle of data scarcity has become an urgent issue to address, since the performance on vision tasks increases logarithmically based on the volume of training data size. To overcome data scarcity, data augmentation techniques have been widely used to enhance the size and quality of training datasets, but they have limitations.
The traditional data augmentation techniques i.e. Cropping, flipping, rotating,changing the intensity of the RGB channels using PCA color augmentation, etc. have their merits like increasing the dataset size by a magnitude of 2048, but they all share a common limitation:
Traditional Augmented Dataset
What if there was a way to break free from these constraints, to generate data points that are entirely new and yet seamlessly blend with the existing data?
Generative AI comes to the rescue by
This is particularly valuable for stress testing and refining models to achieve perfect accuracy in challenging scenarios.
A tool that can turn text into stunning images - yes, it's real! Stable Diffusion, is a state-of-the-art series of image generation models released in 2022 by StabilityAI, CompVis, and RunwayML, where latent diffusion models conjure lifelike, diverse aesthetic images from mere text prompts. Whether you want to turn text into images, give images a fresh twist, or just fill in the spots, Stable Diffusion has got your back. Particularly noteworthy is its image-to-image mode, translating input images into diverse domains or styles, guided by textual descriptions.
http://jalammar.github.io/illustrated-stable-diffusion/
Text2Image
Pikachu committing tax fraud, paperwork, exhausted, cute, really cute, cozy, by Steve Hanks, by Lisa Yuskavage, by Serov Valentin, by Tarkovsky, 8 k render, detailed, cute cartoon style
Image2Image
Pikachu eating icecream, yummy, melting, cute, really cute, cozy, by Steve Hanks, by Lisa Yuskavage, by Serov Valentin, by Tarkovsky, 8 k render, detailed, cute cartoon style
Upscaling Image to 8k
This innovative method takes your input images and maintains their core meaning while enhancing the spectrum of training data. This technique leverages the power of GANs to meticulously map input to output domains. The beauty? This approach spans image translation, style tweaking, super-resolution, and more, elevating the caliber of training data for an array of computer vision tasks.
Our efforts go beyond mere speculation, as we actively engage with real-world scenarios in domains such as Advanced Driver Assistance Systems (ADAS) and Automotive AI-powered quality inspection systems. These generation techniques play a vital role in stress-testing and refining our models, adding a practical dimension to our endeavors.
The training of autonomous driving assistant systems heavily relies on data. Generative AI can bridge the gap between limited real-world data and the need for diverse and abundant training data. Specific examples where generative AI can enhance data generation for autonomous driving include different weather conditions, rare events, and edge cases that require perfect accuracy.
For our case, by utilizing stable diffusion and the img2img mode, we have generated the road speed signs data to stress test and refine the model's predictions to achieve improved accuracy
Example Data for Stress Testing:
Orginal Images |
|
|
|
Generated Noisy Data for Stress Testing |
|
|
|
Quality Inspection systems also face challenges in obtaining sufficient data for training and testing. By utilizing stable diffusion and the img2img mode, inspection systems can generate data that closely resembles real-world scenarios and incorporate the impression of a large amount of data being fed into the generative AI model. This enables the generation of edge cases that require perfect accuracy, allowing the inspection systems to improve their trained model accuracy and refine predictions.
Original Data Generated Data
The following parameters play their role in the data generation in an image-to-image stable diffusion pipeline
You can try playing with these parameters in the following code snippet and see for yourself.
import torch | |
from torch import autocast | |
from diffusers import StableDiffusionImg2ImgPipeline | |
from PIL import Image | |
import matplotlib.pyplot as plt | |
import os | |
device = "cuda" | |
model_path = "CompVis/stable-diffusion-v1-4" | |
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_path, revision="fp16",torch_dtype=torch.float16) | |
pipe = pipe.to(device) | |
# Load the image using PIL (Python Imaging Library) | |
image = Image.open(‘input_image_path’) | |
# example of prompts | |
prompt = 'similar traffic sign of 90, 4k, hd, high quality ' | |
# example of negative prompts | |
negative_prompt = "different structure, disoriented shape, change shape, change number" | |
with autocast("cuda"): | |
output = pipe(prompt= prompt, negative_prompt = negative_prompt, image=image, strength=0.4, guidance_scale=25) | |
generated_image = output.images[0] | |
generated_image.save(‘generated_picture.jpg’) |
Picture it, less data collection and augmentation hustle, solutions for data scarcity woes, and an accuracy boost for those tough scenarios. It even crafts those edge cases that stress-test and refine models. And guess what? If you've got a 6 GB VRAM GPU, you can explore the wonders of generative AI. It's your key to outsmarting data scarcity and unveiling improved precision and performance in computer vision applications. Ready to dive into this efficiency-packed future? The spotlight is on you!