Written by: Zeeshan Hussain

Title:- Achieving Efficiency with a Staggering Decrease in Data Collection and Augmentation Efforts through Stable Diffusion

Introduction

Tired of traditional data augmentation's limitations in your computer vision pursuits? Buckle up as we embark on an exhilarating ride into the universe of Generative AI, which is revolutionizing data annotation and augmentation, streamlining these processes like never before. With a spotlight on demystifying its mechanics, and drawing comparisons with traditional approaches, you'll witness its ability to amplify the performance of AI systems in real-world practical scenarios.

The Importance of Data in Computer Vision

The very essence of computer vision applications depends on the availability of data — large, diverse, and representative datasets that teach machines to understand the patterns in the visual world around us. Data scientists, the builders of these technical innovations, spend over 80% of their time preparing and managing data. Surprisingly, 60% of their time is spent cleaning and organizing data, whereas only 19% is spent acquiring datasets.

The uphill battle of data scarcity has become an urgent issue to address, since the performance on vision tasks increases logarithmically based on the volume of training data size. To overcome data scarcity, data augmentation techniques have been widely used to enhance the size and quality of training datasets, but they have limitations.

Limitations of Traditional Data Augmentation

The traditional data augmentation techniques i.e. Cropping, flipping, rotating,changing the intensity of the RGB channels using PCA color augmentation, etc. have their merits like increasing the dataset size by a magnitude of 2048, but they all share a common limitation:

They are tethered to the existing dataset and can only modify, reposition, or replicate the data points they already possess.
In addition to losing the original information of the images, they are ineffectively managing data shortages, restricting data diversity, and producing overly augmented patterns.

Traditional Augmented Dataset

What if there was a way to break free from these constraints, to generate data points that are entirely new and yet seamlessly blend with the existing data?

Introducing Generative AI and Stable Diffusion

Generative AI comes to the rescue by

Introducing diversity in terms of lighting conditions, scenarios, and automated creativity into the data set
Generating patterns that the model can learn from and thus improving the model's performance in real-world scenarios where variations are unavoidable.

This is particularly valuable for stress testing and refining models to achieve perfect accuracy in challenging scenarios.

What is Stable Diffusion?

A tool that can turn text into stunning images - yes, it's real! Stable Diffusion, is a state-of-the-art series of image generation models released in 2022 by StabilityAI, CompVis, and RunwayML, where latent diffusion models conjure lifelike, diverse aesthetic images from mere text prompts. Whether you want to turn text into images, give images a fresh twist, or just fill in the spots, Stable Diffusion has got your back. Particularly noteworthy is its image-to-image mode, translating input images into diverse domains or styles, guided by textual descriptions.

http://jalammar.github.io/illustrated-stable-diffusion/

Text2Image

Pikachu committing tax fraud, paperwork, exhausted, cute, really cute, cozy, by Steve Hanks, by Lisa Yuskavage, by Serov Valentin, by Tarkovsky, 8 k render, detailed, cute cartoon style

Image2Image

Pikachu eating icecream, yummy, melting, cute, really cute, cozy, by Steve Hanks, by Lisa Yuskavage, by Serov Valentin, by Tarkovsky, 8 k render, detailed, cute cartoon style

Upscaling Image to 8k

Image2Image Mode

This innovative method takes your input images and maintains their core meaning while enhancing the spectrum of training data. This technique leverages the power of GANs to meticulously map input to output domains. The beauty? This approach spans image translation, style tweaking, super-resolution, and more, elevating the caliber of training data for an array of computer vision tasks.

Real-world Use Cases:

Our efforts go beyond mere speculation, as we actively engage with real-world scenarios in domains such as Advanced Driver Assistance Systems (ADAS) and Automotive AI-powered quality inspection systems. These generation techniques play a vital role in stress-testing and refining our models, adding a practical dimension to our endeavors.

Autonomous Driving Assistant Systems

The training of autonomous driving assistant systems heavily relies on data. Generative AI can bridge the gap between limited real-world data and the need for diverse and abundant training data. Specific examples where generative AI can enhance data generation for autonomous driving include different weather conditions, rare events, and edge cases that require perfect accuracy.

For our case, by utilizing stable diffusion and the img2img mode, we have generated the road speed signs data to stress test and refine the model's predictions to achieve improved accuracy

Original Image Generated Images:

Example Data for Stress Testing:

Orginal Images
Generated Noisy Data for Stress Testing

AI-Powered Quality Monitoring Systems:

Quality Inspection systems also face challenges in obtaining sufficient data for training and testing. By utilizing stable diffusion and the img2img mode, inspection systems can generate data that closely resembles real-world scenarios and incorporate the impression of a large amount of data being fed into the generative AI model. This enables the generation of edge cases that require perfect accuracy, allowing the inspection systems to improve their trained model accuracy and refine predictions.

Original Data Generated Data:

Original Data Generated Data

Do It Yourself:

The following parameters play their role in the data generation in an image-to-image stable diffusion pipeline

Prompt, negative_prompt The prompt or prompts to guide the image generation. It holds all the information you want to have in the image. To have more control in your hand while image generation you could also add weighting to your prompt, Compel could be used to add weights.
generator: using seed value to make generation deterministic.
strength Conceptually indicates how much to transform the reference input image. Input image will be used as a starting point, adding more noise to it the larger the strength The number of denoising steps depends on the amount of noise initially added. When strength is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified.
num_inference_steps The number of denoising steps. More denoising steps usually lead to a higher-quality image at the expense of slower inference. This parameter will be modulated by strength.
guidance_scale Higher guidance scale encourages to generate images that are closely linked to the text prompt usually at the expense of lower image quality.

You can try playing with these parameters in the following code snippet and see for yourself.

import torch
	from torch import autocast
	from diffusers import StableDiffusionImg2ImgPipeline
	from PIL import Image
	import matplotlib.pyplot as plt
	import os

	device = "cuda"
	model_path = "CompVis/stable-diffusion-v1-4"

	pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_path, revision="fp16",torch_dtype=torch.float16)
	pipe = pipe.to(device)
	# Load the image using PIL (Python Imaging Library)
	image = Image.open(‘input_image_path’)

	# example of prompts
	prompt = 'similar traffic sign of 90, 4k, hd, high quality '

	# example of negative prompts
	negative_prompt = "different structure, disoriented shape, change shape, change number"
	with autocast("cuda"):
	output = pipe(prompt= prompt, negative_prompt = negative_prompt, image=image, strength=0.4, guidance_scale=25)
	generated_image = output.images[0]
	generated_image.save(‘generated_picture.jpg’)

Conclusion:

Picture it, less data collection and augmentation hustle, solutions for data scarcity woes, and an accuracy boost for those tough scenarios. It even crafts those edge cases that stress-test and refine models. And guess what? If you've got a 6 GB VRAM GPU, you can explore the wonders of generative AI. It's your key to outsmarting data scarcity and unveiling improved precision and performance in computer vision applications. Ready to dive into this efficiency-packed future? The spotlight is on you!