The concept of Generative Adversarial Networks was first coined by Goodfellow et al. in their now-famous 2014 paper. The researchers proposed an unusual training setup (Fig. 1) where two networks, generator, and discriminator, are pitched against each other in a competition. The generator has to generate fake images given random noise as input. In contrast, the discriminator has to discern between the fakes and the real images from the target domain we would like the model to learn (e.g., facial images). Over time, both networks progressively improve in their tasks, and one can obtain a trained generator model that replicates images from the target domain pretty well.
What Can Generative Models Do?
With the correct problem definition, GANs are capable of solving different problems while working with images, namely:
Generating novel data samples such as images of non-existent people, animals, objects, etc. Not only images, but other types of media can be generated in this way as well (audio, text).
Image inpainting — restoring missing parts of images.
Image super-resolution — upscaling low-res images to high-res without noticeable upscaling artifacts.
Domain adaptation — making data from one domain resemble the data from the other domain (e.g., making a normal photo look like an oil painting while retaining the originally depicted content).
Denoising — removal of all kinds of noise from the data. For example, removing statistical noise from x-ray images fits the medical needs described in our use cases.
Besides the above-mentioned procedures, GANs are capable of a lot more. Creating data – from images to texts or even melodies – is just the tip of the iceberg. In the future, we might be witnesses to great processes and the emergence of new GAN applications dedicated to the medical field, Augmented Reality, creating training data, etc.
GANs applications can solve different tasks, including:
- Generate examples for Image Datasets
- Image-to-Image Translation
- Text-to-Image Translation
- Semantic-Image-to-Photo Translation
- Face Frontal View Generation
- Generate New Human Poses
- Photos to Emojis
- Photograph Editing
- Face Aging
- Photo Blending
- Super Resolution
- Photo Inpainting
- Clothing Translation
- Video Prediction
- 3D Object Generation
Now is the best moment to implement GANs from their abilities because they can model real data distributions and learn helpful representations for improving the AI pipelines, securing data, finding anomalies, and adapting to specific real-world cases.
GAN Use Cases and Project Ideas
Moving on from theory and academic/non-academic research, let’s now examine where and how GANs are actually being used in business. It seems that while the research on the topic has been very active since 2015-2016, the practical adoption of these models is starting right now, and there are good reasons for this.
GANs already produce photorealistic images, for example, for industrial design elements, interior design, clothing, bags, briefcases, computer games scenes, etc.
Also, GANs have been used to train film or animation personnel. They can recreate a three-dimensional model of an object using fragmentary images and improve photos obtained from astronomical observations.
Healthcare Applications
The possibility of image improvement allows us to implement GANs in medicine for Photo-Realistic Single Image Super-Resolution. Why is this significant?
The high demand for GANs in healthcare is that images should fit particular requirements and be high-quality. High image quality can be difficult to obtain under certain measurement protocols; for example, there is a strong need to decrease the effect of radiation on patients when using low-dose scanning in Computer Tomography (CT, to reduce the harmful effect on people with certain health pre-conditions such as lung cancer) or MRI. It has the effect of complicating efforts to obtain good quality pictures because of the poor quality scans.
Super-resolution improves the captured images and can remove the noise quite well; however, the adoption of GANs in the medical area is quite slow as many experiments and trials have to be made due to safety concerns. When dealing with healthcare, it is mandatory to involve several domain experts to evaluate the models and ensure the denoising does not distort the actual content of the image in some way that could lead to an incorrect diagnosis.
Despite the enormous opportunities, GANs have issues. The biggest one is their instability. GANs are notoriously difficult to train. Sometimes, these networks may generate images with artifacts because the models do not have enough information in the training data to understand how certain things work in real life. For example, given a dataset of portrait images, the network may know how to model human faces but may fail at grasping the idea of what particular elements of clothing must look like. So it is mandatory to carefully choose the data to be relevant to the expected result.
The general overview proves that the advertising and marketing industries have the highest GAN adoption rate. This is reasonable because promoting a certain product or service often requires creating unique but repetitive content, such as capturing images of photo models.
To address this opportunity, Rosebud AI has developed a Generative Photos app that makes heavy use of the recent advances in GANs. The application сreates custom images of fashion models that do not exist. This is achieved by using stock images of real models and replacing the faces with generated ones. The exciting thing here is that you can swap the face for a generated one and customize the generated face in more than one way.
Solutions like this often approach the task with the following steps:
- A face and its boundary box have to be detected in the image, which is a fairly common operation that can be done using existing face detection models.
- The detected face is cropped out of the image (there are different approaches to solving this task.
- A cropped face is projected into a latent space of the GAN model, and a similar face is synthesized by the model (inversion).
- The newly generated face has to be “transplanted” back into the original image. It can be accomplished with the FaceShifter model, or similar models.
Another curious piece of software by Rosebud AI, the Tokkingheads app, can animate any facial photo (synthetic or real) with audio or text serving an input (Fig. 9). The next technology step that GANs are allowed to make is to generate artificial photos and animate them and make them live.
Generated Media Inc – this company applies the StyleGan model to create synthetic facial photos of varying ethnicity, age, and gender. While the generation process is nothing remarkable (the company mentions they use StyleGAN by Karras et al. and Nvidia), the interesting point is how the partner companies use AI-generated photos.
The use cases span a wide range of domains. It turns out synthetic faces are useful in the 3D graphics industry, where the 2D facial image can be converted into 3D models and used as assets for video games or animation.
GANs are gaining ground there and primarily as an entertainment tool when it comes to the mobile app market. Two particularly well-known apps of this kind, FaceApp (oriented at the western market) and ZAO (targets eastern market), provide users with original features — the ability to edit a person’s facial appearance or even swap a face celebrity in a video with their own faces.
FaceApp uses a face editing approach based on StyleGAN or a similar neural network. It can work with photos and videos, suggesting some modifications are made to ensure temporal consistency of the generated frames. If each frame in the video is processed by GAN individually, this will most likely result in “face flinching” artifacts when the processed frames are joined back into a video. Thus an extra effort is required to make sure the frames smoothly transition from one to another.
It is worth noting that the computations required for this process are still too intensive to run the software directly on mobile devices. Therefore, the processing is done in a centralized manner on the company’s servers.
If you want to know more about GANs project ideas from whole-body generation to AI-powered face animation, download the extended PDF version.Download PDF
GANs as a Service
Instead of finding a specific niche application for the models, some companies offer access to GANs and all the infrastructure and interfaces to handle the data, train the models, and obtain the final results.
Runway AI is one such company, positioning itself as a platform for Machine Learning and enabling novel content creation techniques. Generative media features, as the company calls them, are part of a web interface that supports training a GAN model on your own dataset and collecting the results in the form of images or even videos – it can be beneficial for content creators and other interested parties as it helps bing the capabilities of GANs to the masses (working with GANs without graphical UI may prove too inconvenient for most of the non-programmer users).
GANs Technology AI for Dataset Generation in Computer Vision
It is not a secret that any Computer Vision Model (and to that extent any neural network-based model) is hungry for data — the more data you have, the better the model you can potentially create. However, manually annotating data labeling for training is a slow and costly process. Many companies cannot afford it.
A possible solution (at least partial) can be found in the generative model domain — it turns out generative models can work as a tool for the synthesis of new labeled data samples based on a relatively small number of hand-crafted assets. This approach is taken by Israel startup DataGen that reported to have secured 18.5 mln USD of funding in March 2021.
The company focuses its efforts on closing the gap between the real and the simulated data so that the knowledge obtained from the simulated data could be used in real-world scenarios. To achieve that, the company first creates a database of 3D models tailored to a specific application area (e.g., 3D models of faces for face recognition) using 3D scanning or conventional 3D modeling technologies. After that, the initial 3D models (their 3D meshes, textures, and semantic information) are converted into latent space (a compressed representation that reflects all these features). GANs are applied to search and sample from this distribution, effectively creating new assets from the same domain as the original ones.
This approach seems very promising and will undoubtedly be adopted by more companies over time. The synthetic data opens a path to an entirely new range of possibilities in simulating very complex objects and environments while providing much more accurate annotations than the manual ones would ever be.
The simulated data gives us full control over the variance of the data (e.g., for human body models, we can select what races, body shapes, sizes, etc., we would like to have and in which proportions). We may be standing at the dawn of a new era for computer vision-intensive applications, such as robotics, self-driving cars, and virtual reality.
Wrapping Up
The more accurate and advanced GANs become, the more benefit businesses can get from them. The development of generative adversarial networks is easily traced thanks to new GAN apps.