Generative adversarial networks (GANs) are a particular kind of machine learning models that are designed to learn a distribution of data from a representative sample. After training, you can sample from the distribution, generating novel examples that were not present in the original training set. So, for example, you can “show” a GAN a bunch of images of faces and then the GAN will be able to generate new faces. This is exactly what’s behind the curtains in the famous “this person does not exist” website.
GANs have seen a lot of improvement in the last few years since their invention in 2014. In particular, one of the leading research groups in GANs is Tero Karra’s group at NVIDIA, who have managed to stay at the top of the state of the art with a series of ground breaking publications:
- Progressive Growing of GANs for Improved Quality, Stability, and Variation
- A Style-Based Generator Architecture for Generative Adversarial Networks
- Analyzing and Improving the Image Quality of StyleGAN
- Training Generative Adversarial Networks with Limited Data
In their latest work, they describe StyleGAN2-ADA, a new version of StyleGAN that is particularly notable because it keeps the high quality results from previous works, but at the same time, it significantly reduces the amount of required training images. It is also more compute efficient than its predecessors. This improvements make it possible to train high quality output GANs a lot more feasible. Now, instead of having to build or obtain a dataset of tens of thousands of images, it is possible to obtain decent results with just a few thousands of images or even less. Take a look at these results that the NVIDIA team got using transfer learning from FFHQ on a very small dataset with only 1336 images scraped from the MET’s website:
But, what does heavy metal have to do with all of this? Well, the truth is that, besides my passion for machine learning, I have a parallel life in which I am a symphonic power metal guitarist. Metal albums artwork are some of the finest pieces of art out there (or at least that’s what we metal fans think!), and so I thought it would be a great way to test StyleGAN2-ADA with metal albums front covers, and see if we could get an unlimited source of these great and inspiring images.
A heavy metal artwork dataset
I managed to gather a total of 16.176 heavy metal album covers from 5.882 different artists. Here’s a sample:
To build the dataset, I followed these steps:
- I obtained a list of artist and album names from DarkLyrics.com using metalparser, a pretty cool and easy to use python package to get data from Dark Lyrics.
- For each album, I searched for the artwork image using coverpy, another really helpful python package.
- I also wanted to see if I could find a way to get the subgenre for each artist, in order to be able to segmentate the dataset. This way I could get a specific subset only for “power metal” or “black metal” albums, since there tends to be a particular style of artwork for each subgenre. Yeah, we metalheads are geeks. I found this notebook analysis with data scraped from the Encyclopaedia Metallum by Jon Charest, which solved this part for me.
Thanks to these resources, building the dataset ended up being much easier than I expected. It is available as a Kaggle dataset here. It’s a first version and there’s lots of ways to improve it to make it usable for other purposes (an artwork to subgenre classifier maybe?), so expect it to be updated soon.
Training the heavy metal artwork GAN
To train the model, I used Google Colab, a Jupyter notebook environment by Google that provides free GPU compute access, and also integrates very well with Google Drive. In particular I used this training script by Derrick Schultz, which is pretty simple to follow along and takes care of some stuff like the Google Drive integration and the ability to resume training if something fails or Colab disconnects your environment (which actually happens after a certain amount of hours).
Like in many other cases, the script uses transfer learning from a previous trained model, in order to save time and compute. Transfer learning is a technique that aims to reduce training time by using a previously trained version of your model in a somewhat similar dataset. Instead of starting from scratch, you resume training starting from the pretrained model. This is particularly helpful with bigger models that take weeks to train in expensive infrastructure. In this case, I selected the FFHQ faces model, at 512 pixels resolution. It may seem weird to have a faces generator as a starting point, but it actually works.
The first results after just a few minutes of training were pretty scary. A disturbing set of faces somewhat transfigured into album artwork. That may not be that bad if you are looking for some psychedelic horror image for your progressive death metal project:
After eight hours of training, things started looking a bit more reasonable, but still quite abstract:
After two days, it looks better and characters start to appear more clearly, but it will probably take much longer and some experimentation with hyper parameters before getting images that actually make sense:
I created a small Colab notebook to easily run the model and generate new artwork. You can check it out here. There, you will be able to generate album covers using random numbers as “seeds”.
This is an example using 666 as seed 😈:
Obvious next steps for this work include training this model further, to see if it starts generating less abstract art, and also training other models with subsets for specific sub genres. A power metal artwork generator would be awesome!
Also, a great improvement for these models would be to gain the ability either to specify the artist name and album title and have the model generate the image conditioned on it; or even better, to be able to somehow mask the logos and titles in the training dataset, in order to generate only the artwork and then be able to place the logo and title manually.
In this article we went through a small case study of how to create a relatively small dataset and train an image generator with it, using NVIDIA’s StyleGAN2-ADA. Along with many others, this proves NVIDIA’s point that it is actually possible to generate high quality images with their model, using a limited number of training examples. In particular, using transfer learning from previous pretrained models, with freely available resources, and even when the target domain is quite different from the starting model.