[D] Can I use my own images for VQGAN CLIP generation?

something you may consider is using projected_gan, similar to stylegan, but it converges much faster.

Colab

https://colab.research.google.com/gist/xl-sr/757757ff8709ad1721c6d9462efdc347/projected_gan.ipynb

---

That's a colab for the project- essentially, you'll run everything but the part where it's asking you to download a dataset and use it- instead you'll put all of your images into one folder (preferably on your google drive, but you can upload a folder to the colab if that works better for you)

To connect your google drive you can add a code block with:

from google.colab import drive

drive._mount('/content/drive')

and then on the line that says

python dataset_tool.py --source=./data/art-painting --dest=./data/art_painting256.zip --resolution=256x256

you'll change the --source flag to point to your image folder, which for me is usually something like /content/drive/MyDrive/folderFullOfImages

Continue running each block, and then you'll get to the training step- this can take a while (usually a handful of hours-- you'll eventually want to manually stop training, I'll usually do it once I hit 150kimages), you'll want to save the .pkl file, then you can use these scripts (inside the project/repository) to either generate images or videos that smoothly interpolate traveling through the latent space in the model

Image Generation

!python gen_images.py --outdir=/content/out --trunc=1.0 --seeds=10-15 --network=/content/projected_gan/00000-fastgan-01-512-gpus1-batch64-/network-snapshot.pkl

Video Generation

!python gen_video.py --output=/content/drive/MyDrive/ProjectedGAN/OUTPUTS/b_001.mp4 --trunc=1.0 --seeds=0-62 --grid=3x3 --network=/content/projected_gan/training-runs/00000-fastgan-01-512-gpus1-batch64-/network-snapshot.pkl

If that all sounds like a bunch of jargon and doesn't make any sense, https://www.youtube.com/watch?v=04Xs3BXzFaw&t=7s <- this video is an excellent explainer! Your results won't be as abstract as what you'll see from vqgan, as these are looking at a smaller set of full images-- (instead of millions of (I think) class-specific polygon cuts w/ vqgan+clip) way fewer than you'd see with that type of generative model, but the results are still pretty cool!

Here's one I made earlier today with this method:

https://twitter.com/Shellworld1/status/1475154796181176324

Here's another I made combining projected_gan with vqgan+clip(within pytti5), but it's shirts :D

https://twitter.com/Shellworld1/status/1473684008441683969

I save all of my .pkl files to my google drive, so if I ever want to generate something again, all I need to do is open that notebook, install the dependencies, and then run those generate_video/image scripts.

I will also offer- ML reddit is not necessarily the best place to ask questions like this as most of the posts here are pointed at academic research-- there's /r/deepdream/ which covers a lot of this type of material but even better- there's an active community on twitter that are generally very helpful if you've got questions- everyone seems to want to help everyone else out- it's very nice! hope this helps :)

/r/MachineLearning Thread