Stylecodes

Abstract

Diffusion models excel in image generation, but controlling them remains a challenge. We focus on the problem of style-conditioned image generation. Although example images work, they are cumbersome: srefs (style-reference codes) from MidJourney solve this issue by expressing a specific image style in a short numeric code. These have seen widespread adoption throughout social media due to both their ease of sharing and the fact they allow using an image for style control, without having to post the source images themselves.

However, users are not able to generate srefs from their own images, nor is the underlying training procedure public. We propose StyleCodes: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code. Our experiments show that our encoding results in minimal loss in quality compared to traditional image-to-style techniques

Overview

Method

We use a combination of a latent autoencoder for the image embeddings and a control module for the unet.

Examples

Interactive Demonstration

We provide an interactive demonstration of our method using Huggingface Spaces based Gradio demo. You can try out our method with your own input images and see the results in real-time.

https://huggingface.co/spaces/CiaraRowles/stylecodes-sd15-demo

How to use:

Upload an input image.
Use an instruction preset
Enter a prompt in the text box.
Click Generate

BibTeX

@misc{Stylecodes2024,
      title={Stylecodes: Encoding Stylistic Information For Image Generation}, 
      author={Ciara Rowles},
      year={2024},
      eprint={2408.03209},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.03209}, 
}

Stylecodes: Encoding Stylistic Information For Image Generation

By encoding the style of an image to a short base64 code and then using that as a condition, we can allow easy social methods of control for diffusion models