// the find

lucidrains/deep-daze

★ 4,319 · Python · MIT · updated Mar 2022

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

Deep Daze is a 2021-era text-to-image tool that optimizes a SIREN implicit neural network guided by CLIP embeddings. You give it a text prompt, it trains a tiny network for several minutes and renders a single image. It was a meaningful research artifact when it shipped, but the field moved on fast.

The SIREN-as-renderer approach is genuinely interesting — instead of a GAN or diffusion model, the image is the weights of a neural net, which means infinite resolution in principle. The `create_story` mode that slides a window across long text is a clever workaround for CLIP's 77-token limit. The CLI is dead simple: `imagine "a house in the forest"` and you're running. The Colab notebooks lower the GPU barrier for people without local hardware.

Dead project — last commit March 2022, and it shows. Output quality is dream-like in the bad sense: blurry, abstract blobs that gesture at the prompt rather than depict it. Requires 4–16 GB VRAM and takes many minutes per image, which Stable Diffusion XL now does in seconds on the same hardware with far better results. The bundled CLIP weights are the original ViT-B/32, not the stronger successors. No one should adopt this for anything other than historical curiosity or studying how CLIP guidance worked before diffusion models dominated.

View on GitHub →