Iterative Text-to-Image conversion using Recurrent Generative Adversarial Tell, Draw, Undo, Repeat


  • Vinay Varma Nadimpalli, Venu Vardhan Reddy Tekula


Conditional text-to-image is an active area of research. We propose a novel Recurrent GAN model architecture that can generate 2-D images from input text in an iterative manner. This is different from the one-step text-to-image generation as the model will be given continuous instructions carrying information on how to modify the most recently generated image. To generate an image in the current time step, the model would take all the previous instructions up to the current time step and will generate the image from the previous time step. We propose a novel method, which is the ability of the model to undo the modifications that it has done to the image on the previous time step. This is very essential in situations when the text instruction given to the model is incorrect (possibly unintentional human errors). The aim of the proposed model is to march towards a complete capacity of interactive text-to-image generation and to enhance the user experience from a Human-Computer Interaction (HCI) perspective.