Audio-Guided Image Manipulation for Artistic Paintings

Seung hyun Lee1Nahyuk Lee1Chanyoung Kim1Wonjeong Ryoo1Jinkyu Kim1Sang Ho Yoon2Sangpil Kim1

Korea University1   KAIST2


We propose a novel audio-guided image manipulation approach for artistic paintings, generating semantically meaningful latent manipulations that give an audio input. To our best knowledge, our work is the first to explore generating semantically meaningful image manipulations from various audio sources. Our proposed approach consists of two main steps. First, we train a set of encoders with a different modality (i.e., audio, text, and image) to produce the matched latent representations. Second, we use direct code optimization to modify a source latent code in response to a user-provided audio input. This methodology enables various manipulations for art paintings conditioned on driving audio inputs, such as wind, fire, explosion, thunderstorm, rain, folk music, and Latin music.


Original Art
Manipulated Art

Fire Sound ->

Latin Music Sound ->

Wind Sound ->


If you use our code or data, please cite:

      title        = {Audio-Guided Image Manipulation for Artistic Paintings},
      author       = {Seung hyun Lee, Nahyuk Lee, Chanyoung Kim, Wonjeong Ryoo, Jinkyu Kim, Sang Ho Yoon, Sangpil Kim},
      booktitle    = {NIPS Workshop on Machine Learning for Creativity and Design},
      year         = {2021}