LVMark: Robust Watermark for latent video diffusion models

MinHyuk Jang1*,  Youngdong Jang1*,  JaeHyeok Lee1,  Kodai Kawamura1,
Feng Yang2,  Sangpil Kim1โœ‰๏ธ


1Korea University 2Google DeepMind
*Equal contribution โœ‰๏ธCorresponding author

Overview

Image 1
Training Pipeline. The training pipeline of our method is illustrated here. Top: We fine-tune the latent decoder to embed binary messages in generated videos and train the watermark decoder to retrieve messages from distorted videos. Bottom-left: We modulate layers of the latent decoder that minimally impact visual quality to embed random messages. Bottom-right: The watermark decoder combines the RGB video with low-frequency subbands from a 3D wavelet transform using cross-attention to decode the binary message.

Abstract

Rapid advancements in generative models have made it possible to create hyper-realistic videos. As their applicability increases, their unauthorized use has raised significant concerns, leading to the growing demand for techniques to protect the ownership of the generative model itself. While existing watermarking methods effectively embed watermarks into image-generative models, they fail to account for temporal information, resulting in poor performance when applied to video-generative models. To address this issue, we introduce a novel watermarking method called LVMark, which embeds watermarks into video diffusion models. A key component of LVMark is a selective weight modulation strategy that efficiently embeds watermark messages into the video diffusion model while preserving the quality of the generated videos. To accurately decode messages in the presence of malicious attacks, we design a watermark decoder that leverages spatio-temporal information in the 3D wavelet domain through a cross-attention module. To the best of our knowledge, our approach is the first to highlight the potential of video-generative model watermarking as a valuable tool for enhancing the effectiveness of ownership protection in video-generative models.




Qualitative Comparison

Comparison between Original and Watermark videos

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark



Comparison between ours and baselines




Quantitative Comparison

Teaser
Quantitative results. We present the quantitative results for Open-Sora and DynamiCrafter with various watermarking methods: HiDDeN, Blind, Stable Signature, WOUAF and LVMark, applied. The evaluation includes image metrics: PSNR, SSIM, LPIPS, video metrics: tLP, FVD, and bit accuracy on 32 and 48-bit messages.
Teaser 2
Robustness comparisons. We present bit accuracy results of watermarking methods under various attacks, including image distortions, video distortions, combined distortions, and model distortions.

BibTeX

@article{jang2024lvmark,
      title={LVMark: Robust Watermark for latent video diffusion models},
      author={Jang, MinHyuk and Jang, Youngdong and Lee, JaeHyeok and Kawamura, Kodai and Yang, Feng and Kim, Sangpil},
      journal={arXiv preprint arXiv:2412.09122},
      year={2024}}