LVMark: Robust Watermark for latent video diffusion models

MinHyuk Jang^1*, Youngdong Jang^1*, JaeHyeok Lee¹, Kodai Kawamura¹,
Feng Yang², Sangpil Kim^1✉️

¹Korea University ²Google DeepMind

^*Equal contribution ^✉️Corresponding author

Code will be released soon 😊

Overview

Training Pipeline. The training pipeline of our method is illustrated here. Top: We fine-tune the latent decoder to embed binary messages in generated videos and train the watermark decoder to retrieve messages from distorted videos. Bottom-left: We modulate layers of the latent decoder that minimally impact visual quality to embed random messages. Bottom-right: The watermark decoder combines the RGB video with low-frequency subbands from a 3D wavelet transform using cross-attention to decode the binary message.

Abstract

Rapid advancements in generative models have made it possible to create hyper-realistic videos. As their applicability increases, their unauthorized use has raised significant concerns, leading to the growing demand for techniques to protect the ownership of the generative model itself. While existing watermarking methods effectively embed watermarks into image-generative models, they fail to account for temporal information, resulting in poor performance when applied to video-generative models. To address this issue, we introduce a novel watermarking method called LVMark, which embeds watermarks into video diffusion models. A key component of LVMark is a selective weight modulation strategy that efficiently embeds watermark messages into the video diffusion model while preserving the quality of the generated videos. To accurately decode messages in the presence of malicious attacks, we design a watermark decoder that leverages spatio-temporal information in the 3D wavelet domain through a cross-attention module. To the best of our knowledge, our approach is the first to highlight the potential of video-generative model watermarking as a valuable tool for enhancing the effectiveness of ownership protection in video-generative models.

Qualitative Comparison

Comparison between Original and Watermark videos

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Comparison between ours and baselines

Quantitative Comparison

Quantitative results. We present the quantitative results for Open-Sora and DynamiCrafter with various watermarking methods: HiDDeN, Blind, Stable Signature, WOUAF and LVMark, applied. The evaluation includes image metrics: PSNR, SSIM, LPIPS, video metrics: tLP, FVD, and bit accuracy on 32 and 48-bit messages.

@article{jang2024lvmark, title={LVMark: Robust Watermark for latent video diffusion models}, author={Jang, MinHyuk and Jang, Youngdong and Lee, JaeHyeok and Kawamura, Kodai and Yang, Feng and Kim, Sangpil}, journal={arXiv preprint arXiv:2412.09122}, year={2024}}

LVMark: Robust Watermark for latent video diffusion models

Code will be released soon 😊

Overview

Abstract

Qualitative Comparison

Comparison between Original and Watermark videos

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Original

Watermark

Comparison between ours and baselines

Quantitative Comparison

BibTeX