MEVG: Multi-event Video Generation with
Text-to-Video Models

ECCV 2024

Gyeongrok Oh1, Jaehwan Jeong1, Sieun Kim1, Wonmin Byeon2,
Jinkyu Kim1, Sungwoong Kim1, and Sangpil Kim1
1Korea University       2NVIDIA Research

[Code][Paper]



Abstract

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a pre-trained diffusion-based text-to-video generative model without a fine-tuning process. Specifically, we propose a last frame-aware diffusion process to preserve visual coherence between consecutive videos where each video consists of different events by initializing the latent and simultaneously adjusting noise in the latent to enhance the motion dynamic in a generated video. Furthermore, we find that the iterative update of latent vectors by referring to all the preceding frames maintains the global appearance across the frames in a video clip. To handle dynamic text input for video generation, we utilize a novel prompt generator that transfers course text messages from the user into the multiple optimal prompts for the text-to-video diffusion model. Extensive experiments and user studies show that our proposed method is superior to other video-generative models in terms of temporal coherency of content and semantics.


Results

Video Results Based on LVDM


"Santa Claus goes snowboarding on a snowy mountain."
"Santa Claus rides his sleigh through the snow in the mountain."
"Santa Claus walks through the forest to a frozen lake."
"Santa Claus has fun skating on the ice."


"A golden retriever is having a picnic on a beautiful tropical beach at sunset."
"A golden retriever is running towards a beautiful tropical beach at sunset."
"A golden retriever is sitting next to a bonfire on a beautiful tropical beach at sunset."
"A golden retriever is looking at the starry sky on a beautiful tropical beach."


"A waterfall flows in the mountains under a clear sky."
"A waterfall flows in the fall mountains under a clear sky."
"A waterfall flows in the winter mountains under a clear sky."
"A waterfall frozen on a mountain during a snowstorm."


"The volcano erupts in the clear weather."
"Smoke comes from the crater of the volcano,
which has ended its eruption in the clear weather."
"The weather around the volcano turns cloudy."


"There is a beach where there is no one."
"The waves hit the deserted beach."
"There is a beach that has been swept away by waves."


Video Results Based on VideoCrafter1


"An astronaut in a white uniform is snowboarding in the snowy hill."
"An astronaut in a white uniform is surfing in the sea."
"An astronaut in a white uniform is surfing in the desert."


"A white dog is running in the beautiful meadow."
"A white dog is standing in the beautiful meadow."
"A white dog is yawning loudly in the beautiful meadow."
"A white dog lies on the ground in the beautiful meadow."


Applications

Image and Multi-text-based Video Generation

Prompts

A single white flower gradually blooms from a single green flower bud.
→ The single white flower is blooming.
→ A lovely fully blossomed single white flower.
Input Image

Input Image

Output

Prompts

People walks on the beach at night.
→ There are sand castles on the beach under the fireworks at night.
→ Very few people remain on the beach at night and they gradually fade away.
Input Image

Input Image

Output


Video Generation with Large Language Model (LLM)

Original Scenario

"In the morning, Albert Einstein was walking in the forest, later he read a book under a tree, and as night fell, he walked towards the lake, eventually sitting near it in the forest at night."

Prompts as a result of LLM

Albert Einstein is walking in the forest in the morning.
→ Albert Einstein reads a book under a tree.
→ Albert Einstein walks from the forest towards the lake as night falls.
→ Albert Einstein sits near the lake in the forest at night.

Output

Original Scenario

"A man embarks on a motorcycle journey, runs through a traffic jam on a busy road, rides a motorcycle in the desert, walks in the desert at night, and looks at the sky with aurora in the desert."

Prompts as a result of LLM

A man embarks on a motorcycle journey.
→ A man runs through a traffic jam on a busy road.
→ A man rides a motorcycle in the desert.
→ A man walks in the desert at night.
→ A man looks at the sky with aurora in the desert.

Output


The used codes and license

URLLicense
https://github.com/YingqingHe/LVDM/MIT
https://github.com/AILab-CVC/VideoCrafter(Hugging Face Space) MIT

Bibtex