Deep Video Generation, Prediction and Completion of Human Action Sequences

Video Result Demonstration

This video result demonstration is divided into two parts: Quanlitative Results and Illustration of Our Pipeline

Quanlitative Results

Note: Each following section corresponds to a generation task, namely video generation, video prediction and video completion. Columns named "Real" stands for real data (for your reference). Columns named "Input-n" stands for input frames where n is the frame number used (e.g. “Input-1” means the 1st frame in a video is used as input/constraint). The other columns show the qualitative results of each method. For our method we also show our pose sequence results, denoted as “Ours-Pose”. Each row corresponds to an action class, from top to bottom: Walking, Direction, Greeting, Sitting, Sitting Down.

Video Generation

Real

VGAN

Ours

Ours-Pose

Real

VGAN

Ours

Ours-Pose

Real

VGAN

Ours

Ours-Pose

Real

VGAN

Ours

Ours-Pose

Real

VGAN

Ours

Ours-Pose


Video Prediction

Input-1

Image

Input-3

Image

Input-2

Image

Input-4

Image

PredNet

PoseVAE

MS-GAN

Ours

Ours-Pose


Input-1

Image

Input-3

Image

Input-2

Image

Input-4

Image

PredNet

PoseVAE

MS-GAN

Ours

Ours-Pose


Input-1

Image

Input-3

Image

Input-2

Image

Input-4

Image

PredNet

PoseVAE

MS-GAN

Ours

Ours-Pose


Input-1

Image

Input-3

Image

Input-2

Image

Input-4

Image

PredNet

PoseVAE

MS-GAN

Ours

Ours-Pose


Input-1

Image

Input-3

Image

Input-2

Image

Input-4

Image

PredNet

PoseVAE

MS-GAN

Ours

Ours-Pose


Video Completion

Input-1

Image

Input-50

Image

cond-VGAN

Ours

Ours-Pose

Input-1

Image

Input-50

Image

cond-VGAN

Ours

Ours-Pose

Input-1

Image

Input-50

Image

cond-VGAN

Ours

Ours-Pose

Input-1

Image

Input-50

Image

cond-VGAN

Ours

Ours-Pose

Input-1

Image

Input-50

Image

cond-VGAN

Ours

Ours-Pose


Illustration of Our Pipeline

Video Generation Pipeline

Video Prediction Pipeline

Video Completion Pipeline

$z_0 \sim \mathcal{U}(-1, 1)$

$z \sim \mathcal{N}(0, 1)$

Input-1

Image

Input-2

Image

Input-3

Image

Input-4

Image

Input-1

Image

Input-50

Image

$z_0 + z\: (concatenation)$

stacked hourglass pose estimation
stacked hourglass pose estimation

$z_0 \in \mathbb{R}^{8}, z \in \mathbb{R}^{24}$

Pose-1

Image

Pose-2

Image

Pose-3

Image

Pose-4

Image

Pose-1

Image

Pose-50

Image
Pose Sequence Generation Process
Constrained Pose Sequence Generation Process
Constrained Pose Sequence Generation Process

Skeleton to Image

Skeleton to Image

Skeleton to Image