ComfyUI文生＆图生视频-Stable-Video-Diffusion部署和使用

2024-2-20

Stable-Video-Diffusion介绍（原文直译）

稳定视频扩散 (SVD) 图像到视频是一种扩散模型，它以静止图像为条件帧，并从中生成视频。 Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it.

模型描述

(SVD) 图像转视频是一个潜在扩散模型,经过训练可以从图像中生成短视频片段。该模型经过训练,可以从576x1024分辨率的上下文帧生成14帧、相同分辨率的视频。我们还精调广泛使用的 f8 解码器,以获得时间一致性。为方便起见,我们在此额外提供配备标准逐帧解码器的模型。

(SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We also finetune the widely used f8-decoder for temporal consistency. For convenience, we additionally provide the model with the standard frame-wise decoder here.

模型来源

用于研究目的,我们推荐在Github上的“生成模型”(generative-models)仓库(https://github.com/Stability-AI/generative-models),其实现了最流行的扩散框架(包括训练和推理)。

For research purposes, we recommend our generative-models Github repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference).

评估

上图评估了SVD图像转视频模型相对于GEN-2和PikaLabs的用户偏好。根据视频质量,人类评估者更青睐SVD图像转视频模型。关于用户研究的细节,请参阅研究论文。

The chart above evaluates user preference for SVD-Image-to-Video over GEN-2 and PikaLabs. SVD-Image-to-Video is preferred by human voters in terms of video quality. For details on the user study, we refer to the research paper

直接用途

该模型仅用于研究目的。可能的研究领域和任务包括:

对生成模型的研究。
对有可能生成有害内容的模型的安全部署。
探查和理解生成模型的局限性和偏见。
艺术品的生成以及在设计和其他艺术过程中的应用。
在教育或创意工具中的应用。

The model is intended for research purposes only. Possible research areas and tasks include

Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.
Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Excluded uses are described below.

超出范围的用途

该模型没有被训练来生成关于人或事件的事实或真实表征,因此使用该模型生成此类内容超出了该模型的能力范围。该模型不应以任何方式违反Stability AI的《可接受使用政策》来使用。

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.

局限性

生成的视频较短(<= 4秒),模型没有达到完美的照片逼真效果。
模型可能会生成没有运动的视频,或非常慢的摄像头平移。
模型无法通过文本进行控制。
模型无法渲染可读的文本。
人脸和人物一般可能无法正确生成。
模型的自动编码部分是有损的。

Limitations

The generated videos are rather short (<= 4sec), and the model does not achieve perfect photorealism.
The model may generate videos without motion, or very slow camera pans.
The model cannot be controlled through text.
The model cannot render legible text.
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.

建议

该模型仅用于研究目的。

The model is intended for research purposes only.

文生视频效果

作者演示视频

笔者出图效果

ComfyUI使用Stable-Video-Diffusion

安装ComfyUI

官网下载地址： https://github.com/comfyanonymous/ComfyUI/releases/download/latest/ComfyUI_windows_portable_nvidia_cu121_or_cpu.7z

百度云盘链接： https://pan.baidu.com/s/1rQ3J2rCh9zsjxUxJ4LDmlA?pwd=n2i7 提取码：n2i7 （2023.12.01更新）

下载完成后解压即可使用。

comfyUI使用与详细介绍请参照此片文章网盘（以更新至最新版本）：

ComfyUI介绍(官方直译)详细部署教程和使用116 赞同 · 31 评论文章

需要使用最新的ComfyUI的版本，已安装的请更新至对于版本。

将comfyui升级至最新版本，使用文件中自带的脚本升级路径为F:\ComfyUI_windows_portable\update

通过Manage工具升级，笔者的版本已经是最新的版本，与之前的版本界面有所区别

下载专用模型

提供了两个模型，两个模型本质没有区别，信息如下： SVD:该模型经过训练,可以在给定576x1024分辨率的上下文帧的基础上,生成14帧相同分辨率的视频。我们使用了SD 2.1中的标准图像编码器,但将解码器替换为时间相关的去闪烁解码器。 SVD-XT:架构与SVD相同,但进行了微调以生成25帧视频。

地址： https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main

地址： https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main

夸克网盘链接：https://pan.quark.cn/s/45e0aae1d7ad 提取码：AzTt