使用Stable-Diffusion生成視頻的完整教程

發(fā)布時間:2023-01-24 11:44:29  |  來源:騰訊網(wǎng)  

本文是關(guān)于如何使用cuda和Stable-Diffusion生成視頻的完整指南,將使用cuda來加速視頻生成,并且可以使用Kaggle的TESLA GPU來免費執(zhí)行我們的模型。

#install the diffuser package

#pip install --upgrade pip


(相關(guān)資料圖)

!pip install --upgrade diffusers transformers scipy

#load the model from stable-diffusion model card

import torch

from diffusers import StableDiffusionPipeline

from huggingface_hub import notebook_login

模型加載

模型的權(quán)重是是在CreateML OpenRail-M許可下發(fā)布的。這是一個開放的許可證,不要求對生成的輸出有任何權(quán)利,并禁止我們故意生產(chǎn)非法或有害的內(nèi)容。如果你對這個許可有疑問,可以看這里

https://huggingface.co/CompVis/stable-diffusion-v1-4

我們首先要成為huggingface Hub的注冊用戶,并使用訪問令牌才能使代碼工作。我們使用是notebook,所以需要使用notebook_login()來進行登錄的工作

執(zhí)行完代碼下面的單元格將顯示一個登錄界面,需要粘貼訪問令牌。

if not (Path.home()/".huggingface"/"token").exists(): notebook_login()

然后就是加載模型

model_id = "CompVis/stable-diffusion-v1-4"

device = "cuda"

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe = pipe.to(device)

顯示根據(jù)文本生成圖像

%%time

#Provide the Keywords

prompts = [

"a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca ",

"detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",

"symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",

"Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",

"portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha ",

"Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",

"highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment ",

"black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"

]

顯示

%%time

#show the results

images = pipe(prompts).images

images

#show a single result

images[0]

第一個文本:a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece 的圖像如下

將生成的圖像顯示在一起

#show the results in grid

from PIL import Image

def image_grid(imgs, rows, cols):

w,h = imgs[0].size

grid = Image.new("RGB", size=(cols*w, rows*h))

for i, img in enumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))

return grid

grid = image_grid(images, rows=2, cols=4)

grid

#Save the results

grid.save("result_images.png")

如果你的GPU內(nèi)存有限(可用的GPU RAM小于4GB),請確保以float16精度加載StableDiffusionPipeline,而不是如上所述的默認float32精度。這可以通過告訴擴散器期望權(quán)重為float16精度來實現(xiàn):

%%time

import torch

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe = pipe.to(device)

pipe.enable_attention_slicing()

images2 = pipe(prompts)

images2[0]

grid2 = image_grid(images, rows=2, cols=4)

grid2

如果要更換噪聲調(diào)度器,也需要將它傳遞給from_pretrained:

%%time

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-4"

# Use the Euler scheduler here instead

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")

pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)

pipe = pipe.to("cuda")

images3 = pipe(prompts)

images3[0][0]

#save the final output

grid3.save("results_stable_diffusionv1.4.png")

看看這圖就是更換不同調(diào)度器的結(jié)果

#results are saved in tuple

images3[0][0]

grid3 = image_grid(images3[0], rows=2, cols=4)

grid3

#save the final output

grid3.save("results_stable_diffusionv1.4.png")

創(chuàng)建視頻。

基本的操作已經(jīng)完成了,現(xiàn)在我們來使用Kaggle生成視頻

首先進入notebook設(shè)置:在加速器選擇GPU,

然后安裝所需的軟件包

pip install -U stable_diffusion_videos

from huggingface_hub import notebook_login

notebook_login()

#Making Videos

from stable_diffusion_videos import StableDiffusionWalkPipeline

import torch

#"CompVis/stable-diffusion-v1-4" for 1.4

pipeline = StableDiffusionWalkPipeline.from_pretrained(

"runwayml/stable-diffusion-v1-5",

torch_dtype=torch.float16,

revision="fp16",

).to("cuda")

#Generate the video Prompts 1

video_path = pipeline.walk(

prompts=["environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",

"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",

"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",

"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",

"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000"],

seeds=[42,333,444,555],

num_interpolation_steps=50,

#height=1280, # use multiples of 64 if >512. Multiples of 8 if

#width=720, ? # use multiples of 64 if >512. Multiples of 8 if

output_dir="dreams", ? ? ? ?# Where images/videos will be saved

name="imagine", ? ? ? ?# Subdirectory of output_dir where images/videos will be saved

guidance_scale=8.5, ? ? ? ? # Higher adheres to prompt more, lower lets model take the wheel

num_inference_steps=50, ? ? # Number of diffusion steps per image generated. 50 is good default

)

將圖像擴大到4k,這樣可以生成視頻

from stable_diffusion_videos import RealESRGANModel

model = RealESRGANModel.from_pretrained("nateraw/real-esrgan")

model.upsample_imagefolder("/kaggle/working/dreams/imagine/imagine_000000/", "/kaggle/working/dreams/imagine4K_00")

為視頻添加音樂

為視頻增加音樂可以通過提供音頻文件的將音頻添加到視頻中。

%%capture

! pip install youtube-dl

! youtube-dl -f bestaudio --extract-audio --audio-format mp3 --audio-quality 0 -o "music/thoughts.%(ext)s" https://soundcloud.com/nateraw/thoughts

from IPython.display import Audio

Audio(filename="music/thoughts.mp3")

這里我們使用youtube-dl下載音頻(需要注意該音頻的版權(quán)),然后將音頻加入到視頻中

# Seconds in the song.

audio_offsets = [7, 9]

fps = 8

# Convert seconds to frames

num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]

video_path = pipeline.walk(

prompts=["blueberry spaghetti", "strawberry spaghetti"],

seeds=[42, 1337],

num_interpolation_steps=num_interpolation_steps,

height=512, ? ? ? ? ? ? ? ? ? ? ? ? ? ?# use multiples of 64

width=512, ? ? ? ? ? ? ? ? ? ? ? ? ? ? # use multiples of 64

audio_filepath="music/thoughts.mp3", ? ?# Use your own file

audio_start_sec=audio_offsets[0], ? ? ? # Start second of the provided audio

fps=fps, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # important to set yourself based on the num_interpolation_steps you defined

batch_size=4, ? ? ? ? ? ? ? ? ? ? ? ? ?# increase until you go out of memory.

output_dir="dreams", ? ? ? ? ? ? ? ? # Where images will be saved

name=None, ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Subdir of output dir. will be timestamp by default

)

最后是生成的視頻的展示:

本文代碼你可以在這里找到:

https://www.kaggle.com/code/rupakroy/stable-diffusion-videos/notebook

作者:Bob Rupak Roy

關(guān)鍵詞: 使用Stable-Diffusion生成視頻的完整教程

 

網(wǎng)站介紹  |  版權(quán)說明  |  聯(lián)系我們  |  網(wǎng)站地圖 

星際派備案號:京ICP備2022016840號-16 營業(yè)執(zhí)照公示信息版權(quán)所有 郵箱聯(lián)系:920 891 263@qq.com