本文是關(guān)于如何使用cuda和Stable-Diffusion生成視頻的完整指南,將使用cuda來加速視頻生成,并且可以使用Kaggle的TESLA GPU來免費執(zhí)行我們的模型。
#install the diffuser package
#pip install --upgrade pip
(相關(guān)資料圖)
!pip install --upgrade diffusers transformers scipy
#load the model from stable-diffusion model card
import torch
from diffusers import StableDiffusionPipeline
from huggingface_hub import notebook_login
模型加載
模型的權(quán)重是是在CreateML OpenRail-M許可下發(fā)布的。這是一個開放的許可證,不要求對生成的輸出有任何權(quán)利,并禁止我們故意生產(chǎn)非法或有害的內(nèi)容。如果你對這個許可有疑問,可以看這里
https://huggingface.co/CompVis/stable-diffusion-v1-4
我們首先要成為huggingface Hub的注冊用戶,并使用訪問令牌才能使代碼工作。我們使用是notebook,所以需要使用notebook_login()來進行登錄的工作
執(zhí)行完代碼下面的單元格將顯示一個登錄界面,需要粘貼訪問令牌。
if not (Path.home()/".huggingface"/"token").exists(): notebook_login()
然后就是加載模型
model_id = "CompVis/stable-diffusion-v1-4"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)
顯示根據(jù)文本生成圖像
%%time
#Provide the Keywords
prompts = [
"a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca ",
"detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",
"symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
"Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",
"portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha ",
"Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
"highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment ",
"black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"
]
顯示
%%time
#show the results
images = pipe(prompts).images
images
#show a single result
images[0]
第一個文本:a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece 的圖像如下
將生成的圖像顯示在一起
#show the results in grid
from PIL import Image
def image_grid(imgs, rows, cols):
w,h = imgs[0].size
grid = Image.new("RGB", size=(cols*w, rows*h))
for i, img in enumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
return grid
grid = image_grid(images, rows=2, cols=4)
grid
#Save the results
grid.save("result_images.png")
如果你的GPU內(nèi)存有限(可用的GPU RAM小于4GB),請確保以float16精度加載StableDiffusionPipeline,而不是如上所述的默認float32精度。這可以通過告訴擴散器期望權(quán)重為float16精度來實現(xiàn):
%%time
import torch
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)
pipe.enable_attention_slicing()
images2 = pipe(prompts)
images2[0]
grid2 = image_grid(images, rows=2, cols=4)
grid2
如果要更換噪聲調(diào)度器,也需要將它傳遞給from_pretrained:
%%time
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
model_id = "CompVis/stable-diffusion-v1-4"
# Use the Euler scheduler here instead
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
images3 = pipe(prompts)
images3[0][0]
#save the final output
grid3.save("results_stable_diffusionv1.4.png")
看看這圖就是更換不同調(diào)度器的結(jié)果
#results are saved in tuple
images3[0][0]
grid3 = image_grid(images3[0], rows=2, cols=4)
grid3
#save the final output
grid3.save("results_stable_diffusionv1.4.png")
創(chuàng)建視頻。
基本的操作已經(jīng)完成了,現(xiàn)在我們來使用Kaggle生成視頻
首先進入notebook設(shè)置:在加速器選擇GPU,
然后安裝所需的軟件包
pip install -U stable_diffusion_videos
from huggingface_hub import notebook_login
notebook_login()
#Making Videos
from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch
#"CompVis/stable-diffusion-v1-4" for 1.4
pipeline = StableDiffusionWalkPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
revision="fp16",
).to("cuda")
#Generate the video Prompts 1
video_path = pipeline.walk(
prompts=["environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",
"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",
"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",
"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000",
"environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000"],
seeds=[42,333,444,555],
num_interpolation_steps=50,
#height=1280, # use multiples of 64 if >512. Multiples of 8 if
#width=720, ? # use multiples of 64 if >512. Multiples of 8 if
output_dir="dreams", ? ? ? ?# Where images/videos will be saved
name="imagine", ? ? ? ?# Subdirectory of output_dir where images/videos will be saved
guidance_scale=8.5, ? ? ? ? # Higher adheres to prompt more, lower lets model take the wheel
num_inference_steps=50, ? ? # Number of diffusion steps per image generated. 50 is good default
)
將圖像擴大到4k,這樣可以生成視頻
from stable_diffusion_videos import RealESRGANModel
model = RealESRGANModel.from_pretrained("nateraw/real-esrgan")
model.upsample_imagefolder("/kaggle/working/dreams/imagine/imagine_000000/", "/kaggle/working/dreams/imagine4K_00")
為視頻添加音樂
為視頻增加音樂可以通過提供音頻文件的將音頻添加到視頻中。
%%capture
! pip install youtube-dl
! youtube-dl -f bestaudio --extract-audio --audio-format mp3 --audio-quality 0 -o "music/thoughts.%(ext)s" https://soundcloud.com/nateraw/thoughts
from IPython.display import Audio
Audio(filename="music/thoughts.mp3")
這里我們使用youtube-dl下載音頻(需要注意該音頻的版權(quán)),然后將音頻加入到視頻中
# Seconds in the song.
audio_offsets = [7, 9]
fps = 8
# Convert seconds to frames
num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]
video_path = pipeline.walk(
prompts=["blueberry spaghetti", "strawberry spaghetti"],
seeds=[42, 1337],
num_interpolation_steps=num_interpolation_steps,
height=512, ? ? ? ? ? ? ? ? ? ? ? ? ? ?# use multiples of 64
width=512, ? ? ? ? ? ? ? ? ? ? ? ? ? ? # use multiples of 64
audio_filepath="music/thoughts.mp3", ? ?# Use your own file
audio_start_sec=audio_offsets[0], ? ? ? # Start second of the provided audio
fps=fps, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # important to set yourself based on the num_interpolation_steps you defined
batch_size=4, ? ? ? ? ? ? ? ? ? ? ? ? ?# increase until you go out of memory.
output_dir="dreams", ? ? ? ? ? ? ? ? # Where images will be saved
name=None, ? ? ? ? ? ? ? ? ? ? ? ? ? ? # Subdir of output dir. will be timestamp by default
)
最后是生成的視頻的展示:
本文代碼你可以在這里找到:
https://www.kaggle.com/code/rupakroy/stable-diffusion-videos/notebook
作者:Bob Rupak Roy
關(guān)鍵詞: 使用Stable-Diffusion生成視頻的完整教程