阿里文本生成视频大模型试用

用的是阿里的文本生成视频大模型,地址

运行环境下载

# 用的python 3.9
conda create -n t2vtest python=3.9
conda activate t2vtest
pip install modelscope==1.4.2
pip install open_clip_torch
pip install pytorch-lightning

# modelscope.pipelines.multi_modal.text_to_video_synthesis_pipeline requires the opencv library but it was not found in your environment.
# 解决方法
pip install opencv-python

{% details 按照上面的安装即可, 然后就OK了, 安装后的环境状态如下. %}

addict==2.4.0
aiohttp==3.9.3
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.0
aliyun-python-sdk-kms==2.16.2
async-timeout==4.0.3
attrs==23.2.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
crcmod==1.7
cryptography==42.0.5
datasets==2.8.0
dill==0.3.6
einops==0.7.0
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.2.0
ftfy==6.1.3
gast==0.5.4
huggingface-hub==0.21.4
idna==3.6
importlib_metadata==7.0.2
Jinja2==3.1.3
jmespath==0.10.0
jsonplus==0.8.0
lightning-utilities==0.10.1
MarkupSafe==2.1.5
modelscope==1.4.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.14
networkx==3.2.1
numpy==1.23.5
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
open-clip-torch==2.24.0
opencv-python==4.9.0.80
oss2==2.18.4
packaging==23.pytorch-lightning
pandas==2.2.1
pillow==10.2.0
platformdirs==4.2.0
protobuf==4.25.3
pyarrow==15.0.1
pycparser==2.21
pycryptodome==3.20.0
python-dateutil==2.9.0.post0
pytorch-lightning==2.2.1
pytz==2024.1
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
responses==0.18.0
safetensors==0.4.2
scipy==1.12.0
sentencepiece==0.2.0
simplejson==3.19.2
six==1.16.0
sortedcontainers==2.4.0
sympy==1.12
timm==0.9.16
tomli==2.0.1
torch==2.2.1
torchmetrics==1.3.1
torchvision==0.17.1
tqdm==4.66.2
triton==2.2.0
typing_extensions==4.10.0
tzdata==2024.1
urllib3==2.2.1
wcwidth==0.2.13
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.17.0

{% enddetails %}

这里下载模型到目录 text-to-video-synthesis 下, 然后就可以用下面的代码测试了.

from modelscope.pipelines import pipeline
from modelscope.outputs import OutputKeys

import time

start_time = time.time()

p = pipeline('text-to-video-synthesis', 'text-to-video-synthesis')
test_text = {
        'text': 'A engineer is on a video call with the interviewer.',
    }
output_video_path = p(test_text, output_video='./interviewer.mp4')[OutputKeys.OUTPUT_VIDEO]
print('output_video_path:', output_video_path)

print('Time:', time.time() - start_time)

遇到了一些抽象的问题, 不知道为什么之前A服务器上的可以用, 然后到B服务器上配置的时候就用不了了, 用的是modelscope提供的git下载, 应该是下载的内容坏了, 但是坏的很一致, 不知道为什么, 删了又下, 两次的 md5 值都是一样的. 最后解决方案就是从 A 服务器上把模型下载下来, 然后上传到 B 服务器上, 就OK了…

不过这个模型效果一般, 玩玩, 了解一下还行.