openai whisper 试用

conda create -n ow python=3.9
conda activate ow
pip install git+https://github.com/openai/whisper.git@ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
certifi==2024.2.2
charset-normalizer==3.3.2
filelock==3.13.1
fsspec==2024.2.0
idna==3.6s
Jinja2==3.1.3
Levenshtein==0.25.0
llvmlite==0.42.0
MarkupSafe==2.1.5
more-itertools==10.2.0
mpmath==1.3.0
networkx==3.2.1
numba==0.59.0
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openai-whisper @ git+https://github.com/openai/whisper.git@ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
python-Levenshtein==0.25.0
rapidfuzz==3.6.1
regex==2023.12.25
requests==2.31.0
sympy==1.12
tiktoken==0.6.0
torch==2.2.1
tqdm==4.66.2
triton==2.2.0
typing_extensions==4.10.0
urllib3==2.2.1

以下是测试脚本.

import whisper
import time
import os
audio_name = "1.wav"

model_type = "base"

model = whisper.load_model(model_type)

print(model.device)
audio = whisper.load_audio(os.path.join(audio_name))
audio = whisper.pad_or_trim(audio)

start_time = time.time()

# make log-Mel spectrogram and move to the same device as the model
# 如果要跑 large 记得要用这个代码设置成 128
# mel = whisper.log_mel_spectrogram(audio, n_mels=128).to(model.device)
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# 检测语言种类
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print("识别结果: ", result.text)

一次性就测试成功了.