M1MacbookAir にfaster-whisper を入れる

https://microai.jp/blog/6cf7f278-e06a-42e1-953b-67eedd9c1ff8
で
m1mac16Gでの実行方法が掲載されてる

Apple Silicon用に最適化されたML用のライブラリMLXで最速Whisperを試す
も古いけど参考になると思う

M1 Macで音声認識AI Whisper
にもある

やはりGPUは必要らしい

実行時間
M1 Mac (MEM: 16G) で実行した結果（cudaなし=int8）
モデルの読み込み: 4.79秒
文字起こし: 93.72秒
GPUを使用していないとやはり遅いらしい

MLX の Whisperを検討したが
https://qiita.com/syukan3/items/5cdf2735d81d438929a9
によれば16GBのメモリでは厳しいらしい

ということで環境も同じような

文字起こしのライブラリ faster-whisper が超簡単で超早くて超正確！
を参考に行う

mkdir whisper
cd whisper

で作業ディレクトリ移動

pip install faster-whisper

音声ファイル操作のサードパーティ製ライブラリ

pip install pydub
pip install pandas

openpyxl(xcelファイル（.xlsx形式）を読み書きするため

pip install openpyxl

も追加でインストールしておく

brew install ffmpeg

でffmpeg も入れておく

しかし

arch -arm64 brew install ffmpeg

としないと

ffmpeg 7.0.1 is already installed but outdated (so it will be upgraded). Error: Cannot install under Rosetta 2 in ARM default prefix (/opt/homebrew)! To rerun under ARM use: arch -arm64 brew install ... To install under x86_64, install Homebrew into /usr/local.

となる

これは
ARMアーキテクチャ用のHomebrewを使う必要があるため

実行後

Removing: /Users/snowpool/Library/Caches/Homebrew/python-packaging_bottle_manifest--24.0... (1.8KB)
Pruned 0 symbolic links and 2 directories from /opt/homebrew
==> Caveats
==> rust
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions

となるが

このメッセージは、Homebrewがキャッシュをクリアし、
シンボリックリンクやディレクトリの整理を行ったことを示しています。
また、Rustのzsh用の補完機能が
/opt/homebrew/share/zsh/site-functions にインストールされたことを知らせています。

特にエラーは表示されていないため、
ffmpeg のインストールやアップグレードが正常に進行したと考えられます。
次のステップとして、ffmpeg が正しくインストールされているかを確認するために、
以下のコマンドを実行してバージョン情報を確認してみてください

ということで

ffmpeg -version

を実行

ffmpeg version 7.0.2 Copyright (c) 2000-2024 the FFmpeg developers
built with Apple clang version 15.0.0 (clang-1500.3.9.4)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.0.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
libavutil      59.  8.100 / 59.  8.100
libavcodec     61.  3.100 / 61.  3.100
libavformat    61.  1.100 / 61.  1.100
libavdevice    61.  1.100 / 61.  1.100
libavfilter    10.  1.100 / 10.  1.100
libswscale      8.  1.100 /  8.  1.100
libswresample   5.  1.100 /  5.  1.100
libpostproc    58.  1.100 / 58.  1.100

となるので問題ない

pip install torch

これもインストール

from faster_whisper import WhisperModel

# large-v3モデルのロード
model = WhisperModel("large-v3", device="cuda", compute_type="float16")

# 音声ファイルをテキストに変換
segments, info = model.transcribe("path_to_your_audio_file.wav")

# 結果を表示
for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

とすることが多いけど
M1mac なので cuda は使えない

なお
faster-whisper ライブラリを使用する際、
指定したモデル（例えば large-v3）は、初回使用時に自動的にダウンロードされます。
モデルはローカルに保存されるため、次回以降は再ダウンロードする必要はありません
とのこと

touch test.py

でファイル作成

from faster_whisper import WhisperModel

# large-v3モデルのロード（CPUを使用）
model = WhisperModel("large-v3", device="cpu", compute_type="int8")

# 音声ファイルをテキストに変換
segments, info = model.transcribe("path_to_your_audio_file.wav")

# 結果を表示
for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

これを実行

python test.py

モデルダウンロード時間は１０分くらい

Downloading config.json: 100%|█████████████| 2.39k/2.39k [00:00<00:00, 8.90MB/s]
Downloading (…)rocessor_config.json: 100%|█████| 340/340 [00:00<00:00, 1.15MB/s]
Downloading tokenizer.json: 100%|██████████| 2.48M/2.48M [00:01<00:00, 2.08MB/s]
Downloading vocabulary.json: 100%|██████████| 1.07M/1.07M [00:01<00:00, 726kB/s]
Downloading model.bin: 100%|███████████████| 3.09G/3.09G [05:35<00:00, 9.21MB/s]
Traceback (most recent call last):█████████| 3.09G/3.09G [05:35<00:00, 9.24MB/s]
  File "/Users/snowpool/aw10s/whisper/test.py", line 7, in <module>
    segments, info = model.transcribe("path_to_your_audio_file.wav")
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 319, in transcribe
    audio = decode_audio(audio, sampling_rate=sampling_rate)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/faster_whisper/audio.py", line 46, in decode_audio
    with av.open(input_file, mode="r", metadata_errors="ignore") as container:
  File "av/container/core.pyx", line 420, in av.container.core.open
  File "av/container/core.pyx", line 266, in av.container.core.Container.__cinit__
  File "av/container/core.pyx", line 286, in av.container.core.Container.err_check
  File "av/error.pyx", line 326, in av.error.err_check
av.error.FileNotFoundError: [Errno 2] No such file or directory: 'path_to_your_audio_file.wav'

となった

原因は単純でフィルを指定していないため

https://microai.jp/blog/6cf7f278-e06a-42e1-953b-67eedd9c1ff8
を参考に
https://www.youtube.com/shorts/tLxGgAVvLwU
のYouTube 動画をダウンロードしてみる

yt-dlp -x --audio-format mp3 --output "output.mp4" "https://www.youtube.com/shorts/tLxGgAVvLwU"

を実行すると

[youtube] Extracting URL: https://www.youtube.com/shorts/tLxGgAVvLwU [youtube] tLxGgAVvLwU: Downloading webpage [youtube] tLxGgAVvLwU: Downloading ios player API JSON [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] HTTP Error 400: Bad Request. Retrying (1/3)... [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] HTTP Error 400: Bad Request. Retrying (2/3)... [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] HTTP Error 400: Bad Request. Retrying (3/3)... [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] Unable to download API page: HTTP Error 400: Bad Request (caused by <HTTPError 400: Bad Request>); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U [youtube] tLxGgAVvLwU: Downloading player 53afa3ce WARNING: [youtube] tLxGgAVvLwU: nsig extraction failed: You may experience throttling for some formats n = nvo-1xpBtSvwgEdI4 ; player = https://www.youtube.com/s/player/53afa3ce/player_ias.vflset/en_US/base.js WARNING: [youtube] tLxGgAVvLwU: nsig extraction failed: You may experience throttling for some formats n = TE4MbILAq1Ti1ox53 ; player = https://www.youtube.com/s/player/53afa3ce/player_ias.vflset/en_US/base.js [youtube] tLxGgAVvLwU: Downloading m3u8 information [info] tLxGgAVvLwU: Downloading 1 format(s): 140 [download] Destination: output.mp4 [download] 100% of 786.76KiB in 00:00:00 at 3.80MiB/s [FixupM4a] Correcting container of "output.mp4" [ExtractAudio] Destination: output.mp4.mp3

となって output.mp4.mp3 が作成される

これは、yt-dlp が元のMP4ファイルから音声を抽出してMP3形式に変換した結果

音声のみのMP4ファイルを作成するには

yt-dlp -f mp4 --output "output.mp4" "https://www.youtube.com/shorts/tLxGgAVvLwU"

とするらしいが

 [youtube] Extracting URL: https://www.youtube.com/shorts/tLxGgAVvLwU [youtube] tLxGgAVvLwU: Downloading webpage [youtube] tLxGgAVvLwU: Downloading ios player API JSON [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] HTTP Error 400: Bad Request. Retrying (1/3)... [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] HTTP Error 400: Bad Request. Retrying (2/3)... [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] HTTP Error 400: Bad Request. Retrying (3/3)... [youtube] tLxGgAVvLwU: Downloading android player API JSON WARNING: [youtube] YouTube said: ERROR - Precondition check failed. WARNING: [youtube] Unable to download API page: HTTP Error 400: Bad Request (caused by <HTTPError 400: Bad Request>); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U [youtube] tLxGgAVvLwU: Downloading player 53afa3ce WARNING: [youtube] tLxGgAVvLwU: nsig extraction failed: You may experience throttling for some formats n = xIaDKrIXPH5wjB2nw ; player = https://www.youtube.com/s/player/53afa3ce/player_ias.vflset/en_US/base.js WARNING: [youtube] tLxGgAVvLwU: nsig extraction failed: You may experience throttling for some formats n = yF3boOCKXfHqyR5qo ; player = https://www.youtube.com/s/player/53afa3ce/player_ias.vflset/en_US/base.js [youtube] tLxGgAVvLwU: Downloading m3u8 information [info] tLxGgAVvLwU: Downloading 1 format(s): 18 ERROR: unable to download video data: HTTP Error 403: Forbidden

となる

このエラーは、YouTubeがアクセスをブロックしていることが原因で発生しています。
特に、yt-dlp が特定のAPIやプレイヤーの情報にアクセスできなくなっている場合に起こります

なお
–force-generic-extractor オプションを使う
–referer オプションを使う
ではだめだった

とりあえずテストで音声の文字起こしができればいいので
これはとりあえず保留

次に

import torch
from faster_whisper import WhisperModel

target_file = "sample.mp4"
model_size = "large-v3"
compute_type_for_gpu = "float16" # or int8_float16
compute_type_for_cup = "int8"
beam_size = 5

# CUDA が利用可能か確認
if torch.cuda.is_available():
    model = WhisperModel(model_size, device="cuda", compute_type=compute_type_for_gpu)
else:
    # CPU で INT8 を使用
    model = WhisperModel(model_size, device="cpu", compute_type=compute_type_for_cup)


segments, info = model.transcribe(target_file, beam_size=beam_size)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

を実行しようとしたが

M1macなのでソース変更の可能性をGPTで調べた

import torch
from faster_whisper import WhisperModel

target_file = "sample.mp4"
model_size = "large-v3"
compute_type_for_gpu = "float16" # or int8_float16
compute_type_for_cpu = "int8"
beam_size = 5

# MPSが利用可能か確認 (AppleシリコンのGPU)
if torch.backends.mps.is_available():
    model = WhisperModel(model_size, device="mps", compute_type=compute_type_for_gpu)
else:
    # CPUで INT8 を使用
    model = WhisperModel(model_size, device="cpu", compute_type=compute_type_for_cpu)

segments, info = model.transcribe(target_file, beam_size=beam_size)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

というように

AppleシリコンのGPUを活用するために
mps（Metal Performance Shaders）デバイスを使うように変更することができます。

ということで
mpsデバイスを使うにあたり設定やインストールが必要なものは？

. PyTorchのインストール
mpsバックエンドはPyTorch 1.12.0以降でサポートされています。
まず、最新のPyTorchをインストールする必要があります

pip install torch torchvision torchaudio

次に使えるか確認したいので

vim check.py

で

import torch

print(torch.backends.mps.is_available())

として保存

これを

python check.py

として

True

なのでOK

xcode-select --install

を念の為実行

xcode-select: note: Command line tools are already installed. Use "Software Update" in System Settings or the softwareupdate command line interface to install updates

既にインストール済みだった

mv output.mp4.mp3 output.mp3

でファイルをリネーム

次に

touch sample.py

で

import torch
from faster_whisper import WhisperModel

target_file = "output.mp3"
model_size = "large-v3"
compute_type_for_gpu = "float16" # or int8_float16
compute_type_for_cpu = "int8"
beam_size = 5

# MPSが利用可能か確認 (AppleシリコンのGPU)
if torch.backends.mps.is_available():
    model = WhisperModel(model_size, device="mps", compute_type=compute_type_for_gpu)
else:
    # CPUで INT8 を使用
    model = WhisperModel(model_size, device="cpu", compute_type=compute_type_for_cpu)

segments, info = model.transcribe(target_file, beam_size=beam_size)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

として保存

これを実行したら

Traceback (most recent call last):
  File "/Users/snowpool/aw10s/whisper/sample.py", line 12, in <module>
    model = WhisperModel(model_size, device="mps", compute_type=compute_type_for_gpu)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 145, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: unsupported device mps

となった

結局MPSはサポートされていないので
CPUで動かすことに

import torch
from faster_whisper import WhisperModel

target_file = "output.mp3"
model_size = "large-v3"
compute_type_for_gpu = "float16" # or int8_float16
compute_type_for_cup = "int8"
beam_size = 5

# CUDA が利用可能か確認
if torch.cuda.is_available():
    model = WhisperModel(model_size, device="cuda", compute_type=compute_type_for_gpu)
else:
    # CPU で INT8 を使用
    model = WhisperModel(model_size, device="cpu", compute_type=compute_type_for_cup)


segments, info = model.transcribe(target_file, beam_size=beam_size)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

として再度実行する

 python sample.py > output.txt

を実行すると

[0.00s -> 1.78s] 食べないダイエットでリバウンドを繰り返し
[1.78s -> 3.78s] 正しく食べることでやっと痩せられた私の
[3.78s -> 5.48s] 一日の食事を解説していきます
[5.48s -> 7.44s] 朝はバナナとオールブランで糖質
[7.44s -> 8.40s] くるみで脂質
[8.40s -> 9.52s] 無脂肪ヨーグルトと
[9.52s -> 12.00s] 低糖質低脂質のミルクで脂肪質とってます
[12.00s -> 13.94s] 詳しい作り方はYouTubeで紹介してます
[13.94s -> 16.02s] 昼はサラダ、胸からお米です
[16.02s -> 17.92s] 胸からは揚げ物なんだけど
[17.92s -> 19.72s] 脂質が低くておすすめです
[19.72s -> 21.94s] 運動量も身長も平均な方であれば
[21.94s -> 23.72s] 4歩から5歩ぐらいがおすすめです
[23.72s -> 24.68s] 夜はサラダと
[24.68s -> 27.28s] あやのさんがヘルシーダイエットクラブで教えてくれた
[27.28s -> 28.84s] バターチキンカレーです
[28.84s -> 31.40s] カレーもノーオイルで自分で手作りしたりとか
[31.40s -> 33.86s] こういう脂質が低いカレールーを使ったりとか
[33.86s -> 37.04s] 市販のレトルトカレーでも脂質が低いものがあるので
[37.04s -> 39.22s] ダイエット中でもカレーを楽しめます
[39.22s -> 41.08s] 市販のものの裏面を見るコツは
[41.08s -> 42.64s] YouTubeで詳しく解説してます
[42.64s -> 45.68s] 自分の食事の適正量や過不足がわからないよっていう方は
[45.68s -> 46.56s] 数日でいいので
[46.56s -> 48.16s] アスケンなどで可視化してみると
[48.16s -> 49.52s] 気づきがあるかもしれません

というように文字起こしができた

次はマイクから文字起こしできるようにする

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル