Browser-use. deepseekR1 + Ollama (エラーとなるため失敗）

browser-useをGeminiで動かすまで（langchain-google-genai 2.0.8がPyPIにリリースされるまでは暫定的にリポジトリからインストール）
が気になったけど
これはgemini を動かす時に必要なリポジトリについて
なので
Deepseek には関係ない

DeepSeek R1をMac Book Airで動かしてみた：R1の6つの「蒸留」モデルの使い方
では
実際に、「DeepSeek-R1-Distill-Llama-8B」を筆者のMac Book Airで動かしてみたところ、普通に使えるスピードで動作したので、夢が広がる。
ちなみに、iPhone / Android でもDeepSeek R1を動かすことができる

とのこと

DeepSeek-R1ローカル環境とWebUIのお手軽実装 & モデル選択のススメ
とかもあるけど
スペックが
* Windows11 WSL2（2.3.26.0）
* Ubuntu（22.04.3 LTS）
* Docker Desktop（27.3.1）
* PCスペック：RTX 4070 VRAM 12GB / RAM 32GB

ログインを伴う作業を行いたいため deepseekR1 をローカルで動かすため
M1 MacBook Air 16GBで動作可能なモデルを調べてみる

GPTによれば

蒸留モデルだと
M1 MacBook Air (16GB) で動作可能な DeepSeek R1 の蒸留モデルをまとめます。
推奨モデル
1. DeepSeek R1 1.5B 蒸留モデル
* モデル名: deepseek-r1:1.5b-qwen-distill-q4_K_M
* パラメータ数: 1.5B（15億）
* サイズ: 約1.1 GB
* 推奨メモリ: 8GB 以上
* 特徴: 最小の蒸留モデルで、軽量かつ高速に動作します。基本的なタスクや軽量な処理に適しています。
2. DeepSeek R1 7B 蒸留モデル
* モデル名: deepseek-r1:7b-qwen-distill-q4_K_M
* パラメータ数: 7B（70億）
* サイズ: 約4.7 GB
* 推奨メモリ: 16GB
* 特徴: バランスの取れた性能を持ち、一般的なNLPタスクに適しています。M1 MacBook Air でも快適に動作します。
3. DeepSeek R1 8B 蒸留モデル
* モデル名: deepseek-r1:8b-llama-distill-q4_K_M
* パラメータ数: 8B（80億）
* サイズ: 約4.9 GB
* 推奨メモリ: 16GB
* 特徴: やや大きめのモデルですが、M1 MacBook Air でも動作可能です。より高度なタスクや複雑な処理に適しています。

とりあえず browser-use の動作ができれば目的は達成できそうなので
* deepseek-r1:8b-llama-distill-q4_K_M
で実験

deepseek-r1:8b-llama-distill-q4_K_M
で検索

https://www.youtube.com/watch?v=oUBeJkKwBcc の動画を日本語に要約。 セットアップ手順を示して

とGPTで調べたが
実行方法が出たのは蒸留モデルではない方だった
とは言ってもこれで要約して方法を動画から抽出できるのは便利だと思う

【蒸留モデル】DeepSeek-R1ローカル実行時におすすめのモデル
を参考に行う

https://ollama.com/library/deepseek-r1/tags
で
利用可能なすべてのモデルタグは公式タグ一覧で確認可能

8b-llama-distill-q4_K_M
で検索すると出るので
これを使うことにする

インストールコマンドは

ollama pull deepseek-r1:8b-llama-distill-q4_K_M

これでインストールできるので
Deepseek で browser-use を試す

import os

from langchain_community.llms import Ollama  # Ollamaを使用
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.controller.service import Controller

import asyncio

controller = Controller()
agent = None

@controller.registry.action('スクリーンショットを保存する')
async def save_screenshot(filename: str = "screenshot.png") -> str:
    page = await agent.browser_context.get_current_page()
    await page.screenshot(path=filename)
    return f"スクリーンショットを {filename} として保存しました"
    
async def main():
    global agent
    llm = Ollama(model="deepseek-r1:8b-llama-distill-q4_K_M")  # Ollamaのローカルモデルを指定
    agent = Agent(
        task="""
        1. https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc にアクセス
        2. 「フルスクリーン」ボタンをクリック後、数秒待機
        3. スクリーンショットを step-{n}.png として保存
        """,
        llm=llm,
        controller=controller,
        browser=Browser(config=BrowserConfig(
            disable_security=True, 
            headless=False,
        )),
    )
    result = await agent.run()
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

として実行

しかし

Traceback (most recent call last): File "/Users/snowpool/aw10s/deepseekR1/advertisement.py", line 3, in <module> from langchain_community.llms import Ollama # Ollamaを使用 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'langchain_community'

となるので

pip install langchain-community
pip install -U langchain-ollama

の後に
from langchain_ollama import OllamaLLM
llm = OllamaLLM(model="deepseek-r1:8b-llama-distill-q4_K_M")

としたけど

INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://github.com/browser-use/browser-use for more information.
INFO     [agent] 🚀 Starting task: 
        1. https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc にアクセス
        2. 「フルスクリーン」ボタンをクリック後、数秒待機
        3. スクリーンショットを step-{n}.png として保存
        
INFO     [agent] 
📍 Step 1
ERROR    [agent] ❌ Result failed 1/3 times:
 
INFO     [agent] 
📍 Step 1
ERROR    [agent] ❌ Result failed 2/3 times:
 
INFO     [agent] 
📍 Step 1
ERROR    [agent] ❌ Result failed 3/3 times:
 
ERROR    [agent] ❌ Stopping due to 3 consecutive failures
INFO     [agent] Created GIF at agent_history.gif
AgentHistoryList(all_results=[ActionResult(is_done=False, extracted_content=None, error='', include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='', include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='', include_in_memory=True)], all_model_outputs=[])

このため ollamaの動作確認

ollama run deepseek-r1:8b-llama-distill-q4_K_M --prompt "こんにちは。あなたは何ができますか？"

Error: unknown flag: --prompt

となる

–prompt フラグはサポートされていないため、プロンプトは位置引数として渡すので

ollama run deepseek-r1:8b-llama-distill-q4_K_M "こんにちは。あなたは何ができますか？"

とした

<think>
Alright, the user greeted me with "こんにちは。" which is Japanese for 
"Hello." I should respond in a friendly manner.

I need to let them know what I can do. Since I'm an AI assistant, I should 
mention various tasks like answering questions, providing information, and 
helping with problems.

It's important to keep it simple and conversational, avoiding any 
technical jargon.

I'll make sure my response is welcoming and open-ended to encourage them 
to ask more.
</think>

こんにちは！私は人工知能助手です。何ができますか？回答、情報検索、問題解決な
ど、さまざまなタスクを担当します。どのようなお手伝いが必要ですか？

と問題ない

しかしエラーになるので
クックパッドの検索を試す

import os

from langchain_ollama import OllamaLLM
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.controller.service import Controller

import asyncio

controller = Controller()
agent = None

@controller.registry.action('スクリーンショットを保存する')
async def save_screenshot(filename: str = "screenshot.png") -> str:
    page = await agent.browser_context.get_current_page()
    await page.screenshot(path=filename)
    return f"スクリーンショットを {filename} として保存しました"
    
async def main():
    global agent
    llm = OllamaLLM(model="deepseek-r1:8b-llama-distill-q4_K_M")
    agent = Agent(
        task="""
        1. https://cookpad.com/jp にアクセスし、ページが完全に読み込まれるまで待機する。
        2. ページ上部にある検索ボックスに「ぶり大根」と入力し、検索ボタンをクリックする。
        3. 検索結果が表示されたら、一番最初のレシピをクリックする。
        4. 各ステップが完了するたびに、スクリーンショットを step-{n}.png として保存する。
        """,
        llm=llm,
        controller=controller,
        browser=Browser(config=BrowserConfig(
            disable_security=True, 
            headless=False,
        )),
    )
    result = await agent.run()
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

しかし

INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [root] Anonymized telemetry enabled. See https://github.com/browser-use/browser-use for more information.
INFO     [agent] 🚀 Starting task: 
        1. https://cookpad.com/jp にアクセスし、ページが完全に読み込まれるまで待機する。
        2. ページ上部にある検索ボックスに「ぶり大根」と入力し、検索ボタンをクリックする。
        3. 検索結果が表示されたら、一番最初のレシピをクリックする。
        4. 各ステップが完了するたびに、スクリーンショットを step-{n}.png として保存する。
        
INFO     [agent] 
📍 Step 1
ERROR    [agent] ❌ Result failed 1/3 times:
 
INFO     [agent] 
📍 Step 1
ERROR    [agent] ❌ Result failed 2/3 times:
 
INFO     [agent] 
📍 Step 1
ERROR    [agent] ❌ Result failed 3/3 times:
 
ERROR    [agent] ❌ Stopping due to 3 consecutive failures
INFO     [agent] Created GIF at agent_history.gif
AgentHistoryList(all_results=[ActionResult(is_done=False, extracted_content=None, error='', include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='', include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='', include_in_memory=True)], all_model_outputs=[])

となる

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル