gemini でのスクショの撮影

プロンプトを


杏林堂薬局/袋井西田店のチラシと店舗情報
「杏林堂薬局/袋井西田店」の店舗情報ページ。チラシ検索サイトShufoo!（しゅふー）に掲載中の「杏林堂薬局/袋井西田店」のチラシ一覧。お得なデジタルチラシを無料でチェック。
www.shufoo.netへアクセスし、「日替」と書かれた画像をクリックした後に「プリント」をクリックし
送信先から「PDFに保存」を選択して「保存」をクリックしてください。

だと
プリント画面で固まる
おそらくプリントの部分がbrowserではないため制御外になると思う

なのでフルスクリーンにしてスクショ
が正解っぽい

スクショに関しては
https://zenn.dev/gunjo/articles/2f6898b846d371

https://zenn.dev/kbyk/articles/3e997a2f762018
を参考に進める

以下コード

from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.controller.service import Controller

import asyncio

controller = Controller()
agent = None

@controller.registry.action('スクリーンショットを保存する')
async def save_screenshot(filename: str = "screenshot.png") -> str:
    page = await agent.browser_context.get_current_page()
    await page.screenshot(path=filename)
    return f"スクリーンショットを {filename} として保存しました"
    
async def main():
    global agent
    llm = ChatGoogleGenerativeAI(model="gemini-pro")  # Gemini Pro モデルを指定
    agent = Agent(
        task="""
        以下の手順を実行してください：
        1. https://s.shufoo.net/chirashi/860323/?cid=nmail_pc にアクセス
        2. 「日替」という画像をクリック
        3. 「フルスクリーン」ボタンをクリック
        4. 拡大画像が表示されたら、スクリーンショットを step-{n}.png として保存
        """,
        llm=llm,
        controller=controller,
        browser=Browser(config=BrowserConfig(
            disable_security=True, 
            headless=False,
        )),
    )
    result = await agent.run()
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

これで実行すると

INFO [browser_use] BrowserUse logging setup complete with level info INFO [root] Anonymized telemetry enabled. See https://github.com/browser-use/browser-use for more information. INFO [agent] 🚀 Starting task: 以下の手順を実行してください： 1. https://s.shufoo.net/chirashi/860323/?cid=nmail_pc にアクセス 2. 「日替」という画像をクリック 3. 「フルスクリーン」ボタンをクリック 4. 拡大画像が表示されたら、スクリーンショットを step-{n}.png として保存 INFO [agent] 📍 Step 1 WARNING [langchain_google_genai.chat_models] Retrying langchain_google_genai.chat_models._achat_with_retry.<locals>._achat_with_retry in 2.0 seconds as it raised NotFound: 404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.. ERROR [agent] ❌ Result failed 1/3 times: 404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods. INFO [agent] 📍 Step 1 WARNING [langchain_google_genai.chat_models] Retrying langchain_google_genai.chat_models._achat_with_retry.<locals>._achat_with_retry in 2.0 seconds as it raised NotFound: 404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.. ERROR [agent] ❌ Result failed 2/3 times: 404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods. INFO [agent] 📍 Step 1 WARNING [langchain_google_genai.chat_models] Retrying langchain_google_genai.chat_models._achat_with_retry.<locals>._achat_with_retry in 2.0 seconds as it raised NotFound: 404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.. ERROR [agent] ❌ Result failed 3/3 times: 404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods. ERROR [agent] ❌ Stopping due to 3 consecutive failures INFO [agent] Created GIF at agent_history.gif AgentHistoryList(all_results=[ActionResult(is_done=False, extracted_content=None, error='404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.', include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.', include_in_memory=True), ActionResult(is_done=False, extracted_content=None, error='404 models/gemini-gemini-2.0-flash-exp is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.', include_in_memory=True)], all_model_outputs=[]) となった

エラーメッセージから、指定した gemini-gemini-2.0-flash-exp というモデルが Google Generative AI API v1beta では見つからないという問題が発生

現在、LangChain で GoogleGenerativeAI を使用する際にサポートされているモデルには以下の
gemini-pro
gemini-pro-vision
gemini-1.5-pro
gemini-1.5-flash
らしい

gemini-1.5-flash
で実験したが
意図しない場所でスクショになってしまう

そもそも見ているページが違っている

https://www.shufoo.net/pntweb/shopDetail/860323/45225639804667/
の挙動を見ていたが

指定しているURLが
メールだと
https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc
になっているので

メールからではなく
チラシのURLが動的でないかを調べることにする

なおメールに記載されているURLに関しては固定のままなので
そのまま使えるURLになっている

URLを変更したが、フルスクリーンをクリックした後すぐにスクショだと
フルスクリーン画面になる前に撮影してしまうため
３秒停止とプロンプトに追加したが失敗する

このため

        1. https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc にアクセス
        2. 「フルスクリーン」ボタンをクリック後、数秒待機
        3. スクリーンショットを step-{n}.png として保存
        """,

というようにプロンプトを変更することで成功

画像が小さいため

        以下の手順を実行してください：
        1. https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc にアクセス
        2. 「フルスクリーン」ボタンをクリック後、数秒待機
        3. フルスクリーン画面が表示されたら、拡大をクリック後、数秒待機
        4. スクリーンショットを step-{n}.png として保存
        """,

とすると今度は拡大のみになる
ブラウザの取り扱いが別のブラウザ扱いなのかもしれない