Snowpool – ページ 18 – Linux & Android Dialy

機能の統合

メールからURLを抽出する

gmail_url_extractor.py
でURLを取得し
image_downloader.py
で画像を取得

ocr_list.py
で vision api でOCR

この中で
line_notify.py
を使うことで
OCRした結果をワードリストに一致したものと画像をLINEで送信

とりあえずここまで作成したらgithubで公開

残る機能は
Yolov8のモデルを作成し画像認識させること
キーワードリストを効率的に作成するため
レシートをOCRしたものから
店名
価格
商品名
日付
を取り出しCSVファイルに保存すること

CSVファイルを元にDBを作成し
在庫管理と連携するようにして無駄な買い物を減らすこと

とりあえずまずは
ocr_list.py
で
gmail_url_extractor.py
でURLを取得し
image_downloader.py
で画像を取得

この画像に対してOCRすればOK

ただし
そのままソースを変えるとテストできなくなるので
別のファイルを作成することにする

なお今日は
おいしい牛乳
が割引らしいのでリストに加える

{
  "keywords": [  
    "麻婆豆腐",
    "キッチンタオル",
    "ほんだし",
    "ぶなしめじ",
    "レタス",
    "キャベツ",
    "おいしい牛乳"

  ]
}

そしてOCRするファイルを作成する

touch image_ocr_notifier.py

中身は

# example_usage.py

from gmail_url_extractor import get_first_unread_email_url
from image_downloader import download_and_merge_images
from google.cloud import vision
import io
import json

def load_settings(file_path='settings.json'):
    with open(file_path, 'r', encoding='utf_8') as settings_json:
        return json.load(settings_json)

def detect_text(image_path):
    """OCRで画像からテキストを抽出"""
    client = vision.ImageAnnotatorClient()
    with io.open(image_path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)
    response = client.document_text_detection(image=image)
    full_text_annotation = response.full_text_annotation

    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))

    return full_text_annotation.text

def search_words(text, keywords):
    """抽出したテキストからキーワードを検索"""
    hitwords = []
    for keyword in keywords:
        if keyword in text:
            hitwords.append(keyword)
    return hitwords

def main():
    # 設定を読み込む
    settings = load_settings()
    
    # GmailからURLを取得
    url = get_first_unread_email_url('【Shufoo!】お気に入り店舗新着チラシお知らせメール')  # '特売情報'はメールの件名に含まれるキーワード
    
    if url:
        print(f"Processing URL: {url}")
        # 画像をダウンロードしてOCRを実行
        output_path = download_and_merge_images('config.json', url)
        
        if output_path:
            extracted_text = detect_text(output_path)
            hitwords = search_words(extracted_text, settings["keywords"])
            
            if hitwords:
                message = "特売リスト: " + ", ".join(hitwords)
                send_line_notify(message, output_path)
            else:
                print("マッチしたキーワードはありませんでした。")
    else:
        print("未読メールが見つかりませんでした。")

if __name__ == '__main__':
    main()

実行したけど

Processing URL: https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc

となるだけ

おそらくxpathが変更になっているので確認

/html/body/div[1]/div[3]/div[1]/div/div[2]/div[3]
/html/body/div[1]/div[3]/div[1]/div/div[2]/div[3]

同じだが動作していない

image_downloader.pyの動作確認

ログの追加: image_downloader.pyの中にログを追加して、どこで処理が失敗しているかを特定
print(f'Checking image source: {src}')  # ログ追加

となるように

def get_images_from_container(driver, base_xpath):
    """指定されたXPathから画像URLを取得する"""
    image_urls = []
    try:
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            print(f'Checking image source: {src}')  # ログ追加
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

というように
変更

また
CR処理のデバッグ: detect_text関数において、Cloud Vision APIのレスポンスが正常であるか確認

image_ocr_notifier.py
の中に

# OCRの結果をデバッグ表示 
full_text_annotation = response.full_text_annotation print("Extracted Text:", full_text_annotation.text)

を追加するので

def detect_text(image_path):
    """OCRで画像からテキストを抽出"""
    client = vision.ImageAnnotatorClient()
    with io.open(image_path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)
    response = client.document_text_detection(image=image)
    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))
    
    # OCRの結果をデバッグ表示
    full_text_annotation = response.full_text_annotation
    print("Extracted Text:", full_text_annotation.text)

    return full_text_annotation.text

しかし動作しない

/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]/div[2]/div[1]

ではなく

/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]/div[2]

の

<div id="cv_1" class="ChirashiView" style="position: absolute; left: 0px; top: -30px; z-index: 1; opacity: 1; cursor: url(&quot;https://www.shufoo.net/site/chirashi_viewer_js/js/../images/openhand_8_8.cur&quot;), default; transition: opacity 200ms ease-in-out;"><div class="ChirashiView_tempDiv" style="position: absolute; overflow: hidden; width: 750px; height: 603px; left: 0px; top: 0px; z-index: 100;"></div><div class="ChirashiContainer" style="position: absolute; left: 0px; top: 0px; width: 750px; height: 603px; z-index: 0; opacity: 1;"><div class="inDiv" style="position: absolute; left: 0px; top: 0px; z-index: 1;"><div id="-2_-2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -1004px; top: -977.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 512px;"></div><div id="-1_-2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -492px; top: -977.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 512px;"></div><div id="0_-2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 20px; top: -977.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 512px;"></div><div id="1_-2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 532px; top: -977.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 198px; height: 512px;"></div><div id="2_-2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 1044px; top: -977.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; height: 512px;"></div><div id="-2_-1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -1004px; top: -465.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 512px;"></div><div id="-1_-1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -492px; top: -465.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 512px;"></div><div id="0_-1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 20px; top: -465.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 512px;"></div><div id="1_-1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 532px; top: -465.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 198px; height: 512px;"></div><div id="2_-1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 1044px; top: -465.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; height: 512px;"></div><div id="-2_0" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -1004px; top: 46.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 510px;"></div><div id="-1_0" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -492px; top: 46.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 510px;"></div><div id="0_0" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 20px; top: 46.5px; width: 512px; height: 512px;"><img draggable="false" src="https://ipqcache2.shufoo.net/c/2024/08/08/25295137072090/index/img/0_100_0.jpg" style="border: 0px; padding: 0px; margin: 0px; width: 512px; height: 510px;"></div><div id="1_0" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 532px; top: 46.5px; width: 512px; height: 512px;"><img draggable="false" src="https://ipqcache2.shufoo.net/c/2024/08/08/25295137072090/index/img/0_100_1.jpg" style="border: 0px; padding: 0px; margin: 0px; width: 198px; height: 510px;"></div><div id="2_0" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 1044px; top: 46.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; height: 510px;"></div><div id="-2_1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -1004px; top: 558.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px;"></div><div id="-1_1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -492px; top: 558.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px;"></div><div id="0_1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 20px; top: 558.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px;"></div><div id="1_1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 532px; top: 558.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 198px;"></div><div id="2_1" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 1044px; top: 558.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px;"></div><div id="-2_2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -1004px; top: 1070.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px;"></div><div id="-1_2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: -492px; top: 1070.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px;"></div><div id="0_2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 20px; top: 1070.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 512px;"></div><div id="1_2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 532px; top: 1070.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px; width: 198px;"></div><div id="2_2" style="position: absolute; opacity: 1; transition: opacity 200ms ease-out; left: 1044px; top: 1070.5px; width: 512px; height: 512px;"><img draggable="false" src="https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png" style="border: 0px; padding: 0px; margin: 0px;"></div></div><div class="linkDiv" style="position: absolute; left: 0px; top: 0px; z-index: 2;"></div></div></div>

のソース部分が正解らしい

config.jsonの中身を

  "base_xpath_images": "/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]/div[2]"

とxpathを変更することで
とりあえず画像の取得はできたが

 File "/Users/snowpool/aw10s/store_adversting_list/image_ocr_notifier.py", line 67, in <module> main() File "/Users/snowpool/aw10s/store_adversting_list/image_ocr_notifier.py", line 60, in main send_line_notify(message, output_path) NameError: name 'send_line_notify' is not defined

となる

    # # OCRの結果をデバッグ表示
    # full_text_annotation = response.full_text_annotation
    # print("Extracted Text:", full_text_annotation.text)

というようにコメントアウト

from line_notify import send_line_notify

が抜けていたので
これを追記

なぜかGPTでソースを生成すると
モジュールのインポートが削除されることがよくある

無事に処理ができたがログに

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1723322571.066745 14856910 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache Sent images/combined_image_20240811_054251.jpg: 200 {"status":200,"message":"ok"}

と出る

GPTによれば

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR

このメッセージは、abseil (Absl) ライブラリが初期化される前にログメッセージが標準エラー出力（STDERR）に書き込まれることを示しています。これは、Google CloudのPythonクライアントライブラリやgRPCなどのバックエンドでAbseilライブラリが使用されている場合によく見られます。この警告は通常、以下の点で心配する必要はありません:

初期化前ログの出力: Abseilが初期化される前に生成されるログメッセージが一時的にSTDERRに出力されるという技術的な情報を示しています。
影響なし: このメッセージは一般的にはアプリケーションの動作に影響を与えないため、無視しても問題ありません

I0000 00:00:1723322571.066745 14856910 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache

このメッセージは、gRPCライブラリの内部で実験的な機能が有効になっていることを示す情報ログです。これも通常、開発者が特に指定しない限り、デフォルトで有効になっている設定です。これらの実験的な機能は、以下のような効果を持ちます：
* 実験的機能の有効化: gRPCの内部機能が特定の実験的な設定で動作していることを示します。
* 通常の動作: 通常、gRPCの標準的な利用に影響はなく、情報提供のためのログです。

これらのログメッセージは、技術的な情報や警告を提供するためのものです。特にabsl::InitializeLog()やgRPCに関する警告は、ログの設定やライブラリの内部動作に関するもので、アプリケーションの正常動作には通常影響を与えません

とのこと

Gmailで指定の未読メールの取得

ラベルがshopで
お気に入り店舗新着チラシお知らせメール
の件名
の未読メールを取得する

本文のURLをクリックし
杏林堂なら
/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]
もしくは
/html/body/div[1]/div[3]/div[1]/div/div[2]
で画像をダウンロードし
複数画像の時には１つに統合する

とりあえずここまでできるようにする

まずは件名で未読のものを取得するようにした

クエリに is:unread を追加しました。この変更により、未読のメールのみが検索対象になります。

import os.path
import base64
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)

    # 未読メールを検索
    query = 'is:unread subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No unread messages found.')
    else:
        print(f'Found {len(messages)} unread messages:')
        for msg in messages:
            msg_id = msg['id']
            msg = service.users().messages().get(userId='me', id=msg_id).execute()
            msg_snippet = msg['snippet']
            print(f'Message snippet: {msg_snippet}')

if __name__ == '__main__':
    main()

変更点は

query = 'is:unread subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'

の部分

さらに最新の1件のみ取得するようにコード変更

変更点は
maxResults パラメータの追加:
service.users().messages().list メソッドに maxResults=1 を追加
これにより、検索結果として最新の1件のみが返される

これで実行すると

Found 1 unread message:
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、1件の新着チラシが掲載開始されました。 ・ピアゴ袋井店https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc ※Shufoo!PCサイトまたは、シュフーチラシアプリ（スマートフォン・タブレット端末用） からログインしてお店の
s

となる

次に本文からURLのみ抽出する
ただし複数存在するため最初のURLのみ抽出する

本文からURLを抽出するには、メールの本文を取得し、正規表現を使ってURLを抽出
複数のURLが含まれている場合は、最初のURLのみを抽出

import os.path
import base64
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def extract_first_url(text):
    """テキストから最初のURLを抽出します。"""
    url_pattern = r'https?://[^\s]+'
    urls = re.findall(url_pattern, text)
    return urls[0] if urls else None

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)

    # 最新の未読メール1件を検索
    query = 'is:unread subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query, maxResults=1).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No unread messages found.')
    else:
        print(f'Found {len(messages)} unread message:')
        msg_id = messages[0]['id']
        msg = service.users().messages().get(userId='me', id=msg_id, format='full').execute()
        msg_payload = msg['payload']
        msg_parts = msg_payload.get('parts', [])
        
        body_data = ""
        for part in msg_parts:
            if part['mimeType'] == 'text/plain':  # or 'text/html' depending on the format you need
                body_data = part['body']['data']
                break
        
        # Base64デコード
        body_text = base64.urlsafe_b64decode(body_data).decode('utf-8')

        # URLを抽出
        first_url = extract_first_url(body_text)
        if first_url:
            print(f'First URL found: {first_url}')
        else:
            print('No URL found in the message.')

if __name__ == '__main__':
    main()

ここで問題発生

Found 1 unread message:
No URL found in the message.

となる

ChatGPTで調べると

変更点
* extract_first_url関数: 正規表現を使ってテキストから最初のURLを抽出する関数を追加しました。
* メール本文の取得: メールのpayloadから本文のデータ部分を取得し、Base64でデコードしています。
* 本文の解析: デコードした本文から最初のURLを抽出しています。
注意点
* MIMEタイプの確認: メールの本文がtext/plainではなくtext/htmlの場合は、if part[‘mimeType’] == ‘text/plain’: を if part[‘mimeType’] == ‘text/html’: に変更する必要があります。
* 複数部品のメール: 一部のメールは複数のpartsに分かれている場合があります。このコードは最初のtext/plainパートからURLを抽出しますが、複雑なメール構造の場合は調整が必要です。

今回の場合

メール本文からURLを抽出するために、正確なデータ部分をデコードすることが必要です。
Gmail APIで取得するメールの本文は複数のパートに分かれていることがあり、
正しいパートからデコードする必要があります。
また、本文の形式がtext/plainかtext/htmlかを確認し、適切にデコードします

get_message_body 関数:
* メールの本文をすべてのパートから取得し、text/plainまたはtext/htmlの内容をデコードします。
* part[‘body’].get(‘data’) を使ってBase64エンコードされた本文データを取得し、デコードして連結します。
本文の抽出:
* get_message_body関数で取得した本文全体からURLを抽出します。
* 正規表現を使用して、最初に見つかったURLを返します。

メール本文のパートを正しく取得してURLを抽出するようにします。メールの本文パートを順番に確認して、デコード

import os.path
import base64
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def extract_first_url(text):
    """テキストから最初のURLを抽出します。"""
    url_pattern = r'https?://[^\s]+'
    urls = re.findall(url_pattern, text)
    return urls[0] if urls else None

def get_message_body(parts):
    """メールの本文を取得します。"""
    body = ""
    for part in parts:
        if part['mimeType'] == 'text/plain' or part['mimeType'] == 'text/html':
            body_data = part['body'].get('data')
            if body_data:
                body += base64.urlsafe_b64decode(body_data).decode('utf-8')
    return body

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)

    # 最新の未読メール1件を検索
    query = 'is:unread subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query, maxResults=1).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No unread messages found.')
    else:
        print(f'Found {len(messages)} unread message:')
        msg_id = messages[0]['id']
        msg = service.users().messages().get(userId='me', id=msg_id, format='full').execute()
        msg_payload = msg['payload']
        
        # メッセージ本文を取得
        msg_body = get_message_body(msg_payload.get('parts', []))

        # URLを抽出
        first_url = extract_first_url(msg_body)
        if first_url:
            print(f'First URL found: {first_url}')
        else:
            print('No URL found in the message.')

if __name__ == '__main__':
    main()

しかし結果は

Found 1 unread message:
No URL found in the message.

再度ChatGPTで調べる

メールの本文からURLが正しく抽出されない場合、
メールの構造が予想とは異なる可能性があります。

Gmail APIで取得したメールのペイロードを詳しく確認し、
メールの本文がどのように構造化されているかを理解する必要があります。

メールの内容は、partsとして分割されているか、
bodyに直接含まれている場合があります。

どちらの場合でも、メールの本文をデコードしてURLを抽出することが必要です。
以下は、メールペイロードを詳細に確認してURLを抽出するための修正コード

import os.path
import base64
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def extract_first_url(text):
    """テキストから最初のURLを抽出します。"""
    url_pattern = r'https?://[^\s]+'
    urls = re.findall(url_pattern, text)
    return urls[0] if urls else None

def get_message_body(payload):
    """メールの本文を取得します。"""
    body = ""
    
    if 'data' in payload['body']:
        # 直接bodyに含まれる場合
        body = base64.urlsafe_b64decode(payload['body']['data']).decode('utf-8')
    elif 'parts' in payload:
        # partsに分割されている場合
        for part in payload['parts']:
            if part['mimeType'] == 'text/plain' or part['mimeType'] == 'text/html':
                body_data = part['body'].get('data')
                if body_data:
                    body += base64.urlsafe_b64decode(body_data).decode('utf-8')
    
    return body

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)

    # 最新の未読メール1件を検索
    query = 'is:unread subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query, maxResults=1).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No unread messages found.')
    else:
        print(f'Found {len(messages)} unread message:')
        msg_id = messages[0]['id']
        msg = service.users().messages().get(userId='me', id=msg_id, format='full').execute()
        msg_payload = msg['payload']
        
        # メッセージ本文を取得
        msg_body = get_message_body(msg_payload)

        # URLを抽出
        first_url = extract_first_url(msg_body)
        if first_url:
            print(f'First URL found: {first_url}')
        else:
            print('No URL found in the message.')

if __name__ == '__main__':
    main()

修正点
1. get_message_body関数の改良:
* payload[‘body’]に直接データが含まれる場合も考慮しました。この場合、partsはありません。
* partsがある場合でも、本文がtext/plainかtext/htmlのいずれかである部分を探します。
2. URL抽出の再確認:
* extract_first_url関数で、取得した本文全体から最初のURLを抽出します。

これで

Found 1 unread message:
First URL found: https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc

というようにURLを抽出できた

これを
Seleniumで開き画像をダウンロードするモジュールの引数に当てるようにする

Xpathはcoopだけ違うようだが
実際には同じだった
ダウンロードファイルはタイムスタンプをつければかぶることはないし
そのままcloud vision api で処理してLINEで送信すれば問題ない

このためクリック処理は不要でそのまま画像をダウンロードするようにする

import os
import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[3]'

    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            # カレントディレクトリにimagesフォルダを作成
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)  # ディレクトリが存在しない場合は作成
            output_path = os.path.join(output_dir, f'combined_image_{current_date}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

これで無事に画像のダウンロードができた

ただし頻繁にxpathが変わるようなので
設定ファイルを作成し
そこで設定したxpathを

    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[3]'

で指定しているxpath になるようにコード変更

設定ファイルは
config.json
だが
既にLINE APIの設定で使っているので
これに

"base_xpath_images": "/html/body/div[1]/div[3]/div[1]/div/div[2]/div[3]"

の項目を追加する

import os
import time
import json
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def load_config(config_file):
    """設定ファイルからコンフィグを読み込む"""
    with open(config_file, 'r') as file:
        config = json.load(file)
    return config

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    # 設定ファイルを読み込む
    config = load_config('config.json')
    
    url = 'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    
    # 設定ファイルからXPathを取得して画像を取得
    base_xpath_images = config['base_xpath_images']
    
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            # カレントディレクトリにimagesフォルダを作成
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)  # ディレクトリが存在しない場合は作成
            output_path = os.path.join(output_dir, f'combined_image_{current_date}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

またタイムスタンプをファイル名に追加すれば上書き防止になるので

タイムスタンプのフォーマット: datetime.now().strftime(‘%Y%m%d_%H%M%S’)で、
現在の日付と時刻をフォーマットし、年/月/日時:分:秒の順にします。

これにより、ファイル名が一意になります。
ファイル名への追加: output_pathの生成時にタイムスタンプを含めることで、
同日に何度実行してもファイルが上書きされないようにしました。

フォルダの作成: 出力先のディレクトリが存在しない場合、自動的に作成されます。

これらを組み込み

import os
import time
import json
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def load_config(config_file):
    """設定ファイルからコンフィグを読み込む"""
    with open(config_file, 'r') as file:
        config = json.load(file)
    return config

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    # 設定ファイルを読み込む
    config = load_config('config.json')
    
    url = 'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    
    # 設定ファイルからXPathを取得して画像を取得
    base_xpath_images = config['base_xpath_images']
    
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付と時刻を取得してフォーマット
            current_timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            # カレントディレクトリにimagesフォルダを作成
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)  # ディレクトリが存在しない場合は作成
            output_path = os.path.join(output_dir, f'combined_image_{current_timestamp}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

これで実行すると

Found image: https://ipqcache2.shufoo.net/c/2024/08/09/96510937073938/index/img/0_100_0.jpg
Found image: https://ipqcache2.shufoo.net/c/2024/08/09/96510937073938/index/img/0_100_1.jpg
Downloaded image_0.jpg
Downloaded image_1.jpg
Saved combined image as images/combined_image_20240809_233141.jpg

となって無事にダウンロードが実行される

次にこれをモジュールにする

# image_downloader.py

import os
import json
import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def load_config(config_file):
    """設定ファイルからコンフィグを読み込む"""
    with open(config_file, 'r') as file:
        config = json.load(file)
    return config

def open_link_in_safari(url):
    """指定されたURLをSafariで開く"""
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    """指定されたXPathから画像URLを取得する"""
    image_urls = []
    try:
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    """画像URLから画像をダウンロードする"""
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    """複数の画像を結合して保存する"""
    if not images:
        print("No images to merge.")
        return

    widths, heights = zip(*(img.size for img in images))
    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def download_and_merge_images(config_file, url):
    """画像をダウンロードして結合するメイン関数"""
    config = load_config(config_file)
    driver = open_link_in_safari(url)

    base_xpath_images = config['base_xpath_images']
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            current_timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)
            output_path = os.path.join(output_dir, f'combined_image_{current_timestamp}.jpg')
            merge_images(images, output_path)

として
image_downloader.py
を作成

ただし、これだと最新のファイルを探すなどの処理が必要になるため
作成したファイル名を返り値として渡すようにすれば
そのファイルに対して
Cloud vision api を実行できるはず

# image_downloader.py

import os
import json
import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def load_config(config_file):
    """設定ファイルからコンフィグを読み込む"""
    with open(config_file, 'r') as file:
        config = json.load(file)
    return config

def open_link_in_safari(url):
    """指定されたURLをSafariで開く"""
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    """指定されたXPathから画像URLを取得する"""
    image_urls = []
    try:
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    """画像URLから画像をダウンロードする"""
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    """複数の画像を結合して保存する"""
    if not images:
        print("No images to merge.")
        return

    widths, heights = zip(*(img.size for img in images))
    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def download_and_merge_images(config_file, url):
    """画像をダウンロードして結合するメイン関数"""
    config = load_config(config_file)
    driver = open_link_in_safari(url)

    base_xpath_images = config['base_xpath_images']
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            current_timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)
            output_path = os.path.join(output_dir, f'combined_image_{current_timestamp}.jpg')
            merge_images(images, output_path)
            return output_path  # 生成されたファイル名を返す
    return None  # 画像がなかった場合はNoneを返す

次にgmailの未読メールからURLを取得する部分もモジュール化する

touch gmail_url_extractor.py

でファイルを作成

import os.path
import base64
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def authenticate_gmail():
    """Gmail APIに認証し、サービスを構築します。"""
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    service = build('gmail', 'v1', credentials=creds)
    return service

def extract_first_url(text):
    """テキストから最初のURLを抽出します。"""
    url_pattern = r'https?://[^\s]+'
    urls = re.findall(url_pattern, text)
    return urls[0] if urls else None

def get_message_body(payload):
    """メールの本文を取得します。"""
    body = ""
    
    if 'data' in payload['body']:
        body = base64.urlsafe_b64decode(payload['body']['data']).decode('utf-8')
    elif 'parts' in payload:
        for part in payload['parts']:
            if part['mimeType'] == 'text/plain' or part['mimeType'] == 'text/html':
                body_data = part['body'].get('data')
                if body_data:
                    body += base64.urlsafe_b64decode(body_data).decode('utf-8')
    
    return body

def get_first_unread_email_url(subject_query):
    """指定された件名を持つ未読メールから最初のURLを取得します。"""
    service = authenticate_gmail()

    query = f'is:unread subject:"{subject_query}"'
    results = service.users().messages().list(userId='me', q=query, maxResults=1).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No unread messages found.')
        return None
    
    msg_id = messages[0]['id']
    msg = service.users().messages().get(userId='me', id=msg_id, format='full').execute()
    msg_payload = msg['payload']
    
    msg_body = get_message_body(msg_payload)
    first_url = extract_first_url(msg_body)
    
    return first_url

として保存

使う時には

from gmail_url_extractor import get_first_unread_email_url

def main():
    subject_query = "【Shufoo!】お気に入り店舗新着チラシお知らせメール"
    url = get_first_unread_email_url(subject_query)
    
    if url:
        print(f'First URL found: {url}')
    else:
        print('No URL found in the message.')

if __name__ == '__main__':
    main()

というように使う

ファイルのダウンロードと生成モジュールは

# example_usage.py

from image_downloader import download_and_merge_images

def main():
    config_file = 'config.json'
    url = 'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc'
    output_path = download_and_merge_images(config_file, url)
    
    if output_path:
        print(f"Generated file: {output_path}")
        # Cloud Vision APIを実行するコードをここに追加
        # example: run_cloud_vision_api(output_path)
    else:
        print("No image file was created.")

if __name__ == '__main__':
    main()

として使う

とりあえずほとんど準備できたので
あとは
ocr_list.py
で
gmail_url_extractor.py
でURLを取得し
image_downloader.py
で画像を取得

この画像に対してOCRすればOKとなる

チラシ詳細の取得

Shufooで直リンクすれば
Gmail処理は不要

https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc
杏林堂
キーワード
日替

https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc
ユーコープ
キーワード
ユーコープのお買い得！

https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc
ぴあご

というように
それぞれみたいチラシのURLとキーワードがペアになっている
ならばこれをJSONとかにして当てはめることができるはず

もしかしたらxpathも同じかもしれないので

cp clik_allget_image.py piago.py

でコピーして
ぴあごとコープで試す

どうやらxpathも同じ
ただし coopのチラシがOCRの精度が良くない

とりあえずぴあごのチラシのリンクはできたけど
画像のダウンロードができていない

とりあえず杏林堂は毎日テストできるので
先に杏林堂のチラシでLINE送信を試す

def wait_for_page_load(driver, timeout=30):
    WebDriverWait(driver, timeout).until(
        EC.presence_of_element_located((By.XPATH, '//img'))  # ページに画像が表示されるまで待機
    )

を追加してみる

import os
import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            # カレントディレクトリにimagesフォルダを作成
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)  # ディレクトリが存在しない場合は作成
            output_path = os.path.join(output_dir, f'combined_image_{current_date}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

を

import os
import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    wait_for_page_load(driver)  # ページの読み込みを待機
    return driver

def wait_for_page_load(driver, timeout=30):
    """
    ページに画像が表示されるまで待機する関数。
    """
    try:
        WebDriverWait(driver, timeout).until(
            EC.presence_of_element_located((By.XPATH, '//img'))
        )
        print("Page loaded successfully.")
    except Exception as e:
        print(f"Error waiting for page to load: {e}")

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 30).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                wait_for_page_load(driver)  # クリック後のページ読み込みを待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 30).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            # カレントディレクトリにimagesフォルダを作成
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)  # ディレクトリが存在しない場合は作成
            output_path = os.path.join(output_dir, f'combined_image_{current_date}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

としてみる

チラシの商品名をLINE送信

ocr_list.py
でキーワードと一致した商品名の取得ができるので
これを
line_notify.pyにセットして送信するようにする

import json
from google.cloud import vision
import io

# 設定ファイルの読み込み
settings_json = open('settings.json', 'r', encoding='utf_8')
settings = json.load(settings_json)

# OCRで画像からテキストを抽出
def detect_text(image_paths):
    client = vision.ImageAnnotatorClient()

    all_text = ''

    for image_path in image_paths:
        with io.open(image_path, 'rb') as image_file:
            content = image_file.read()

        image = vision.Image(content=content)

        # document_text_detectionを使用して文書全体のテキストを取得
        response = client.document_text_detection(image=image)
        full_text_annotation = response.full_text_annotation

        # テキストの抽出
        all_text += full_text_annotation.text

        if response.error.message:
            raise Exception(
                '{}\nFor more info on error messages, check: '
                'https://cloud.google.com/apis/design/errors'.format(
                    response.error.message))

    return all_text

# キーワード検索
def search_words(all_text):
    hitwords = []
    for keyword in settings["keywords"]:
        if keyword in all_text:
            hitwords.append(keyword)

    return hitwords

# 例として実行
if __name__ == "__main__":
    image_paths = ["images/combined_image_20240805.jpg"]
    extracted_text = detect_text(image_paths)
    hitwords = search_words(extracted_text)
    
    # ヒットしたキーワードのみを表示
    if hitwords:
        print("マッチしたキーワード:", ", ".join(hitwords))
    else:
        print("マッチしたキーワードはありませんでした。")

の中で

import requests
import os
from PIL import Image
from io import BytesIO
from utils import load_config, get_latest_directory, get_image_files

def resize_image_if_needed(image_data, max_size=3 * 1024 * 1024):
    if len(image_data) > max_size:
        image = Image.open(BytesIO(image_data))
        new_size = (image.width // 2, image.height // 2)
        image = image.resize(new_size, Image.LANCZOS)

        output = BytesIO()
        image_format = image.format if image.format else 'JPEG'
        image.save(output, format=image_format)
        return output.getvalue()
    return image_data

def send_line_notify(message, config_path='config.json'):
    # 設定ファイルを読み込む
    config = load_config(config_path)

    # 設定ファイルからトークンとディレクトリパスを取得
    token = config['token']
    base_path = config['image_file_path']

    # 最新のpredictディレクトリを取得
    latest_dir = get_latest_directory(base_path)
    image_files = get_image_files(latest_dir)

    url = 'https://notify-api.line.me/api/notify'

    headers = {'Authorization': f"Bearer {token}"}
    params = {'message': message}

    # 最新のpredictディレクトリ内の全ての画像ファイルに対してLINE Notify APIにリクエストを送信
    for image_file_path in image_files:
        with open(image_file_path, 'rb') as img_file:
            img_data = img_file.read()
            img_data = resize_image_if_needed(img_data)

            # ファイルデータをバイトデータとして用意
            files = {'imageFile': BytesIO(img_data)}
            files['imageFile'].name = os.path.basename(image_file_path)

            # LINE Notify APIにリクエストを送信
            res = requests.post(url, headers=headers, params=params, files=files)

            # レスポンスを出力
            print(f"File: {image_file_path}")
            print(res.status_code)
            print(res.text)

を呼び出し
hit words を line notifyで送信したい

その前に

import requests
import os
from PIL import Image
from io import BytesIO
from utils import load_config, get_latest_directory, get_image_files

def resize_image_if_needed(image_data, max_size=3 * 1024 * 1024):
    if len(image_data) > max_size:
        image = Image.open(BytesIO(image_data))
        new_size = (image.width // 2, image.height // 2)
        image = image.resize(new_size, Image.LANCZOS)

        output = BytesIO()
        image_format = image.format if image.format else 'JPEG'
        image.save(output, format=image_format)
        return output.getvalue()
    return image_data

def send_line_notify(message, config_path='config.json'):
    # 設定ファイルを読み込む
    config = load_config(config_path)

    # 設定ファイルからトークンとディレクトリパスを取得
    token = config['token']
    base_path = config['image_file_path']

    # 最新のpredictディレクトリを取得
    latest_dir = get_latest_directory(base_path)
    image_files = get_image_files(latest_dir)

    url = 'https://notify-api.line.me/api/notify'

    headers = {'Authorization': f"Bearer {token}"}
    params = {'message': message}

    # 最新のpredictディレクトリ内の全ての画像ファイルに対してLINE Notify APIにリクエストを送信
    for image_file_path in image_files:
        with open(image_file_path, 'rb') as img_file:
            img_data = img_file.read()
            img_data = resize_image_if_needed(img_data)

            # ファイルデータをバイトデータとして用意
            files = {'imageFile': BytesIO(img_data)}
            files['imageFile'].name = os.path.basename(image_file_path)

            # LINE Notify APIにリクエストを送信
            res = requests.post(url, headers=headers, params=params, files=files)

            # レスポンスを出力
            print(f"File: {image_file_path}")
            print(res.status_code)
            print(res.text)

で今回はyolov8は使ってないので
まずはテキストのみにする

import requests
import os
from utils import load_config

def send_line_notify(message, config_path='config.json'):
    # 設定ファイルを読み込む
    config = load_config(config_path)

    # 設定ファイルからトークンを取得
    token = config['token']

    url = 'https://notify-api.line.me/api/notify'

    headers = {'Authorization': f"Bearer {token}"}
    params = {'message': message}

    # LINE Notify APIにリクエストを送信
    res = requests.post(url, headers=headers, params=params)

    # レスポンスを出力
    print(res.status_code)
    print(res.text)

# 例として実行
if __name__ == "__main__":
    message = "マッチしたキーワード: サンプルキーワード"
    send_line_notify(message)

とりあえず main 部分は削除して
ocr_list.pyの中で呼び出したい

import json
from google.cloud import vision
import io
from line_notify import send_line_notify

# 設定ファイルの読み込み
def load_settings(file_path='settings.json'):
    with open(file_path, 'r', encoding='utf_8') as settings_json:
        return json.load(settings_json)

# OCRで画像からテキストを抽出
def detect_text(image_paths):
    client = vision.ImageAnnotatorClient()

    all_text = ''

    for image_path in image_paths:
        with io.open(image_path, 'rb') as image_file:
            content = image_file.read()

        image = vision.Image(content=content)

        # document_text_detectionを使用して文書全体のテキストを取得
        response = client.document_text_detection(image=image)
        full_text_annotation = response.full_text_annotation

        # テキストの抽出
        all_text += full_text_annotation.text

        if response.error.message:
            raise Exception(
                '{}\nFor more info on error messages, check: '
                'https://cloud.google.com/apis/design/errors'.format(
                    response.error.message))

    return all_text

# キーワード検索
def search_words(all_text, keywords):
    hitwords = []
    for keyword in keywords:
        if keyword in all_text:
            hitwords.append(keyword)

    return hitwords

# 例として実行
if __name__ == "__main__":
    settings = load_settings()
    image_paths = ["images/combined_image_20240805.jpg"]
    extracted_text = detect_text(image_paths)
    hitwords = search_words(extracted_text, settings["keywords"])
    
    # ヒットしたキーワードをLINE Notifyで送信
    if hitwords:
        message = "マッチしたキーワード: " + ", ".join(hitwords)
        send_line_notify(message)
    else:
        print("マッチしたキーワードはありませんでした。")

これを実行したらLINEで送信されたのでOK

このままだとメッセージが分かりにくいので
マッチしたキーワードから
特売リスト
にメッセージを変更

そしてOCRした画像ファイルも一緒にLINE送信するように
line_notify.pyのソースを変更

import requests
import os
from utils import load_config

def send_line_notify(message, config_path='config.json'):
    # 設定ファイルを読み込む
    config = load_config(config_path)

    # 設定ファイルからトークンを取得
    token = config['token']

    url = 'https://notify-api.line.me/api/notify'

    headers = {'Authorization': f"Bearer {token}"}
    params = {'message': message}

    # LINE Notify APIにリクエストを送信
    res = requests.post(url, headers=headers, params=params)

    # レスポンスを出力
    print(res.status_code)
    print(res.text)

を

import requests
from utils import load_config
from io import BytesIO
from PIL import Image

def resize_image_if_needed(image_data, max_size=3 * 1024 * 1024):
    """
    画像が指定されたサイズを超える場合は、画像のサイズを縮小する。

    Args:
        image_data (bytes): 画像データ。
        max_size (int): 最大ファイルサイズ（バイト）。

    Returns:
        bytes: サイズ変更後の画像データ。
    """
    if len(image_data) > max_size:
        image = Image.open(BytesIO(image_data))
        new_size = (image.width // 2, image.height // 2)
        image = image.resize(new_size, Image.LANCZOS)

        output = BytesIO()
        image_format = image.format if image.format else 'JPEG'
        image.save(output, format=image_format)
        return output.getvalue()
    return image_data

def send_line_notify(message, image_path=None, config_path='config.json'):
    """
    LINE Notifyを使用してメッセージとオプションで画像を送信する関数。

    Args:
        message (str): 送信するメッセージ。
        image_path (str): 送信する画像のパス。
        config_path (str): 設定ファイルのパス。

    Returns:
        None
    """
    # 設定ファイルを読み込む
    config = load_config(config_path)

    # 設定ファイルからトークンを取得
    token = config['token']

    url = 'https://notify-api.line.me/api/notify'

    headers = {'Authorization': f"Bearer {token}"}
    params = {'message': message}

    # 画像がある場合は読み込み
    files = None
    if image_path:
        with open(image_path, 'rb') as img_file:
            img_data = img_file.read()
            img_data = resize_image_if_needed(img_data)
            files = {'imageFile': BytesIO(img_data)}
            files['imageFile'].name = os.path.basename(image_path)

    # LINE Notify APIにリクエストを送信
    res = requests.post(url, headers=headers, params=params, files=files)

    # レスポンスを出力
    print(res.status_code)
    print(res.text)

しかし

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1723040085.190147 14567125 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
Traceback (most recent call last):
  File "/Users/snowpool/aw10s/store_adversting_list/ocr_list.py", line 57, in <module>
    send_line_notify(message,image_path)
  File "/Users/snowpool/aw10s/store_adversting_list/line_notify.py", line 54, in send_line_notify
    with open(image_path, 'rb') as img_file:
TypeError: expected str, bytes or os.PathLike object, not list

となる

send_line_notify関数は単一の画像パスを期待しているため、リストから1つずつ画像パスを取り出して送信する必要があります。

ソースを

import requests
from utils import load_config
from io import BytesIO
from PIL import Image
import os

def resize_image_if_needed(image_data, max_size=3 * 1024 * 1024):
    """
    画像が指定されたサイズを超える場合は、画像のサイズを縮小する。

    Args:
        image_data (bytes): 画像データ。
        max_size (int): 最大ファイルサイズ（バイト）。

    Returns:
        bytes: サイズ変更後の画像データ。
    """
    if len(image_data) > max_size:
        image = Image.open(BytesIO(image_data))
        new_size = (image.width // 2, image.height // 2)
        image = image.resize(new_size, Image.LANCZOS)

        output = BytesIO()
        image_format = image.format if image.format else 'JPEG'
        image.save(output, format=image_format)
        return output.getvalue()
    return image_data

def send_line_notify(message, image_paths=None, config_path='config.json'):
    """
    LINE Notifyを使用してメッセージとオプションで画像を送信する関数。
    複数の画像パスをリストとして受け取ることができます。

    Args:
        message (str): 送信するメッセージ。
        image_paths (list): 送信する画像のパスのリスト。
        config_path (str): 設定ファイルのパス。

    Returns:
        None
    """
    # 設定ファイルを読み込む
    config = load_config(config_path)

    # 設定ファイルからトークンを取得
    token = config['token']

    url = 'https://notify-api.line.me/api/notify'

    headers = {'Authorization': f"Bearer {token}"}
    params = {'message': message}

    # 画像がリストとして渡されている場合に対応
    if image_paths is not None:
        if not isinstance(image_paths, list):
            image_paths = [image_paths]

        for image_path in image_paths:
            if image_path is not None:
                with open(image_path, 'rb') as img_file:
                    img_data = img_file.read()
                    img_data = resize_image_if_needed(img_data)
                    files = {'imageFile': BytesIO(img_data)}
                    files['imageFile'].name = os.path.basename(image_path)

                    # LINE Notify APIにリクエストを送信
                    res = requests.post(url, headers=headers, params=params, files=files)

                    # レスポンスを出力
                    print(f"Sent {image_path}: {res.status_code} {res.text}")
    else:
        # 画像がない場合はメッセージのみ送信
        res = requests.post(url, headers=headers, params=params)
        print(f"Sent message: {res.status_code} {res.text}")

とすることで複数の画像に対応

リストに対応: image_paths引数がリスト型で渡されても対応できるようにしました。リストでない場合は、単一の画像パスとしてリストに変換して扱います。
画像のループ処理: 画像が渡された場合はループでそれぞれの画像をLINE Notifyに送信します。
画像なしの場合: 画像が渡されなかった場合は、メッセージのみを送信します。

Jsonファイルを元に購入リストがあるか判定

 vim ocr_list.py

でファイルを作成

import json
settings_json = open('settings.json', 'r', encoding='utf_8')
settings = json.load(settings_json)

# 公式サイトからpdfリンク一覧取得
def get_urls():
  import requests
  from bs4 import BeautifulSoup

  params = { settings['url_params_name']: settings['url_params_value'] }
  load_url = settings['url']
  html = requests.get(load_url, params=params)
  soup = BeautifulSoup(html.text, 'html.parser')

  flyer_list = soup.find_all('table')
  url_list = []
  for flyer in flyer_list:
    # 日付
    date = flyer.find('div', {'class': 'sale'}).find('a').get_text(strip=True).replace(' ', '').replace('（', '(').replace('）', ')')
    
    # PDF(表)
    omote_url = flyer.find('a', {'title': 'PDFオモテ'})['href']
    omote = {}
    omote['date'] = date
    omote['url'] = settings['url_stem'] + omote_url.replace('../', '')
    url_list.append(omote)

    # PDF(裏)
    if flyer.find('a', {'title': 'PDFウラ'}):
      ura_url = flyer.find('a', {'title': 'PDFウラ'})['href'] 
      ura = {}
      ura['date'] = date
      ura['url'] = settings['url_stem'] + ura_url.replace('../', '')
      url_list.append(ura)

  return url_list

# 未解析のチラシURLを取得
def get_new_urls(url_list):
  # urls.txt読込
  old_urls = []
  with open('urls.txt', 'r') as f:
    old_urls = f.read().splitlines()

  new_urls = []
  urls_text = []
  count = 0
  for url_info in url_list:
    urls_text.append(url_info['url'] + '\n')

    if url_info['url'] not in old_urls:
      # 新規
      url_info['number'] = count
      new_urls.append(url_info)
      count += 1
  
  # urls.txt書込
  f = open('urls.txt', 'w')
  f.writelines(urls_text)
  f.close()

  return new_urls

# 未解析のpdfをDL
def dl_pdfs(new_url_list):
  import urllib.request
  import time

  pdf_list = []
  for url_info in new_url_list:
    # 表
    file_name = f'pdf/{url_info["number"]}.pdf'
    urllib.request.urlretrieve(url_info['url'], file_name)
    url_info['pdf_path'] = file_name

    time.sleep(2)

    pdf_list.append(url_info)

  return pdf_list

# PDFをJPGに変換
def pdf_to_jpeg(path):
  import os
  from pathlib import Path
  from pdf2image import convert_from_path

  # poppler/binを環境変数PATHに追加する
  poppler_dir = Path(__file__).parent.absolute() / 'lib/poppler/bin'
  os.environ['PATH'] += os.pathsep + str(poppler_dir)

  image_paths = []

  pdf_path = Path(path)
  # PDF -> Image に変換（150dpi）
  pages = convert_from_path(str(pdf_path), 150)

  # 画像ファイルを１ページずつ保存
  image_dir = Path('./jpg')
  for i, page in enumerate(pages):
    file_name = pdf_path.stem + '_{:02d}'.format(i + 1) + '.jpeg'
    image_path = image_dir / file_name
    # JPEGで保存
    page.save(str(image_path), 'JPEG')
    image_paths.append(image_path)

  return image_paths

# 複数チラシをJPGに変換
def pdfs_to_jpeg(pdf_list):
  jpg_list = []
  for pdf_info in pdf_list:
    jpg_info = pdf_info
    # 表
    omote_image_paths = pdf_to_jpeg(pdf_info['pdf_path'])
    jpg_info['image_paths'] = omote_image_paths

    jpg_list.append(jpg_info)

  return jpg_list

# OCR
def detect_text(image_paths):
  from google.cloud import vision
  import io
  client = vision.ImageAnnotatorClient()

  all_text = ''

  for image_path in image_paths:
    with io.open(image_path, 'rb') as image_file:
      content = image_file.read()

    image = vision.Image(content=content)
    
    # pylint: disable=no-member
    response = client.text_detection(image=image)
    texts = response.text_annotations

    for text in texts:
      all_text += str(text.description)

    if response.error.message:
      raise Exception(
        '{}\nFor more info on error messages, check: '
        'https://cloud.google.com/apis/design/errors'.format(
          response.error.message))

  return all_text

# キーワード検索
def search_words(all_text):
  hitwords = []
  for keyword in settings["keywords"]:
    if keyword in all_text:
      hitwords.append(keyword)

  return hitwords

# キーワードに引っかかったチラシ取得
def get_target_flyers(jpg_list):
  result = []
  for jpg_info in jpg_list:
    all_text = detect_text(jpg_info['image_paths'])
    hitwords = search_words(all_text)

    if len(hitwords) != 0:
      hit = jpg_info
      hit['hitwords'] = hitwords
      result.append(hit)

  return result

# Slack通知
def slack_notice(results):
  import slackweb
  slack = slackweb.Slack(url=settings['slack_webhook_url'])
  for result in results:
    text = f'{result["date"]} チラシ掲載商品：{",".join(result["hitwords"])}\n<{result["url"]}|チラシを見る>'
    slack.notify(text=text)

### FlyerOCR ###
import shutil
import os
os.makedirs('pdf/', exist_ok=True)
os.makedirs('jpg/', exist_ok=True)

url_list = get_urls()
new_url_list = get_new_urls(url_list)
pdf_list = dl_pdfs(new_url_list)
jpg_list = pdfs_to_jpeg(pdf_list)
results = get_target_flyers(jpg_list)
slack_notice(results)

shutil.rmtree('pdf/')
shutil.rmtree('jpg/')

のコードを書き換える

text_detection
は
画像内のテキスト要素を検出するのに適しており、一般的なOCRタスクに使用される

これを
document_text_detection
を使い
文書のスキャンや複雑なレイアウトを持つ画像に対して適しており、より詳細なテキスト情報を取得できるようにする

# OCR
def detect_text(image_paths):
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    all_text = ''

    for image_path in image_paths:
        with io.open(image_path, 'rb') as image_file:
            content = image_file.read()

        image = vision.Image(content=content)

        # document_text_detectionを使用
        response = client.document_text_detection(image=image)
        # FullTextAnnotationを使用して文書全体のテキストを取得
        full_text_annotation = response.full_text_annotation

        # テキストの抽出
        all_text += full_text_annotation.text

        if response.error.message:
            raise Exception(
                '{}\nFor more info on error messages, check: '
                'https://cloud.google.com/apis/design/errors'.format(
                    response.error.message))

    return all_text

とりあえず実行できるか試すので

# 例として実行
if __name__ == "__main__":
    image_paths = ["images/combined_image_20240802.jpg"]
    extracted_text = detect_text(image_paths)
    print(extracted_text)

この結果はそのままだとターミナル表示なので
テキストファイルに保存する

全文の中で検出成功しているのは
麻婆豆腐
キッチンタオル

全文は

8/
2
金曜日
本日限定!
とろける
バラエティパック
1000
T
スライスカード
サイズ
創品
創業祭特に
$10
ポイント
焼そば
うどん
$10
ポイント
129 168 159
プレーンヨーグル
ビヒダス
400
イチビキ
EE97
128
上級
ヒラス
-10
139円
創業祭特価
CRUNKY
Ghana
ON 161
198
213
30
128 98 9869
136
105円
645
8/2はおやつの日
BLACKENT
ADREN
おやつフェア カメ
強力小麦を
POTAH
198
64
191
78
B
カメリヤス
強力小麦粉」
298
321
ロリエ
きれいガード
ソフィ
$20
198
PARA 2139
税込74円]
かに
麻婆豆腐
168
8x4 パウダースプレー
88
181
ジョイコン
2
100-498
100 100
260 root
1,180
OPIE
ボディフィット
ガード
8
50
270
378
171
CARE
UF.O.
3
UFO
パワフル吸収
キッチンタオル
くらしりえね
ミネピア
「キッチンタオル
$149
10% 163
18 U
3759
8
300
$20
248
10272
$100
8398
ロール
ダブルメロン
38 萬
14952371
698
798
580
530
2.836
58
458
10 2.066
オフェルミン
2080
=.098
767
7=1080]
10 305P
TONA 415

となっているが
画像と重なっている文字は読み取りが苦手みたい
単純に解像度の問題かもしれないが

とりあえず
麻婆豆腐
キッチンタオル
はできたので
これをjsonファイルに書き込んでリストと一致するか実験する

settings.jsonの中身を

{
  "keywords": [  
    "麻婆豆腐",
    "キッチンタオル",
    "keyword3"
  ]
}

にする

次にキーワードと一致したもののみ変数に格納する
これをlineで送るようにする

とりあえずコードを変更

import json
settings_json = open('settings.json', 'r', encoding='utf_8')
settings = json.load(settings_json)

# OCR
def detect_text(image_paths):
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    all_text = ''

    for image_path in image_paths:
        with io.open(image_path, 'rb') as image_file:
            content = image_file.read()

        image = vision.Image(content=content)

        # document_text_detectionを使用
        response = client.document_text_detection(image=image)
        # FullTextAnnotationを使用して文書全体のテキストを取得
        full_text_annotation = response.full_text_annotation

        # テキストの抽出
        all_text += full_text_annotation.text

        if response.error.message:
            raise Exception(
                '{}\nFor more info on error messages, check: '
                'https://cloud.google.com/apis/design/errors'.format(
                    response.error.message))

    return all_text

# キーワード検索
def search_words(all_text):
  hitwords = []
  for keyword in settings["keywords"]:
    if keyword in all_text:
      hitwords.append(keyword)

  return hitwords

# キーワードに引っかかったチラシ取得
def get_target_flyers(jpg_list):
  result = []
  for jpg_info in jpg_list:
    all_text = detect_text(jpg_info['image_paths'])
    hitwords = search_words(all_text)

    if len(hitwords) != 0:
      hit = jpg_info
      hit['hitwords'] = hitwords
      result.append(hit)

  return result


# 例として実行
if __name__ == "__main__":
    image_paths = ["images/combined_image_20240802.jpg"]
    extracted_text = detect_text(image_paths)
    print(extracted_text)

を

import json
from google.cloud import vision
import io

# 設定ファイルの読み込み
settings_json = open('settings.json', 'r', encoding='utf_8')
settings = json.load(settings_json)

# OCRで画像からテキストを抽出
def detect_text(image_paths):
    client = vision.ImageAnnotatorClient()

    all_text = ''

    for image_path in image_paths:
        with io.open(image_path, 'rb') as image_file:
            content = image_file.read()

        image = vision.Image(content=content)

        # document_text_detectionを使用して文書全体のテキストを取得
        response = client.document_text_detection(image=image)
        full_text_annotation = response.full_text_annotation

        # テキストの抽出
        all_text += full_text_annotation.text

        if response.error.message:
            raise Exception(
                '{}\nFor more info on error messages, check: '
                'https://cloud.google.com/apis/design/errors'.format(
                    response.error.message))

    return all_text

# キーワード検索
def search_words(all_text):
    hitwords = []
    for keyword in settings["keywords"]:
        if keyword in all_text:
            hitwords.append(keyword)

    return hitwords

# 例として実行
if __name__ == "__main__":
    image_paths = ["images/combined_image_20240802.jpg"]
    extracted_text = detect_text(image_paths)
    hitwords = search_words(extracted_text)
    
    # ヒットしたキーワードのみを表示
    if hitwords:
        print("マッチしたキーワード:", ", ".join(hitwords))
    else:
        print("マッチしたキーワードはありませんでした。")

に変えてみる

これで実行すると
マッチしたキーワード: 麻婆豆腐, キッチンタオル
となる

あとはキーワードにマッチした画像も一緒にLINEで送信したいので
ファイルパスを取得するようにする

そもそもの流れを復習すると
Gmailで最新のチラシのリンクを開く

日替のチラシがあるなら画像をダウンロードし統合して１つのファイルにする
clik_allget_image.py

OCRしてリストに一致しているものを取り出す
LINEで送信
line_notify.py

となっている

ただshufoo限定で店舗ごとにユニークアドレスとなっているのなら
Gmailから開く処理は不要となる

Cloud vision APIで文字列として商品情報が取得できるか試す

vim purchase_list.py

中身を

from google.cloud import vision
from PIL import Image
import io
import os

def resize_image_if_needed(image_path, max_size_mb=40):
    """Resize the image to half its original dimensions if it exceeds max_size_mb."""
    with open(image_path, "rb") as fb:
        image = Image.open(fb)
        image_io = io.BytesIO()
        image.save(image_io, format=image.format)
        image_size_mb = image_io.tell() / (1024 * 1024)
        
        if image_size_mb > max_size_mb:
            new_size = (image.width // 2, image.height // 2)
            resized_image = image.resize(new_size, Image.ANTIALIAS)
            
            resized_io = io.BytesIO()
            resized_image.save(resized_io, format=image.format)
            return resized_io.getvalue()
        
        return image_io.getvalue()

client = vision.ImageAnnotatorClient()

# image_path = "combined_image.jpg"
image_path = "image_0.jpg"

resized_image = resize_image_if_needed(image_path)

image = vision.Image(content=resized_image)

response = client.document_text_detection(image=image)
texts = response.text_annotations

if texts:
    print(texts[0].description)
else:
    print("No text detected.")

で実行したが

    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 Request payload size exceeds the limit: 41943040 bytes.

となるため容量を４０M以下になるようにする

from google.cloud import vision
from PIL import Image
import io

def compress_image(image_path, max_size_mb=40):
    """Compress an image to be under the specified size in megabytes."""
    with open(image_path, "rb") as fb:
        image = Image.open(fb)
        image_format = image.format
        image_io = io.BytesIO()
        
        # Try different quality settings to get under the size limit
        for quality in range(95, 10, -5):
            image_io.seek(0)
            image.save(image_io, format=image_format, quality=quality)
            size_mb = image_io.tell() / (1024 * 1024)
            if size_mb <= max_size_mb:
                break
        
        return image_io.getvalue()

client = vision.ImageAnnotatorClient()

image_path = "preprocessed_image.jpg"
compressed_image = compress_image(image_path)

image = vision.Image(content=compressed_image)

response = client.document_text_detection(image=image)
texts = response.text_annotations

if texts:
    print(texts[0].description)
else:
    print("No text detected.")

よくみたらファイル名が違っていた

杏林堂のチラシは
combined_image.jpg
だったので

from google.cloud import vision
from PIL import Image
import io
import os

def resize_image_if_needed(image_path, max_size_mb=40):
    """Resize the image to half its original dimensions if it exceeds max_size_mb."""
    with open(image_path, "rb") as fb:
        image = Image.open(fb)
        image_io = io.BytesIO()
        image.save(image_io, format=image.format)
        image_size_mb = image_io.tell() / (1024 * 1024)
        
        if image_size_mb > max_size_mb:
            new_size = (image.width // 2, image.height // 2)
            resized_image = image.resize(new_size, Image.ANTIALIAS)
            
            resized_io = io.BytesIO()
            resized_image.save(resized_io, format=image.format)
            return resized_io.getvalue()
        
        return image_io.getvalue()

client = vision.ImageAnnotatorClient()

image_path = "combined_image.jpg"
resized_image = resize_image_if_needed(image_path)

image = vision.Image(content=resized_image)

response = client.document_text_detection(image=image)
texts = response.text_annotations

if texts:
    print(texts[0].description)
else:
    print("No text detected.")

で

python purchase_list.py > purchase_list.txt

で結果をテキストファイルに保存した

次に
https://github.com/yakipudding/flyer-ocr
のソース

スーパーのチラシをOCRしてSlackに通知したら便利だった
を参考に
OCRをかけて自分が狙っている商品が出たら通知が来るようにする

使う内容は

# OCR
def detect_text(image_paths):
  from google.cloud import vision
  import io
  client = vision.ImageAnnotatorClient()

  all_text = ''

  for image_path in image_paths:
    with io.open(image_path, 'rb') as image_file:
      content = image_file.read()

    image = vision.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations

    for text in texts:
      all_text += str(text.description)

    if response.error.message:
      raise Exception(
        '{}\nFor more info on error messages, check: '
        'https://cloud.google.com/apis/design/errors'.format(
          response.error.message))

  return all_text

で
OCR結果を変数に格納

そしてこの結果から指定したキーワードが存在するかチェック

# キーワード検索
def search_words(all_text):
  hitwords = []
  # ★任意のキーワード（商品名）を設定
  keywords = ['ヨーグルト', '若鶏もも肉']
  for keyword in keywords:
    if keyword in all_text:
      hitwords.append(keyword)

  return hitwords

これでリストに商品名を入れている

Slackに通知を送る
の部分をLine notifyで送る処理に変えればOK

def slack_notice(results):
  import slackweb
  slack = slackweb.Slack(url='★WebhookのURL')
  for result in results:
    text = f'{result["date"]} チラシ掲載商品：{",".join(result["hitwords"])}\n<{result["url"]}|チラシを見る>'
    slack.notify(text=text)

とりあえずコードを書いていく
商品リストを
settings.json
に記述する

これはチラシの文字列とマッチさせるため
とりあえずは杏林堂のチラシをOCRかけて一致するようなキーワードにする

まずチラシ統合画像がかぶるので
日付をファイル名につけるように変更

import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            output_path = f'/mnt/data/combined_image_{current_date}.jpg'
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

しかし

Clicked on link with title: 8/2　日替
Found image: https://ipqcache2.shufoo.net/c/2024/07/30/15607036918828/index/img/0_100_0.jpg
Found image: https://ipqcache2.shufoo.net/c/2024/07/30/15607036918828/index/img/0_100_1.jpg
Found image: https://ipqcache2.shufoo.net/c/2024/07/30/15607036918828/index/img/0_100_2.jpg
Found image: https://ipqcache2.shufoo.net/c/2024/07/30/15607036918828/index/img/0_100_3.jpg
Downloaded image_0.jpg
Downloaded image_1.jpg
Downloaded image_2.jpg
Downloaded image_3.jpg
Traceback (most recent call last):
  File "/Users/snowpool/aw10s/store_adversting_list/clik_allget_image.py", line 107, in <module>
    main()
  File "/Users/snowpool/aw10s/store_adversting_list/clik_allget_image.py", line 104, in main
    merge_images(images, output_path)
  File "/Users/snowpool/aw10s/store_adversting_list/clik_allget_image.py", line 83, in merge_images
    combined_image.save(output_path)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/PIL/Image.py", line 2428, in save
    fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/combined_image_20240802.jpg'

となるので

import os

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            # Update the path to a valid directory on your machine
            output_dir = os.path.expanduser('~/images')
            os.makedirs(output_dir, exist_ok=True)  # Ensure the directory exists
            output_path = os.path.join(output_dir, f'combined_image_{current_date}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

へコード変更

さっきの原因は
ローカルシステム上に/mnt/data/というディレクトリが存在しないため
解決策：
1. ディレクトリを作成：ローカルマシンにディレクトリを作成します（例: imagesというディレクトリをユーザーディレクトリ内に作成）。
2. output_pathを更新：スクリプトの中で画像を保存するパスを、この新しいディレクトリに設定します。

これでエラーはなくなるが
ホームディレクトリに画像が保存される
これだと分かりにくいので
カレントディレクトリにimagesフォルダを作成し
１つのファイルにまとめた統合画像ファイルを保存する

import os
import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            # 現在の日付を取得してフォーマット
            current_date = datetime.now().strftime('%Y%m%d')
            # カレントディレクトリにimagesフォルダを作成
            output_dir = 'images'
            os.makedirs(output_dir, exist_ok=True)  # ディレクトリが存在しない場合は作成
            output_path = os.path.join(output_dir, f'combined_image_{current_date}.jpg')
            merge_images(images, output_path)

if __name__ == '__main__':
    main()

これでimagesの中に被らずに保存できる

次に文字列との一致だが
杏林堂のチラシで欲しいものがなかった

とりあえず適当に検出できた文字列をキーワードにして
一致したら
チラシの画像ファイルと文字列をlineで送るようにする

Gmailで指定のメールの中から件名を指定し取得

が指定メール
これのうち
【Shufoo!】お気に入り店舗新着チラシお知らせメール
の件名のもののみ取得するようにする

この中の本文の中のURLへアクセスしチラシを取得する

が
https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc
の
杏林堂西田町

https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc
ユーコープ/袋井田町店

https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc
ぴあご袋井

とりあえず件名が
【Shufoo!】お気に入り店舗新着チラシお知らせメール
のものだけを取得する

vim get_mail_subject.py

中身は

import os.path
import base64
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)

    # メールを検索
    query = 'subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No messages found.')
    else:
        print(f'Found {len(messages)} messages:')
        for msg in messages:
            msg_id = msg['id']
            msg = service.users().messages().get(userId='me', id=msg_id).execute()
            msg_snippet = msg['snippet']
            print(f'Message snippet: {msg_snippet}')

if __name__ == '__main__':
    main()

そして認証ファイルをコピーする

cp ../mail_auto/credentials.json .
 cp ../mail_auto/token.json .

実行すると

Found 24 messages:
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、1件の新着チラシが掲載開始されました。 ・ピアゴ袋井店https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc ※Shufoo!PCサイトまたは、シュフーチラシアプリ（スマートフォン・タブレット端末用） からログインしてお店の
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、4件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・ユーコープ/袋井田町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、4件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、1件の新着チラシが掲載開始されました。 ・ピアゴ袋井店https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc ※Shufoo!PCサイトまたは、シュフーチラシアプリ（スマートフォン・タブレット端末用） からログインしてお店の
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、4件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・ユーコープ/袋井田町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、1件の新着チラシが掲載開始されました。 ・ピアゴ袋井店https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc ※Shufoo!PCサイトまたは、シュフーチラシアプリ（スマートフォン・タブレット端末用） からログインしてお店の
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、1件の新着チラシが掲載開始されました。 ・ピアゴ袋井店https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc ※Shufoo!PCサイトまたは、シュフーチラシアプリ（スマートフォン・タブレット端末用） からログインしてお店の
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、4件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・ユーコープ/袋井田町店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、3件の新着チラシが掲載開始されました。 ・杏林堂薬局/袋井旭町店https://www.shufoo.net/pntweb/shopDetail/860335/?cid=nmail_pc ・杏林堂薬局/袋井西田店https://www.shufoo.net/pntweb/
Message snippet: こちらのメールは「Shufoo!」でお気に入り登録した店舗の新着チラシ掲載開始をお知らせするメールです。 以下、1件の新着チラシが掲載開始されました。 ・ピアゴ袋井店https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc ※Shufoo!PCサイトまたは、シュフーチラシアプリ（スマートフォン・タブレット端末用） からログインしてお店の

というように
Ctrl + c で止めるまで続く

次に
取得したメールの本文の中に
https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc
もしくは
https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc
または
https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc
を含んでいる場合 safari の selenium でリンクページを開くようにコード変更

vim mail_url,py

で

import os.path
import base64
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

# 検索するURLリスト
URL_LIST = [
    'https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc',
    'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc',
    'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
]

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)

    # メールを検索
    query = 'subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No messages found.')
    else:
        print(f'Found {len(messages)} messages:')
        for msg in messages:
            msg_id = msg['id']
            msg = service.users().messages().get(userId='me', id=msg_id).execute()
            msg_payload = msg.get('payload', {})
            msg_parts = msg_payload.get('parts', [])
            msg_body = ''

            for part in msg_parts:
                if part['mimeType'] == 'text/plain':
                    msg_body = base64.urlsafe_b64decode(part['body']['data']).decode('utf-8')
                    break

            # URLリスト内のURLを含むか確認
            for url in URL_LIST:
                if url in msg_body:
                    print(f'Opening URL: {url}')
                    open_link_in_safari(url)
                    break

if __name__ == '__main__':
    main()

で実行

しかし取得できないので
メールの最新の１件を取得しその中に指定のURLがあれば seleniumで開くようにする
が何も表示されないのでログを出力するようにコード変更

import os.path
import base64
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from selenium import webdriver
from selenium.webdriver.safari.service import Service as SafariService

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

# 検索するURLリスト
URL_LIST = [
    'https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc',
    'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc',
    'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
]

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)

def get_email_body(parts):
    """メールパーツを再帰的に探索して本文を取得"""
    for part in parts:
        if part['mimeType'] == 'text/plain' or part['mimeType'] == 'text/html':
            try:
                body = base64.urlsafe_b64decode(part['body']['data']).decode('utf-8')
                return body
            except KeyError:
                continue
            except Exception as e:
                print(f'Error decoding part: {e}')
                continue
        elif 'parts' in part:
            body = get_email_body(part['parts'])
            if body:
                return body
    return None

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
        print("Loaded credentials from token file.")
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
            print("Credentials refreshed.")
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
            print("New credentials obtained.")
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())
            print("Credentials saved to token file.")

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)
    print("Gmail API client built.")

    # メールを検索
    query = 'subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query, maxResults=1).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No messages found.')
    else:
        print(f'Found {len(messages)} message(s).')
        msg_id = messages[0]['id']
        msg = service.users().messages().get(userId='me', id=msg_id).execute()
        print(f'Fetched message with ID: {msg_id}')
        msg_payload = msg.get('payload', {})
        msg_body = get_email_body(msg_payload.get('parts', []))

        if not msg_body:
            print(f'No body found for message ID: {msg_id}')
            return

        print(f'Message ID: {msg_id}')
        print(f'Message Body: {msg_body[:200]}...')  # メール本文の一部を表示

        # URLリスト内のURLを含むか確認
        for url in URL_LIST:
            if url in msg_body:
                print(f'Opening URL: {url}')
                open_link_in_safari(url)
                break

if __name__ == '__main__':
    main()

結果

Gmail API client built.
Found 1 message(s).
Fetched message with ID: 
No body found for message ID:

となった

とりあえずラベルを shopにして
Shufooメールを自動振り分けする

とりあえずこのメールで最新のものを取得するようにするが
その前にブラウザで開くようにした

メールの本文が取得できない問題を解決するために、メールのペイロード構造をより詳細に確認し、可能なすべてのパートを再帰的にチェックして本文を取得する方法を強化します。また、HTML形式の本文も考慮しコード変更

import os.path
import base64
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from selenium import webdriver
from selenium.webdriver.safari.service import Service as SafariService

# 認証情報ファイルのパス
CREDENTIALS_FILE = 'path/to/credentials.json'
TOKEN_FILE = 'token.json'

# Gmail APIのスコープ
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

# 検索するURLリスト
URL_LIST = [
    'https://www.shufoo.net/pntweb/shopDetail/15782/?cid=nmail_pc',
    'https://www.shufoo.net/pntweb/shopDetail/197728/?cid=nmail_pc',
    'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
]

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)

def get_email_body(parts):
    """メールパーツを再帰的に探索して本文を取得"""
    for part in parts:
        if part['mimeType'] == 'text/plain' or part['mimeType'] == 'text/html':
            try:
                body_data = part['body'].get('data')
                if body_data:
                    body = base64.urlsafe_b64decode(body_data).decode('utf-8')
                    return body
            except Exception as e:
                print(f'Error decoding part: {e}')
        elif 'parts' in part:
            body = get_email_body(part['parts'])
            if body:
                return body
    return None

def main():
    # トークンファイルが存在する場合は読み込む
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
        print("Loaded credentials from token file.")
    
    # 認証が有効でない場合は新しく認証を行う
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
            print("Credentials refreshed.")
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=0)
            print("New credentials obtained.")
        # トークンを保存する
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())
            print("Credentials saved to token file.")

    # Gmail APIクライアントを構築
    service = build('gmail', 'v1', credentials=creds)
    print("Gmail API client built.")

    # メールを検索
    query = 'subject:"【Shufoo!】お気に入り店舗新着チラシお知らせメール"'
    results = service.users().messages().list(userId='me', q=query, maxResults=1).execute()
    messages = results.get('messages', [])

    if not messages:
        print('No messages found.')
    else:
        print(f'Found {len(messages)} message(s).')
        msg_id = messages[0]['id']
        msg = service.users().messages().get(userId='me', id=msg_id, format='full').execute()
        print(f'Fetched message with ID: {msg_id}')
        msg_payload = msg.get('payload', {})
        msg_body = get_email_body([msg_payload])

        if not msg_body:
            print(f'No body found for message ID: {msg_id}')
            return

        print(f'Message ID: {msg_id}')
        print(f'Message Body: {msg_body[:200]}...')  # メール本文の一部を表示

        # URLリスト内のURLを含むか確認
        for url in URL_LIST:
            if url in msg_body:
                print(f'Opening URL: {url}')
                open_link_in_safari(url)
                break

if __name__ == '__main__':
    main()

これでURLを開くことができた

改良点
1. メールペイロードの完全な再帰的探索: メールペイロード全体を再帰的に探索し、本文データを見つけ出す。
2. デコードエラーハンドリング: デコードエラーが発生した場合にエラーメッセージを出力して継続する。
3. デバッグ情報の追加: 追加のデバッグ出力により、メールの取得プロセスの各ステップが明確になる。

次に
7/30　日替
というような日付のリンクをクリックするようにする

import datetime
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver):
    # 今日の日付を取得
    today_str = datetime.datetime.now().strftime("%m/%d")
    # 日付フォーマットの調整
    today_str = today_str.lstrip("0").replace("/", "月") + "日替"

    try:
        # 日付要素を探してクリック
        element = driver.find_element(By.XPATH, f"//*[contains(text(), '{today_str}')]")
        element.click()
        print(f'Clicked on element with text: {today_str}')
        time.sleep(3)  # クリックした後に3秒間待機
    except Exception as e:
        print(f'Error clicking on element: {e}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    click_date_element(driver)
    driver.quit()

if __name__ == '__main__':
    main()

としたが

Error clicking on element: Message: ; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception

となってしまう

このエラーは、指定された要素がページ上に見つからなかったことを示しています。この問題を解決するためには、ページが完全に読み込まれるまで待機する必要があります。また、指定されたXPathが正しいことを確認する必要があります
とのこと

追加の変更点
1. WebDriverWaitを使用して要素が見つかるまで待機: pythonコードをコピーする  WebDriverWait(driver, 10).until(
2. EC.presence_of_element_located((By.XPATH, f”//*[contains(text(), ‘{today_str}’)]”))
3. )
4.   
5. デバッグ用に要素のリストを表示: 要素が見つからない場合に、ページ上の要素をリストして表示します。 pythonコードをコピーする  elements = driver.find_elements(By.XPATH, “//*”)
6. for elem in elements:
7. print(elem.text)  
ということで

import datetime
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver):
    # 今日の日付を取得
    today_str = datetime.datetime.now().strftime("%m/%d")
    # 日付フォーマットの調整
    today_str = today_str.lstrip("0").replace("/", "月") + "日替"

    try:
        # 日付要素を探してクリック
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, f"//*[contains(text(), '{today_str}')]"))
        )
        element = driver.find_element(By.XPATH, f"//*[contains(text(), '{today_str}')]")
        element.click()
        print(f'Clicked on element with text: {today_str}')
        time.sleep(3)  # クリックした後に3秒間待機
    except Exception as e:
        print(f'Error clicking on element: {e}')
        # デバッグ用にページ上の要素をリストする
        elements = driver.find_elements(By.XPATH, "//*")
        for elem in elements:
            print(elem.text)

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    click_date_element(driver)
    driver.quit()

if __name__ == '__main__':
    main()

これで取得した結果を

python click_url.py > shop.txt

でみたけど量が多すぎるので
Chatgptでエラーになる

一度 developer tools でサイトの構成を見る

xpathだと

/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul

の中にリンクがある

/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul/li[2]/a

がそのリンク
しかし表示されない

import datetime
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, xpath):
    try:
        # 日付要素を探してクリック
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, xpath))
        )
        container = driver.find_element(By.XPATH, xpath)
        # コンテナ内のすべてのリンクを取得
        links = container.find_elements(By.TAG_NAME, 'a')
        today_str = datetime.datetime.now().strftime("%m/%d").lstrip("0").replace("/", "月") + "日替"

        for link in links:
            if today_str in link.text:
                link.click()
                print(f'Clicked on link with text: {link.text}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print(f'No link found with text: {today_str}')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # ここで指定されたXPathを使用
    xpath = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, xpath)
    driver.quit()

if __name__ == '__main__':
    main()

直接りんくのHTMLを調べる

<a href="//www.shufoo.net/pntweb/shopDetail/860323/86383836863914/" class="sc_custom_link" rel="sd_shop_chirashi_list" title="7/30　日替">
                          <span class="shop_chirashi_list_thumb"><img src="//ipqcache2.shufoo.net/c/2024/07/26/c/3927665654283/index/img/thumb/thumb_m.jpg" alt=""></span>
                          <span class="shop_chirashi_list_title">7/30　日替</span>
                        </a>

日付を指定せず
日替
と書かれたリンクをクリックするようにした

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # ここで指定されたXPathを使用
    base_xpath = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath)
    driver.quit()

if __name__ == '__main__':
    main()

これでクリックはできたので
次に画像を取得

Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_turn_over.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_turn_over.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_basic.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_basic.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_basic.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_basic.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_basic.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/button_cover_basic.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://ipqcache2.shufoo.net/c/2024/07/24/29355636822862/index/img/0_100_0.jpg
Found image: https://ipqcache2.shufoo.net/c/2024/07/24/29355636822862/index/img/0_100_1.jpg
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png
Found image: https://www.shufoo.net/site/chirashi_viewer_js/images/transparent.png

となるので

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # ここで指定されたXPathを使用
    base_xpath = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    get_images_from_container(driver, base_xpath)
    driver.quit()

if __name__ == '__main__':
    main()

とする

取得された画像の中には多くの透明画像やボタン画像が含まれているようです。特定の条件に基づいて必要な画像をフィルタリングする必要があります。以下のように、特定の条件を追加して必要な画像のみを取得するようにコードを修正

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # ここで指定されたXPathを使用
    base_xpath = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    get_images_from_container(driver, base_xpath)
    driver.quit()

if __name__ == '__main__':
    main()

結果として

Found image: https://ipqcache2.shufoo.net/c/2024/07/24/29355636822862/index/img/0_100_0.jpg Found image: https://ipqcache2.shufoo.net/c/2024/07/24/29355636822862/index/img/0_100_1.jpg

となった

パスの取得はできたので
指定されたURLを開き、指定されたXPathに基づいて画像を取得し、それらの画像をリストとして返します。さらに、必要に応じて画像をダウンロードする

import time
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            with open(f'image_{i}.jpg', 'wb') as file:
                file.write(response.content)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # ここで指定されたXPathを使用
    base_xpath = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath)
    driver.quit()
    
    # 画像をダウンロード
    if image_urls:
        download_images(image_urls)

if __name__ == '__main__':
    main()

コードの説明
1. リンクを開く: pythonコードをコピーする  url = ‘https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc’
2. driver = open_link_in_safari(url)
3.   
4. XPathに基づいてコンテナを探し、画像要素を取得する: pythonコードをコピーする  base_xpath = ‘/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]’
5. image_urls = get_images_from_container(driver, base_xpath)
6.   
7. 画像URLを取得してリストに追加: pythonコードをコピーする  for img in images:
8. src = img.get_attribute(‘src’)
9. # 特定の条件に基づいて画像をフィルタリング
10. if ‘index/img’ in src:
11. image_urls.append(src)
12. print(f’Found image: {src}’) 
13. 画像をダウンロード: pythonコードをコピーする  def download_images(image_urls):
14. for i, url in enumerate(image_urls):
15. response = requests.get(url)
16. if response.status_code == 200:
17. with open(f’image_{i}.jpg’, ‘wb’) as file:
18. file.write(response.content)
19. print(f’Downloaded image_{i}.jpg’)
20. else:
21. print(f’Failed to download {url}’) 
このスクリプトは、指定されたXPath内の画像要素を取得し、それらの画像URLをリストとして返します。その後、リスト内の画像URLを使用して画像をダウンロードします。ダウンロードされた画像はimage_0.jpg, image_1.jpgのように保存されます。

画像は保存できたので

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # ここで指定されたXPathを使用
    base_xpath = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath)
    driver.quit()

if __name__ == '__main__':
    main()

の中にこの処理を追加して画像をダウンロードできるようにする

import time
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            with open(f'image_{i}.jpg', 'wb') as file:
                file.write(response.content)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        download_images(image_urls)

if __name__ == '__main__':
    main()

コードの説明
1. リンクを開く: pythonコードをコピーする  driver = open_link_in_safari(url)
2.   
3. 特定のリンクをクリックする: pythonコードをコピーする  base_xpath_click = ‘/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul’
4. click_date_element(driver, base_xpath_click)
5.   
6. 画像を取得してダウンロードする: pythonコードをコピーする  base_xpath_images = ‘/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]’
7. image_urls = get_images_from_container(driver, base_xpath_images)
8. driver.quit()
9.
10. if image_urls:
11. download_images(image_urls)
12.   
このスクリプトは、指定されたURLを開き、「日替」という文字列が含まれるリンクをクリックし、その後、画像を取得してダウンロードします。これにより、リンククリックと画像ダウンロードの処理を一貫して行うことができます。

しかし画像が３分割されて見えにくいので統合する

pip install selenium pillow requests

で必要なライブラリをインストール

import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            merge_images(images, 'combined_image.jpg')

if __name__ == '__main__':
    main()

1. リンクを開く: pythonコードをコピーする  driver = open_link_in_safari(url)
2.   
3. 特定のリンクをクリックする: pythonコードをコピーする  base_xpath_click = ‘/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul’
4. click_date_element(driver, base_xpath_click)
5.   
6. 画像を取得してダウンロードする: pythonコードをコピーする  base_xpath_images = ‘/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]’
7. image_urls = get_images_from_container(driver, base_xpath_images)
8.   
9. 画像をダウンロード: pythonコードをコピーする  images = download_images(image_urls)
10.   
11. 画像を結合して保存: pythonコードをコピーする  if images:
12. merge_images(images, ‘combined_image.jpg’)
13.   
このスクリプトは、指定されたURLを開き、「日替」という文字列が含まれるリンクをクリックし、その後、画像を取得してダウンロードし、それらを1つの画像ファイルに統合します。統合された画像はcombined_image.jpgとして保存

しかし画像がバラバラ

import time
import requests
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.safari.service import Service as SafariService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_link_in_safari(url):
    # Safariドライバーを使用してブラウザを起動
    service = SafariService()
    driver = webdriver.Safari(service=service)
    driver.get(url)
    time.sleep(3)  # リンクを開いた後に3秒間待機
    return driver

def click_date_element(driver, base_xpath):
    try:
        # コンテナ内の日付要素を探してクリック
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        links = container.find_elements(By.XPATH, ".//a[contains(@title, '日替')]")

        for link in links:
            if '日替' in link.get_attribute('title'):
                link.click()
                print(f'Clicked on link with title: {link.get_attribute("title")}')
                time.sleep(3)  # クリックした後に3秒間待機
                return

        print('No link found with title containing: 日替')
    except Exception as e:
        print(f'Error clicking on element: {e}')

def get_images_from_container(driver, base_xpath):
    image_urls = []
    try:
        # コンテナ内の画像要素を探す
        container = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, base_xpath))
        )
        images = container.find_elements(By.TAG_NAME, 'img')
        
        for img in images:
            src = img.get_attribute('src')
            # 特定の条件に基づいて画像をフィルタリング
            if 'index/img' in src:
                image_urls.append(src)
                print(f'Found image: {src}')
    except Exception as e:
        print(f'Error finding images: {e}')
    return image_urls

def download_images(image_urls):
    images = []
    for i, url in enumerate(image_urls):
        response = requests.get(url)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            images.append(image)
            print(f'Downloaded image_{i}.jpg')
        else:
            print(f'Failed to download {url}')
    return images

def merge_images(images, output_path):
    widths, heights = zip(*(img.size for img in images))

    total_height = sum(heights)
    max_width = max(widths)

    combined_image = Image.new('RGB', (max_width, total_height))

    y_offset = 0
    for img in images:
        combined_image.paste(img, (0, y_offset))
        y_offset += img.height

    combined_image.save(output_path)
    print(f'Saved combined image as {output_path}')

def main():
    url = 'https://www.shufoo.net/pntweb/shopDetail/860323/?cid=nmail_pc'
    driver = open_link_in_safari(url)
    # 特定のリンクをクリックする
    base_xpath_click = '/html/body/div[1]/div[3]/div[1]/div/div[4]/div/div/div/div/div/div/ul'
    click_date_element(driver, base_xpath_click)
    
    # 画像を取得してダウンロードする
    base_xpath_images = '/html/body/div[1]/div[3]/div[1]/div/div[2]/div[2]'
    image_urls = get_images_from_container(driver, base_xpath_images)
    driver.quit()
    
    if image_urls:
        images = download_images(image_urls)
        if images:
            merge_images(images, '/mnt/data/combined_image_corrected.jpg')

if __name__ == '__main__':
    main()

で位置が合うか実験

結局位置は合わなかったが画像は一枚になった

直接杏林堂のHPで画像を取得しようとしたが
習慣のチラシになるためこの方法はやめた

とりあえずチラシの取得はできたので
次に
Cloud vision APIで文字列として商品情報が取得できるか試す

Google cloud vision API サンプル

https://nikkie-ftnext.hatenablog.com/entry/ocr-with-google-vision-api-python-first-step
によれば

Vision APIのOCRには2つあり
* TEXT_DETECTION
* DOCUMENT_TEXT_DETECTION
の２つがある

リファレンスは
https://cloud.google.com/vision/docs/ocr?hl=ja#optical_character_recognition_ocr
TEXT_DETECTION
任意の画像からテキストを検出、抽出します。
たとえば、写真に道路名や交通標識が写っていれば、
抽出された文字列全体、個々の単語、それらの境界ボックスが JSON レスポンスに含まれます。

DOCUMENT_TEXT_DETECTION
画像からテキストを抽出しますが、
高密度のテキストやドキュメントに応じてレスポンスが最適化され、
ページ、ブロック、段落、単語、改行の情報が JSON に含まれます

手書き入力の抽出とファイル（PDF / TIFF）からのテキスト抽出については、DOCUMENT_TEXT_DETECTION をご覧ください

とのこと

コードのサンプルについては
https://cloud.google.com/vision/product-search/docs/samples?hl=ja
にある

とりあえずまず動くか試したいので
2023-03-16【GCP/Python】Vision APIのOCR（光学文字認識）でテキスト抽出！
を参考に動かす

ファイル名を

vision_api_test.py

とする

cd aw10s
cd store_adversting_list 
vim vision_api_test.py

で
まずは
JSONを任意の場所に置いて読み込むらしいが
情報が古い

この時に認証関連のコマンドを使ってない
2024-01-03GoogleのVision APIをPythonから呼び出して、画像内のテキストを検出する

を参考に
とりあえず

from google.cloud import vision

client = vision.ImageAnnotatorClient()

with open("kanji.png", "rb") as fb:
    content = fb.read()

image = vision.Image(content=content)

response = client.document_text_detection(image=image)
texts = response.text_annotations
print(texts[0].description)

のファイルパスを変えて実行してみる

from google.cloud import vision

client = vision.ImageAnnotatorClient()

with open("test.jpg", "rb") as fb:
    content = fb.read()

image = vision.Image(content=content)

response = client.document_text_detection(image=image)
texts = response.text_annotations
print(texts[0].description)

スマホで撮影したチラシの画像でテスト

実行結果が大量になりそうなので

python vision_api_test.py >> result.txt

でテキストファイルに格納する

結果は

Aグループ版 オモテ
コミュニティー
15
魚担当
おすすめ
16
日
握りのセット!
うなぎの太巻、押寿司
では、うなぎ
―んびょうきゅうりそうなぎの
た
いて
今ぎをのせました。
→サイズ
1208195
うなぎ 28
うなぎ
うなぎり
醬
27.
水公園
coop
ユーコープ
いつもの商品を
膀胱! わくわく 日替りセール!!
お求めやすく!!」
7/100
7/110
7/100
14日
お一家族様
合わせて2点限り
14日
乾
麺10
店頭では10%引の価格を表示しています。
加工食品コーナーの商品に限ります。
※一部対象外の商品がございます。
魚
大
冷凍食品 20
●アイスクリームなど一部対象外商品があります。
詳しくは店頭でご確認ください。
例えばこちらの商品も20%引でこの
7/12 (金
7/13 土
毎週水曜日は
牛乳の日
毎週木曜日は
たまごの日
写真は全てイメージです。
毎週金曜日は
パンの日
コーブ牛乳
お一家族様5点限り
お一家族様 たまご
CO-OP コープ牛乳
1点限り
10個入
パスコ 超熟食パン
静岡県産など
金目鯛切身
切 1パック
680
味つき
734分
えだまめ
超
・6枚
8枚
熟
138円)
刺身用かつお
11バッグ ¥429円
宮格率から 398円 なども
真あじ
128
茶豆
税込
xg 138円
89
参考税込 149 円
天候などにより、水揚げのない場合はご容赦ください。
※写真はイメージです
写真は全てイメージです
新潟県などの国内産
ぶなしめじ 128円
(Wバック)
1袋
税込138円
13
土
7/
8%
群馬県などの国内産
わけあり
牛乳 KOO
成分散調整
「コープ牛乳」
¥198
参考税込
税込 213円
商品は店頭でご確認ください。
7/13 0
1000ml
178
参考税込
8% 192
子音 "14
子育て14日
ポイント
5 倍デー
co-op
岩手ひと
岩手ひとめぼれ
国館
洗米
ブルガリア ¥1,680円 1,780円
18% 各
Co-op
・塩味つきえ
・塩味つき茶
【各250g
15
参考税达 17
8%
19100-1
写真は
です
トマト (小箱)
桃(2個)
1パック/
山梨県などの国内産 ¥498円
$398
茶美豚
鹿児島県-
岩手県:
群馬県產
北海道產
100g当り とかち
100g当り
参考同认
8537
8%
番号税込 429 円
co-op
5ke
あらびきポーク
スペ
商品に限りがあるため、品切れの場合はご容赦ください。
●写真は全てイメージです
茶美酥
豚バラ
うすぎり
ポイントカードはお会計時にご提示ください
明治
特別栽培米
20 特別栽培無洗米
岩手ひとめぼれ 岩手ひとめぼれ
5kg
5kg
ポイントカードはお会計時にご提示ください プリマハム
総菜コーナー
11時販売開始
(本
彩り手巻き寿司
入り)
4本入
658円
710円
本 198円 本 本 378 円
参考税込
※税込) 213円 (交雑種) 税込
408円
販売開始
7種の天ぷら
盛り合わせ
1バック
398円
Dead 429
·伊右衛門
烏龍茶
各2L
配合 138 円
税込
8%
サントリー
税込 149 円
"15"16
ほほえみ
ポイント
15 モチー
限り
スモークの
香薰
こうくん
金賞受賞
香あらびき
キッコーマン ポーク体 238円
濃いだし ウインナー 参考税込 257円
めんつゆ
本つゆ
1L
大体228円
参考税込
税込 246円
※写真はイメージです
本つゆ
通常の5倍
90g×2
合わせ味噌
麺づくり
7/10 水 16 火 お野菜
いつも
※写真は全てイメージです。
おトク宣言!
群馬県などの
国内産
レタス
店内のこの
群馬県などの
国内産
きゃべつ
meiji
くちど
ヨーグルト
明治
R
ブルガリア
LB81
税込 1,814 円
税込 1,922 円
¥400g
LBSD プレーン
ヨーグルトの正本
全体 138円
見た目のキレイ
見た目のキレイ
LROT
参考税込
WWB 400M
税込 149 円
通常の
Aroma
ネスレ
Rich
25
Aroma
Rich
25
www
エクセラ
倍
・無糖
サイズ
コースの
甘さひかえめ お一家族様よりどり2点限り
【茶美豚
マルちゃん
NESCAFE
Excell
Freally
NESCAFE 各900ml
100g当り
ライオン
鹿児島県・
体各
¥78円
アロマリッチ
円 ジュリエット 合 378円 茶美豚
岩手県・群馬県産
198
参考税込
各 84 円
・キャサリン
豚ロース生姜焼・
参考税込
8% 各4円 各詰替950ml
税込 各 415円 豚丼用
参考税込。
8%
1 213 円
麺づくり
・鶏ガラ醤油
0円 ・合わせ味噌合 118
・鶏だし塩 [参考税込
0円 各1食
日香 127円
1個
108
参考税込 116円
1個
138
円 国内産
1個
お酒酒などの肴 158円
税込 149円 ブロッコリー 170円
8%
商品に限りがあるため、品切れの場合はご容赦ください。
●商品に併記している「参考税込」は、お支払い金額の目安として表示しています。 消費税は、レジで精算する際にお買い物小計に対してかかります。 ●酒類・日用品・雑貨などは消費税率10%対象です。 ●お酒は20歳未満の方への販売は致しておりません。
ざいません。
減にもつながります。
①Aグループ版 ウラ
COOP ユーコース
毎週
土曜日は子育て5倍デー 毎月5日 15日 15日 読み
シニア・
ほほえみ
ポイント
本 5 倍デー 7/100-160
組合員限定プレゼント!
対象商品を1点お買上げにつき
レジにて表示のポイントを
プレゼント致します。
広告の売り出し期間
ポイント
「プレゼント
※写真はイメージです
写真はイメージです。
広告実施店 よくお確かめの上、ご来店ください。
冷L中華
しょうゆだま
0000
冷中華
・ごまだれ
5
ポイント
プレゼント
CCO-OP 冷し中華
本番 218 円
・しょうゆだれ・ごまだれ 参考税込 235円
各110g×3食
「ラーメン
19
10%
「プレゼント」
COOD ざるラーメン 各¥ 298円
和風しょうゆつゆ
110g×3食
●税込 321 円
群馬県などの
国内産
たま
各
7/100 140
表示価格は
7/100 140
co-op
を味わう!
「とっておき
北海道産小麦使用
釜揚げうどん
400g
2番 160円)
● 172 円
10%
表示しています。
※写真はイメージです。
乾麺 10%
10%を表示しています。
コーナーのります。
品がございます。
本 158 円
きゅうり 1袋 170m
高知県などの
国内産
おくら 1ネット
149
●写真はイメージです
・コス
MENU
7/10 160
暴口
co-op & コスモ
直火燒
カレールーカレールー
**
カレールー
[カレール
中華
・中辛・辛口
•直火燒
よりどり
108278 (20
税込
300円
りんごカレー
各8皿分
かつお
メージです
うなぎ
神奈川食
ALCI
8594-2580
中原
230-463-36-30
並木あおば店
23342-758-2141
あさ9時よる9
ACC
末吉店 ハーモス深谷 片
045-581-0850 045-853-1141
洋光台店 神大寺店 竹
045-833-1537 045-431-3630
AC
セレク
04
ジョン 指定
鹿児島:大隅産
うなぎ蒲焼(長焼)
本 2,380円
白根店長後駅前店
045-954-1254
0466-43-4121
AC
南台店 茅ヶ崎高
1尾1バック
●者税込 2,570円
20-466-44-7750
0467-51-8777 46
ガッツリお肉で
スタミナチャージ!
[●写真は
イメージ
[100g当り
378円
税込 408円
税込298円
mmmm321
うまさのためにしない
強いこだわり
●平から収屋
当日出荷
さん兄
5
100g当り
全 498 円
ニュージーランド
ニュージーランドビーフ
牛リブロースステーキ 537円
●写真は
ニュージー
ニュージーランドビーフ
牛モモかたまり
ポイント
プレゼント
[焼そば
co-a
ACC
南林間店 ハーモス 厚木戸店
046-274-7202 046-257-3335 045-295-6600
AC
秦野曽屋店
0463-75-0357 0463-83-2830
ALG
城北店 千代田店 富士中央店
54-247-1326 054-264-1991 25-45-55-2555
新沢田店 袋井田町店 小豆店
-25-5000 239-43-7020 053-473-553
さんじの店 桜づつみ店
1053-441-8787 BUSS-989-99228
あさ9時30分よる9時
安台店
1045-983-1321
あさ10mよる8時
100g当り
268円
289 円
太麺焼そば
450g×2食
西港倉店
0467-32-5422
as 10 239-30
神記
大谷店
046-235-6432
A218 夏の
235円
8310 239
井田三舞店
044-798-8806
ハーモス
045-912-9955
東严塚駅前店
45-426-921
上今泉店
0046-231-7263
トマカ
●チャージ祭
7月・8月は
10. 20. 30.
税込 537 円
体 498円
アメリカ産
ブラントさんの牛肉
牛バラカルビ焼肉用
KA
御殿場高原
あらびきポーク
190g
138円)
CO-OP
いろいろ使える
味菜卵の
たま 178円 長ねぎ 1袋
国内産品
茨城県などの
198円
価 213 円
● 192 円
慮ください。
コース 高知県産
産直 黄金しょうが
+ 100円
総菜コーナー 11時販売開始
●写真はイメージです
バック
みょうが(2本): 138円 良) 138円
*税込108円
愛知県の国内産
大業
100g当り
●前
149円
なまぶし
ご飯(いくら)
1袋
があるため、品切れの場合はご容赦ください。
149円
1本釣り刺身用
日光丸で握った 258円
●
使わず
かつおたたき(解凍) 278円
イメージです
冷し中華
COOP
コーコース
7月24号
土用丑の日
3個入
7/130-140
●写真はイコージです
※写真はイメージです。
本
イパック 478 円
24 16
【ご予約承~ 6月15日(土)~7月15日
7月2日月・2日・2日
焼(長焼)
<130g
2380円
2,570 円
本日より1
2.580
しじみ汁に!
EAGEDO
7318
やきとりセット
(もも&ねぎま・たれ)
●516円
バック
4 498 円
537円
入り
ごちそう握り寿司
バック
1,080 円
1,166円
コープ
純正
ごま油
3008
548円
税込591円
イオシッコ
カ
カスピ海
ヨーグルト
グレ 400g
生乳
税込258円
・まさ 278円
82
沖縄県伊平屋島産
味付けもずく
(EHB-REBAV)
70g×3 1バック
コース
178円 税
192
オレ達のえだ豆
4 298円
組合員さんの声
「ほどよいごたえと、
かみしめるほどあふれてくる
ピリカちゃんさん
321がないです!
めください。
ユーコープはいつでもどなたでも加入して、ご利用いただける組合員のお店です。 ホームページはこちらから ユーコープwww.ucoop.or.jp/service/omise ●店舗により一部が異なる場合がございます。万一売り切れの場合はご容敬ください。一部パッケージの
собр
西部エリア

というようにかなり正確に取得できる
以前OCR関連のライブラリで行ったりchatgptで
チラシの内容を表示しようとした時にはできなかったことができた

ただし、スマホの撮影の場合
チラシ以外のものが入ると
その部分まで文字を認識してしまう

次回はチラシの画像のみで行うことにする

チラシの取得は以前Gmailで特定のメールのみ取得することができたので
Shuhooや特売のメールのみ取得し
リンク先から画像を取得すればできそう

あとは画像と数字の結び付け、もしくは
直接商品と価格を結び付けて取得できればOK

gcloudコマンドでApplication Default Credentials (ADC)の作成

https://cloud.google.com/docs/authentication/provide-credentials-adc?hl=ja#local-dev
にリファレンスがある

Application Default Credentials (ADC) は、アプリケーション環境に基づいて認証情報を自動的に検索するために、認証ライブラリが使用する手法です。認証ライブラリは、これらの認証情報を Cloud クライアントライブラリと Google API クライアントライブラリで使用可能にします。ADC を使用すると、Google Cloud サービスと API に対するアプリケーションの認証方法を変更せずに、開発環境または本番環境でコードを実行できます

とのこと

参考サイトにしている
https://nikkie-ftnext.hatenablog.com/entry/ocr-with-google-vision-api-python-first-step
では
ユーザー認証情報で進めました
とあるので

gcloud auth application-default login

を実行する

すると
Google Auth Library が Google アカウントへのアクセスをリクエストが要求されるので
アカウントを選択し
許可する

これで
ライブラリで参照するJSONファイルが作られる

ls ~/.config/gcloud/application_default_credentials.json

でファイルがあるか確認できる

ただし
https://nikkie-ftnext.hatenablog.com/entry/ocr-with-google-vision-api-python-first-step
によれば
Vision APIを最初に叩いたとき「The vision.googleapis.com API requires a quota project, which is not set by default.」が返りました
* 案内に沿ってgcloud auth application-default set-quota-project <選んだプロジェクトID>して解決しています
とあるので

gcloud auth application-default set-quota-project 選んだプロジェクトID

とした

Credentials saved to file: [/Users/snowpool/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).

Quota project "選んだプロジェクトID" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.

というログが出る

ls ~/.config/gcloud/

で調べたら

access_tokens.db			default_configs.db
active_config				gce
application_default_credentials.json	legacy_credentials
config_sentinel				logs
configurations				virtenv
credentials.db

となっていた

次に依存ライブラリのインストール

pip install google-cloud-vision

しかし

Collecting google-cloud-vision
  Downloading google_cloud_vision-3.7.3-py2.py3-none-any.whl.metadata (5.2 kB)
Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision) (2.11.0)
Requirement already satisfied: google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-cloud-vision) (2.18.0)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-cloud-vision)
  Downloading proto_plus-1.24.0-py3-none-any.whl.metadata (2.2 kB)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-cloud-vision) (3.20.3)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision) (1.59.0)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision) (2.28.2)
Collecting grpcio<2.0dev,>=1.33.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision)
  Downloading grpcio-1.65.1-cp310-cp310-macosx_12_0_universal2.whl.metadata (3.3 kB)
Collecting grpcio-status<2.0dev,>=1.33.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision)
  Downloading grpcio_status-1.65.1-py3-none-any.whl.metadata (1.1 kB)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision) (0.3.0)
Requirement already satisfied: six>=1.9.0 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision) (1.16.0)
Requirement already satisfied: urllib3<2.0 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision) (1.26.14)
Requirement already satisfied: rsa<5,>=3.1.4 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision) (4.9)
INFO: pip is looking at multiple versions of grpcio-status to determine which version is compatible with other requirements. This could take a while.
  Downloading grpcio_status-1.64.1-py3-none-any.whl.metadata (1.1 kB)
  Downloading grpcio_status-1.64.0-py3-none-any.whl.metadata (1.1 kB)
  Downloading grpcio_status-1.63.0-py3-none-any.whl.metadata (1.1 kB)
  Downloading grpcio_status-1.62.2-py3-none-any.whl.metadata (1.3 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2 (from google-cloud-vision)
  Downloading protobuf-4.25.3-cp37-abi3-macosx_10_9_universal2.whl.metadata (541 bytes)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision) (0.5.0)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision) (3.0.1)
Requirement already satisfied: idna<4,>=2.5 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in ./.pyenv/versions/3.10.6/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision) (2022.12.7)
Downloading google_cloud_vision-3.7.3-py2.py3-none-any.whl (466 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 466.4/466.4 kB 4.3 MB/s eta 0:00:00
Downloading proto_plus-1.24.0-py3-none-any.whl (50 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.1/50.1 kB 2.5 MB/s eta 0:00:00
Downloading grpcio-1.65.1-cp310-cp310-macosx_12_0_universal2.whl (10.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.4/10.4 MB 8.5 MB/s eta 0:00:00
Downloading grpcio_status-1.62.2-py3-none-any.whl (14 kB)
Downloading protobuf-4.25.3-cp37-abi3-macosx_10_9_universal2.whl (394 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 394.2/394.2 kB 8.9 MB/s eta 0:00:00
Installing collected packages: protobuf, grpcio, proto-plus, grpcio-status, google-cloud-vision
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mediapipe 0.10.9 requires protobuf<4,>=3.11, but you have protobuf 4.25.3 which is incompatible.
Successfully installed google-cloud-vision-3.7.3 grpcio-1.65.1 grpcio-status-1.62.2 proto-plus-1.24.0 protobuf-4.25.3

[notice] A new release of pip is available: 24.0 -> 24.1.2
[notice] To update, run: pip install --upgrade pip

となる

chatgptによれば
インストールは成功したようですが、mediapipeパッケージとprotobufのバージョンに互換性の問題があります。mediapipeはprotobufの3.11から4未満のバージョンを必要としていますが、現在インストールされているprotobufのバージョンは4.25.3です。
とのこと

提示された解決策は
1. protobufのバージョンを互換性のあるバージョンにダウングレードする： bashコードをコピーする  pip install protobuf==3.20.3 これにより、protobufをmediapipeと互換性のあるバージョンに戻すことができますが、google-cloud-visionと競合する可能性があります。
2. mediapipeの新しいバージョンを探す： mediapipeのドキュメントや問題追跡システムをチェックして、新しいprotobufバージョンと互換性のあるmediapipeのバージョンやワークアラウンドを探します。
3. 仮想環境を使用する：プロジェクトごとに異なるバージョンのパッケージを必要とする場合は、仮想環境を作成することをお勧めします。これにより、パッケージの競合を避けることができます。

とのこと

mediapipeの最新バージョンは？
最新のmediapipeのバージョンは0.10.14です。このバージョンは2024年5月8日にリリースされました。インストールするには、以下のコマンドを使用してください：

pip install mediapipe==0.10.14

とのことなので
これを実行

再度

pip install google-cloud-vision

を実行するとエラーは消えた

次はサンプルコードを動かす

gcloud の初期化

gcloud を初期化

gcloud CLI をインストールする
を参考に続ける

gcloud -v

を実行すると

Google Cloud SDK 477.0.0
bq 2.1.4
core 2024.05.17
gcloud-crc32c 1.0.0
gsutil 5.29
Updates are available for some Google Cloud CLI components.  To install them,
please run:
  $ gcloud components update

となる

gcloud CLI のコンポーネントの管理
によれば

gcloud components update コマンドを使用して、インストールされているすべてのコンポーネントを、使用可能な最新バージョンの gcloud CLI に更新できる

以前正常に動作していたバージョンに戻す必要があり、gcloud CLI を直接インストール（パッケージマネージャーではなく、インタラクティブインストーラ、静的バージョン、Windows インストーラ、Homebrew などを使用して）している場合は、gcloud components update を使用して、指定したバージョンに戻します。

gcloud components update –version VERSION

とあるので
多分失敗しても戻せるはず
ログでバージョンが判明しているので
一度更新してみる

実行すると

To help improve the quality of this product, we collect anonymized usage data 
and anonymized stacktraces when crashes are encountered; additional information 
is available at <https://cloud.google.com/sdk/usage-statistics>. This data is 
handled in accordance with our privacy policy 
<https://cloud.google.com/terms/cloud-privacy-notice>. You may choose to opt in 
this collection now (by choosing 'Y' at the below prompt), or at any time in the
 future by running the following command:

    gcloud config set disable_usage_reporting false

となるが
N
にしておく

次に

Beginning update. This process may take several minutes.

Your current Google Cloud CLI version is: 477.0.0
You will be upgraded to version: 484.0.0

┌───────────────────────────────────────────────────────────────────────────┐
│ These components will be updated. │
├───────────────────────────────────────────────────┬────────────┬──────────┤
│ Name │ Version │ Size │
├───────────────────────────────────────────────────┼────────────┼──────────┤
│ BigQuery Command Line Tool │ 2.1.7 │ 1.7 MiB │
│ Cloud Storage Command Line Tool │ 5.30 │ 11.3 MiB │
│ Google Cloud CLI Core Libraries │ 2024.07.12 │ 19.1 MiB │
│ Google Cloud CRC32C Hash Tool (Platform Specific) │ 1.0.0 │ 1.2 MiB │
│ gcloud cli dependencies │ 2024.07.12 │ 16.6 MiB │
└───────────────────────────────────────────────────┴────────────┴──────────┘
┌────────────────────────────────────────────────────────────────────┐
│ These components will be installed. │
├─────────────────────────────────────────────┬────────────┬─────────┤
│ Name │ Version │ Size │
├─────────────────────────────────────────────┼────────────┼─────────┤
│ gcloud cli dependencies (Platform Specific) │ 2021.04.16 │ < 1 MiB │
└─────────────────────────────────────────────┴────────────┴─────────┘

The following release notes are new in this upgrade.
Please read carefully for information about new features, breaking changes,
and bugs fixed. The latest full release notes can be viewed at:
https://cloud.google.com/sdk/release_notes

484.0.0 (2024-07-16)
Breaking Changes
▪ **(GKE Hub)** gcloud container fleet memberships get-credentials now
requires the permission gkehub.gateway.generateCredentials
(automatically included in roles gkehub.gatewayReader,
gkehub.gatewayEditor, and gkehub.gatewayAdmin), as well as network
access to *connectgateway.googleapis.com.

AlloyDB
▪ Added --node-ids flag to gcloud alloydb instances restart command in
the alpha and beta tracks. This flag allows users to allow users to
specify a comma-separated list of read pool node IDs to perform the
restart on. Without specifying this flag, every node in the read pool
will be restarted.

App Engine
▪ Removed Google App Engine PHP 5/5 support.

▪ Updated the Java SDK to version 2.0.29 build from the open source
project
<https://github.com/GoogleCloudPlatform/appengine-java-standard/releases/tag/v2.0.29>.

Artifact Registry
▪ Fixed error codes for gcloud artifacts docker upgrade migrate.

Batch
▪ Fixed the --filter flag of gcloud batch list command to match gcloud
topic filters syntax.

BigQuery
▪ Added support for non-ASCII characters in the field mask when
updating Connections.
▪ Added support for configuration.authentication.profile_id in the
field mask when updating Connections.
▪ Fixed a bug where bq init would be called even when --use_google_auth
is specified.

Cloud Build
▪ Add support in gcloud builds worker-pools commands for default region
set in config.

Cloud Data Fusion
▪ Added three new optional arguments to gcloud beta data-fusion
instances create command:
◆ --maintenance-window-start
◆ --maintenance-window-end
◆ --maintenance-window-recurrence
◆ These arguments allow users to specify the start time, end time,
and recurrence of the maintenance window for their Data Fusion
instance.
▪ Add four new optional arguments to gcloud beta data-fusion instances
update command:
◆ --maintenance-window-start
◆ --maintenance-window-end
◆ --maintenance-window-recurrence
◆ --clear-maintenance-window
◆ These arguments allow users to update maintenance window for their
Data Fusion instance by specifying the start time, end time, and
recurrence, or clear the maintenance window using
--clear-maintenance-window.

Cloud Run
▪ Allows --revision-suffix to be specified with empty string to clear
client-set revision naming.

Cloud SQL
▪ Added --[no-]enable-dataplex-integration flag to gcloud sql instances
create and gcloud sql instances patch to support Dataplex Integration
for Cloud SQL.
▪ Added support for MySQL 8.4.

Cloud Spanner
▪ Promoted --type=DATABASE_CHANGE_QUORUM option in gcloud spanner
operations list to GA.
▪ Fixed the DATABASE_CHANGE_QUORUM type filter string in gcloud spanner
operations list.

Cloud Workstations
▪ Adding disable_ssl_validation support for workstations
start-tcp-tunnel and workstations ssh.

Compute Engine
▪ Added gcloud compute routers add-route-policy-term which adds policy
term to a Route Policy in Cloud Router.
▪ Promoted gcloud compute routers add-route-policy-term to beta.
▪ Added gcloud compute routers update-route-policy-term which updates
policy term in a Route Policy in Cloud Router.
▪ Promoted gcloud compute routers update-route-policy-term to beta.
▪ Added gcloud compute routers remove-route-policy-term which removes
policy term from Route Policy in Cloud Router.
▪ Promoted gcloud compute routers remove-route-policy-term to beta.
▪ Fixed a bug in gcloud beta compute ssh where a third-party identity
subject bearing an '@' sign wouldn't be URL-escaped in the way the OS
Login API expects, causing spurious rejection.
▪ Promoted support of flags to --detection-load-threshold,
detection-absolute-qps, detection-relative-to-baseline-qps, and
traffic-granularity-configs in gcloud compute security-policies
add-layer7-ddos-defense-threshold-config to GA.

Dataproc Metastore
▪ Promoted --min-scaling-factor, --max-scaling-factor, and
--autoscaling-enabled flag of gcloud metastore services create and
gcloud metastore services update to GA.

Distributed Cloud Edge
▪ Added --offline-reboot-ttl flag to gcloud edge-cloud container
clusters create and gcloud edge-cloud container clusters update
commands. This flag specifies the maximum duration a node can reboot
offline (without connection to Google) and then rejoin its cluster to
resume its designated workloads.

Kubernetes Engine
▪ Added flag option --addons=RayOperator to enable/disable the Ray
Operator addon for GKE Standard clusters.
▪ Added flag --[no]-enable-ray-operator to enable/disable the Ray
Operator addon for GKE Autopilot clusters.
▪ Added flag --[no]-enable-ray-cluster-logging to enable/disable
automatic log collection for Ray clusters when the Ray Operator addon
is enabled.
▪ Added flag --[no]-enable-ray-cluster-monitoring to enable/disable
automatic metrics collection for Ray clusters when the Ray Operator
addon is enabled.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

483.0.0 (2024-07-02)
Google Cloud CLI
▪ Enabled faster component update mode by default on Linux. This avoids
making a backup copy of the installation directory when running certain
gcloud components commands, which should significantly improve the time
taken by these operations (including installation and updates).
▪ Fixed issue where gcloud components update would leave installation
in an unusable state when downloading certain components failed.

AlloyDB
▪ Added the following flags to gcloud alloydb instances create and
gcloud alloydb instances update in alpha and beta:
◆ --observability-config-enabled
◆ --observability-config-preserve-comments
◆ --observability-config-track-wait-events
◆ --observability-config-max-query-string-length
◆ --observability-config-record-application-tags
◆ --observability-config-query-plans-per-minute
◆ --observability-config-track-active-queries
▪ Promoted AlloyDB Cross Region Replication commands to beta and GA
track. Modified commands include: alloydb clusters switchover.
▪ Added creating cross region and project backups support to gcloud
alloydb backups create command.
▪ Added ability to create clusters with database_version POSTGRES_16 in
beta track.

Batch
▪ Release resource-allowances commands to the alpha track.

Cloud Access Context Manager
▪ Promoted gcloud access-context-manager supported-services to GA.

Cloud Filestore
▪ Enable Filestore promote-replica command in GA track.

Cloud Functions
▪ Added --binary-authorization and --clear-binary-authorization flags
for 2nd gen function in alpha and beta track.

Cloud NetApp
▪ Updated psa-range comments on gcloud netapp storage-pools and gcloud
netapp volumes to indicate that the psa-range key is not used and will
be ignored.

Cloud SQL
▪ Adding support for clean and if-exists flags to parallel import and
single-threaded-export.

Cloud Workstations
▪ Added --vm-tags flag to gcloud workstations config create to add tags
to the workstation's underlying VM.

Compute Engine
▪ Promoted --preference flag of gcloud compute backend-services
add-backend and gcloud compute backend-services update-backend to GA.
▪ Promoted --service-lb-policy flag of gcloud compute backend-services
create and gcloud compute backend-services update to GA.
▪ Promote gcloud compute instances ops-agents to GA.
▪ Added IDPF to the list of guestOsFeatures.
▪ Promoted --max-run-duration flag of gcloud compute instances create
to v1.
◆ Allows specifying the duration of time after which the instance
will terminate.
▪ Promoted --termination-time flag of gcloud compute instances create
to v1.
◆ Allows specifying the timestamp that the instance will terminate.
▪ Promoted --discard-local-ssds-at-termination-timestamp flag of gcloud
compute instances create to v1.
◆ Allows specifying the option to whether discard attached local SSDs
when automatically stopping this VM

Network Security
▪ Renamed FIREWALL_POLICY column to SOURCE_RESOURCE column in gcloud
networksecurity address-groups list-references command.
▪ Added --purpose flag to gcloud alpha/beta network-security
address-groups create/update commands.

Secret Manager
▪ Added flag --location and --regional-kms-key-name for secrets to use
the regional secrets.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

482.0.0 (2024-06-25)
App Engine
▪ Removed unused python2 dependencies that have CVEs.

Cloud Bigtable
▪ Adds force option to cbt setgcpolicy.

Cloud Dataflow
▪ Elevate jinja variables to top-level flag in dataflow yaml run.

Cloud Functions
▪ Added --clear-build-service-account flag for gcloud functions deploy.

Cloud Key Management Service
▪ Added --allowed-access-reasons flag to gcloud kms keys create to
create a key with a Key Access Justifications policy configured.
▪ Added --allowed-access-reasons flag to gcloud kms keys update to
update the Key Access Justifications policy on a key.
▪ Added --remove-key-access-justifications-policy flag to gcloud kms
keys update to remove the Key Access Justifications policy on a key.

Cloud Services
▪ Promoted gcloud services policies add-enable-rules to beta.
▪ Promoted gcloud services policies remove-enable-rules to beta.
▪ Promoted gcloud services policies get to beta.
▪ Promoted gcloud services policies get-effective to beta.
▪ Promoted gcloud services policies test-enabled to beta.
▪ Promoted gcloud services groups list-ancestor-groups to beta.
▪ Promoted gcloud services groups list-descendant-services to beta.
▪ Promoted gcloud services groups list-members to beta.

Compute Engine
▪ Added --size flag to gcloud compute instance-groups managed update
for v1.
▪ Promoted --confidential-compute-type flag for the command gcloud
compute instance create to GA.
▪ Promoted --performance-monitoring-unit flag for the command gcloud
compute instance-templates create to GA.
▪ Promoted --performance-monitoring-unit flag for the command gcloud
compute instances bulk create to GA.
▪ Promoted --performance-monitoring-unit flag for the command gcloud
compute instances create to GA.

Kubernetes Engine
▪ Updated default kubectl from 1.27.14 to 1.27.15.
▪ Additional kubectl versions:
◆ kubectl.1.27 (1.27.15)
◆ kubectl.1.28 (1.28.11)
◆ kubectl.1.25 (1.29.6)
◆ kubectl.1.30 (1.30.2)

Network Connectivity
▪ Added include-import-ranges flag to hybrid spoke creation to support
importing hub subnets.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

481.0.0 (2024-06-18)
Breaking Changes
▪ **(Cloud Dataflow)** Deprecated gcloud dataflow sql command group.
The command group will be removed by 2025-01-31. See Beam YAML
(https://beam.apache.org/documentation/sdks/yaml/) and Beam notebooks
(https://cloud.google.com/dataflow/docs/guides/notebook-advanced#beam-sql)
for alternatives.

AlloyDB
▪ Added another option ASSIGN_IPV4 to flag --assign-inbound-public-ip
to enable public IP for an instance to gcloud alloydb instances create
and gcloud alloydb instances create-secondary.
▪ Added flag --authorized-external-networks to set a list of authorized
external networks on an instance to gcloud alloydb instances create and
gcloud alloydb instances create-secondary.
▪ Added switchover command.

Artifact Registry
▪ Fixed a bug where gcloud artifacts files download and gcloud
artifacts generic download would crash.

BigQuery
▪ Added undelete command for datasets.
▪ Updated google-auth to version 2.29.0.
▪ Improved authentication error messaging.

Cloud Domains
▪ Implemented the following commands for gcloud domains registrations
google-domains-dns
◆ get-forwarding-config
◆ export-dns-record-sets

Cloud Filestore
▪ Added --source-instance flag to gcloud filestore instances create and
gcloud beta filestore instances create command to specify the instance
will be created as a Standby replica of the source-instance.
▪ Added promote-replica verb for filestore instances. promote-replica
promotes a standby replication instance to a regular instance.

Cloud Functions
▪ Promoted --build-service-account flag for gcloud functions deploy to
GA.

Cloud Identity-Aware Proxy
▪ Promoted gcloud iap regional command to beta and GA.

Cloud Pub/Sub
▪ Promoted --cloud-storage-use-topic-schema flag of gcloud pubsub
subscriptions create to GA. Added the ability to set whether to use
topic schemas in Cloud Pub/Sub to Cloud Storage subscriptions. For more
information, see
<https://cloud.google.com/pubsub/docs/create-cloudstorage-subscription#use-topic-schema>.
▪ Promoted --cloud-storage-use-topic-schema flag of gcloud pubsub
subscriptions update. to GA. Added the ability to update whether to use
topic schemas in Cloud Pub/Sub to Cloud Storage subscriptions. For more
information, see
<https://cloud.google.com/pubsub/docs/create-cloudstorage-subscription#use-topic-schema>.

Cloud Spanner
▪ Added gcloud beta spanner instance-partitions command group.
▪ Added --instance-partition flag to gcloud beta spanner operations
list, gcloud beta spanner operations describe, and gcloud beta spanner
operations cancel.

Cloud Storage
▪ Added commands for creating, listing, describing, and deleting
folders in buckets with hierarchical namespace enabled:
◆ Added gcloud alpha storage folders create which creates folders.
◆ Added gcloud alpha storage folders list which lists folders in
buckets.
◆ Added gcloud alpha storage folders describe which gets the folder's
metadata.
◆ Added gcloud alpha storage folders delete which deletes folders.
▪ Updated gsutil component to 5.30.

Compute Engine
▪ Promoted --confidential-compute-type flag for the command gcloud
compute instance create to GA.
▪ Added --size flag to gcloud compute instance-groups managed update
for v1.

Network Connectivity
▪ Updated gcloud network-connectivity internal-ranges update to support
setting and clearing labels.
▪ Added support for include-export-ranges to support include filters
for VPC spokes.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

480.0.0 (2024-06-11)
App Engine
▪ Updated the Java SDK to version 2.0.28 build from the open source
project
<https://github.com/GoogleCloudPlatform/appengine-java-standard/releases/tag/v2.0.28>.

Artifact Registry
▪ gcloud artifacts docker upgrade migrate now automatically creates
repos for pkg.dev-based migration.
◆ Added gcloud artifacts files delete command.

Cloud Datastream
▪ Added --type and --sqlserver-* flags to gcloud datastream
connection-profiles create|update and gcloud datastream objects lookup
commands to support SQL server source.
▪ Added --type and --sqlserver-* flags to gcloud datastream streams
create|update commands to support SQL server source.
▪ Added --sqlserver-rdbms-file flag and support for SQL Server profile
to --connection-profile-object-file to gcloud datastream
connection-profiles discover commands to support SQL server source.

Cloud Domains
▪ Implemented the following commands for gcloud domains registrations
◆ renew-domain
◆ initiate-push-transfer

Cloud Functions
▪ Added support for --execution-id flag when used together with --gen2
flag for gcloud functions logs read.

Cloud Healthcare
▪ Added beta flag --enable-history-modifications to the fhir-stores
create and fhir-stores update commands.

Cloud IAM
▪ Updated iam service-accounts keys list to return additional
properties, namely:
◆ disable_reason: The reason the Service Account Key as been disabled
(if applicable)
◆ extended_status: Additional metadata about the Service Account Key

Cloud Memorystore
▪ Added --zone-distribution-mode and --zone flags to gcloud redis
clusters create for creating single zone clusters.

Cloud Pub/Sub
▪ Added --bigquery-service-account-email and
--cloud-storage-service-account-email flags to gcloud pubsub
subscriptions create to set the service account for writing messages to
BigQuery and Cloud Storage, respectively.
▪ Added --bigquery-service-account-email and
--cloud-storage-service-account-email flags to gcloud pubsub
subscriptions create to update the service account for writing messages
to BigQuery and Cloud Storage, respectively.

Cloud Spanner
▪ Added --proto-descriptors-file to gcloud spanner databases create
command to allow creating database with proto and enum type columns.
▪ Added --proto-descriptors-file to gcloud spanner databases ddl update
command to allow updating database with proto and enum type columns.
▪ Added --include-proto-descriptors to gcloud spanner databases ddl
describe command to allow proto descriptors for a database with proto
and enum type columns.
▪ Promoted gcloud spanner databases change-quorum command to GA.

Cloud Storage
▪ Adds support of Cross Bucket Replication Feature in alpha track of
gcloud transfer command group.

Cloud Workstations
▪ Added --allowed-ports flag to gcloud beta workstations configs create
and gcloud beta workstations configs update commands.
▪ Added enable-nested-virtualization pool-size and boot-disk-size to
--boost-configs flag in beta workstations configs create and beta
workstations configs update.

Compute Engine
▪ Added support for version=24.04 and short-name=ubuntu in --os-types
for gcloud beta compute instances ops-agents policies [create|update].
▪ Promoted Tls Early Data in TargetHttpsProxy compute API to v1.
▪ Added gce_vm_ip_portmap Network Endpoint Group for gcloud compute
network-endpoint-groups beta support.
▪ Added --access-mode flag to gcloud compute disks create and gcloud
compute disks update.
▪ Added --tls-early-data flag to gcloud compute v1 target-https-proxies
create/update to Tls Early Data field in Target Https Proxy.

Secret Manager
▪ Added --version-destroy-ttl flag to gcloud secrets create to let
users enable secret version delayed destruction on a secret.
▪ Added --version-destroy-ttl flag to gcloud secrets update to let
users enable/update secret version delayed destruction on a secret.
▪ Added --remove-version-destroy-ttl flag to gcloud secrets update to
let users disable secret version delayed destruction on a secret.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

479.0.0 (2024-06-04)
Breaking Changes
▪ **(Cloud Dataflow)** gcloud dataflow flex-template build for Dataflow
Flex Templates in Python with --env
FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE=/path/to/requirements.txt
defined will run pip check after pip install while building the
container image. This will break the build process if newly installed
packages override pre-installed packages with a version that is known
to be incompatible with other pre-installed packages. See
<https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates>
for alternative methods to build a container image without pip check.

Artifact Registry
▪ Updated gcloud artifacts docker images command to include tags as
list instead of string to enable use of --filter=tags=<SearchTerm>.

Cloud Composer
▪ Added gcloud composer environments check-upgrade command. It checks
that an environment upgrade does not fail because of PyPI module
conflicts.
▪ Added gcloud composer environments list-upgrades command. It lists
all possible target versions that an existing environment can be
upgraded to.

Cloud Dataproc
▪ Updated gcloud dataproc session-templates export to filter out
additional output only fields.

Cloud NetApp
▪ Added --administrators flag to gcloud netapp active-directories
create and gcloud netapp active-directories update.

Cloud SQL
▪ Added --preferred-secondary-zone flag to gcloud sql instances clone
command.

Compute Engine
▪ Updated import and export schemas for gcloud compute
forwarding-rules.
▪ Promoted --ip-collection flag of gcloud compute forwarding-rules
create to GA.
▪ Promoted --ip-collection-region flag of gcloud compute
forwarding-rules create to GA.

Compute OS Config
▪ Added --allow-missing flag to gcloud compute os-config
os-policy-assignments update to allow for creation of a new OS policy
assignment resource if it does not exist.

Database Migration
▪ Updated gcloud database-migration connection-profiles create cloudsql
to support POSTGRES_16 version option for Cloud SQL connection
profiles.

Distributed Cloud Edge
▪ Added add/remove exclusion window flags for gcloud edge-cloud
container clusters update to allow users to add/remove exclusion
windows where automatic upgrades will be blocked within.

Service Extensions
▪ Added the metadata field to the DEP extensions.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

478.0.0 (2024-05-29)
Google Cloud CLI
▪ Improved error message for authentication errors.
▪ Improved error message for service account impersonation refresh
errors.
▪ Fixed issue where some commands with a --uri flag would produce no
output.

App Engine
▪ Enable fetch_instance_after_connect_error for compute
start-iap-tunnel in GA.
▪ Allow IAP tunneling for instances with external IP when explicitly
enabled via flag.

App Engine Flexible Environment
▪ Enable fetch_instance_after_connect_error for compute
start-iap-tunnel in GA.
▪ Allow IAP tunneling for instances with external IP when explicitly
enabled via flag.

Artifact Registry
▪ Added gcloud artifacts files describe command.

Backup For GKE
▪ Added --volume-data-restore-policy-bindings flag to gcloud container
backup-restore restore-plans create and gcloud container backup-restore
restore-plans update to enhance volume restore flexibility.
▪ Added --volume-data-restore-policy-overrides-file flag to gcloud
container backup-restore restores create to enhance volume restore
flexibility.
▪ Added --permissive-mode flag to gcloud container backup-restore
backup-plans create and gcloud container backup-restore backup-plans
update to enable bypassing the new backup-time restorability
enforcement.
▪ Added --filter-file flag to gcloud container backup-restore restores
create to support Backup for GKE fine-grained restore.
▪ Added --restore-order-file flag to gcloud <alpha|beta> container
backup-restore restore-plans create and gcloud <alpha|beta> container
backup-restore restore-plans update to support custom ordering while
performing restore as part of Backup for GKE.
▪ Added the following enum values for the flag
--namespaced-resource-restore-mode to gcloud <alpha|beta> container
backup-restore restore-plans create and gcloud <alpha|beta> container
backup-restore restore-plans update to expand namespaced resource
restore mode options:
◆ merge-skip-on-conflict
◆ merge-replace-volume-on-conflict
◆ merge-replace-on-conflict
▪ Deprecated --substitution-rules-file flag. Use
--transformation-rules-file instead.

BigQuery
▪ Added support for map_target_type with external parquet tables.
▪ Added support for column_name_character_map to map special characters
in column names during load jobs.
▪ Added a printout of unreachable locations when datasets and jobs are
listed and a region is down.

Cloud Dataplex
▪ Promoted gcloud dataplex aspect-types command group to GA.
▪ Promoted gcloud dataplex entry-groups command group to GA.
▪ Promoted gcloud dataplex entry-types command group to GA.

Cloud Deploy
▪ Fixed issue where skaffold files generated from deploy releases
create flags did not include all profiles in the release's Delivery
Pipeline.

Cloud Firestore
▪ Promoted Cloud Firestore Backups and Restore gcloud CLI changes to
beta.
◆ Promoted gcloud firestore backups list command to beta.
◆ Promoted gcloud firestore backups describe command to beta.
◆ Promoted gcloud firestore backups delete command to beta.
◆ Promoted gcloud firestore backups schedules create command to beta.
◆ Promoted gcloud firestore backups schedules update command to beta.
◆ Promoted gcloud firestore backups schedules describe command to
beta.
◆ Promoted gcloud firestore backups schedules delete command to beta.
◆ Promoted gcloud firestore backups schedules list command to beta.
◆ Promoted gcloud firestore databases restore command to beta.
▪ Promoted Cloud Firestore Backups and Restore gcloud CLI changes to
GA.
◆ Promoted gcloud firestore backups list command to GA.
◆ Promoted gcloud firestore backups describe command to GA.
◆ Promoted gcloud firestore backups delete command to GA.
◆ Promoted gcloud firestore backups schedules create command to GA.
◆ Promoted gcloud firestore backups schedules update command to GA.
◆ Promoted gcloud firestore backups schedules describe command to GA.
◆ Promoted gcloud firestore backups schedules delete command to GA.
◆ Promoted gcloud firestore backups schedules list command to GA.
◆ Promoted gcloud firestore databases restore command to GA.

Cloud Firestore Emulator
▪ Release Cloud Firestore emulator v1.19.7
◆ Fixes unexpected responses from nested queries in Datastore Mode.
◆ Add Auth Context support for Firestore triggers (2nd gen) in
Firestore Emulator

Cloud Functions
▪ Added validation to --runtime-update-policy argument.

Cloud SQL
▪ Added PostgreSQL 16 to database versions.
▪ Fixed issue where gcloud sql instances export and gcloud sql
instances import would display only the operation selfLink field
instead of the whole operation when run asynchronously with --async,
regardless of the --format flag specified.
◆ This behavior is now consistent with other gcloud sql instances
commands.
◆ To display only the selfLink field, use --format="value(selfLink)".

Cloud Workstations
▪ Adding --env flag to gcloud workstations create.

Compute Engine
▪ Added IPV6_ONLY option to --stack-type flag of gcloud compute
vpn-gateways create command to create an IPv6-only VPN gateway.

Config Connector
▪ Updated Google Cloud Config Connector to version 1.118.1. See Config
Connector Overview for more details
https://cloud.google.com/config-connector/docs/overview
(https://cloud.google.com/config-connector/docs/overview).

Database Migration
▪ Updated gcloud database-migrate connection-profiles update to update
Database Migration Service connection profile for SQL Server to Cloud
SQL-SQL Server migrations.
▪ Updated gcloud database-migrate migration-jobs update to update
Database Migration Service migration job for SQL Server to Cloud
SQL-SQL Server migrations.

Dataproc Metastore
▪ Added --deletion-protection flags to gcloud metastore services create
GA release track to allow creating a Dataproc Metastore instance with
delete protection.
▪ Added --deletion-protection and --no-deletion-protection flags to
gcloud metastore services update GA release track to allow updating a
Dataproc Metastore instance with delete protection.

Security Command Center
▪ Added --filter-modules flag in gcloud scc manage services describe
... api to filter response by modules.

Subscribe to these release notes at
https://groups.google.com/forum/#!forum/google-cloud-sdk-announce
(https://groups.google.com/forum/#!forum/google-cloud-sdk-announce).

Once started, canceling this operation may leave your SDK installation in an
inconsistent state.

となる
これで
Y
とすると更新される

Performing in place update...

╔════════════════════════════════════════════════════════════╗
╠═ Uninstalling: BigQuery Command Line Tool                 ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Uninstalling: Cloud Storage Command Line Tool            ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Uninstalling: Google Cloud CLI Core Libraries            ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Uninstalling: Google Cloud CRC32C Hash Tool              ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Uninstalling: gcloud cli dependencies                    ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: BigQuery Command Line Tool                   ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: Cloud Storage Command Line Tool              ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: Google Cloud CLI Core Libraries              ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: Google Cloud CRC32C Hash Tool (Platform S... ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gcloud cli dependencies                      ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gcloud cli dependencies (Platform Specific)  ═╣
╚════════════════════════════════════════════════════════════╝

Performing post processing steps...done.                                       

Update done!

To revert your CLI to the previously installed version, you may run:
  $ gcloud components update --version 477.0.0

となるのでログを翻訳

後処理ステップを実行しています…完了しました。

アップデート完了！

CLI を以前にインストールしたバージョンに戻すには、次のコマンドを実行します。
$ gcloud コンポーネントの更新 –バージョン 477.0.0
となる

つまりこのコマンドを実行すれば不具合があっても戻せるはず

とりあえず更新処理はできたので初期化をする

./google-cloud-sdk/bin/gcloud init

を実行

Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.                                            
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

You must log in to continue. Would you like to log in (Y/n)?

となるので
Y
にする

これでブラウザで認証画面になるので
ログインして処理を進めていき
最後に許可を押せばOK

Your browser has been opened to visit:


Please enter numeric choice or text value (must exactly match list item):

となるので
今回使うのプロジェクトの番号を入力

これで

* Commands that require authentication will use snowpoollovely@gmail.com by default
* Commands will reference project `raspberrypi-ea1b6` by default
Run `gcloud help config` to learn how to change individual settings

This gcloud configuration is called [default]. You can create additional configurations if you work with multiple accounts and/or projects.
Run `gcloud topic configurations` to learn more.

Some things to try next:

* Run `gcloud --help` to see the Cloud Platform services you can interact with. And run `gcloud help COMMAND` to get help on any gcloud command.
* Run `gcloud topic --help` to learn about advanced features of the SDK like arg files and output formatting
* Run `gcloud cheat-sheet` to see a roster of go-to `gcloud` commands.

となる

gcloud config list

を実行すると
設定の確認ができる

とりあえず設定はできたので
次に
gcloudコマンドでApplication Default Credentials (ADC)の作成