Yomitokuで写真やレシートを解析

Yomitokuで写真やレシートを解析してみる
では
Google Colab で実行しているが
レシート解析に成功している

なお
【Python】PyTorchをAppleシリコン搭載Mac（M1、M2）にインストールする方法 – AppleシリコンGPUで動かす方法も、併せて紹介 –

によれば

PyTorchをNvidia製GPUで動かすためには、扱うデータを「メインメモリ」から、
「GPU上のメモリ」に移す必要があります。

AppleシリコンGPUの場合も同じで、データを、メインメモリから、GPU上のメモリに移す処理が必要

import torch
print(torch.backends.mps.is_available())

で
True
となればOK

AppleシリコンGPUを使用する場合、
* device = torch.device(‘mps’)
* ｛データ｝.to(device)
を使う

mpsとは、Metal Perfomance Shadersの略称

メモリのサイズが気になったのでChatGPTで調べてみた

1 MacBook Air（16GBモデル）のGPUメモリサイズは固定ではなく、
ユニファイドメモリ（16GB）の中から動的に割り当てられます。
最大で約8GB〜12GB程度が割り当て可能ですが、これはシステム負荷に依存します。

必要に応じてアクティビティモニタやPyTorchでリアルタイムの使用量を確認するのがおすすめです。

とりあえず実験を進める
https://www.muji.com/public/media/jp/doc/9952536/muji2021aw_all.pdf
無印良品 2021 秋冬収納・家具・家電・ファブリック
からPDFダウンロード

トミカ&プラレールカタログwithあにあ 2022-2023
https://www.takaratomy.co.jp/products/plarail/catalog/2022_2023TPcatalog.pdf

これを
Data/imageフォルダに移動しておく

pip install yomitoku

でyomitokuをインストール

次に

import cv2
import torch

from yomitoku import DocumentAnalyzer
from yomitoku.data.functions import load_image, load_pdf

if __name__ == "__main__":
    filename = "drugstore_flyer"
    pdf_filepath = f"./data/image/{filename}.pdf"

    image = load_pdf(pdf_filepath)
    analyzer = DocumentAnalyzer(
        configs={},
        visualize=True,
        device='mps'
    )

    results, ocr_vis, layout_vis = analyzer(image[0])

    # to HTML
    # results.to_html(f"./outputs/{filename}.html")

    # to image
    cv2.imwrite(f"./outputs/{filename}_ocr.jpg", ocr_vis)
    cv2.imwrite(f"./outputs/{filename}_layout.jpg", layout_vis)

として
pdf_ocr.py
を保存

次に実行
しかしこれだとエラーになるので

import cv2

from yomitoku import DocumentAnalyzer
from yomitoku.data.functions import load_image, load_pdf

pdf_filepath = f"document.pdf"

image = load_pdf(pdf_filepath)
analyzer = DocumentAnalyzer(
    configs={},
    visualize=True,
    device='mps'
)

results, ocr_vis, layout_vis = analyzer(image[0])


# to image
cv2.imwrite(f"document_ocr.jpg", ocr_vis)
cv2.imwrite(f"document_layout.jpg", layout_vis)

でファイルを１つにして実行する

しかし

2024-12-14 06:37:40,596 - yomitoku.base - INFO - Initialize TextDetector
model.safetensors: 100%|█████████████████████| 102M/102M [00:03<00:00, 34.0MB/s]
2024-12-14 06:37:45,343 - yomitoku.base - INFO - Initialize TextRecognizer
config.json: 100%|█████████████████████████████| 256/256 [00:00<00:00, 1.43MB/s]
model.safetensors: 100%|█████████████████████| 200M/200M [00:06<00:00, 30.7MB/s]
2024-12-14 06:37:53,752 - yomitoku.base - INFO - Initialize LayoutParser
model.safetensors: 100%|█████████████████████| 172M/172M [00:04<00:00, 34.5MB/s]
2024-12-14 06:37:59,630 - yomitoku.base - INFO - Initialize TableStructureRecognizer
model.safetensors: 100%|█████████████████████| 172M/172M [00:06<00:00, 28.3MB/s]
2024-12-14 06:38:07,932 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.2879679203033447
2024-12-14 06:38:07,966 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.03367877006530762
2024-12-14 06:38:09,561 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 2.916445255279541
2024-12-14 06:38:19,991 - yomitoku.base - INFO - Initialize TextDetector
2024-12-14 06:38:20,963 - yomitoku.base - INFO - Initialize TextRecognizer
2024-12-14 06:38:22,444 - yomitoku.base - INFO - Initialize LayoutParser
2024-12-14 06:38:23,230 - yomitoku.base - INFO - Initialize TableStructureRecognizer
2024-12-14 06:38:24,966 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.029360055923462
2024-12-14 06:38:24,982 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.01499795913696289
2024-12-14 06:38:26,832 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 2.895256757736206
2024-12-14 06:38:26,837 - yomitoku.base - ERROR - Error occurred in TextRecognizer __call__: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/snowpool/aw10s/ollama/pdf_ocr.py", line 15, in <module>
    results, ocr_vis, layout_vis = analyzer(image[0])
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/document_analyzer.py", line 304, in __call__
    resutls, ocr, layout = asyncio.run(self.run(img))
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/document_analyzer.py", line 293, in run
    results = await asyncio.gather(*tasks)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/ocr.py", line 83, in __call__
    rec_outputs, vis = self.recognizer(img, det_outputs.points, vis=vis)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/base.py", line 45, in wrapper
    raise e
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/base.py", line 40, in wrapper
    result = func(*args, **kwargs)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/text_recognizer.py", line 103, in __call__
    for data in dataloader:
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 484, in __iter__
    return self._get_iterator()
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__
    w.start()
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

となる

このエラーメッセージは、

multiprocessing モジュールを使用する際に適切なプロセス起動方法が設定されていないことが原因です。
特に、macOSではデフォルトで spawn を使用してプロセスを開始するため、
if __name__ == “__main__”: を使用しないとこのエラーが発生します。
以下の修正版コードを使用して、この問題を解決できます。

とのこと

import cv2
import torch

from yomitoku import DocumentAnalyzer
from yomitoku.data.functions import load_image, load_pdf

if __name__ == "__main__":
    filename = "drugstore_flyer"
    pdf_filepath = f"./images/{filename}.pdf"

    image = load_pdf(pdf_filepath)
    analyzer = DocumentAnalyzer(
        configs={},
        visualize=True,
        device='mps'
    )

    results, ocr_vis, layout_vis = analyzer(image[0])

    # to HTML
    # results.to_html(f"./outputs/{filename}.html")

    # to image
    cv2.imwrite(f"./outputs/{filename}_ocr.jpg", ocr_vis)
    cv2.imwrite(f"./outputs/{filename}_layout.jpg", layout_vis)

が
https://github.com/Shakshi3104/ymtk-supplementary/blob/main/app.py
にあったのでコードを書き換える

import cv2
import torch

from yomitoku import DocumentAnalyzer
from yomitoku.data.functions import load_image, load_pdf

if __name__ == "__main__":
    filename = "document"
    pdf_filepath = f"./images/{filename}.pdf"

    image = load_pdf(pdf_filepath)
    analyzer = DocumentAnalyzer(
        configs={},
        visualize=True,
        device='mps'
    )

    results, ocr_vis, layout_vis = analyzer(image[0])

    # to HTML
    # results.to_html(f"./outputs/{filename}.html")

    # to image
    cv2.imwrite(f"./outputs/{filename}_ocr.jpg", ocr_vis)
    cv2.imwrite(f"./outputs/{filename}_layout.jpg", layout_vis)

として

mkdir outputs  
mv document.pdf images/

でPDFを移動して
出力先のフォルダも作成しておく

これで実行すると

2024-12-15 05:50:41,470 - yomitoku.base - INFO - Initialize TextDetector
2024-12-15 05:50:42,794 - yomitoku.base - INFO - Initialize TextRecognizer
2024-12-15 05:50:44,762 - yomitoku.base - INFO - Initialize LayoutParser
2024-12-15 05:50:45,957 - yomitoku.base - INFO - Initialize TableStructureRecognizer
2024-12-15 05:50:48,155 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.2731208801269531
2024-12-15 05:50:48,189 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.033370256423950195
2024-12-15 05:50:49,887 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 3.005059003829956
2024-12-15 05:51:04,448 - yomitoku.base - INFO - TextRecognizer __call__ elapsed_time: 14.56023383140564
snowpool@kubotasorunoAir ollama % mkdir outputs
snowpool@kubotasorunoAir ollama % python pdf_ocr.py
2024-12-15 05:52:39,988 - yomitoku.base - INFO - Initialize TextDetector
2024-12-15 05:52:41,732 - yomitoku.base - INFO - Initialize TextRecognizer
2024-12-15 05:52:43,589 - yomitoku.base - INFO - Initialize LayoutParser
2024-12-15 05:52:44,413 - yomitoku.base - INFO - Initialize TableStructureRecognizer
2024-12-15 05:52:46,277 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.1299068927764893
2024-12-15 05:52:46,312 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.03462696075439453
2024-12-15 05:52:48,106 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 2.958970069885254
2024-12-15 05:53:01,007 - yomitoku.base - INFO - TextRecognizer __call__ elapsed_time: 12.900847911834717

となるが
最初のページのみしか処理されない

これは

load_pdf関数がPDFを画像に変換した際、最初のページだけを image[0] で指定しているため、
最初のページしか処理されていない状況です。
すべてのページを処理するには、PDF内のすべてのページをループするようにコードを修正します。

とのこと

これをコードを変えて全ページを実行してみるが

ページ数は１４７
処理開始が６時５分
とりあえず半分やるだけで1時間以上かかるので停止

次はレシートをcolabでやってみる

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル