Yomitokuで写真やレシートを解析
Yomitokuで写真やレシートを解析してみる
では
Google Colab で実行しているが
レシート解析に成功している
なお
【Python】PyTorchをAppleシリコン搭載Mac(M1、M2)にインストールする方法 – AppleシリコンGPUで動かす方法も、併せて紹介 –
によれば
PyTorchをNvidia製GPUで動かすためには、扱うデータを「メインメモリ」から、
「GPU上のメモリ」に移す必要があります。
AppleシリコンGPUの場合も同じで、データを、メインメモリから、GPU上のメモリに移す処理が必要
import torch print(torch.backends.mps.is_available())
で
True
となればOK
AppleシリコンGPUを使用する場合、
* device = torch.device(‘mps’)
* {データ}.to(device)
を使う
mpsとは、Metal Perfomance Shadersの略称
メモリのサイズが気になったのでChatGPTで調べてみた
1 MacBook Air(16GBモデル)のGPUメモリサイズは固定ではなく、
ユニファイドメモリ(16GB)の中から動的に割り当てられます。
最大で約8GB〜12GB程度が割り当て可能ですが、これはシステム負荷に依存します。必要に応じてアクティビティモニタやPyTorchでリアルタイムの使用量を確認するのがおすすめです。
とりあえず実験を進める
https://www.muji.com/public/media/jp/doc/9952536/muji2021aw_all.pdf
無印良品 2021 秋冬 収納・家具・家電・ファブリック
からPDFダウンロード
トミカ&プラレールカタログwithあにあ 2022-2023
https://www.takaratomy.co.jp/products/plarail/catalog/2022_2023TPcatalog.pdf
これを
Data/imageフォルダに移動しておく
pip install yomitoku
でyomitokuをインストール
次に
import cv2 import torch from yomitoku import DocumentAnalyzer from yomitoku.data.functions import load_image, load_pdf if __name__ == "__main__": filename = "drugstore_flyer" pdf_filepath = f"./data/image/{filename}.pdf" image = load_pdf(pdf_filepath) analyzer = DocumentAnalyzer( configs={}, visualize=True, device='mps' ) results, ocr_vis, layout_vis = analyzer(image[0]) # to HTML # results.to_html(f"./outputs/{filename}.html") # to image cv2.imwrite(f"./outputs/{filename}_ocr.jpg", ocr_vis) cv2.imwrite(f"./outputs/{filename}_layout.jpg", layout_vis)
として
pdf_ocr.py
を保存
次に実行
しかしこれだとエラーになるので
import cv2 from yomitoku import DocumentAnalyzer from yomitoku.data.functions import load_image, load_pdf pdf_filepath = f"document.pdf" image = load_pdf(pdf_filepath) analyzer = DocumentAnalyzer( configs={}, visualize=True, device='mps' ) results, ocr_vis, layout_vis = analyzer(image[0]) # to image cv2.imwrite(f"document_ocr.jpg", ocr_vis) cv2.imwrite(f"document_layout.jpg", layout_vis)
でファイルを1つにして実行する
しかし
2024-12-14 06:37:40,596 - yomitoku.base - INFO - Initialize TextDetector model.safetensors: 100%|█████████████████████| 102M/102M [00:03<00:00, 34.0MB/s] 2024-12-14 06:37:45,343 - yomitoku.base - INFO - Initialize TextRecognizer config.json: 100%|█████████████████████████████| 256/256 [00:00<00:00, 1.43MB/s] model.safetensors: 100%|█████████████████████| 200M/200M [00:06<00:00, 30.7MB/s] 2024-12-14 06:37:53,752 - yomitoku.base - INFO - Initialize LayoutParser model.safetensors: 100%|█████████████████████| 172M/172M [00:04<00:00, 34.5MB/s] 2024-12-14 06:37:59,630 - yomitoku.base - INFO - Initialize TableStructureRecognizer model.safetensors: 100%|█████████████████████| 172M/172M [00:06<00:00, 28.3MB/s] 2024-12-14 06:38:07,932 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.2879679203033447 2024-12-14 06:38:07,966 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.03367877006530762 2024-12-14 06:38:09,561 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 2.916445255279541 2024-12-14 06:38:19,991 - yomitoku.base - INFO - Initialize TextDetector 2024-12-14 06:38:20,963 - yomitoku.base - INFO - Initialize TextRecognizer 2024-12-14 06:38:22,444 - yomitoku.base - INFO - Initialize LayoutParser 2024-12-14 06:38:23,230 - yomitoku.base - INFO - Initialize TableStructureRecognizer 2024-12-14 06:38:24,966 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.029360055923462 2024-12-14 06:38:24,982 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.01499795913696289 2024-12-14 06:38:26,832 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 2.895256757736206 2024-12-14 06:38:26,837 - yomitoku.base - ERROR - Error occurred in TextRecognizer __call__: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. Traceback (most recent call last): File "<string>", line 1, in <module> File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/snowpool/aw10s/ollama/pdf_ocr.py", line 15, in <module> results, ocr_vis, layout_vis = analyzer(image[0]) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/document_analyzer.py", line 304, in __call__ resutls, ocr, layout = asyncio.run(self.run(img)) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete return future.result() File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/document_analyzer.py", line 293, in run results = await asyncio.gather(*tasks) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/ocr.py", line 83, in __call__ rec_outputs, vis = self.recognizer(img, det_outputs.points, vis=vis) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/base.py", line 45, in wrapper raise e File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/base.py", line 40, in wrapper result = func(*args, **kwargs) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/yomitoku/text_recognizer.py", line 103, in __call__ for data in dataloader: File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 484, in __iter__ return self._get_iterator() File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__ w.start() File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch prep_data = spawn.get_preparation_data(process_obj._name) File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "/Users/snowpool/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
となる
このエラーメッセージは、
multiprocessing モジュールを使用する際に適切なプロセス起動方法が設定されていないことが原因です。
特に、macOSではデフォルトで spawn を使用してプロセスを開始するため、
if __name__ == “__main__”: を使用しないとこのエラーが発生します。
以下の修正版コードを使用して、この問題を解決できます。
とのこと
import cv2 import torch from yomitoku import DocumentAnalyzer from yomitoku.data.functions import load_image, load_pdf if __name__ == "__main__": filename = "drugstore_flyer" pdf_filepath = f"./images/{filename}.pdf" image = load_pdf(pdf_filepath) analyzer = DocumentAnalyzer( configs={}, visualize=True, device='mps' ) results, ocr_vis, layout_vis = analyzer(image[0]) # to HTML # results.to_html(f"./outputs/{filename}.html") # to image cv2.imwrite(f"./outputs/{filename}_ocr.jpg", ocr_vis) cv2.imwrite(f"./outputs/{filename}_layout.jpg", layout_vis)
が
https://github.com/Shakshi3104/ymtk-supplementary/blob/main/app.py
にあったのでコードを書き換える
import cv2 import torch from yomitoku import DocumentAnalyzer from yomitoku.data.functions import load_image, load_pdf if __name__ == "__main__": filename = "document" pdf_filepath = f"./images/{filename}.pdf" image = load_pdf(pdf_filepath) analyzer = DocumentAnalyzer( configs={}, visualize=True, device='mps' ) results, ocr_vis, layout_vis = analyzer(image[0]) # to HTML # results.to_html(f"./outputs/{filename}.html") # to image cv2.imwrite(f"./outputs/{filename}_ocr.jpg", ocr_vis) cv2.imwrite(f"./outputs/{filename}_layout.jpg", layout_vis)
として
mkdir outputs mv document.pdf images/
でPDFを移動して
出力先のフォルダも作成しておく
これで実行すると
2024-12-15 05:50:41,470 - yomitoku.base - INFO - Initialize TextDetector 2024-12-15 05:50:42,794 - yomitoku.base - INFO - Initialize TextRecognizer 2024-12-15 05:50:44,762 - yomitoku.base - INFO - Initialize LayoutParser 2024-12-15 05:50:45,957 - yomitoku.base - INFO - Initialize TableStructureRecognizer 2024-12-15 05:50:48,155 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.2731208801269531 2024-12-15 05:50:48,189 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.033370256423950195 2024-12-15 05:50:49,887 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 3.005059003829956 2024-12-15 05:51:04,448 - yomitoku.base - INFO - TextRecognizer __call__ elapsed_time: 14.56023383140564 snowpool@kubotasorunoAir ollama % mkdir outputs snowpool@kubotasorunoAir ollama % python pdf_ocr.py 2024-12-15 05:52:39,988 - yomitoku.base - INFO - Initialize TextDetector 2024-12-15 05:52:41,732 - yomitoku.base - INFO - Initialize TextRecognizer 2024-12-15 05:52:43,589 - yomitoku.base - INFO - Initialize LayoutParser 2024-12-15 05:52:44,413 - yomitoku.base - INFO - Initialize TableStructureRecognizer 2024-12-15 05:52:46,277 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.1299068927764893 2024-12-15 05:52:46,312 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.03462696075439453 2024-12-15 05:52:48,106 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 2.958970069885254 2024-12-15 05:53:01,007 - yomitoku.base - INFO - TextRecognizer __call__ elapsed_time: 12.900847911834717
となるが
最初のページのみしか処理されない
これは
load_pdf関数がPDFを画像に変換した際、最初のページだけを image[0] で指定しているため、
最初のページしか処理されていない状況です。
すべてのページを処理するには、PDF内のすべてのページをループするようにコードを修正します。
とのこと
これをコードを変えて全ページを実行してみるが
ページ数は147
処理開始が6時5分
とりあえず半分やるだけで1時間以上かかるので停止
次はレシートをcolabでやってみる