Yomitokuで写真やレシートを解析をColabで試す
Yomitokuで写真やレシートを解析してみる
で
Colab で行なってるのがあったので参考にする
出力形式によって得られる情報が異なるらしい
A100で実験する
https://github.com/kotaro-kinoshita/yomitoku
でライブラリが公開されているので
ドキュメントなどはこれを読む
基本的にGPUが必要
! pip install yomitoku
でライブラリインストール
最後にsession restartしろといわれたので、ダイアログボックスに促されるままrestartする
次に写真の用意
google colabなので、/content以下にimageフォルダを作成し、その中にJPGをいれ
フォルダをまるっと指定すると、その中の画像ファイルをすべて解析してくれます
!mkdir image
でフォルダ作成
ここに写真をアップロードする
aw10s/ollama/images/test.jpg
をアップロード
! yomitoku /content/image/ -f md -o results -v --figure
で解析
結果
2024-12-15 21:31:34,841 - yomitoku.base - INFO - Initialize TextDetector model.safetensors: 100% 102M/102M [00:04<00:00, 23.8MB/s] 2024-12-15 21:31:40,639 - yomitoku.base - INFO - Initialize TextRecognizer config.json: 100% 256/256 [00:00<00:00, 1.87MB/s] model.safetensors: 100% 200M/200M [00:08<00:00, 23.9MB/s] 2024-12-15 21:31:50,844 - yomitoku.base - INFO - Initialize LayoutParser model.safetensors: 100% 172M/172M [00:07<00:00, 23.5MB/s] 2024-12-15 21:31:59,554 - yomitoku.base - INFO - Initialize TableStructureRecognizer model.safetensors: 100% 172M/172M [00:03<00:00, 43.0MB/s] 2024-12-15 21:32:04,793 - yomitoku.cli.main - INFO - Output directory: results 2024-12-15 21:32:04,793 - yomitoku.cli.main - INFO - Processing file: /content/image/test.jpg 2024-12-15 21:32:06,818 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 1.9037435054779053 2024-12-15 21:32:07,114 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 2.19960355758667 2024-12-15 21:32:07,247 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.13303065299987793 2024-12-15 21:32:08,789 - yomitoku.base - INFO - TextRecognizer __call__ elapsed_time: 1.9707973003387451 2024-12-15 21:32:08,863 - yomitoku.cli.main - INFO - Output file: results/image_test_p1_ocr.jpg 2024-12-15 21:32:08,914 - yomitoku.cli.main - INFO - Output file: results/image_test_p1_layout.jpg 2024-12-15 21:32:08,927 - yomitoku.cli.main - INFO - Output file: results/image_test_p1.md 2024-12-15 21:32:08,928 - yomitoku.cli.main - INFO - Total Processing time: 4.13 sec
結果は自動作成される results の中にある
ファイル名_p1.md
という感じで出力される
image_test_p1.md
の中身は
<img src="figures/image_test_p1_figure_0.png" width="200px"><br> 登録番号 T5080401017738<br>とれたて食楽部<br>静岡県袋井市山名町3\-3<br>TEL 0538\-41\-1100 2024年 8月10日\(土\)08:59 \#000011<br>000801精算機1<br>000801精算機1<br>3901 09:08<br>R9309<br>\#000003<br>お会計券<br>西澤<br>000008 ¥150<br>内8 ★きゅうり/鈴木 仁<br>P2023300101503 内8 ★きゅうり/小林宗作 P2055600101303 ¥130 内8 リーフレタス/\(有\)成神工 ¥216<br>P2086402402169 |小計<br>\(内税 8%対象額<br>買上点数|¥496| |-|-| ||¥496\)| ||3点| |合計|¥496| |\(税率 8%対象額|¥496\)| |\(内消費税等 8%|¥36\)| |課税事業者|| |\(税率 8%対象額|¥216\)| |\(内消費税等 8%|¥16\)| |免税事業者|| |\(税率 8%対象額|¥280\)| ¥496<br>クレジット<br>¥36\)<br>\(内消費税等 、内Sは軽減税率対象商品です。
これがOCRした内容
次に
出力形式によって得られる情報が異なるらしいので
JSON形式で出力する
! yomitoku /content/image/ -f json -o results -v --figure
結果は
2024-12-16 21:00:01,501 - yomitoku.base - INFO - Initialize TextDetector 2024-12-16 21:00:02,677 - yomitoku.base - INFO - Initialize TextRecognizer 2024-12-16 21:00:04,059 - yomitoku.base - INFO - Initialize LayoutParser 2024-12-16 21:00:05,086 - yomitoku.base - INFO - Initialize TableStructureRecognizer 2024-12-16 21:00:06,063 - yomitoku.cli.main - INFO - Output directory: results 2024-12-16 21:00:06,063 - yomitoku.cli.main - INFO - Processing file: /content/image/test.jpg 2024-12-16 21:00:07,057 - yomitoku.base - INFO - TextDetector __call__ elapsed_time: 0.9070024490356445 2024-12-16 21:00:07,207 - yomitoku.base - INFO - LayoutParser __call__ elapsed_time: 1.0565669536590576 2024-12-16 21:00:07,340 - yomitoku.base - INFO - TableStructureRecognizer __call__ elapsed_time: 0.1329059600830078 2024-12-16 21:00:08,864 - yomitoku.base - INFO - TextRecognizer __call__ elapsed_time: 1.8073179721832275 2024-12-16 21:00:08,940 - yomitoku.cli.main - INFO - Output file: results/image_test_p1_ocr.jpg 2024-12-16 21:00:08,992 - yomitoku.cli.main - INFO - Output file: results/image_test_p1_layout.jpg 2024-12-16 21:00:08,994 - yomitoku.cli.main - INFO - Output file: results/image_test_p1.json 2024-12-16 21:00:08,995 - yomitoku.cli.main - INFO - Total Processing time: 2.93 sec
出力されたファイルは
Image_test_p1.json
内容は
{ "figures": [ { "box": [ 586, 249, 1633, 503 ], "direction": "horizontal", "order": 0, "paragraphs": [] } ], "paragraphs": [ { "box": [ 638, 569, 1548, 872 ], "contents": "登録番号 T5080401017738\nとれたて食楽部\n静岡県袋井市山名町3-3\nTEL 0538-41-1100", "direction": "horizontal", "order": 1, "role": null }, { "box": [ 569, 924, 1614, 1080 ], "contents": "2024年 8月10日(土)08:59 #000011\n000801精算機1\n000801精算機1\n3901", "direction": "horizontal", "order": 2, "role": null }, { "box": [ 545, 1173, 1636, 1347 ], "contents": "09:08\nR9309\n#000003\nお会計券\n西澤\n000008", "direction": "horizontal", "order": 3, "role": null }, { "box": [ 516, 1387, 1688, 1577 ], "contents": "¥150\n内8 ★きゅうり/鈴木 仁\nP2023300101503", "direction": "horizontal", "order": 4, "role": null }, { "box": [ 485, 1558, 1448, 1660 ], "contents": "内8 ★きゅうり/小林宗作", "direction": "horizontal", "order": 5, "role": null }, { "box": [ 641, 1652, 1186, 1734 ], "contents": "P2055600101303", "direction": "horizontal", "order": 6, "role": null }, { "box": [ 1532, 1534, 1703, 1620 ], "contents": "¥130", "direction": "horizontal", "order": 7, "role": null }, { "box": [ 484, 1695, 1712, 1904 ], "contents": "内8 リーフレタス/(有)成神工 ¥216\nP2086402402169", "direction": "horizontal", "order": 8, "role": null }, { "box": [ 341, 3432, 1884, 3719 ], "contents": "¥496\nクレジット\n¥36)\n(内消費税等", "direction": "horizontal", "order": 10, "role": null }, { "box": [ 288, 3822, 1779, 4077 ], "contents": "、内Sは軽減税率対象商品です。", "direction": "horizontal", "order": 11, "role": null } ], "tables": [ { "box": [ 346, 1915, 1852, 3358 ], "cells": [ { "box": [ 347, 1921, 1279, 2310 ], "col": 1, "col_span": 1, "contents": "小計\n(内税 8%対象額\n買上点数", "row": 1, "row_span": 3 }, { "box": [ 1276, 1925, 1851, 2039 ], "col": 2, "col_span": 1, "contents": "¥496", "row": 1, "row_span": 1 }, { "box": [ 1276, 2043, 1851, 2133 ], "col": 2, "col_span": 1, "contents": "¥496)", "row": 2, "row_span": 1 }, { "box": [ 1276, 2152, 1851, 2289 ], "col": 2, "col_span": 1, "contents": "3点", "row": 3, "row_span": 1 }, { "box": [ 347, 2296, 1280, 2426 ], "col": 1, "col_span": 1, "contents": "合計", "row": 4, "row_span": 1 }, { "box": [ 1276, 2296, 1851, 2426 ], "col": 2, "col_span": 1, "contents": "¥496", "row": 4, "row_span": 1 }, { "box": [ 348, 2424, 1280, 2526 ], "col": 1, "col_span": 1, "contents": "(税率 8%対象額", "row": 5, "row_span": 1 }, { "box": [ 1276, 2424, 1851, 2526 ], "col": 2, "col_span": 1, "contents": "¥496)", "row": 5, "row_span": 1 }, { "box": [ 347, 2540, 1280, 2695 ], "col": 1, "col_span": 1, "contents": "(内消費税等 8%", "row": 6, "row_span": 1 }, { "box": [ 1276, 2540, 1851, 2695 ], "col": 2, "col_span": 1, "contents": "¥36)", "row": 6, "row_span": 1 }, { "box": [ 348, 2705, 1280, 2852 ], "col": 1, "col_span": 1, "contents": "課税事業者", "row": 7, "row_span": 1 }, { "box": [ 1276, 2705, 1851, 2852 ], "col": 2, "col_span": 1, "contents": "", "row": 7, "row_span": 1 }, { "box": [ 348, 2865, 1280, 2951 ], "col": 1, "col_span": 1, "contents": "(税率 8%対象額", "row": 8, "row_span": 1 }, { "box": [ 1276, 2865, 1851, 2951 ], "col": 2, "col_span": 1, "contents": "¥216)", "row": 8, "row_span": 1 }, { "box": [ 347, 2980, 1280, 3102 ], "col": 1, "col_span": 1, "contents": "(内消費税等 8%", "row": 9, "row_span": 1 }, { "box": [ 1276, 2980, 1851, 3102 ], "col": 2, "col_span": 1, "contents": "¥16)", "row": 9, "row_span": 1 }, { "box": [ 348, 3103, 1280, 3202 ], "col": 1, "col_span": 1, "contents": "免税事業者", "row": 10, "row_span": 1 }, { "box": [ 1276, 3103, 1851, 3202 ], "col": 2, "col_span": 1, "contents": "", "row": 10, "row_span": 1 }, { "box": [ 347, 3211, 1280, 3355 ], "col": 1, "col_span": 1, "contents": "(税率 8%対象額", "row": 11, "row_span": 1 }, { "box": [ 1276, 3211, 1851, 3355 ], "col": 2, "col_span": 1, "contents": "¥280)", "row": 11, "row_span": 1 } ], "n_col": 2, "n_row": 11, "order": 9 } ], "words": [ { "content": "、内Sは軽減税率対象商品です。", "det_score": 0.6796720981104631, "direction": "horizontal", "points": [ [ 431, 3885 ], [ 1752, 3813 ], [ 1759, 3925 ], [ 438, 3997 ] ], "rec_score": 0.03351603075861931 }, { "content": "(内消費税等", "det_score": 0.8627398538033404, "direction": "horizontal", "points": [ [ 871, 3594 ], [ 1378, 3577 ], [ 1381, 3686 ], [ 875, 3704 ] ], "rec_score": 0.979337215423584 }, { "content": "¥36)", "det_score": 0.7510144629009029, "direction": "horizontal", "points": [ [ 1694, 3557 ], [ 1881, 3557 ], [ 1881, 3664 ], [ 1694, 3664 ] ], "rec_score": 0.9134319424629211 }, { "content": "クレジット", "det_score": 0.8124990954358461, "direction": "horizontal", "points": [ [ 350, 3484 ], [ 1270, 3465 ], [ 1272, 3569 ], [ 352, 3589 ] ], "rec_score": 0.9985630512237549 }, { "content": "¥496", "det_score": 0.8485499834607593, "direction": "horizontal", "points": [ [ 1463, 3438 ], [ 1843, 3429 ], [ 1846, 3534 ], [ 1465, 3543 ] ], "rec_score": 0.9998164176940918 }, { "content": "(税率 8%対象額", "det_score": 0.853462993147711, "direction": "horizontal", "points": [ [ 459, 3234 ], [ 1102, 3225 ], [ 1103, 3327 ], [ 460, 3336 ] ], "rec_score": 0.9029322266578674 }, { "content": "¥280)", "det_score": 0.8697890050123379, "direction": "horizontal", "points": [ [ 1632, 3182 ], [ 1858, 3182 ], [ 1858, 3295 ], [ 1632, 3295 ] ], "rec_score": 0.9970043301582336 }, { "content": "免税事業者", "det_score": 0.8758680560327103, "direction": "horizontal", "points": [ [ 357, 3103 ], [ 831, 3112 ], [ 829, 3219 ], [ 355, 3210 ] ], "rec_score": 0.9983263611793518 }, { "content": "(内消費税等 8%", "det_score": 0.8655238629057851, "direction": "horizontal", "points": [ [ 475, 2996 ], [ 1100, 2996 ], [ 1100, 3093 ], [ 475, 3093 ] ], "rec_score": 0.908987820148468 }, { "content": "¥16)", "det_score": 0.8736068132308011, "direction": "horizontal", "points": [ [ 1663, 2953 ], [ 1837, 2953 ], [ 1837, 3060 ], [ 1663, 3060 ] ], "rec_score": 0.9977002143859863 }, { "content": "(税率 8%対象額", "det_score": 0.8736011394336691, "direction": "horizontal", "points": [ [ 488, 2884 ], [ 1100, 2884 ], [ 1100, 2981 ], [ 488, 2981 ] ], "rec_score": 0.9527927041053772 }, { "content": "¥216)", "det_score": 0.8798753646381205, "direction": "horizontal", "points": [ [ 1613, 2841 ], [ 1827, 2835 ], [ 1830, 2940 ], [ 1616, 2946 ] ], "rec_score": 0.9991167187690735 }, { "content": "課税事業者", "det_score": 0.8782089667459907, "direction": "horizontal", "points": [ [ 383, 2761 ], [ 839, 2773 ], [ 836, 2875 ], [ 381, 2863 ] ], "rec_score": 0.9996485114097595 }, { "content": "(内消費税等 8%", "det_score": 0.8545332324999831, "direction": "horizontal", "points": [ [ 422, 2552 ], [ 1015, 2569 ], [ 1012, 2666 ], [ 419, 2649 ] ], "rec_score": 0.9030265808105469 }, { "content": "¥36)", "det_score": 0.874239359391371, "direction": "horizontal", "points": [ [ 1637, 2517 ], [ 1801, 2517 ], [ 1801, 2614 ], [ 1637, 2614 ] ], "rec_score": 0.9990556836128235 }, { "content": "(税率 8%対象額", "det_score": 0.8743577414363437, "direction": "horizontal", "points": [ [ 430, 2455 ], [ 1018, 2472 ], [ 1015, 2556 ], [ 427, 2539 ] ], "rec_score": 0.9258973002433777 }, { "content": "¥496)", "det_score": 0.873335879837346, "direction": "horizontal", "points": [ [ 1584, 2421 ], [ 1787, 2411 ], [ 1792, 2503 ], [ 1589, 2513 ] ], "rec_score": 0.9990760087966919 }, { "content": "合計", "det_score": 0.7989976852002054, "direction": "horizontal", "points": [ [ 422, 2351 ], [ 774, 2357 ], [ 772, 2459 ], [ 420, 2453 ] ], "rec_score": 0.9596889615058899 }, { "content": "¥496", "det_score": 0.84043764577154, "direction": "horizontal", "points": [ [ 1423, 2337 ], [ 1756, 2319 ], [ 1761, 2406 ], [ 1428, 2424 ] ], "rec_score": 0.9845718741416931 }, { "content": "買上点数", "det_score": 0.8724811626703719, "direction": "horizontal", "points": [ [ 474, 2164 ], [ 821, 2176 ], [ 818, 2263 ], [ 471, 2251 ] ], "rec_score": 0.8529166579246521 }, { "content": "3点", "det_score": 0.877569906485891, "direction": "horizontal", "points": [ [ 1611, 2127 ], [ 1751, 2127 ], [ 1751, 2221 ], [ 1611, 2221 ] ], "rec_score": 0.999936580657959 }, { "content": "(内税 8%対象額", "det_score": 0.8549906700132722, "direction": "horizontal", "points": [ [ 466, 2070 ], [ 1026, 2084 ], [ 1023, 2171 ], [ 464, 2157 ] ], "rec_score": 0.6945281624794006 }, { "content": "¥496)", "det_score": 0.8776254550615946, "direction": "horizontal", "points": [ [ 1566, 2039 ], [ 1756, 2028 ], [ 1761, 2121 ], [ 1571, 2131 ] ], "rec_score": 0.9979957938194275 }, { "content": "小計", "det_score": 0.8659234180608216, "direction": "horizontal", "points": [ [ 483, 1981 ], [ 677, 1981 ], [ 677, 2076 ], [ 483, 2076 ] ], "rec_score": 0.66346675157547 }, { "content": "¥496", "det_score": 0.8556146869165837, "direction": "horizontal", "points": [ [ 1558, 1955 ], [ 1732, 1943 ], [ 1738, 2028 ], [ 1563, 2039 ] ], "rec_score": 0.9997748136520386 }, { "content": "P2086402402169", "det_score": 0.8685337040086014, "direction": "horizontal", "points": [ [ 628, 1823 ], [ 1186, 1823 ], [ 1186, 1897 ], [ 628, 1897 ] ], "rec_score": 0.994750440120697 }, { "content": "内8 リーフレタス/(有)成神工 ¥216", "det_score": 0.7733530013156038, "direction": "horizontal", "points": [ [ 471, 1719 ], [ 1716, 1690 ], [ 1718, 1800 ], [ 474, 1829 ] ], "rec_score": 0.487512469291687 }, { "content": "P2055600101303", "det_score": 0.8620755339288183, "direction": "horizontal", "points": [ [ 641, 1658 ], [ 1185, 1652 ], [ 1186, 1729 ], [ 641, 1734 ] ], "rec_score": 0.9976783394813538 }, { "content": "内8 ★きゅうり/小林宗作", "det_score": 0.8030553586848483, "direction": "horizontal", "points": [ [ 485, 1558 ], [ 1448, 1558 ], [ 1448, 1660 ], [ 485, 1660 ] ], "rec_score": 0.8243062496185303 }, { "content": "¥130", "det_score": 0.8647632946312013, "direction": "horizontal", "points": [ [ 1532, 1541 ], [ 1700, 1534 ], [ 1703, 1613 ], [ 1536, 1620 ] ], "rec_score": 0.9999488592147827 }, { "content": "P2023300101503", "det_score": 0.8535739641137016, "direction": "horizontal", "points": [ [ 654, 1502 ], [ 1183, 1497 ], [ 1183, 1566 ], [ 654, 1571 ] ], "rec_score": 0.9996684789657593 }, { "content": "内8 ★きゅうり/鈴木 仁", "det_score": 0.7407760786246895, "direction": "horizontal", "points": [ [ 503, 1413 ], [ 1447, 1402 ], [ 1448, 1494 ], [ 504, 1505 ] ], "rec_score": 0.9217264652252197 }, { "content": "¥150", "det_score": 0.8529803361907365, "direction": "horizontal", "points": [ [ 1521, 1386 ], [ 1685, 1376 ], [ 1691, 1455 ], [ 1527, 1465 ] ], "rec_score": 0.9999033212661743 }, { "content": "000008", "det_score": 0.8420085177533121, "direction": "horizontal", "points": [ [ 918, 1276 ], [ 1151, 1269 ], [ 1153, 1341 ], [ 920, 1347 ] ], "rec_score": 0.9971499443054199 }, { "content": "西澤", "det_score": 0.8301356971000148, "direction": "horizontal", "points": [ [ 1198, 1261 ], [ 1360, 1254 ], [ 1363, 1333 ], [ 1201, 1340 ] ], "rec_score": 0.9971343278884888 }, { "content": "お会計券", "det_score": 0.8479825696054877, "direction": "horizontal", "points": [ [ 540, 1209 ], [ 835, 1209 ], [ 835, 1285 ], [ 540, 1285 ] ], "rec_score": 0.9989036917686462 }, { "content": "#000003", "det_score": 0.832013468600706, "direction": "horizontal", "points": [ [ 884, 1207 ], [ 1148, 1198 ], [ 1150, 1269 ], [ 886, 1278 ] ], "rec_score": 0.9912707805633545 }, { "content": "R9309", "det_score": 0.8094638501787745, "direction": "horizontal", "points": [ [ 1195, 1189 ], [ 1392, 1182 ], [ 1394, 1254 ], [ 1198, 1261 ] ], "rec_score": 0.9985978603363037 }, { "content": "09:08", "det_score": 0.8513526355901968, "direction": "horizontal", "points": [ [ 1442, 1174 ], [ 1635, 1167 ], [ 1638, 1241 ], [ 1444, 1248 ] ], "rec_score": 0.9983935952186584 }, { "content": "3901", "det_score": 0.8389774914979935, "direction": "horizontal", "points": [ [ 1465, 1031 ], [ 1622, 1024 ], [ 1625, 1100 ], [ 1468, 1108 ] ], "rec_score": 0.9998389482498169 }, { "content": "000801精算機1", "det_score": 0.8261694877181531, "direction": "horizontal", "points": [ [ 557, 1015 ], [ 1029, 1004 ], [ 1031, 1068 ], [ 559, 1079 ] ], "rec_score": 0.8924864530563354 }, { "content": "000801精算機1", "det_score": 0.7923262641892176, "direction": "horizontal", "points": [ [ 1130, 993 ], [ 1596, 965 ], [ 1599, 1024 ], [ 1134, 1052 ] ], "rec_score": 0.8452281355857849 }, { "content": "2024年 8月10日(土)08:59 #000011", "det_score": 0.76677570112371, "direction": "horizontal", "points": [ [ 567, 950 ], [ 1601, 899 ], [ 1605, 971 ], [ 571, 1021 ] ], "rec_score": 0.7705328464508057 }, { "content": "TEL 0538-41-1100", "det_score": 0.7891764333521688, "direction": "horizontal", "points": [ [ 899, 822 ], [ 1425, 800 ], [ 1428, 856 ], [ 902, 878 ] ], "rec_score": 0.8409962058067322 }, { "content": "静岡県袋井市山名町3-3", "det_score": 0.8232278183336981, "direction": "horizontal", "points": [ [ 773, 750 ], [ 1545, 729 ], [ 1547, 795 ], [ 774, 817 ] ], "rec_score": 0.9089908003807068 }, { "content": "とれたて食楽部", "det_score": 0.8368622312565417, "direction": "horizontal", "points": [ [ 905, 666 ], [ 1359, 657 ], [ 1360, 732 ], [ 906, 740 ] ], "rec_score": 0.7296138405799866 }, { "content": "登録番号 T5080401017738", "det_score": 0.8375435412981505, "direction": "horizontal", "points": [ [ 640, 582 ], [ 1384, 566 ], [ 1386, 637 ], [ 642, 653 ] ], "rec_score": 0.6307746171951294 } ] }
となっている
この2つのファイルを使って
日付,店名,商品名,数量,金額
を抽出し
CSVファイルにできるか試すことにする
chatGPTでは目的通りのCSVファイルが作成された
次は他のレシートでも実験してみる
~/Downloads/Photos-001/
に色々Google Photo の写真があるので試す
ぴあごも問題なく抽出できる
ココカラファインもできる
ただし単品の金額ではなく
合計金額になっているので
プロンプトを変えることにする
データから日付,店名,商品名,数量,金額 を抽出し 金額の部分は単品の金額にして CSVファイルにして
としたが
これだとものによっては金額が単品になっていない
これは杏林堂などのレシートでも同じ
データから日付,店名,商品名,数量,単品価格 を抽出し CSVファイルにして
とプロンプトを変えても変わらなかった