これはなにをしたくて書いたもの?
前にPyMySQLで大量にデータを登録しようとして、遅いな…?と思ってConnector/Jでバルクinsertを使うように切り替えたことがあるの
ですが、PyMySQLでもexecutemanyを使えば同じことができそうだったので試してみることにしました。
executemanyでinsert文またはreplace文が高速になるという話
APIリファレンスのexecutemany
の説明に思いきり書かれていましたね。
This method improves performance on multiple-row INSERT and REPLACE. Otherwise it is equivalent to looping over args with execute().
今回はこちらをinsert文で試してみます。
環境
今回の環境はこちら。
$ uv --version uv 0.5.14 $ python3 --version Python 3.12.3
MySQLは172.17.0.2でアクセスできるものとします。
MySQL localhost:3306 ssl practice SQL > select version(); +-----------+ | version() | +-----------+ | 8.4.3 | +-----------+ 1 row in set (0.0007 sec)
お題
以下のテーブルにデータを100万レコード登録する時に、execute
を使うパターンとexecutemany
を使うパターンでパフォーマンスが
どう変わるかを見ていこうと思います。
create table test( id int, c1 varchar(36), c2 varchar(36), primary key(id) );
プライマリーキーは単純増加の整数、あとの2つのカラムは値が欲しいだけなのでUUID バージョン4にします。
計測してみる
計測はテストコードで行うことにします。
プロジェクトを作成。
$ uv init pymysql-bulk-insert $ cd pymysql-bulk-insert
使用するライブラリーを追加。
$ uv add PyMySQL[rsa] $ uv add --dev pytest mypy types-PyMySQL
pyproject.toml
[project] name = "pymysql-bulk-insert" version = "0.1.0" description = "Add your description here" readme = "README.md" requires-python = ">=3.12" dependencies = [ "pymysql[rsa]>=1.1.1", ] [dependency-groups] dev = [ "mypy>=1.14.1", "pytest>=8.3.4", "types-pymysql>=1.1.0.20241103", ] [tool.mypy] strict = true disallow_any_unimported = true disallow_any_expr = true disallow_any_explicit = true warn_unreachable = true pretty = true
インストールされたライブラリーの一覧。
$ uv pip list Package Version ----------------- -------------- cffi 1.17.1 cryptography 44.0.0 iniconfig 2.0.0 mypy 1.14.1 mypy-extensions 1.0.0 packaging 24.2 pluggy 1.5.0 pycparser 2.22 pymysql 1.1.1 pytest 8.3.4 types-pymysql 1.1.0.20241103 typing-extensions 4.12.2
テストコードを作成して計測してみる
それでは、テストコードを作成して計測してみます。以下のようなテストコードを作成。
test_mysql_insert.py
import pymysql from pymysql.connections import Connection from pymysql.cursors import DictCursor import pytest import time import uuid def create_connection() -> "Connection[DictCursor]": DictCursorType: type[DictCursor] = DictCursor return pymysql.connect( host="172.17.0.2", port=3306, user="kazuhira", password="password", database="practice", cursorclass=DictCursorType ) @pytest.fixture() def truncate() -> None: with create_connection() as connection: with connection.cursor() as cursor: cursor.execute("truncate table test") def test_insert_use_execute(truncate: None) -> None: with create_connection() as connection: with connection.cursor() as cursor: record = 1000000 commit_threshold = 10000 start = time.time() lap_start = start print("start single row insert scenario") for i in range(0, record): cursor.execute("insert into test(id, c1, c2) values(%s, %s, %s)", (i + 1, str(uuid.uuid4()), str(uuid.uuid4()))) if i > 0 and i % commit_threshold == 0: connection.commit() lap_elapsed = time.time() - lap_start print(f"[single] {i} records committed, lap time = {lap_elapsed} sec") lap_start = time.time() connection.commit() elapsed = time.time() - start print(f"[single] {record} records committed, elapsed time = {elapsed} sec") print("end single row insert scenario") def test_insert_use_execute_many(truncate: None) -> None: with create_connection() as connection: with connection.cursor() as cursor: record = 1000000 commit_threshold = 10000 batch_size = 50 start = time.time() lap_start = start print("start multiple row insert scenario") batch_parameters: list[tuple[int, str, str]] = [] for i in range(0, record): batch_parameters.append((i + 1, str(uuid.uuid4()), str(uuid.uuid4()))) if i > 0 and i % batch_size == 0: cursor.executemany("insert into test(id, c1, c2) values(%s, %s, %s)", batch_parameters); batch_parameters = [] if i > 0 and i % commit_threshold == 0: connection.commit() lap_elapsed = time.time() - lap_start print(f"[multi] {i} records committed, lap time = {lap_elapsed} sec") lap_start = time.time() if batch_parameters: cursor.executemany("insert into test(id, c1, c2) values(%s, %s, %s)", batch_parameters); connection.commit() elapsed = time.time() - start print(f"[multi] {record} records committed, elapsed time = {elapsed} sec") print("end multiple row insert scenario")
1レコードずつinsert文を実行するテストと、50レコードまとめてexecutemany
でinsert文を実行するテストの2つを用意しました。
コミット件数は1万レコードごとにしています。
それぞれのテストの前にはtruncate tableを実行するようにしています。
@pytest.fixture() def truncate() -> None: with create_connection() as connection: with connection.cursor() as cursor: cursor.execute("truncate table test")
こちらが1レコードずつinsert文を実行するテスト。
def test_insert_use_execute(truncate: None) -> None: with create_connection() as connection: with connection.cursor() as cursor: record = 1000000 commit_threshold = 10000 start = time.time() lap_start = start print("start single row insert scenario") for i in range(0, record): cursor.execute("insert into test(id, c1, c2) values(%s, %s, %s)", (i + 1, str(uuid.uuid4()), str(uuid.uuid4()))) if i > 0 and i % commit_threshold == 0: connection.commit() lap_elapsed = time.time() - lap_start print(f"[single] {i} records committed, lap time = {lap_elapsed} sec") lap_start = time.time() connection.commit() elapsed = time.time() - start print(f"[single] {record} records committed, elapsed time = {elapsed} sec") print("end single row insert scenario")
こちらがexecutemany
でまとめてinsert文を実行するテスト。まとめる件数は50レコードずつにしています。
def test_insert_use_execute_many(truncate: None) -> None: with create_connection() as connection: with connection.cursor() as cursor: record = 1000000 commit_threshold = 10000 batch_size = 50 start = time.time() lap_start = start print("start multiple row insert scenario") batch_parameters: list[tuple[int, str, str]] = [] for i in range(0, record): batch_parameters.append((i + 1, str(uuid.uuid4()), str(uuid.uuid4()))) if i > 0 and i % batch_size == 0: cursor.executemany("insert into test(id, c1, c2) values(%s, %s, %s)", batch_parameters); batch_parameters = [] if i > 0 and i % commit_threshold == 0: connection.commit() lap_elapsed = time.time() - lap_start print(f"[multi] {i} records committed, lap time = {lap_elapsed} sec") lap_start = time.time() if batch_parameters: cursor.executemany("insert into test(id, c1, c2) values(%s, %s, %s)", batch_parameters); connection.commit() elapsed = time.time() - start print(f"[multi] {record} records committed, elapsed time = {elapsed} sec") print("end multiple row insert scenario")
それぞれのテストにはラップタイムと全体の実行時間を出力するようにしているので、その相対比較とします。
では、テストを実行します。
$ uv run pytest --capture=no
1レコードずつinsert文を実行した場合の結果は、約3分でした。
[single] 1000000 records committed, elapsed time = 183.3452467918396 sec
executemany
でまとめてinsert文を実行した場合では、50秒弱と3倍以上速くなりました。
[multi] 1000000 records committed, elapsed time = 49.93781518936157 sec
確かに効果ありですね。
ラップタイムはこんな感じです。1レコードずつinsert文を実行する場合。
test_mysql_insert.py start single row insert scenario [single] 10000 records committed, lap time = 2.8672566413879395 sec [single] 20000 records committed, lap time = 1.7911808490753174 sec [single] 30000 records committed, lap time = 1.7744178771972656 sec [single] 40000 records committed, lap time = 1.9135522842407227 sec [single] 50000 records committed, lap time = 1.8735332489013672 sec [single] 60000 records committed, lap time = 1.836503505706787 sec [single] 70000 records committed, lap time = 1.8148603439331055 sec [single] 80000 records committed, lap time = 1.7599217891693115 sec [single] 90000 records committed, lap time = 1.7711296081542969 sec [single] 100000 records committed, lap time = 1.9285857677459717 sec 〜省略〜 [single] 900000 records committed, lap time = 1.7326405048370361 sec [single] 910000 records committed, lap time = 1.7279140949249268 sec [single] 920000 records committed, lap time = 1.8290126323699951 sec [single] 930000 records committed, lap time = 1.745462417602539 sec [single] 940000 records committed, lap time = 1.920597791671753 sec [single] 950000 records committed, lap time = 1.7457966804504395 sec [single] 960000 records committed, lap time = 1.9177658557891846 sec [single] 970000 records committed, lap time = 1.7535102367401123 sec [single] 980000 records committed, lap time = 1.7440297603607178 sec [single] 990000 records committed, lap time = 1.767449140548706 sec [single] 1000000 records committed, elapsed time = 183.3452467918396 sec end single row insert scenario
executemany
でまとめてinsert文を実行した場合は、時々1秒を超えたりして揺らいでいましたが、概ねこんな感じです。
start multiple row insert scenario [multi] 10000 records committed, lap time = 1.415905237197876 sec [multi] 20000 records committed, lap time = 0.41378355026245117 sec [multi] 30000 records committed, lap time = 0.45179271697998047 sec [multi] 40000 records committed, lap time = 0.45946836471557617 sec [multi] 50000 records committed, lap time = 0.6484074592590332 sec [multi] 60000 records committed, lap time = 0.38768482208251953 sec [multi] 70000 records committed, lap time = 0.9320194721221924 sec [multi] 80000 records committed, lap time = 0.39597082138061523 sec [multi] 90000 records committed, lap time = 0.3991262912750244 sec [multi] 100000 records committed, lap time = 0.3714487552642822 sec 〜省略〜 [multi] 900000 records committed, lap time = 0.45747923851013184 sec [multi] 910000 records committed, lap time = 0.36601924896240234 sec [multi] 920000 records committed, lap time = 0.8718698024749756 sec [multi] 930000 records committed, lap time = 0.384235143661499 sec [multi] 940000 records committed, lap time = 0.39829087257385254 sec [multi] 950000 records committed, lap time = 0.3484499454498291 sec [multi] 960000 records committed, lap time = 0.6414399147033691 sec [multi] 970000 records committed, lap time = 0.33321499824523926 sec [multi] 980000 records committed, lap time = 0.38330674171447754 sec [multi] 990000 records committed, lap time = 0.3649733066558838 sec [multi] 1000000 records committed, elapsed time = 49.93781518936157 sec end multiple row insert scenario
MySQL側の一般ログで確認する
最後に、MySQL側の一般ログでも確認してみましょう。
こんな感じに設定。
general_log = ON general_log_file = /var/log/mysql/server.log
executemany
でまとめてinsert文を実行した時に記録されたログを見てみると、確かにバルクinsertになっています。
2025-01-03T13:39:46.864748Z 9 Query insert into test(id, c1, c2) values(990002, 'ea1fbd28-5a0f-495e-bfb4-4c68739cce17', 'dc633b77-9763-4b8f-b91c-1d1cec35c664'),(990003, '46194d03-888b-4168-90f3-0f82822af169', '976de47e-7672-4d3e-b677-8f4e39e6c719'),(990004, 'f2bdb865-72f8-4dbc-b506-f6fa2c88ce3d', '47a03465-9c8f-4698-a8a4-fedcf91594b2'),(990005, 'bda7fd72-ee20-4a18-b7d9-031c50df4662', 'a70385bf-c3ce-464e-934d-ee307d5ab001'),(990006, '0c69a7a7-6556-44c6-9958-900990aab326', 'cad2369f-103e-46fc-89d6-b7345ec983ab'),(990007, '51215433-5015-4b06-a72a-fabbe8a2007b', 'c294b9e2-434c-4f2e-8a77-ade2f5a93d81'),(990008, '6e070328-c156-47e7-8268-1828731d4677', '097d0542-247e-4ab3-9030-5eec12de9a91'),(990009, '9224b133-7d52-4097-9f85-8a79fded6732', '6a1d7b4c-0282-41d9-804a-9ee07518dc59'),(990010, '82896d8e-5dbd-49e8-b0d3-ddc959ebcca9', '3adb588e-3e58-45b2-b17d-4224b9dbd1f9'),(990011, 'f991372b-ac0b-4c6f-b894-60b3a7ec241b', '799f61f8-96a1-42ab-af0d-cb1551591fe7'),(990012, '84e884c0-0c78-48d3-8c55-72c0e81ce38e', '94270265-8e7b-47c9-8616-cda7e4e08c7a'),(990013, 'e38bb4f2-6dd7-4bb6-815d-ef6ca4aae323', '3c159cf1-9aea-45d1-96bc-7e3fe5189dca'),(990014, '31dee250-7f94-4327-89ef-dd83d26ffe09', '41ce6dd7-e3b3-4047-b63a-44ef1e392366'),(990015, '852c20ea-a112-4f44-8ad1-d4b97b3347cd', '1f2399f6-f434-43a7-ad64-a2c1e5f2db6d'),(990016, '1c8f56d0-075c-4864-9760-7e76bd5e9823', 'cf1d7f8d-5260-4b0c-b93d-cad0736f74c4'),(990017, '34f51ab4-9e59-41d6-9e5c-184f73557647', '17e92e2a-4b26-4312-b096-2ac79843b1e6'),(990018, 'ccba2ed6-711d-4b84-a7f4-a22375b575d7', '0ab496dd-cd8a-4607-a8a9-8d0ab28bd712'),(990019, 'e0d1d999-0057-44fd-8126-93d22969aa67', 'b0e00d58-e1ff-4c4e-907f-adbeece57a75'),(990020, '51c4c271-0c45-4b99-8637-eaf78ed45e6f', '89676000-294b-4d74-9546-504abe35dd2e'),(990021, '5ba0b0c7-0aa2-4cce-bb7b-052f2a26bcfd', '4b18a7c7-6e15-4ce7-98b2-5b8da3dfafb0'),(990022, 'e36fffb0-f452-439c-bfda-52405a45ff6c', '574c787c-9a08-4478-9078-e34d8f58affc'),(990023, '625d48bc-e588-438b-aa44-a3c6b06c9853', 'a39a2319-519b-406f-bb37-d8e7a98da7fb'),(990024, 'd8936fe5-af34-49fd-bcfd-f13b3c01ad11', 'ccf4b12d-7c1b-4271-b86b-e1dd76599e5a'),(990025, '5f67c52b-2360-4fb5-9f31-1adbbc4f7ba2', '5678c561-6773-45f1-9276-37852d64c07d'),(990026, '9c3b3c78-bbae-4f25-a8cb-f53cb2eaaab4', '2a66a144-c47c-47e6-82e6-11ec1854f8f3'),(990027, '0fa4bbba-ce4f-44e3-a273-0fe4615a12b4', '6e17fcda-6eb8-4b2f-82ac-05faa296b6b7'),(990028, '496b599d-0817-4535-9ae4-e7121c818643', '7c32d94f-20da-4b0c-8440-3bba916ff65a'),(990029, '20cbc26a-fd06-46f9-8a01-20acc1e8632e', '6f11d2d2-29e7-4a51-94f1-e7e1b47a9083'),(990030, 'cd86d3b0-b32e-4fef-b63d-27c24a021b29', 'c2ceb1be-8e1d-40de-94d6-e0c1a8938467'),(990031, '87465b12-5876-4947-8b33-eb22ddf66189', '88feabae-deb1-4d31-965d-6eac34a54c48'),(990032, 'b67763d4-b83d-4e50-ad75-82eb7020b150', '179e20ce-2495-4b7e-b736-d2fea911ad29'),(990033, '89d2a0a9-1a00-4ded-b3d5-9d2613f6620a', 'bcc4f556-87a0-421f-b5e7-23fd5d36671a'),(990034, 'd96f44de-e40d-4f23-82f0-0a1e5d6b51ec', 'dba984e3-21c4-4266-b41d-c90ed779729a'),(990035, '2c6edccf-4d78-4682-a06d-f15d4de302bb', '8f650284-93fb-422d-8f70-a39fa186c1cc'),(990036, 'a0f4ada2-f7ca-43fa-95d7-fecc2650a2d6', 'a044737f-4663-4a3c-99d8-72229b1967fe'),(990037, '34a90772-eb41-4831-89c1-d333349ccda9', 'e6682ca1-c9ba-4cf2-a717-1b3ecb2e8c8a'),(990038, 'c9c4e109-4e21-460d-a619-828103ad1d84', '74bfc066-8ec8-45d7-8c88-be3c9a1daf50'),(990039, '36339d4f-09e2-48e2-9457-235848bae1b7', '77e1fab3-b4e9-4fa9-8ed8-1d712cbfff7d'),(990040, 'ae09bd16-67a5-4829-b7c7-1a555e98adb4', '7c87071b-4874-4f88-af30-9c8a9ce6f40e'),(990041, '54566038-d8de-4f89-8243-60422224492d', '0f86f249-01d6-4287-89cb-e8b379ff909e'),(990042, '91651609-6f1c-4c2f-91e9-1bc2bb5afe5a', '5ed76807-ebc2-4994-9b3b-21e818da8b93'),(990043, 'd2f2e9d1-f7f8-44f2-a5b6-cbab3b04bbb4', 'c7049caf-ec48-444d-a3cf-4cf98375e05d'),(990044, 'd6b06593-84f2-49b0-9bb2-520075cb7b68', 'ce10bcf4-e31c-42fd-b97c-e741e1f0e879'),(990045, '2f0efd11-18a2-41d2-aa35-7257fef8c811', '537e62c4-52c9-4ed0-80cb-238179597503'),(990046, '8be3da9d-cdf0-42f6-9d37-0c3e9c8712bb', '5cd23712-9994-475c-b322-fddc386f0fdb'),(990047, '7be0dcd3-daff-4a18-932c-111112426c24', '8a18a858-4ca1-42ad-919e-f4441da9db6a'),(990048, '95b19db9-1c86-4c4c-8aea-97560f07e390', '63c9843f-d155-4368-aa2a-e0e8a43a4ef7'),(990049, 'ef4ee4a8-98d4-423f-a690-6135a6284f07', 'c108c2f1-2bdf-4973-b08a-c77f1ed0b515'),(990050, '0d48e642-2278-4c08-9cf2-0ff581de81f0', '6009f22d-0813-448e-a0b7-ba7f197a38f3'),(990051, 'ba050edb-448d-4368-9f8f-659d5f3f16c1', '1ef94897-5b25-48ba-9609-799b4de656b1')
よさそうですね。
おわりに
PyMySQLでinsert文(またはreplace文)をexecutemany
を使うことで高速に実行できることを確認してみました。
さらっと書かれているので、見落としやすい気がするのですが…覚えておいた方がPythonからMySQLを使う時には便利でしょうね。