ããã¯ããªã«ãããããŠæžãããã®ïŒ
MySQLã«CSVããŒããããããšããæã«ãLOAD DATA
ã䜿ã£ããããµã€ãã«INSERTæã䜿ã£ãããšããã€ãæ¹æ³ã
ãããšæãã®ã§ãããã©ããããéããã®ã ããïŒãšããããšã§è©ŠããŠã¿ãããšã«ããŸããã
ãé¡
CSVãã¡ã€ã«ãçšæããŠã以äžã®æ¹æ³ãè©ŠããŠã¿ãŸãã
LOAD DATA
ïŒæ£ç¢ºã«ã¯ãLOAD DATA LOCAL
ïŒ- Parallel Table Import UtilityïŒMySQL ShellïŒ
- CSVãèªã¿èŸŒã¿ãINSERTæãå®è¡ããJavaããã°ã©ã
- ã·ã³ãã«ã«INSERTãå®è¡
- JDBCã§ã®ãããæŽæ°
rewriteBatchedStatements
ãtrue
ã«ããŠããããæŽæ°
ããŒã¿ã«ã¯ãStack Overflowã®ãŠãŒã¶ãŒæ å ±ã4ã«ã©ã ã®ã¿ã50äžä»¶ã ãåãåºããŠCSVãã¡ã€ã«ã«ãããã®ã䜿ããŸãã
ããŸãããŒã¿éèªäœã¯å€§ããããšããªãã®ã§ããããããªã«æ°é·ã«åŸ ãŠãªãã£ãããã§ãâŠã
æåã«ãããããã®æ¹æ³ã«ã€ããŠæžããŠãããŸãã
LOAD DATA
LOAD DATA
ã¯ãããã¹ããã¡ã€ã«ããMySQLã®ããŒãã«ãžããŒã¿ãããŒãããæã§ãã
SQL ã¹ããŒãã¡ã³ã / LOAD DATA ã¹ããŒãã¡ã³ã
éåžžã®INSERTæãããã20åéãããã§ãã
ããã¹ããã¡ã€ã«ããããŒãã«ãããŒãããå Žåã¯ãLOAD DATA ã䜿çšããŸãã éåžžããã㯠INSERT ã¹ããŒãã¡ã³ãã䜿çšããå Žåããã20 åéããªããŸãã
æé©å / INSERT ã¹ããŒãã¡ã³ãã®æé©å
ãã ããã¯ã©ã€ã¢ã³ãåŽã«ãããã¡ã€ã«ãããŒãããå ŽåïŒLOAD DATA LOCAL
ïŒã¯ãã»ãã¥ãªãã£äžã®æ³šæç¹ãããã
ããã©ã«ãã§ã¯ç¡å¹ã«ãªã£ãŠããŸãããã®æ©èœã䜿ãããã«ã¯ãã¯ã©ã€ã¢ã³ãïŒãµãŒããŒäž¡æ¹ã§æå¹ã«ããå¿
èŠããããŸãã
ã»ãã¥ãªãã£ãŒ / LOAD DATA LOCAL ã®ã»ãã¥ãªãã£ãŒäžã®èæ ®äºé
Parallel Table Import Utility
Parallel Table Import Utilityã¯ãMySQL Shellã®æ©èœã§ãã
MySQL :: MySQL Shell 8.0 :: 8.4 Parallel Table Import Utility
ãã®æ©èœã¯ãå ¥åãšãªãããã¹ããã¡ã€ã«ã解æããã£ã³ã¯ã«åå²ããŠMySQLã«ããŒã¿ã䞊åã«ããŒãããããšãã§ããŸãã
å
éšçã«ã¯LOAD DATA
ã䞊åã§å®è¡ããããããéåžžã®LOAD DATA
ãã·ã³ã°ã«ã¹ã¬ããã§åäœããã®ã«å¯ŸããŠ
倧å¹
ã«é«éåãèŠèŸŒãããšãã§ããããã§ãã
䞊å床ã¯ãã¹ã¬ããæ°ã®æå®ããã³ãã£ã³ã¯ãµã€ãºããå°åºãããŸãã
min{max{1, threads}, chunks}}
LOAD DATA
ãããéããªããã©ããã¯ã䜿çšããããŒã¿ãã¡ã€ã«ã®ãµã€ãºã«äŸåãããšãããããã®ã§ãèšæž¬ããã®ã¿ã§ããã
ãã®ä»ãå
¥åã®ååŠçãšããŠå€æãã§ããããè€æ°ã®ãã¡ã€ã«ã䜿ã£ãã€ã³ããŒããã§ããããå§çž®ããããã¡ã€ã«ã
ããŒãããããã§ããããã§ãã
rewriteBatchedStatements
MySQLã«ã¯ãINSERTæã®VALUESå¥ã«å€ãè€æ°æžãã1床ã®INSERTæã§è€æ°ã®è¡ãç»é²ããæ©èœããããŸãã
â»ãã«ã¯INSERT
æé©å / InnoDB ããŒãã«ã®äžæ¬ããŒã¿ããŒã
Connector/Jã§ã¯ãrewriteBatchedStatements
ãšããããããã£ãtrue
ã«ãããšJDBCã®ãããæŽæ°ããã«ã¯INSERTã«
æžãæããŠå®è¡ããŠãããŸãã
MySQL :: MySQL Connector/J 8.0 Developer Guide :: 6.3.13 Performance Extensions
çµæ
å
ã«çµæã ãæžããŠãããšããµã€ãã«INSERTæãå®è¡ããããããLOAD DATA
ã䜿ã£ããParallel Table Import Utilityã
䜿ãæ¹ãå§åçã«éãã§ãã
LOAD DATA
ãšParallel Table Import Utilityã«ã€ããŠã¯ãä»åã®ããŒã¿éã§ã¯ã»ãŒå·®ããããŸããã§ããã
Parallel Table Import Utilityããè¥å¹²éããããã§ããæ±ã£ãããŒã¿éãå°ããã£ãã§ããâŠã
JDBCã§ã®INSERTæå®è¡ã¯ãrewriteBatchedStatements
ãtrue
ã«ãããã«ã¯INSERTã®æ¹ããä»ã®æ¹æ³ã®å以äžéãçµæã«
ãªããŸããã
ãµããªã¯ããããªæãã§ãã
50äžè¡ã®CSVã«å¯ŸããŠã
- Parallel Table Import Utility ⊠6ç§çšåºŠ
LOAD DATA
⊠7ã9ç§çšåºŠ- ãã«ã¯INSERT ⊠30ç§çšåºŠ
- åçŽãªINSERTãããã³ãããæŽæ° ⊠80ç§çšåºŠ
ãã«ã¯INSERTããã³ãããæŽæ°ã®ããããµã€ãºã¯ã10ã«ããŠããŸãã
ã§ã¯ãå®éã«äœæãããœãŒã¹ã³ãŒããªã©ãèšèŒããŠãããŸãã
ç°å¢
ä»åã®ç°å¢ã¯ããã¡ãã§ãã
mysql> select version(); +-----------+ | version() | +-----------+ | 8.0.23 | +-----------+ 1 row in set (0.01 sec)
MySQLãµãŒããŒã¯ã172.17.0.2ã§åäœããŠãããã®ãšããŸãããã¥ãŒãã³ã°ã¯ç¹ã«ããŠããããã»ãŒããã©ã«ãå€ã§ãã
MySQL Shellã
$ mysqlsh --version /path/to/mysqlsh Ver 8.0.23 for Linux on x86_64 - for MySQL 8.0.23 (MySQL Community Server (GPL))
ãŸããJavaã«é¢ããç°å¢æ å ±ã¯ãã¡ãã
$ java --version openjdk 11.0.10 2021-01-19 OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing) $ mvn --version Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d) Maven home: $HOME/.sdkman/candidates/maven/current Java version: 11.0.10, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64 Default locale: ja_JP, platform encoding: UTF-8 OS name: "linux", version: "5.4.0-72-generic", arch: "amd64", family: "unix"
ããŒãã«å®çŸ©ãšããŒã¿äœæ
DDLã¯ããããªæãã§çšæã
create table account ( id int, name varchar(50), registered datetime, about varchar(255), primary key(id) );
ããŒã¿ã¯Stack Overflowã®ãŠãŒã¶ãŒã䜿ããŸããã
$ curl -OL https://archive.org/download/stackexchange/stackoverflow.com-Users.7z $ 7z x stackoverflow.com-Users.7z $ ll -h Users.xml -rw-rw-r-- 1 xxxxx xxxxx 4.4G 3æ 1 13:31 Users.xml $ wc -l Users.xml 14080582 Users.xml
äžèº«ãXMLãªããšãšãã¡ãã£ãšå€§ããã®ã§ããŒã¿æœåºãšä»¶æ°ãçµã蟌ãã®ã«ããã°ã©ã ãäœæããŠCSVã«å€æããŸããã
$ groovy users_to_csv.groovy Users.xml 500000 $ ll -h users_500000.csv -rw-rw-r-- 1 xxxxx xxxxx 32M 4æ 17 23:00 users_500000.csv
ããã§äœ¿ã£ãããã°ã©ã ã¯ãæåŸã«èŒããŸããusers_500000.csv
ãšããã®ããããŒã¿ããŒãã«äœ¿ãCSVãã¡ã€ã«ã§ãã
ããŒãããããŒã¿ã¯ã次ã®ã¯ãšãªã®å®è¡åã«ã¯TRUNCATE
ããŠãããã®ãšããŸãã
mysql> truncate table account; Query OK, 0 rows affected (0.21 sec)
ã§ã¯ãé²ããŠãããŸãããã
LOAD DATA
ãŸãã¯LOAD DATA
ãããã£ãŠã¿ãŸããæ£ç¢ºã«ã¯ãLOAD DATA LOCAL
ã§ããã
SQL ã¹ããŒãã¡ã³ã / LOAD DATA ã¹ããŒãã¡ã³ã
mysql
ã³ãã³ããèµ·åã
$ mysql -umysql_user -p -h172.17.0.2 practice Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 22 Server version: 8.0.23 MySQL Community Server - GPL Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
å®è¡ããããšãããšããšã©ãŒã«ãªããŸãã
mysql> load data local infile 'users_500000.csv' into table account fields terminated by ',' optionally enclosed by '"' escaped by '\\'; ERROR 3948 (42000): Loading local data is disabled; this must be enabled on both the client and server sides
ãšã©ãŒã¡ãã»ãŒãžã®éããã¯ã©ã€ã¢ã³ãã®ããŒã«ã«ããŒã¿ãã¯ã©ã€ã¢ã³ããšãµãŒããŒããããã§æå¹ã«ããå¿ èŠããããŸãããšã
ãã¡ãã®ããã¥ã¡ã³ãã«èšèŒãããŠããå 容ã§ãã
ã»ãã¥ãªãã£ãŒ / LOAD DATA LOCAL ã®ã»ãã¥ãªãã£ãŒäžã®èæ ®äºé
ãµãŒããŒåŽã¯ãmy.cnf
ã®mysqld
ã»ã¯ã·ã§ã³ã§local-infile
ãèšå®ããããset global local_infile
ã§èšå®ããŸãã
ä»åã¯set global local_infile
ã§æå¹ã«ããããšã«ããŸããã
mysql> set global local_infile = on; Query OK, 0 rows affected (0.00 sec)
å床å®è¡âŠãããšããšã©ãŒã¡ãã»ãŒãžãå€ãããŸãã
mysql> load data local infile 'users_500000.csv' into table account fields terminated by ',' optionally enclosed by '"' escaped by '\\'; ERROR 2068 (HY000): LOAD DATA LOCAL INFILE file request rejected due to restrictions on access.
ã¯ã©ã€ã¢ã³ãåŽãããã®æ©èœãæå¹ã«ããå¿ èŠãããããã§ãã
my.cnf
ã®client
ã»ã¯ã·ã§ã³ã§local-infile
ãèšå®ããããmysql
ã³ãã³ãã®èµ·åæã«--local-infile
ã§èšå®ããŸãã
ä»åã¯ãmysql
ã³ãã³ãã®ãªãã·ã§ã³ã§æå®ããŸããã
$ mysql -umysql_user -p -h172.17.0.2 --local-infile=on practice
ããã§ããŒããçŽããšãä»åºŠã¯æåããŸããã
mysql> load data local infile 'users_500000.csv' into table account fields terminated by ',' optionally enclosed by '"' escaped by '\\'; Query OK, 500000 rows affected (7.73 sec) Records: 500000 Deleted: 0 Skipped: 0 Warnings: 0
Parallel Table Import Utility
次ã¯ãParallel Table Import Utilityã§ãã
MySQL :: MySQL Shell 8.0 :: 8.4 Parallel Table Import Utility
ãã¡ãã¯ãMySQL Shellã®æ©èœã«ãªããŸãã
mysqlsh
ã³ãã³ãããçŽæ¥å®è¡ããæ¹æ³ãšãJavaScriptãPythonã§å®è¡ããæ¹æ³ããããŸãã
ã©ã¡ãã®æ¹æ³ã䜿ãã«ãããå
éšçã«ã¯çµå±LOAD DATA
ã䜿ãã®ã§ããã¡ããæå¹ã«ããŠããå¿
èŠããããŸãã
mysql> set global local_infile = on; Query OK, 0 rows affected (0.00 sec)
ãŸãã¯mysqlsh
ã³ãã³ãã§å®è¡ããŠã¿ãŸãããã
$ mysqlsh mysql_user@172.17.0.2:3306 -- util import-table users_500000.csv --schema=practice --table=account --fieldsTerminatedBy=',' --fieldsEnclosedBy='"' --fieldsOptionallyEnclosed=true --fieldsEscapedBy='\' --bytesPerChunk=10M Please provide the password for 'mysql_user@172.17.0.2:3306': ******** Importing from file '/path/to/users_500000.csv' to table `practice`.`account` in MySQL Server at 172.17.0.2:3306 using 4 threads [Worker003] users_500000.csv: Records: 43164 Deleted: 0 Skipped: 0 Warnings: 0 [Worker002] users_500000.csv: Records: 131723 Deleted: 0 Skipped: 0 Warnings: 0 [Worker001] users_500000.csv: Records: 163859 Deleted: 0 Skipped: 0 Warnings: 0 [Worker000] users_500000.csv: Records: 161254 Deleted: 0 Skipped: 0 Warnings: 0 100% (32.62 MB / 32.62 MB), 5.31 MB/s File '/path/to/users_500000.csv' (32.62 MB) was imported in 6.1474 sec at 5.31 MB/s Total rows affected in practice.account: Records: 500000 Deleted: 0 Skipped: 0 Warnings: 0
fieldsã
ãªãã·ã§ã³ã®æå³ã¯ãLOAD DATA
ãšåãã§ããschema
ã§ããŒã¿ç»é²å
ã®ããŒã¿ããŒã¹åãtable
ã§ããŒãã«åã
æå®ããŸãã
ããã¥ã¡ã³ããèŠããšãè€æ°ã®ãã¡ã€ã«ãäžæ°ã«æå®ããããšãã§ããããã§ãã
äœåãè©ŠããŠã¿ãŸããããLOAD DATA
ãããæ°æã¡éãã§ããããããŸã§å·®ã¯ãããŸããã察象ã®ãã¡ã€ã«ãµã€ãºã
å°ããããã§ããã
ãªããä»åã¯4ã€ã®ã¹ã¬ããã䜿ãããŠããŸããã
using 4 threads
ã¹ã¬ããæ°ã決å®ããå€æ°ã¯2ã€ãããã²ãšã€ã¯ä»åæå®ããbytesPerChunk
ãããã²ãšã€ã¯threads
ã§ãã
bytesPerChunk
ã®ããã©ã«ãå€ã¯50Mã§ãthreads
ã®ããã©ã«ãå€ã¯8ã§ãã
ããã¥ã¡ã³ãã®èšç®åŒã«ãããšã1ãšthreads
ã®å€§ããæ¹ãšããã£ã³ã¯ãµã€ãºã®ãã¡ãå°ããæ¹ãéžæãããŸãã
min{max{1, threads}, chunks}}
ä»åã¯ãã£ã³ã¯ãµã€ãºã10Mã«ããŠããã¡ã€ã«ãµã€ãºã32Mã ã£ãã®ã§4ã«ãªããthreads
ã®ããã©ã«ãå€8ãããå°ããã®ã§
4ã¹ã¬ããã«ãªã£ããšããããšã§ããã
$ ll -h users_500000.csv -rw-rw-r-- 1 xxxxx xxxxx 32M 4æ 17 23:00 users_500000.csv
ãšããã§ãmysql_user:password@172.17.0.2:3306
ã¿ãããªèšè¿°ã«ãããšããã¹ã¯ãŒããæå®ããŠå®è¡ã§ããããã§ãã
JavaScriptã§ãå®è¡ããŠã¿ãŸãããããŸãã¯mysqlsh
ã³ãã³ãã§ãã°ã€ã³ã
$ mysqlsh mysql_user@172.17.0.2:3306 Please provide the password for 'mysql_user@172.17.0.2:3306': ******** MySQL Shell 8.0.23 Copyright (c) 2016, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type '\help' or '\?' for help; '\quit' to exit. Creating a session to 'mysql_user@172.17.0.2:3306' Fetching schema names for autocompletion... Press ^C to stop. Your MySQL connection id is 59 Server version: 8.0.23 MySQL Community Server - GPL No default schema selected; type \use <schema> to set one. MySQL 172.17.0.2:3306 ssl JS >
å®è¡ã
MySQL 172.17.0.2:3306 ssl JS > util.importTable('users_500000.csv', {schema: 'practice', table: 'account', fieldsTerminatedBy: ',', fieldsEnclosedBy: '"', fieldsOptionallyEnclosed: true, fieldsEscapedBy: '\\', bytesPerChunk: '10M'}) Importing from file '/path/to/users_500000.csv' to table `practice`.`account` in MySQL Server at 172.17.0.2:3306 using 4 threads [Worker002] users_500000.csv: Records: 43164 Deleted: 0 Skipped: 0 Warnings: 0 [Worker003] users_500000.csv: Records: 131723 Deleted: 0 Skipped: 0 Warnings: 0 [Worker001] users_500000.csv: Records: 161254 Deleted: 0 Skipped: 0 Warnings: 0 [Worker000] users_500000.csv: Records: 163859 Deleted: 0 Skipped: 0 Warnings: 0 100% (32.62 MB / 32.62 MB), 5.89 MB/s File ''/path/to/users_500000.csv' (32.62 MB) was imported in 5.5399 sec at 5.89 MB/s Total rows affected in practice.account: Records: 500000 Deleted: 0 Skipped: 0 Warnings: 0
ãªãã·ã§ã³ãªã©ã®æå³ã¯ãå ã»ã©ãšå€ãããªãã®ã§å²æã
JDBC
ããããã¯ãJavaã§ããã°ã©ã ãæžããŠãããŸãã
MavenäŸåé¢ä¿ãšããã©ã°ã€ã³ã®èšå®ã
<dependencies> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.23</version> </dependency> <dependency> <groupId>com.opencsv</groupId> <artifactId>opencsv</artifactId> <version>5.4</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.2.4</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.littlewings.mysql.Importer</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>
å®è¡å¯èœJARãã¡ã€ã«ãäœæããŸãã
ããã°ã©ã ã®é圢ã¯ããããªæãã§ãã
src/main/java/org/littlewings/mysql/Importer.java
package org.littlewings.mysql; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Paths; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Timestamp; import java.time.LocalDateTime; import java.time.format.DateTimeFormatter; import java.util.Properties; import com.opencsv.CSVReader; import com.opencsv.exceptions.CsvValidationException; public class Importer { String mode; String filePath; public static void main(String... args) throws SQLException, IOException, CsvValidationException { Importer importer = new Importer(args[0], args[1]); importer.execute(); } public Importer(String mode, String filePath) { this.mode = mode; this.filePath = filePath; } public void execute() throws SQLException, IOException, CsvValidationException { switch (mode) { case "simple": simpleInsert(); break; case "batch": batchInsert(); break; case "batch-rewrite": batchRewriteInsert(); break; default: System.out.printf("unknown mode [%s]%n", mode); break; } } ãçç¥ã private void log(String format, Object... args) { System.out.printf("[%s] %s%n", LocalDateTime.now(), String.format(format, args)); } }
å®è¡ããçš®é¡ã«ãã£ãŠãsimple
ãbatch
ãbatch-rewrite
ã®3ã€ãçšæããŸããã
ããã±ãŒãžã³ã°ããã
$ mvn package
ãããªæãã§å®è¡ããŸãã
$ java -jar target/mysql-import-0.0.1-SNAPSHOT.jar [çš®é¡] users_500000.csv
ã§ã¯ãããããçš®é¡ããšã«æžããŠãããŸãã
åçŽã«INSERTæãå®è¡ãã
åçŽã«INSERTæãå®è¡ãããã¿ãŒã³ã®ã³ãŒãã¯ããã¡ãã
private void simpleInsert() throws SQLException, IOException, CsvValidationException { Properties properties = new Properties(); properties.setProperty("user", "mysql_user"); properties.setProperty("password", "password"); int commitThreshold = 10000; int count = 0; long startTime = System.currentTimeMillis(); long commitStartTime = System.currentTimeMillis(); DateTimeFormatter formatter = DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm:ss"); try (CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(filePath), StandardCharsets.UTF_8)); Connection conn = DriverManager.getConnection("jdbc:mysql://172.17.0.2:3306/practice", properties); PreparedStatement statement = conn.prepareStatement("insert into account(id, name, registered, about) values(?, ?, ?, ?)")) { conn.setAutoCommit(false); String[] columns; while ((columns = csvReader.readNext()) != null) { count++; statement.setInt(1, Integer.parseInt(columns[0])); statement.setString(2, columns[1]); statement.setTimestamp(3, Timestamp.valueOf(LocalDateTime.parse(columns[2], formatter))); statement.setString(4, columns[3]); statement.executeUpdate(); if (count % commitThreshold == 0) { long committedTime = System.currentTimeMillis() - commitStartTime; log("commit, %d records, lap time = %f sec...", count, committedTime * 1.0 / 1000); conn.commit(); commitStartTime = System.currentTimeMillis(); } } conn.commit(); long elapsedTime = System.currentTimeMillis() - startTime; log("%d records inserted, elapsed time = %f sec", count, elapsedTime * 1.0 / 1000); } }
CSVãã¡ã€ã«ã1è¡ãã€èªãã§INSERTãã10,000件ããšã«ã³ãããããããã«ããŸããã
å®è¡ã®æ§åã¯ããã¡ãã
$ java -jar target/mysql-import-0.0.1-SNAPSHOT.jar simple users_500000.csv [2021-04-18T02:08:52.365205] commit, 10000 records, lap time = 3.998000 sec... [2021-04-18T02:08:54.063781] commit, 20000 records, lap time = 1.643000 sec... [2021-04-18T02:08:55.629334] commit, 30000 records, lap time = 1.521000 sec... [2021-04-18T02:08:57.112967] commit, 40000 records, lap time = 1.431000 sec... [2021-04-18T02:08:58.513753] commit, 50000 records, lap time = 1.357000 sec... [2021-04-18T02:09:00.039572] commit, 60000 records, lap time = 1.455000 sec... [2021-04-18T02:09:01.461116] commit, 70000 records, lap time = 1.380000 sec... [2021-04-18T02:09:02.883278] commit, 80000 records, lap time = 1.379000 sec... [2021-04-18T02:09:04.375070] commit, 90000 records, lap time = 1.431000 sec... [2021-04-18T02:09:05.878608] commit, 100000 records, lap time = 1.464000 sec... ãçç¥ã [2021-04-18T02:09:50.253484] commit, 400000 records, lap time = 1.357000 sec... [2021-04-18T02:09:51.730666] commit, 410000 records, lap time = 1.402000 sec... [2021-04-18T02:09:53.159438] commit, 420000 records, lap time = 1.367000 sec... [2021-04-18T02:09:54.605684] commit, 430000 records, lap time = 1.394000 sec... [2021-04-18T02:09:56.128087] commit, 440000 records, lap time = 1.459000 sec... [2021-04-18T02:09:57.562830] commit, 450000 records, lap time = 1.381000 sec... [2021-04-18T02:09:59.001128] commit, 460000 records, lap time = 1.368000 sec... [2021-04-18T02:10:00.460900] commit, 470000 records, lap time = 1.386000 sec... [2021-04-18T02:10:01.823628] commit, 480000 records, lap time = 1.308000 sec... [2021-04-18T02:10:03.394428] commit, 490000 records, lap time = 1.480000 sec... [2021-04-18T02:10:04.834426] commit, 500000 records, lap time = 1.366000 sec... [2021-04-18T02:10:04.919687] 500000 records inserted, elapsed time = 76.556000 sec
ãããæŽæ°
次ã¯ããããæŽæ°ã
private void batchInsert() throws SQLException, IOException, CsvValidationException { Properties properties = new Properties(); properties.setProperty("user", "mysql_user"); properties.setProperty("password", "password"); int commitThreshold = 10000; int batchThreshold = 10; int count = 0; long startTime = System.currentTimeMillis(); long commitStartTime = System.currentTimeMillis(); DateTimeFormatter formatter = DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm:ss"); try (CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(filePath), StandardCharsets.UTF_8)); Connection conn = DriverManager.getConnection("jdbc:mysql://172.17.0.2:3306/practice", properties); PreparedStatement statement = conn.prepareStatement("insert into account(id, name, registered, about) values(?, ?, ?, ?)")) { conn.setAutoCommit(false); String[] columns; while ((columns = csvReader.readNext()) != null) { count++; statement.setInt(1, Integer.parseInt(columns[0])); statement.setString(2, columns[1]); statement.setTimestamp(3, Timestamp.valueOf(LocalDateTime.parse(columns[2], formatter))); statement.setString(4, columns[3]); statement.addBatch(); if (count % batchThreshold == 0) { statement.executeBatch(); } if (count % commitThreshold == 0) { long committedTime = System.currentTimeMillis() - commitStartTime; log("commit, %d records, lap time = %f sec...", count, committedTime * 1.0 / 1000); conn.commit(); commitStartTime = System.currentTimeMillis(); } } conn.commit(); long elapsedTime = System.currentTimeMillis() - startTime; log("%d records inserted, elapsed time = %f sec", count, elapsedTime * 1.0 / 1000); } }
å ã»ã©ãšã¯éã£ãŠã10件ããšã«ãããæŽæ°ããããã«ããŸããã
statement.addBatch(); if (count % batchThreshold == 0) { statement.executeBatch(); }
å®è¡ã®æ§åã
$ java -jar target/mysql-import-0.0.1-SNAPSHOT.jar batch users_500000.csv [2021-04-18T02:11:18.837457] commit, 10000 records, lap time = 4.386000 sec... [2021-04-18T02:11:20.909568] commit, 20000 records, lap time = 2.016000 sec... [2021-04-18T02:11:22.369608] commit, 30000 records, lap time = 1.411000 sec... [2021-04-18T02:11:23.981233] commit, 40000 records, lap time = 1.564000 sec... [2021-04-18T02:11:25.655595] commit, 50000 records, lap time = 1.605000 sec... [2021-04-18T02:11:27.154502] commit, 60000 records, lap time = 1.452000 sec... [2021-04-18T02:11:28.629899] commit, 70000 records, lap time = 1.435000 sec... [2021-04-18T02:11:30.353573] commit, 80000 records, lap time = 1.684000 sec... [2021-04-18T02:11:31.992758] commit, 90000 records, lap time = 1.599000 sec... [2021-04-18T02:11:33.567422] commit, 100000 records, lap time = 1.529000 sec... ãçç¥ã [2021-04-18T02:12:17.328760] commit, 400000 records, lap time = 1.351000 sec... [2021-04-18T02:12:18.712159] commit, 410000 records, lap time = 1.329000 sec... [2021-04-18T02:12:20.172459] commit, 420000 records, lap time = 1.415000 sec... [2021-04-18T02:12:21.583530] commit, 430000 records, lap time = 1.346000 sec... [2021-04-18T02:12:23.183495] commit, 440000 records, lap time = 1.525000 sec... [2021-04-18T02:12:24.645953] commit, 450000 records, lap time = 1.409000 sec... [2021-04-18T02:12:26.059431] commit, 460000 records, lap time = 1.355000 sec... [2021-04-18T02:12:27.586332] commit, 470000 records, lap time = 1.427000 sec... [2021-04-18T02:12:29.061952] commit, 480000 records, lap time = 1.403000 sec... [2021-04-18T02:12:30.537175] commit, 490000 records, lap time = 1.425000 sec... [2021-04-18T02:12:31.900668] commit, 500000 records, lap time = 1.290000 sec... [2021-04-18T02:12:31.957996] 500000 records inserted, elapsed time = 77.511000 sec
åçŽãªINSERTæãå®è¡ããæ¹ãšãã»ãŒå·®ããããŸããâŠã
rewriteBatchedStatementsã䜿ã£ããã«ã¯INSERT
æåŸã¯ãrewriteBatchedStatements
ã䜿ã£ããã«ã¯INSERTã§ãã
private void batchRewriteInsert() throws SQLException, IOException, CsvValidationException { Properties properties = new Properties(); properties.setProperty("user", "mysql_user"); properties.setProperty("password", "password"); properties.setProperty("rewriteBatchedStatements", "true"); // properties.setProperty("profileSQL", "true"); int commitThreshold = 10000; int batchThreshold = 10; int count = 0; long startTime = System.currentTimeMillis(); long commitStartTime = System.currentTimeMillis(); DateTimeFormatter formatter = DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm:ss"); try (CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(filePath), StandardCharsets.UTF_8)); Connection conn = DriverManager.getConnection("jdbc:mysql://172.17.0.2:3306/practice", properties); PreparedStatement statement = conn.prepareStatement("insert into account(id, name, registered, about) values(?, ?, ?, ?)")) { conn.setAutoCommit(false); String[] columns; while ((columns = csvReader.readNext()) != null) { count++; statement.setInt(1, Integer.parseInt(columns[0])); statement.setString(2, columns[1]); statement.setTimestamp(3, Timestamp.valueOf(LocalDateTime.parse(columns[2], formatter))); statement.setString(4, columns[3]); statement.addBatch(); if (count % batchThreshold == 0) { statement.executeBatch(); } if (count % commitThreshold == 0) { long committedTime = System.currentTimeMillis() - commitStartTime; log("commit, %d records, lap time = %f sec...", count, committedTime * 1.0 / 1000); conn.commit(); commitStartTime = System.currentTimeMillis(); } } conn.commit(); long elapsedTime = System.currentTimeMillis() - startTime; log("%d records inserted, elapsed time = %f sec", count, elapsedTime * 1.0 / 1000); } }
ãããæŽæ°ã®ããã°ã©ã ãšã®éãã¯ãæ¥ç¶ããæã®ããããã£ã§rewriteBatchedStatements
ãtrue
ã«ããã ãã§ãã
MySQL :: MySQL Connector/J 8.0 Developer Guide :: 6.3.13 Performance Extensions
properties.setProperty("rewriteBatchedStatements", "true");
å®è¡ã®æ§åã
$ java -jar target/mysql-import-0.0.1-SNAPSHOT.jar batch-rewrite users_500000.csv [2021-04-18T02:13:25.024357] commit, 10000 records, lap time = 3.168000 sec... [2021-04-18T02:13:25.943424] commit, 20000 records, lap time = 0.883000 sec... [2021-04-18T02:13:26.882186] commit, 30000 records, lap time = 0.910000 sec... [2021-04-18T02:13:27.540862] commit, 40000 records, lap time = 0.627000 sec... [2021-04-18T02:13:28.173199] commit, 50000 records, lap time = 0.602000 sec... [2021-04-18T02:13:28.740784] commit, 60000 records, lap time = 0.531000 sec... [2021-04-18T02:13:29.340097] commit, 70000 records, lap time = 0.568000 sec... [2021-04-18T02:13:29.860993] commit, 80000 records, lap time = 0.489000 sec... [2021-04-18T02:13:30.424720] commit, 90000 records, lap time = 0.515000 sec... [2021-04-18T02:13:30.932128] commit, 100000 records, lap time = 0.480000 sec... ãçç¥ã [2021-04-18T02:13:47.384536] commit, 400000 records, lap time = 0.449000 sec... [2021-04-18T02:13:47.899068] commit, 410000 records, lap time = 0.479000 sec... [2021-04-18T02:13:48.434318] commit, 420000 records, lap time = 0.488000 sec... [2021-04-18T02:13:48.939430] commit, 430000 records, lap time = 0.476000 sec... [2021-04-18T02:13:49.577309] commit, 440000 records, lap time = 0.591000 sec... [2021-04-18T02:13:50.101016] commit, 450000 records, lap time = 0.483000 sec... [2021-04-18T02:13:50.606555] commit, 460000 records, lap time = 0.454000 sec... [2021-04-18T02:13:51.219517] commit, 470000 records, lap time = 0.579000 sec... [2021-04-18T02:13:51.783128] commit, 480000 records, lap time = 0.473000 sec... [2021-04-18T02:13:52.529406] commit, 490000 records, lap time = 0.671000 sec... [2021-04-18T02:13:53.129715] commit, 500000 records, lap time = 0.537000 sec... [2021-04-18T02:13:53.178695] 500000 records inserted, elapsed time = 31.325000 sec
ããã ãã§ãå以äžã«éããªããŸããã
èŒããŠããã³ãŒãã§ã¯ã³ã¡ã³ãã¢ãŠãããŠããŸãããprofileSQL
ãtrue
ã«ãããšSQLæã衚瀺ãããããã«ãªãã®ã§
MySQL :: MySQL Connector/J 8.0 Developer Guide :: 6.3.14 Debugging/Profiling
properties.setProperty("rewriteBatchedStatements", "true"); properties.setProperty("profileSQL", "true");
ãããæŽæ°ããã«ã¯INSERTã«å€æãããŠããæ§åã確èªã§ããŸãã
â»å€ã衚瀺ãããŠããã®ã§ãããèŒããŠããŸãã
Sun Apr 18 02:17:00 JST 2021 INFO: [QUERY] insert into account(id, name, registered, about) values(...),(...),(...),(...),(...),(...),(...),(...),(...),(...) [Created on: Sun Apr 18 02:17:00 JST 2021, duration: 1, connection-id: 73, statement-id: 0, resultset-id: 0, at org.littlewings.mysql.Importer.batchRewriteInsert(Importer.java:181)] Sun Apr 18 02:17:00 JST 2021 INFO: [FETCH] [Created on: Sun Apr 18 02:17:00 JST 2021, duration: 0, connection-id: 73, statement-id: 0, resultset-id: 0, at org.littlewings.mysql.Importer.batchRewriteInsert(Importer.java:181)]
ãŸãšã
MySQLã«å¯ŸããŠãLOAD DATA
ãParallel Table Import UtilityãJDBCã§ã®éåžžã®INSERTããã«ã¯INSERTãªã©ã
è©ŠããŠã¿ãŸããã
LOAD DATA
ãParallel Table Import Utilityãå§åçã«éãã§ããã
倧éã®ããã¹ãããŒã¿ãããŒããããªãããã£ã±ããã¡ããªãã§ãããããâŠã
ããŒã¿éãå€ãã»ã©å·®ãåºãããªã®ã§ãç¹ã«Parallel Table Import UtilityãšåãããŠç¢ºèªããŠéžãŒãããªãšæããŸãã
ãªãã±
Stack Overflowã®ãŠãŒã¶ãŒã®ããŒã¿ããXMLããCSVã«å€æãã€ã€ãæå®ã®ã¬ã³ãŒãæ°ã«çµãããã°ã©ã ãæåŸã«
èŒããŠãããŸãã
Id
ãæ£ã®å€ããæŸã£ãŠããªãã£ãããAboutMe
ã255æåã§åãæšãŠãããæ¥æã®ãã©ãŒããããå€ããããšããã€ã
å€æãªã©ãããŠããŸãã
users_to_csv.groovy
import com.opencsv.CSVParser import com.opencsv.CSVParserBuilder import com.opencsv.CSVWriterBuilder import java.nio.charset.StandardCharsets import java.nio.file.Files import java.nio.file.Paths import java.time.format.DateTimeFormatter import javax.xml.stream.XMLInputFactory import javax.xml.stream.XMLStreamConstants @Grab('com.opencsv:opencsv:5.4') import com.opencsv.CSVWriterBuilder import com.opencsv.CSVParserBuilder @Grab('org.jsoup:jsoup:1.13.1') import org.jsoup.Jsoup def xmlPath = args[0] def csvSize = args[1] as int def reader = Files.newBufferedReader(Paths.get(xmlPath), StandardCharsets.UTF_8) reader.readLine() // skip BOM. def outputFile = "users_${csvSize}.csv" def csvWriter = new CSVWriterBuilder(Files.newBufferedWriter(Paths.get(outputFile), StandardCharsets.UTF_8)).withParser(new CSVParserBuilder().withEscapeChar('\\' as char).build()).build() def datetimeFormatter = DateTimeFormatter.ofPattern('uuuu-MM-dd HH:mm:ss') def count = 0 def xmlInputFactory = XMLInputFactory.newInstance() def streamReader = xmlInputFactory.createXMLStreamReader(reader) while (streamReader.hasNext()) { def eventType = streamReader.next() if (eventType == XMLStreamConstants.START_ELEMENT) { if (streamReader.getLocalName() == 'row') { def accountId = streamReader.getAttributeValue(null, "AccountId") def name = streamReader.getAttributeValue(null, 'DisplayName') def registeredDate = streamReader.getAttributeValue(null, 'CreationDate') def about = streamReader.getAttributeValue(null, "AboutMe") if (accountId == null || accountId as int < 0) { continue } registeredDate = datetimeFormatter.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME.parse(registeredDate)) if (about == null) { about = '' } else { about = Jsoup.parse(about).text() } if (about.length() > 255) { about = about.substring(0, 252) + "..." } csvWriter.writeNext(List.of(accountId, name, registeredDate, about).toArray(new String[0])) count++ if (count % 10000 == 0) { println("${count} records, generated...") } if (count == csvSize) { break } } } } streamReader.close() csvWriter.close()