ちょっと圧縮・解凍ライブラリを調べていまして、簡単に以下あたりを比較したり、確認するための
下地を作ってみようかなと思いまして。
Snappyとか名前をちらっと見かけたことがあるくらいだったので、こういう機会に1度見ておこうかと。
ピックしているものはけっこう適当だったり雰囲気だったりしますが、ご愛嬌。
なお、とりあえず確認にはテストコードを使うため、JUnitとAssertJをMaven依存関係に加えている
ものとします。
<dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> <dependency> <groupId>org.assertj</groupId> <artifactId>assertj-core</artifactId> <version>3.6.2</version> <scope>test</scope> </dependency>
圧縮対象
Wikipedia日本語データのダンプの一部を使いましょう。
https://dumps.wikimedia.org/jawiki/20170320/
一部取ってきて
$ wget https://dumps.wikimedia.org/jawiki/20170320/jawiki-20170320-pages-articles1.xml-p000000001p000168815.bz2 $ bunzip2 jawiki-20170320-pages-articles1.xml-p000000001p000168815.bz2
適当に1K、1M、100M(全部およそ)くらいに分けます。
$ head -n 16 jawiki-20170320-pages-articles1.xml-p000000001p000168815 > 1K-file $ head -n 7880 jawiki-20170320-pages-articles1.xml-p000000001p000168815 > 1M-file $ head -n 1020000 jawiki-20170320-pages-articles1.xml-p000000001p000168815 > 100M-file
はい。
-rw-rw-r-- 1 xxxxx xxxxx 104222966 3月 30 00:31 100M-file -rw-rw-r-- 1 xxxxx xxxxx 1069 3月 30 23:25 1K-file -rw-rw-r-- 1 xxxxx xxxxx 1048529 3月 30 00:30 1M-file -rw-rw-r-- 1 xxxxx xxxxx 1425464395 3月 23 00:31 jawiki-20170320-pages-articles1.xml-p000000001p000168815
こちらを使っていきましょう。ファイルは3つ作ったものの、エントリの量の都合上で最初は100Mのファイルを
使って書いていきます。
GZIP(JDK標準)
まずは、JDK標準で入っているGZIP。java.util.zipパッケージに入っているので、ライブラリ不要でそのまま使えます。
書いてみたサンプル。ファイルの中身をbyte配列にしてオンメモリで圧縮、それから展開して、最後にオリジナルのサイズと
圧縮後のサイズの比を取ります。
※以降のサンプルも、同じ内容で書いていきます
src/test/java/org/littlewings/compress/JdkGzipTest.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import java.util.zip.GZIPInputStream; import java.util.zip.GZIPOutputStream; import org.junit.Ignore; import org.junit.Test; import static org.assertj.core.api.Assertions.assertThat; public class JdkGzipTest { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream gzip = new GZIPOutputStream(compressBaos)) { gzip.write(binary); } byte[] compressed = compressBaos.toByteArray(); assertThat(compressed).hasSize(35534431); // 解凍 ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream gzip = new GZIPInputStream(new ByteArrayInputStream(compressed))) { int b; while ((b = gzip.read()) != -1) { decompressBaos.write(b); } } byte[] decompressed = decompressBaos.toByteArray(); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.34, 0.35); } }
圧縮後のデータサイズは約1/3になりました。
Commons Compress
続いて、Commons Compress。
多くの圧縮形式をサポートしているようです。
ar, cpio, Unix dump, tar, zip, gzip, XZ, Pack200, bzip2, 7z, arj, lzma, snappy, DEFLATE, lz4 and Z files
http://commons.apache.org/proper/commons-compress/
サンプルも載っているので、それほど困らず使えると思います。
Commons Compress – Commons Compress User Guide
ただ、サンプルに載っているSnappyの書き出し、それからLZ4については、このエントリを書いている時点(1.13)では
まだ利用できません。
とりあえず、今回はGZIP、Bzip2、XZを試してみました。
利用するMaven依存関係は、こちら。
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.13</version>
</dependency>
GZIP。
src/test/java/org/littlewings/compress/CommonsGzipTest.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import org.junit.Ignore; import org.junit.Test; import static org.assertj.core.api.Assertions.assertThat; public class CommonsGzipTest { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream gzip = new GzipCompressorOutputStream(compressBaos)) { gzip.write(binary); } byte[] compressed = compressBaos.toByteArray(); assertThat(compressed).hasSize(35534431); // 解凍 ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream gzip = new GzipCompressorInputStream(new ByteArrayInputStream(compressed))) { int b; while ((b = gzip.read()) != -1) { decompressBaos.write(b); } } byte[] decompressed = decompressBaos.toByteArray(); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.34, 0.35); } }
圧縮後のサイズは、JDK標準のGZIPと同じ(違ったら困りますが)。
続いて、Bzip2。
src/test/java/org/littlewings/compress/CommonsBzip2Test.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.apache.commons.compress.compressors.xz.XZCompressorInputStream; import org.apache.commons.compress.compressors.xz.XZCompressorOutputStream; import org.junit.Ignore; import org.junit.Test; import static org.assertj.core.api.Assertions.assertThat; public class CommonsBzip2Test { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream gzip = new BZip2CompressorOutputStream(compressBaos)) { gzip.write(binary); } byte[] compressed = compressBaos.toByteArray(); assertThat(compressed).hasSize(25761240); // 解凍 ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream gzip = new BZip2CompressorInputStream(new ByteArrayInputStream(compressed))) { int b; while ((b = gzip.read()) != -1) { decompressBaos.write(b); } } byte[] decompressed = decompressBaos.toByteArray(); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.24, 0.25); } }
GZIPよりも高い圧縮率となります。1/4くらいのサイズになりました。
そして、最後にXZ。Commons CompressでXZを使うには、追加のライブラリが必要です。
XZ for Java
Maven依存関係としては、こちらを加えます。
<dependency>
<groupId>org.tukaani</groupId>
<artifactId>xz</artifactId>
<version>1.6</version>
</dependency>
サンプル。
src/test/java/org/littlewings/compress/CommonsXzTest.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import java.util.zip.GZIPOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.apache.commons.compress.compressors.xz.XZCompressorInputStream; import org.apache.commons.compress.compressors.xz.XZCompressorOutputStream; import org.junit.Ignore; import org.junit.Test; import static org.assertj.core.api.Assertions.assertThat; public class CommonsXzTest { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream gzip = new XZCompressorOutputStream(compressBaos)) { gzip.write(binary); } byte[] compressed = compressBaos.toByteArray(); assertThat(compressed).hasSize(23685232); // 解凍 ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream gzip = new XZCompressorInputStream(new ByteArrayInputStream(compressed))) { int b; while ((b = gzip.read()) != -1) { decompressBaos.write(b); } } byte[] decompressed = decompressBaos.toByteArray(); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.22, 0.23); } }
GZIP、Bzip2よりもさらに小さくなります。
snappy-java
続いて、snappy-java。Snappyという圧縮形式のJava向けのライブラリです。
GitHub - xerial/snappy-java: Snappy compressor/decompressor for Java
JNIを使って、ネイティブコードを呼び出すような実装となっています。圧縮率はGZIPなどに比べると低いですが、
高速らしいです。
Maven依存関係は、こちらを追加。
<dependency> <groupId>org.xerial.snappy</groupId> <artifactId>snappy-java</artifactId> <version>1.1.4-M3</version> </dependency>
使い方ですが、3種類のAPIがあり、簡易に使えるもの、Streamベースのもの、フレームを使ったものの3種類があります。
今回は、簡易的なものとStreamベースのものを使ってみたいと思います。
src/test/java/org/littlewings/compress/SnappySimplyTest.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.junit.Ignore; import org.junit.Test; import org.xerial.snappy.Snappy; import static org.assertj.core.api.Assertions.assertThat; public class SnappySimplyTest { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 byte[] compressed = Snappy.compress(binary); assertThat(compressed).hasSize(54457016); // 解凍 byte[] decompressed = Snappy.uncompress(compressed); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.52, 0.53); } }
圧縮率は、確かに低いですね…半分にもなっていません。
続いて、Streamベースのもの。こちらは、大きなファイルの圧縮などに使うことを想定したAPIみたいです。
src/test/java/org/littlewings/compress/SnappyStreamTest.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.junit.Ignore; import org.junit.Test; import org.xerial.snappy.SnappyInputStream; import org.xerial.snappy.SnappyOutputStream; import static org.assertj.core.api.Assertions.assertThat; public class SnappyStreamTest { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream gzip = new SnappyOutputStream(compressBaos)) { gzip.write(binary); } byte[] compressed = compressBaos.toByteArray(); assertThat(compressed).hasSize(56361671); // 解凍 ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream gzip = new SnappyInputStream(new ByteArrayInputStream(compressed))) { int b; while ((b = gzip.read()) != -1) { decompressBaos.write(b); } } byte[] decompressed = decompressBaos.toByteArray(); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.54, 0.55); } }
ところで、これらのAPI注意点ですが、使うAPIによっては互換性がなかったりします。
例えば、簡易APIで圧縮したものをStreamベースのAPIで解凍できますが、その逆はできなかったりします。
src/test/java/org/littlewings/compress/SnappyApiCompatibilityTest.java
package org.littlewings.compress; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.junit.Test; import org.xerial.snappy.Snappy; import org.xerial.snappy.SnappyInputStream; import org.xerial.snappy.SnappyOutputStream; import static org.assertj.core.api.Assertions.assertThat; import static org.assertj.core.api.Assertions.assertThatThrownBy; public class SnappyApiCompatibilityTest { @Test public void fromSimplyToStream() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 byte[] compressed = Snappy.compress(binary); assertThat(compressed).hasSize(54457016); // 解凍 ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream gzip = new SnappyInputStream(new ByteArrayInputStream(compressed))) { int b; while ((b = gzip.read()) != -1) { decompressBaos.write(b); } } byte[] decompressed = decompressBaos.toByteArray(); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.52, 0.53); } @Test public void fromStreamToSimply() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream gzip = new SnappyOutputStream(compressBaos)) { gzip.write(binary); } byte[] compressed = compressBaos.toByteArray(); assertThat(compressed).hasSize(56361671); // 解凍 assertThatThrownBy(() -> Snappy.uncompress(compressed)) .isInstanceOf(IOException.class) .hasMessage("FAILED_TO_UNCOMPRESS(5)"); } }
フレームも含めて、APIの互換性は表にしてまとめられているので、こちらを参照しましょう。
LZ4 Java
最後は、LZ4のJava向けのライブラリ。
GitHub - lz4/lz4-java: LZ4 compression for Java
Snappyよりも高速な圧縮形式らしいです。こちらのJava実装ですが、Snappyと同様JNIを使用してネイティブコードを
呼び出すようになっています。
(Pure Java実装も含まれていますが、最初にネイティブコードを使おうとするようになっています)
LZ4 Javaを使うのに必要なMaven依存関係は、こちら。
<dependency> <groupId>net.jpountz.lz4</groupId> <artifactId>lz4</artifactId> <version>1.3.0</version> </dependency>
サンプルコード。
src/test/java/org/littlewings/compress/Lz4Test.java
package org.littlewings.compress; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Paths; import net.jpountz.lz4.LZ4Compressor; import net.jpountz.lz4.LZ4Factory; import net.jpountz.lz4.LZ4FastDecompressor; import org.junit.Test; import static org.assertj.core.api.Assertions.assertThat; public class Lz4Test { @Test public void compressDecompress100M() throws IOException { byte[] binary = Files.readAllBytes(Paths.get("../100M-file")); assertThat(binary).hasSize(104222966); // 圧縮 LZ4Factory factory = LZ4Factory.fastestInstance(); LZ4Compressor compressor = factory.fastCompressor(); byte[] compressed = compressor.compress(binary); assertThat(compressed).hasSize(54393088); // 解凍 LZ4FastDecompressor decompressor = factory.fastDecompressor(); byte[] decompressed = decompressor.decompress(compressed, binary.length); assertThat(decompressed).hasSize(binary.length); assertThat((double) compressed.length / binary.length) .isBetween(0.52, 0.53); } }
Snappyと同様、圧縮率自体はそれほど高くありません。
なお、ちょっと困ったこととして、解凍時に解凍後のサイズを渡す必要があるみたいなんですが…。
byte[] decompressed = decompressor.decompress(compressed, binary.length);
GitHubのサンプルを見ると、確かに最初に圧縮前のサイズを保持していて、それを解凍時に渡すようにしています。
他のベンチマークとか見ていると、どうも圧縮後のファイルから計算で求められそうな雰囲気もあるのですが…果たして…?
https://github.com/ning/jvm-compressor-benchmark/blob/master/src/main/java/com/ning/jcbm/lz4/AbstractLz4Driver.java#L45-L55
とまあ、こんな感じで試してみました。
簡単にベンチマーク
と、ここまでAPIの簡単な使い方と圧縮後のサイズを見てきましたが、せっかくなので速度も簡単に見ておきます。
計測には、JMHを使いました。なんとなく。
Maven依存関係には、こちらを追加。
<dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-core</artifactId> <version>1.18</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> <version>1.18</version> <scope>provided</scope> </dependency>
また、Maven Shade Pluginも追加して、単一のJARファイルにしておきます。
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <finalName>benchmark</finalName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.openjdk.jmh.Main</mainClass> </transformer> </transformers> <filters> <filter> <!-- Shading signed JARs will fail without this. http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar --> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> </execution> </executions> </plugin> </plugins> </build>
ここまで使ってきた各種APIに対して、ファイルサイズ1K、1M、100Mに対して測定してみます。
参考)
他にもベンチマークの参考ということであれば、このあたりも見られた方がよいかもしれません。
GitHub - ning/jvm-compressor-benchmark: Benchmark suite for data compression library on the JVM
環境は、以下のとおり。
$ java -version java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
Ubuntu Linux 14.04 LTS
$ uname -a Linux xxxxx 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:27:58 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Core i7 CPU
$ cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz model name : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
メモリは11G。
実行コマンドは、こんな感じ。
※絞ってますが、これでも2時間くらいかかるので…
$ java -jar benchmark.jar -wi 3 -i 3 -f 1 -bm avgt -tu ms
で、結果。
ソースコードはあとで載せます。結果がそこそこブレたりしたので、気になる方は手元で試されるとよいでしょう。
JDK標準GZIP
※結果は少し整形しています
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.jdkGzipCompress_1K avgt 3 0.022 ± 0.003 ms/op CompressBenchmark.jdkGzipCompress_1M avgt 3 58.445 ± 19.323 ms/op CompressBenchmark.jdkGzipCompress_100M avgt 3 6406.854 ± 1535.855 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.jdkGzipDecompress_1K avgt 3 0.225 ± 0.097 ms/op CompressBenchmark.jdkGzipDecompress_1M avgt 3 210.369 ± 63.218 ms/op CompressBenchmark.jdkGzipDecompress_100M avgt 3 20953.244 ± 1796.023 ms/op
Commons Compress - GZIP
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.commonsGzipCompress_1K avgt 3 0.053 ± 0.717 ms/op CompressBenchmark.commonsGzipCompress_1M avgt 3 72.059 ± 29.865 ms/op CompressBenchmark.commonsGzipCompress_100M avgt 3 6600.498 ± 10184.787 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.commonsGzipDecompress_1K avgt 3 0.290 ± 0.115 ms/op CompressBenchmark.commonsGzipDecompress_1M avgt 3 201.237 ± 44.347 ms/op CompressBenchmark.commonsGzipDecompress_100M avgt 3 20460.803 ± 11088.249 ms/op
Commons Compress - Bzip2
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.commonsBzip2Compress_1K avgt 3 1.739 ± 0.287 ms/op CompressBenchmark.commonsBzip2Compress_1M avgt 3 120.367 ± 11.274 ms/op CompressBenchmark.commonsBzip2Compress_100M avgt 3 11330.378 ± 227.484 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.commonsBzip2Decompress_1K avgt 3 0.109 ± 0.063 ms/op CompressBenchmark.commonsBzip2Decompress_1M avgt 3 47.158 ± 15.537 ms/op CompressBenchmark.commonsBzip2Decompress_100M avgt 3 4815.504 ± 764.417 ms/op
Commons Compress - XZ
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.commonsXzCompress_1K avgt 3 13.180 ± 5.589 ms/op CompressBenchmark.commonsXzCompress_1M avgt 3 624.189 ± 266.903 ms/op CompressBenchmark.commonsXzCompress_100M avgt 3 95819.375 ± 15847.521 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.commonsXzDecompress_1K avgt 3 1.245 ± 1.690 ms/op CompressBenchmark.commonsXzDecompress_1M avgt 3 65.512 ± 21.580 ms/op CompressBenchmark.commonsXzDecompress_100M avgt 3 5835.776 ± 1474.174 ms/op
Snappy - Simple
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.snappySimplyCompress_1K avgt 3 0.003 ± 0.001 ms/op CompressBenchmark.snappySimplyCompress_1M avgt 3 6.111 ± 0.382 ms/op CompressBenchmark.snappySimplyCompress_100M avgt 3 1583.644 ± 20934.928 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.snappySimplyDecompress_1K avgt 3 0.001 ± 0.001 ms/op CompressBenchmark.snappySimplyDecompress_1M avgt 3 1.335 ± 0.237 ms/op CompressBenchmark.snappySimplyDecompress_100M avgt 3 1016.332 ± 27063.161 ms/op
Snappy - Stream
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.snappyStreamCompress_1K avgt 3 0.003 ± 0.002 ms/op CompressBenchmark.snappyStreamCompress_1M avgt 3 8.012 ± 53.825 ms/op CompressBenchmark.snappyStreamCompress_100M avgt 3 672.487 ± 414.928 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.snappyStreamDecompress_1K avgt 3 0.005 ± 0.003 ms/op CompressBenchmark.snappyStreamDecompress_1M avgt 3 6.258 ± 3.308 ms/op CompressBenchmark.snappyStreamDecompress_100M avgt 3 811.926 ± 3204.878 ms/op
LZ4
# 圧縮 Benchmark Mode Cnt Score Error Units CompressBenchmark.lz4Compress_1K avgt 3 0.033 ± 0.488 ms/op CompressBenchmark.lz4Compress_1M avgt 3 4.618 ± 18.158 ms/op CompressBenchmark.lz4Compress_100M avgt 3 15281.482 ± 149076.550 ms/op #解凍 Benchmark Mode Cnt Score Error Units CompressBenchmark.lz4Decompress_1K avgt 3 0.024 ± 0.591 ms/op CompressBenchmark.lz4Decompress_1M avgt 3 0.990 ± 6.549 ms/op CompressBenchmark.lz4Decompress_100M avgt 3 3407.482 ± 59961.681 ms/op
まとめ
Snappy、LZ4が確かに高速です。GZIPとかと比べると、桁がひとつ以上違いますね…。ただ、圧縮率は劣るので、
そのあたりは用途と求める内容に応じてでしょうか。
※LZ4は100Mの時のばらつきがけっこう激しかったですが…
バランスを取るならGZIPだなぁという気もしてきました。
オマケ
ベンチマークを取った時のソースコードを載せておきます。
src/main/java/org/littlewings/compress/benchmark/CompressBenchmark.java
package org.littlewings.compress.benchmark; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.io.UncheckedIOException; import java.nio.file.Files; import java.nio.file.Paths; import java.util.zip.GZIPInputStream; import java.util.zip.GZIPOutputStream; import net.jpountz.lz4.LZ4Compressor; import net.jpountz.lz4.LZ4Factory; import net.jpountz.lz4.LZ4FastDecompressor; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import org.apache.commons.compress.compressors.xz.XZCompressorInputStream; import org.apache.commons.compress.compressors.xz.XZCompressorOutputStream; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.xerial.snappy.Snappy; import org.xerial.snappy.SnappyInputStream; import org.xerial.snappy.SnappyOutputStream; @State(Scope.Benchmark) public class CompressBenchmark { byte[][] binaries; byte[][] jdkGzipCompressed; byte[][] commonsGzipCompressed; byte[][] commonsBzip2Compressed; byte[][] commonsXzCompressed; byte[][] snappySimplyCompressed; byte[][] snappyStreamCompressed; byte[][] lz4Compressed; @Setup public void setup() throws IOException { binaries = new byte[][]{ Files.readAllBytes(Paths.get("1K-file")), Files.readAllBytes(Paths.get("1M-file")), Files.readAllBytes(Paths.get("100M-file")) }; jdkGzipCompressed = new byte[][]{ jdkGzipCompress_1K(), jdkGzipCompress_1M(), jdkGzipCompress_100M() }; commonsGzipCompressed = new byte[][]{ commonsGzipCompress_1K(), commonsGzipCompress_1M(), commonsGzipCompress_100M() }; commonsBzip2Compressed = new byte[][]{ commonsBzip2Compress_1K(), commonsBzip2Compress_1M(), commonsBzip2Compress_100M() }; commonsXzCompressed = new byte[][]{ commonsXzCompress_1K(), commonsXzCompress_1M(), commonsXzCompress_100M() }; snappySimplyCompressed = new byte[][]{ snappySimplyCompress_1K(), snappySimplyCompress_1M(), snappySimplyCompress_100M() }; snappyStreamCompressed = new byte[][]{ snappyStreamCompress_1K(), snappyStreamCompress_1M(), snappyStreamCompress_100M() }; lz4Compressed = new byte[][]{ lz4Compress_1K(), lz4Compress_1M(), lz4Compress_100M() }; } ///// JDK Gzip @Benchmark public byte[] jdkGzipCompress_1K() { return streamBasedCompress(binaries[0], is -> new GZIPOutputStream(is)); } @Benchmark public byte[] jdkGzipCompress_1M() { return streamBasedCompress(binaries[1], is -> new GZIPOutputStream(is)); } @Benchmark public byte[] jdkGzipCompress_100M() { return streamBasedCompress(binaries[2], is -> new GZIPOutputStream(is)); } @Benchmark public byte[] jdkGzipDecompress_1K() { return streamBasedDecompress(jdkGzipCompressed[0], is -> new GZIPInputStream(is)); } @Benchmark public byte[] jdkGzipDecompress_1M() { return streamBasedDecompress(jdkGzipCompressed[1], is -> new GZIPInputStream(is)); } @Benchmark public byte[] jdkGzipDecompress_100M() { return streamBasedDecompress(jdkGzipCompressed[2], is -> new GZIPInputStream(is)); } ///// Commons Compress Gzip @Benchmark public byte[] commonsGzipCompress_1K() { return streamBasedCompress(binaries[0], is -> new GzipCompressorOutputStream(is)); } @Benchmark public byte[] commonsGzipCompress_1M() { return streamBasedCompress(binaries[1], is -> new GzipCompressorOutputStream(is)); } @Benchmark public byte[] commonsGzipCompress_100M() { return streamBasedCompress(binaries[2], is -> new GzipCompressorOutputStream(is)); } @Benchmark public byte[] commonsGzipDecompress_1K() { return streamBasedDecompress(commonsGzipCompressed[0], is -> new GzipCompressorInputStream(is)); } @Benchmark public byte[] commonsGzipDecompress_1M() { return streamBasedDecompress(commonsGzipCompressed[1], is -> new GzipCompressorInputStream(is)); } @Benchmark public byte[] commonsGzipDecompress_100M() { return streamBasedDecompress(commonsGzipCompressed[2], is -> new GzipCompressorInputStream(is)); } ///// Commons Compress Bzip2 @Benchmark public byte[] commonsBzip2Compress_1K() { return streamBasedCompress(binaries[0], is -> new BZip2CompressorOutputStream(is)); } @Benchmark public byte[] commonsBzip2Compress_1M() { return streamBasedCompress(binaries[1], is -> new BZip2CompressorOutputStream(is)); } @Benchmark public byte[] commonsBzip2Compress_100M() { return streamBasedCompress(binaries[2], is -> new BZip2CompressorOutputStream(is)); } @Benchmark public byte[] commonsBzip2Decompress_1K() { return streamBasedDecompress(commonsBzip2Compressed[0], is -> new BZip2CompressorInputStream(is)); } @Benchmark public byte[] commonsBzip2Decompress_1M() { return streamBasedDecompress(commonsBzip2Compressed[1], is -> new BZip2CompressorInputStream(is)); } @Benchmark public byte[] commonsBzip2Decompress_100M() { return streamBasedDecompress(commonsBzip2Compressed[2], is -> new BZip2CompressorInputStream(is)); } ///// Commons Compress Xz @Benchmark public byte[] commonsXzCompress_1K() { return streamBasedCompress(binaries[0], is -> new XZCompressorOutputStream(is)); } @Benchmark public byte[] commonsXzCompress_1M() { return streamBasedCompress(binaries[1], is -> new XZCompressorOutputStream(is)); } @Benchmark public byte[] commonsXzCompress_100M() { return streamBasedCompress(binaries[2], is -> new XZCompressorOutputStream(is)); } @Benchmark public byte[] commonsXzDecompress_1K() { return streamBasedDecompress(commonsXzCompressed[0], is -> new XZCompressorInputStream(is)); } @Benchmark public byte[] commonsXzDecompress_1M() { return streamBasedDecompress(commonsXzCompressed[1], is -> new XZCompressorInputStream(is)); } @Benchmark public byte[] commonsXzDecompress_100M() { return streamBasedDecompress(commonsXzCompressed[2], is -> new XZCompressorInputStream(is)); } ///// Snappy-Java Simply @Benchmark public byte[] snappySimplyCompress_1K() { return snappySimplyCompress(binaries[0]); } @Benchmark public byte[] snappySimplyCompress_1M() { return snappySimplyCompress(binaries[1]); } @Benchmark public byte[] snappySimplyCompress_100M() { return snappySimplyCompress(binaries[2]); } @Benchmark public byte[] snappySimplyDecompress_1K() { return snappySimplyDecompress(snappySimplyCompressed[0]); } @Benchmark public byte[] snappySimplyDecompress_1M() { return snappySimplyDecompress(snappySimplyCompressed[1]); } @Benchmark public byte[] snappySimplyDecompress_100M() { return snappySimplyDecompress(snappySimplyCompressed[2]); } ///// Snappy-Java Stream @Benchmark public byte[] snappyStreamCompress_1K() { return streamBasedCompress(binaries[0], is -> new SnappyOutputStream(is)); } @Benchmark public byte[] snappyStreamCompress_1M() { return streamBasedCompress(binaries[1], is -> new SnappyOutputStream(is)); } @Benchmark public byte[] snappyStreamCompress_100M() { return streamBasedCompress(binaries[2], is -> new SnappyOutputStream(is)); } @Benchmark public byte[] snappyStreamDecompress_1K() { return streamBasedDecompress(snappyStreamCompressed[0], is -> new SnappyInputStream(is)); } @Benchmark public byte[] snappyStreamDecompress_1M() { return streamBasedDecompress(snappyStreamCompressed[1], is -> new SnappyInputStream(is)); } @Benchmark public byte[] snappyStreamDecompress_100M() { return streamBasedDecompress(snappyStreamCompressed[2], is -> new SnappyInputStream(is)); } ///// LZ4-Java @Benchmark public byte[] lz4Compress_1K() { return lz4Compress(binaries[0]); } @Benchmark public byte[] lz4Compress_1M() { return lz4Compress(binaries[1]); } @Benchmark public byte[] lz4Compress_100M() { return lz4Compress(binaries[2]); } @Benchmark public byte[] lz4Decompress_1K() { return lz4Decompress(lz4Compressed[0], binaries[0].length); } @Benchmark public byte[] lz4Decompress_1M() { return lz4Decompress(lz4Compressed[1], binaries[1].length); } @Benchmark public byte[] lz4Decompress_100M() { return lz4Decompress(lz4Compressed[2], binaries[2].length); } @FunctionalInterface interface IoFunction<I, O> { O apply(I i) throws IOException; } byte[] streamBasedCompress(byte[] binary, IoFunction<OutputStream, OutputStream> fun) { ByteArrayOutputStream compressBaos = new ByteArrayOutputStream(); try (OutputStream compressStream = fun.apply(compressBaos)) { compressStream.write(binary); } catch (IOException e) { throw new UncheckedIOException(e); } return compressBaos.toByteArray(); } byte[] streamBasedDecompress(byte[] compressed, IoFunction<InputStream, InputStream> fun) { ByteArrayOutputStream decompressBaos = new ByteArrayOutputStream(); try (InputStream decompressStream = fun.apply(new ByteArrayInputStream(compressed))) { int b; while ((b = decompressStream.read()) != -1) { decompressBaos.write(b); } } catch (IOException e) { throw new UncheckedIOException(e); } return decompressBaos.toByteArray(); } byte[] snappySimplyCompress(byte[] binary) { try { return Snappy.compress(binary); } catch (IOException e) { throw new UncheckedIOException(e); } } byte[] snappySimplyDecompress(byte[] compressed) { try { return Snappy.uncompress(compressed); } catch (IOException e) { throw new UncheckedIOException(e); } } byte[] lz4Compress(byte[] binary) { LZ4Factory factory = LZ4Factory.fastestInstance(); LZ4Compressor compressor = factory.fastCompressor(); return compressor.compress(binary); } byte[] lz4Decompress(byte[] compressed, int originalSize) { LZ4Factory factory = LZ4Factory.fastestInstance(); LZ4FastDecompressor decompressor = factory.fastDecompressor(); return decompressor.decompress(compressed, originalSize); } }