Java 15で導入されたJEP 378 Text Blocksを試す

これは、なにをしたくて書いたもの？

Java 15で、JEP 378 Text Blocksが導入されました。

すでに時々使っているのですが、改行や空白の扱いなどを雰囲気で使っているので、1度ちゃんと見ておこうかなと思いまして。

JEP 378: Text Blocks

JEP 378: Text Blocksのページはこちら。

こういうやつですね。

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
              """;

リリース時のブログ。

Javaにテキスト・ブロックが登場

Descriptionをちょっと見てみましょう。

JEP 378: Text Blocks / Description

Text Blocksの概要は以下のようですね。

開始と終了の区切り文字で囲まれた、0個以上の文字からなるコンテンツで構成される
開始の区切り文字は3つのダブルクォート（"""）で、その後に0個以上の空白と改行が続く
- コンテンツは開始区切り文字に続く改行の後の、最初の文字から始まる
終了の区切り文字は3つのダブルクォート（"""）で、コンテンツは終了の区切り文字の前の最後の文字までになる

A text block consists of zero or more content characters, enclosed by opening and closing delimiters.

The opening delimiter is a sequence of three double quote characters (""") followed by zero or more white spaces followed by a line terminator. The content begins at the first character after the line terminator of the opening delimiter.

The closing delimiter is a sequence of three double quote characters. The content ends at the last character before the first double quote of the closing delimiter.

他の特徴。

コンテンツには、ダブルクォート（"）を直接含めることができる
- エスケープは不要
行末文字（\nなど）を含めることができる
- 使えるだけで、必須ではなく推奨もされていない

The content may include double quote characters directly, unlike the characters in a string literal. The use of \" in a text block is permitted, but not necessary or recommended. Fat delimiters (""") were chosen so that " characters could appear unescaped, and also to visually distinguish a text block from a string literal.

The content may include line terminators directly, unlike the characters in a string literal. The use of \n in a text block is permitted, but not necessary or recommended.

コンパイル時の処理。

JEP 378: Text Blocks / Description / Compile-time processing

改行はLFに統一される
- プラットフォーム間で挙動を同じとするため
ソースコードのインデントのための空白は削除される
エスケープシーケンスは解釈される

Line terminators in the content are translated to LF (\u000A). The purpose of this translation is to follow the principle of least surprise when moving Java source code across platforms.

Incidental white space surrounding the content, introduced to match the indentation of Java source code, is removed.

Escape sequences in the content are interpreted. Performing interpretation as the final step means developers can write escape sequences such as \n without them being modified or deleted by earlier steps.

改行はLFになるんですね。

改行、空白、エスケープシーケンスについては、それぞれ以下に詳しく書かれています。

また、新しいエスケープシーケンスとStringのメソッドが追加されているようですね。

JEP 378: Text Blocks / Description / New escape sequences
- \（改行）で、Text Blocks内の改行を抑制する
  - 長い文字列をText Blocksで表現する際に、改行を含めたくない場合に使用
- \sで、空白（\u0020）になる
  - 開始、末尾の空白の削除の抑制に使用

こんなところでしょうか。

空白の扱いなどは気になるので、簡単に試してみましょう。

環境

今回の環境は、こちら。

$ java --version
openjdk 17.0.5 2022-10-18
OpenJDK Runtime Environment (build 17.0.5+8-Ubuntu-2ubuntu122.04)
OpenJDK 64-Bit Server VM (build 17.0.5+8-Ubuntu-2ubuntu122.04, mixed mode, sharing)


$ mvn --version
Apache Maven 3.9.0 (9b58d2bad23a66be161c4664ef21ce219c2c8584)
Maven home: $HOME/.sdkman/candidates/maven/current
Java version: 17.0.5, vendor: Private Build, runtime: /usr/lib/jvm/java-17-openjdk-amd64
Default locale: ja_JP, platform encoding: UTF-8
OS name: "linux", version: "5.15.0-60-generic", arch: "amd64", family: "unix"

準備

pom.xmlには特に依存関係を設定せず、これくらいにしておきます。

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    </properties>

シンプルにmainメソッドを使って実行していこうと思います。

というわけで、実行方法は以下で。

$ mvn compile exec:java -Dexec.mainClass=[クラス名]

はじめてのText Block

まずはText Blockを試してみましょう。

こんな雛形のクラスを用意。

src/main/java/org/littlewings/textblocks/App1.java

package org.littlewings.textblocks;

public class App1 {
    public static void main(String... args) {
        // ここに処理を書く！
    }
}

mainメソッド内でいろいろ試してみましょう。

JEP 378の例に習って、HTMLで。

        String html1 = """
                <html>
                    <body>
                        <p>Hello, world</p>
                    </body>
                </html>
                """;

        System.out.printf("html1 ↓%n%s%n", html1);

結果。

html1 ↓
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>

改行の確認。

        System.out.printf(
                "has crlf = %b, has lf = %b, has cr = %b%n",
                html1.contains("\r\n"),
                html1.contains("\n"),
                html1.contains("\r")
        );

確かにLFのようです。

has crlf = false, has lf = true, has cr = false

ところで、出力した内容に改行が多いように見えますね。これをやめたい場合は、以下のように終端の"""をコンテンツの最後に配置します。

        String html2 = """                
                <html>
                    <body>
                        <p>Hello, world</p>
                    </body>
                </html>""";

        System.out.printf("html2 ↓%n%s%n", html2);

こうなりました。

html2 ↓
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>

また、始点の"""の後ろには空白と改行のみのような記述がありましたね。確認してみましょう。

        String html3 = """<html>
                    <body>
                        <p>Hello, world</p>
                    </body>
                </html>""";

        System.out.printf("html3 ↓%n%s%n", html3);

これはコンパイルエラーになります。

[ERROR] /path/to/src/main/java/org/littlewings/textblocks/App1.java:[37,27] テキスト・ブロックの開始区切り文字のシーケンスが無効です。行の終了文字がありません

わかりにくいですが、始点の"""の後ろに空白があるのはOKです。

        String html3 = """        
                <html>
                    <body>
                        <p>Hello, world</p>
                    </body>
                </html>""";

        System.out.printf("html3 ↓%n%s%n", html3);

といっても、文字列内には含まれないようですが。

html3 ↓
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>

Text Blockには"をふつうに含めることができます。

        String html4 = """        
                <html lang="ja">
                    <body>
                        <p style="text-align: center">Hello, world</p>
                    </body>
                </html>""";

        System.out.printf("html4 ↓%n%s%n", html4);

結果。

html4 ↓
<html lang="ja">
    <body>
        <p style="text-align: center">Hello, world</p>
    </body>
</html>

空白の扱いを確認する

Text Blockを使っていて、空白の扱いは気になるところです。コンパイル時の処理にも、インデントのための空白は削除されることが
書かれていましたし、先ほど使ったText Blockからも先頭の空白はなくなっています。

こちらの内容を、もう少し見てみます。

JEP 378: Text Blocks / Description / Compile-time processing / Incidental white space

削除される対象の空白は、Text Block内の先頭と末尾のもののようです。一方で、コンテンツとしての空白をどう解釈するかがポイントに
なるようです。

Accordingly, an appropriate interpretation for the content of a text block is to differentiate incidental white space at the start and end of each line, from essential white space.

結果を見ると、以下がわかりやすいですね。

JEP 378: Text Blocks / Description / Compile-time processing / Incidental white space / Significant trailing line policy

終端の区切り文字がコンテンツよりも左にある場合、区切り文字より左はインデントのための空白と見なされる
コンテンツが終端の区切り文字よりも左にある場合、最も左にあるコンテンツの位置より左がインデントのための空白と見なされる

要するに、コンテンツか終端の区切り文字で調整するようです。

いくつかバリエーションを試してみましょう。

こんな雛形を用意。

src/main/java/org/littlewings/textblocks/App2.java

package org.littlewings.textblocks;

import java.util.LinkedHashMap;
import java.util.Map;

public class App2 {
    public static void main(String... args) {
        Map<String, String> patterns = new LinkedHashMap<>();

        // ここにパターンを登録

        patterns.entrySet().forEach(e -> printString(e.getKey(), e.getValue()));
    }

    static void printString(String name, String target) {
        System.out.printf("""
                ===================================
                %s ↓
                -----
                %s
                ===================================
                """, name, target);
    }
}

Mapに、パターン名とText Blockを登録して確認していきます。

パターン。

        patterns.put(
                "独立した終端・コンテンツの始点と同じ位置",
                """
                        <html>
                            <body>
                                <p>Hello, world</p>
                            </body>
                        </html>
                        """
        );

結果。

===================================
独立した終端・コンテンツの始点と同じ位置 ↓
-----
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>

===================================

終端を左に寄せたパターン。

        patterns.put(
                "独立した終端・コンテンツの始点よりも左",
                """
                            <html>
                                <body>
                                    <p>Hello, world</p>
                                </body>
                            </html>
                        """
        );

結果。終端を左に寄せた分が全体の空白に反映されます。

===================================
独立した終端・コンテンツの始点よりも左 ↓
-----
    <html>
        <body>
            <p>Hello, world</p>
        </body>
    </html>

===================================

終端をコンテンツの始点よりも右にしたパターン。

        patterns.put(
                "独立した終端・コンテンツの始点よりも右",
                """
                        <html>
                            <body>
                                <p>Hello, world</p>
                            </body>
                        </html>
                            """
        );

結果。これはコンテンツの位置になります。

===================================
独立した終端・コンテンツの始点よりも右 ↓
-----
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>

===================================

パターン。

        patterns.put(
                "コンテンツと同じ行の終端",
                """
                        <html>
                            <body>
                                <p>Hello, world</p>
                            </body>
                        </html>"""
        );

結果。

===================================
コンテンツと同じ行の終端 ↓
-----
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>
===================================

コンテンツの始点を不揃いにしたパターン。

        patterns.put(
                "コンテンツと同じ行の終端・コンテンツの始点を不揃いに",
                """
                        <html>
                    <body>
                                <p>Hello, world</p>
                            </body>
                        </html>"""
        );

結果。非空白のコンテンツの最も左の位置が反映されます。

===================================
コンテンツと同じ行の終端・コンテンツの始点を不揃いに ↓
-----
    <html>
<body>
            <p>Hello, world</p>
        </body>
    </html>
===================================

コンテンツを空白で始めたいパターン。\sを使います。

        patterns.put(
                "コンテンツと同じ行の終端・\\sエスケープシーケンスで左の空白を確保",
                """
                        <html>
                            <body>
                    \s            <p>Hello, world</p>
                            </body>
                        </html>"""
        );

結果。この場合は、\sを配置した場所から空白を含めコンテンツとして解釈されます。

===================================
コンテンツと同じ行の終端・\sエスケープシーケンスで左の空白を確保 ↓
-----
    <html>
        <body>
             <p>Hello, world</p>
        </body>
    </html>
===================================

末尾に空白を置いたパターン。

        patterns.put(
                "コンテンツと同じ行の終端・末尾に空白を配置しているが削除される",
                """
                        <html>        
                            <body>        
                                <p>Hello, world</p>        
                            </body>        
                        </html>        """
        );

結果。末尾の空白は削除されます。

===================================
コンテンツと同じ行の終端・末尾に空白を配置しているが削除される ↓
-----
<html>
    <body>
        <p>Hello, world</p>
    </body>
</html>
===================================

末尾に\sを配置したパターン。

        patterns.put(
                "コンテンツと同じ行の終端・末尾の空白を\\sエスケープシーケンスで確保",
                """
                        <html>        \s
                            <body>        \s
                                <p>Hello, world</p>        \s
                            </body>        \s
                        </html>        \s"""
        );

結果。この場合、末尾の空白もコンテンツとして解釈されます。

===================================
コンテンツと同じ行の終端・末尾の空白を\sエスケープシーケンスで確保 ↓
-----
<html>         
    <body>         
        <p>Hello, world</p>         
    </body>         
</html>         
===================================

Text Blockを使って、長い1行を作る

\（改行）で、Text Block内の改行を抑制するという話でした。最後にこちらを試してみましょう。

src/main/java/org/littlewings/textblocks/App3.java

package org.littlewings.textblocks;

public class App3 {
    public static void main(String... args) {
        // ここに処理を書く
    }
}

\（改行）を使った例。

        String summary1 = """
                Add text blocks to the Java language. \
                A text block is a multi-line string literal that \
                avoids the need for most escape sequences, \
                automatically formats the string in a predictable way, \
                and gives the developer control over the format when desired.""";

        System.out.printf("summary1 ↓%n%s%n", summary1);

Text Blockとして見かけ上は改行されていますが、\（改行）とすることで文字列としては改行が含まれなくなります。

summary1 ↓
Add text blocks to the Java language. A text block is a multi-line string literal that avoids the need for most escape sequences, automatically formats the string in a predictable way, and gives the developer control over the format when desired.

あと、あまり意味がないという話でしたが、\nも使えますね。

        String summary2 = """
                Add text blocks to the Java language. \n
                A text block is a multi-line string literal that \n
                avoids the need for most escape sequences, \n
                automatically formats the string in a predictable way, \n
                and gives the developer control over the format when desired.""";

        System.out.printf("summary2 ↓%n%s%n", summary2);

こんな感じになりますが…。

summary2 ↓
Add text blocks to the Java language. 

A text block is a multi-line string literal that 

avoids the need for most escape sequences, 

automatically formats the string in a predictable way, 

and gives the developer control over the format when desired.

まとめ

Java 15で導入されたJEP 378 Text Blocksを試してみました。

空白の扱いのルールや、\（改行）、\sといったエスケープシーケンスなど、この機会に確認できてよかったかなと思います。

使えるところでは使っていきましょう。

CLOVER🍀

That was when it all began.