VarnishのVCLのデフォルト設定を見つつ、設定を変更して遊んでみる

これは、なにをしたくて書いたもの？

少し前に、Varnish 6.0をインストールしてみました。

Varnish 6.0をUbuntu Linux 18.04 LTSにインストールする - CLOVER🍀

この時は、とりあえずインストールしただけだったので、今回はもう少し設定を扱ってみましょう。

環境

今回の環境は、こちら。

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:    18.04
Codename:   bionic

Varnishのバージョンは、こちらです。

$ varnishd -V
varnishd (varnish-6.0.4 revision 14d4eee6f0afc3020a044341235f54f3bd1449f1)
Copyright (c) 2006 Verdens Gang AS
Copyright (c) 2006-2019 Varnish Software AS

Varnishおよびバックエンドサーバーが稼働しているサーバーのIPアドレスおよび、アクセスするクライアントのIPアドレスは、
以下とします。

Varnish and オリジンサーバー … 192.168.33.10
クライアント … 192.168.33.1

デフォルトのVarnishの設定

まずは、ベースとなるVarnishの設定を確認してみます。コメントアウトと飛ばすと、こんな感じでした。

$ grep -vE ' *#' /etc/varnish/default.vcl

vcl 4.0;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
}

sub vcl_backend_response {
}

sub vcl_deliver {
}

バックエンドサーバー（テスト用のサーバー）

今回は、バックエンドサーバーを簡単にPythonのHTTPサーバーで用意します。

カレントディレクトリと、サブディレクトリにファイルをひとつ用意して、ポート8080で起動。

$ echo 'Hello Varnish!!' > hello.txt
$ mkdir sub-dir
$ echo 'Hello Cache Server!!' > sub-dir/cache.txt

$ python3 -m http.server 8080

Varnishへは、ポート6081越しにアクセスすれば、Varnishを経由してキャッシュがなければバックエンドサーバーへのアクセスとなります。

$ curl 192.168.33.10:6081/hello.txt
Hello Varnish!!

ビルトインの設定？

ところで、用意されているデフォルトの設定ファイル（/etc/varnish/default.vcl）の中には特になにも設定がなさそうでしたが、
キャッシュはすでに有効になっている感じでした。

どうなっているんでしょう？

ビルトインの設定ファイルがあるようです。

https://github.com/varnishcache/varnish-cache/blob/varnish-6.0.4/bin/varnishd/builtin.vcl

$ curl -s https://raw.githubusercontent.com/varnishcache/varnish-cache/varnish-6.0.4/bin/varnishd/builtin.vcl | grep -vE '*#' 
/*-
 * Copyright (c) 2006 Verdens Gang AS
 * Copyright (c) 2006-2015 Varnish Software AS
 * All rights reserved.
 *
 * Author: Poul-Henning Kamp <phk@phk.freebsd.dk>
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * This is the builtin VCL code
 */

vcl 4.0;


sub vcl_recv {
    if (req.method == "PRI") {
        /* This will never happen in properly formed traffic (see: RFC7540) */
        return (synth(405));
    }
    if (!req.http.host &&
      req.esi_level == 0 &&
      req.proto ~ "^(?i)HTTP/1.1") {
        /* In HTTP/1.1, Host is required. */
        return (synth(400));
    }
    if (req.method != "GET" &&
      req.method != "HEAD" &&
      req.method != "PUT" &&
      req.method != "POST" &&
      req.method != "TRACE" &&
      req.method != "OPTIONS" &&
      req.method != "DELETE" &&
      req.method != "PATCH") {
        /* Non-RFC2616 or CONNECT which is weird. */
        return (pipe);
    }

    if (req.method != "GET" && req.method != "HEAD") {
        /* We only deal with GET and HEAD by default */
        return (pass);
    }
    if (req.http.Authorization || req.http.Cookie) {
        /* Not cacheable by default */
        return (pass);
    }
    return (hash);
}

sub vcl_pipe {
    return (pipe);
}

sub vcl_pass {
    return (fetch);
}

sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    return (lookup);
}

sub vcl_purge {
    return (synth(200, "Purged"));
}

sub vcl_hit {
    if (obj.ttl >= 0s) {
        // A pure unadulterated hit, deliver it
        return (deliver);
    }
    if (obj.ttl + obj.grace > 0s) {
        // Object is in grace, deliver it
        // Automatically triggers a background fetch
        return (deliver);
    }
    // fetch & deliver once we get the result
    return (miss);
}

sub vcl_miss {
    return (fetch);
}

sub vcl_deliver {
    return (deliver);
}

/*
 * We can come here "invisibly" with the following errors: 500 & 503
 */
sub vcl_synth {
    set resp.http.Content-Type = "text/html; charset=utf-8";
    set resp.http.Retry-After = "5";
    set resp.body = {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + resp.status + " " + resp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + resp.status + " " + resp.reason + {"</h1>
    <p>"} + resp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + req.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"};
    return (deliver);
}


sub vcl_backend_fetch {
    if (bereq.method == "GET") {
        unset bereq.body;
    }
    return (fetch);
}

sub vcl_backend_response {
    if (bereq.uncacheable) {
        return (deliver);
    } else if (beresp.ttl <= 0s ||
      beresp.http.Set-Cookie ||
      beresp.http.Surrogate-control ~ "(?i)no-store" ||
      (!beresp.http.Surrogate-Control &&
        beresp.http.Cache-Control ~ "(?i:no-cache|no-store|private)") ||
      beresp.http.Vary == "*") {
        set beresp.ttl = 120s;
        set beresp.uncacheable = true;
    }
    return (deliver);
}

sub vcl_backend_error {
    set beresp.http.Content-Type = "text/html; charset=utf-8";
    set beresp.http.Retry-After = "5";
    set beresp.body = {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + beresp.status + " " + beresp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + beresp.status + " " + beresp.reason + {"</h1>
    <p>"} + beresp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + bereq.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"};
    return (deliver);
}


sub vcl_init {
    return (ok);
}

sub vcl_fini {
    return (ok);
}

デフォルトで用意されている設定ファイルの先頭のコメントを読むと、この設定ファイルの後にビルトインの内容が動くように
読めますね。

$ head -n 6 /etc/varnish/default.vcl 
#
# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.

設定ファイルを読もう

ところで、設定ファイルはVCLという言語で書くようなのですが

VCL - Varnish Configuration Language — Varnish version 6.0.4 documentation

読み方がよくわかりませんね？

シンタックスおよびリファレンスは、こちら。

VCL Syntax — Varnish version 6.0.4 documentation

VCL — Varnish version 6.0.4 documentation

ですが、これだけだと、このような設定はちょっと読めません。

sub vcl_recv {
    if (req.method == "PRI") {
        /* This will never happen in properly formed traffic (see: RFC7540) */
        return (synth(405));
    }
    if (!req.http.host &&
      req.esi_level == 0 &&
      req.proto ~ "^(?i)HTTP/1.1") {
        /* In HTTP/1.1, Host is required. */
        return (synth(400));
    }
    if (req.method != "GET" &&
      req.method != "HEAD" &&
      req.method != "PUT" &&
      req.method != "POST" &&
      req.method != "TRACE" &&
      req.method != "OPTIONS" &&
      req.method != "DELETE" &&
      req.method != "PATCH") {
        /* Non-RFC2616 or CONNECT which is weird. */
        return (pipe);
    }

    if (req.method != "GET" && req.method != "HEAD") {
        /* We only deal with GET and HEAD by default */
        return (pass);
    }
    if (req.http.Authorization || req.http.Cookie) {
        /* Not cacheable by default */
        return (pass);
    }
    return (hash);
}

これを読むには、ビルトインのサブルーチンと、状態遷移のドキュメントを読むとよさそうです。

ビルトインのサブルーチンと状態遷移のドキュメントは、クライアントサイドとバックエンドサイドに分かれています。

Built in subroutines — Varnish version 6.0.4 documentation

Varnish Processing States — Varnish version 6.0.4 documentation

クライアントサイドの図。

f:id:Kazuhira:20190929012646p:plain

バックエンドサイドの図。クライアントサイドの図の「see backend graph」の部分が該当します。

f:id:Kazuhira:20191005173517p:plain

図を見ていると、ビルトインのサブルーチンと、サブルーチンが返す値で状態遷移が決まることがなんとなくわかります。

例えば、vcl_recvで「pass」を返すと、次はvcl_passに移る、という具合ですね。

vcl_passまたはvcl_missのあとは、バックエンドに移り、オリジンサーバーへのアクセスが行われます。

ビルトインのサブルーチンの意味を、さらっと眺めてみましょう。

Built in subroutines — Varnish version 6.0.4 documentation

クライアントサイド
- vcl_recv … リクエストの開始時に呼び出されます。この後に、どのような処理に振り分けるかを決定します
- vcl_pipe … パイプモードに入る時に呼び出されます。この場合、クライアントとバックエンドのやり取りは接続が終了するまで、そのままの内容で渡されます。単純なプロキシになるということです
- vcl_pass … リクエストをバックエンドサーバーに渡します。この場合、レスポンスはキャッシュには保存されません。プロキシになるわけですね
- vcl_hash … vcl_recvの後に呼び出されます。リクエストに対するハッシュ値を作成し、キャッシュ有無の確認に使われます
- vcl_purge … キャッシュのパージが実行されます
- vcl_miss … リクエストに対応するキャッシュがない場合に呼び出され、バックエンドサーバーへの呼び出しを行います。また、vcl_hitがfetchを戻した場合にも呼び出されます（deliverかな？）
- vcl_hit … リクエストに対応するキャッシュがヒットした場合に呼び出されます
- vcl_deliver … クライアントにレスポンスを返します
- vcl_synth … syntheticオブジェクトを配信するために呼び出されます。syntheticオブジェクトはVCLで作成され、バックエンドサーバーからは取得されません
バックエンドサイド
- vcl_backend_fetch … バックエンドサーバーへのリクエストを送信する前に呼び出されます。バックエンドサーバーに送信するリクエストを変更するのに使われます
- vcl_backend_response … レスポンスヘッダーが、バックエンドサーバーから正常に戻ってきた時に呼び出されます
- vcl_backend_error … バックエンドサーバーからのフェッチに失敗した場合、またはmax_retriesを超えた場合に呼び出されます

というあたりを見つつ、もう1度状態遷移の図を眺めてみるとよいかもしれません。

設定を変更してみる

それでは、試しにいくつかデフォルトの設定ファイルを変更してみましょう。

「/hello.txt」にアクセスされた時は、単純にプロキシするようにしてみましょう。

sub vcl_recv {
   if (req.url == "/hello.txt") {
       return (pass);
   }
}

設定を変えたら、Varnishをrestartまたはreloadします。

確認。

$ curl 192.168.33.10:6081/hello.txt
Hello Varnish!!

アクセスログを見てみます。

$ sudo varnishncsa -F '%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-ent}i" %{Varnish:hitmiss}x'

何度アクセスしても「miss」になり、バックエンドサーバーへリクエストが送信されます。

192.168.33.1 - - [05/Oct/2019:10:09:44 +0000] "GET http://192.168.33.10:6081/hello.txt HTTP/1.1" 200 16 "-" "curl/7.58.0" miss
192.168.33.1 - - [05/Oct/2019:10:09:45 +0000] "GET http://192.168.33.10:6081/hello.txt HTTP/1.1" 200 16 "-" "curl/7.58.0" miss

サブディレクトリのコンテンツにアクセスしてみます。

$ curl 192.168.33.10:6081/sub-dir/cache.txt
Hello Cache Server!!
$ curl 192.168.33.10:6081/sub-dir/cache.txt
Hello Cache Server!!

こちらは、2回目はキャッシュが使われます。

192.168.33.1 - - [05/Oct/2019:10:10:29 +0000] "GET http://192.168.33.10:6081/sub-dir/cache.txt HTTP/1.1" 200 21 "-" "curl/7.58.0" miss
192.168.33.1 - - [05/Oct/2019:10:10:29 +0000] "GET http://192.168.33.10:6081/sub-dir/cache.txt HTTP/1.1" 200 21 "-" "curl/7.58.0" hit

正規表現も使えるので、例えばこれだと単純なプロキシサーバーになりますね。

sub vcl_recv {
   if (req.url ~ "^.+") {
       return (pass);
   }
}

論理演算の例。GETメソッドかつ「/sub-dir/cache.txt」へのアクセスの時は、キャッシュしない…まあ、やや作為的な例ですが。

sub vcl_recv {
    if (req.method == "GET" && req.url == "/sub-dir/cache.txt") {
      return (pass);
    }
}

さて、vcl_recvは戻しましょう。

sub vcl_recv {
}

次は、User-Agentでキャッシュの種類を分けてみましょう。

先ほどからcurlで確認していますが、curlとwgetで利用するキャッシュを変更してみます。この場合、vcl_hashを使います。

sub vcl_hash {
    if (req.http.user-agent ~ "curl.+") {
        hash_data("curl");
    } else if (req.http.user-agent ~ "Wget.+") {
        hash_data("wget");
    }
}

この状態でVarnishを再起動して、curlで2回アクセス。

$ curl 192.168.33.10:6081/hello.txt
Hello Varnish!!

$ curl 192.168.33.10:6081/hello.txt
Hello Varnish!!

アクセスログを見ます。2回目は、キャッシュされていますね。

192.168.33.1 - - [05/Oct/2019:10:33:25 +0000] "GET http://192.168.33.10:6081/hello.txt HTTP/1.1" 200 16 "-" "curl/7.58.0" miss
192.168.33.1 - - [05/Oct/2019:10:33:26 +0000] "GET http://192.168.33.10:6081/hello.txt HTTP/1.1" 200 16 "-" "curl/7.58.0" hit

続いて、wgetでアクセス。

$ wget -q 192.168.33.10:6081/hello.txt -O /dev/stdout 
Hello Varnish!!

$ wget -q 192.168.33.10:6081/hello.txt -O /dev/stdout 
Hello Varnish!!

事前にcurlでアクセスしているにも関わらず、1回目のリクエストはキャッシュミスしました。

192.168.33.1 - - [05/Oct/2019:10:33:27 +0000] "GET http://192.168.33.10:6081/hello.txt HTTP/1.1" 200 16 "-" "Wget/1.19.4 (linux-gnu)" miss
192.168.33.1 - - [05/Oct/2019:10:33:27 +0000] "GET http://192.168.33.10:6081/hello.txt HTTP/1.1" 200 16 "-" "Wget/1.19.4 (linux-gnu)" hit

OKそうです。

また、アクセス時のパスに合わせて、TTLを変更してみましょう。ここでは、vcl_backend_response内で定義します。

sub vcl_backend_response {
    if (bereq.url == "/hello.txt") {
        set beresp.ttl = 10s;
    } else if (bereq.url == "/sub-dir/cache.txt") {
        set beresp.ttl = 5s;
    }
}

「/hello.txt」では10秒、「/sub-dir/cache.txt」では5秒、それぞれキャッシュします。

最後に、複数のバックエンドサーバーを使う例を試してみましょう。ドキュメントの以下を参考にして

Multiple backends

複数のbackendを定義して、

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

backend other {
    .host = "127.0.0.1";
    .port = "3000";
}

vcl_recvで、条件に応じて「req.backend_hint」で振り分けます。

sub vcl_recv {
    if (req.url ~ "^/other/") {
        set req.backend_hint = other;
    } else {
        set req.backend_hint = default;
    }
}

いろいろ試してみましたが、少しVCLに慣れてきた感じがします。

基本的なことは、できそうな感じでしょうか？