TiUP playgroundでクラスター内のコンポーネントのインスタンスを増減させてみる

これは、なにをしたくて書いたもの？

TiDBのコマンドラインツールであるTiUPには簡単にローカルにTiDBの環境を作ることができる、playgroundというコマンドがあります。

このコマンドでもTiDBのクラスター内のコンポーネントを複数のインスタンスで構成できるようなので、試してみることにしました。

TiUP playgroundコマンドの位置づけ

TiUPのplaygroundのドキュメントはこちら。TiUP playgroundはローカルにTiDBクラスターを簡単に構築できるコマンドです。

Quickly Deploy a Local TiDB Cluster | PingCAP Docs

特にドキュメントには書かれていませんが、TiUP playgroundは本番環境での利用は想定されていないようです。

Quick Startでは、それがハッキリと書かれています。

The deployment method provided in this guide is ONLY FOR quick start, NOT FOR production or comprehensive functionality and stability testing.

Quick Start Guide for the TiDB Database Platform | PingCAP Docs

本番環境を構築する際には、TiUP clusterコマンドを使用します。TiUP clusterコマンドのドキュメントには、TiUP playgroundがテスト向け
であることが書かれていますね。

Similar to the TiUP playground component used for a local test deployment, the TiUP cluster component quickly deploys TiDB for production environment.

Deploy and Maintain an Online TiDB Cluster Using TiUP | PingCAP Docs

TiUP playgroundコマンドでクラスター内のコンポーネントのインスタンス数を増減させる

TiUP playgroundコマンドでクラスター内のコンポーネントのインスタンス数を増減させる方法は、このあたりに記載があります。

後半2つを見ると、途中からインスタンス数を増減できるようです。

今回はこのあたりを試してみたいと思います。

環境

今回の環境はこちら。

$ tiup --version
1.15.2 v1.15.0-nightly-17
Go Version: go1.21.10
Git Ref: master
GitHash: 8ee99e4c89915cdd7986cc5d9921a73db5850dc1

TiDBは8.1.0を使います。

my:root@172.17.0.2:4000=> select version();
     version()
--------------------
 8.0.11-TiDB-v8.1.0
(1 row)

TiUP playgroundでコンポーネントのインスタンス数を増やしたクラスターを立ち上げてみる

まず、ベースのコマンドは以下とします。

$ tiup playground v8.1.0 --host 0.0.0.0

現時点のTiDBのLTSである8.1.0でクラスターを構築します。

ちなみに、--helpで今回やろうとしていることのヒントも得られたりします。

$ tiup playground --help
Bootstrap a TiDB cluster in your local host, the latest release version will be chosen
if you don't specified a version.

Examples:
  $ tiup playground nightly                         # Start a TiDB nightly version local cluster
  $ tiup playground v5.0.1 --db 3 --pd 3 --kv 3     # Start a local cluster with 10 nodes
  $ tiup playground nightly --without-monitor       # Start a local cluster and disable monitor system
  $ tiup playground --pd.config ~/config/pd.toml    # Start a local cluster with specified configuration file
  $ tiup playground --db.binpath /xx/tidb-server    # Start a local cluster with component binary path
  $ tiup playground --mode tikv-slim                # Start a local tikv only cluster (No TiDB or TiFlash Available)
  $ tiup playground --mode tikv-slim --kv 3 --pd 3  # Start a local tikv only cluster with 6 nodes

Usage:

では、TiKVを3つ、PDを3つで起動してみましょう。

$ tiup playground v8.1.0 --kv 3 --pd 3 --host 0.0.0.0

TiKVはストレージ、PDはメタデータ管理用のコンポーネントです。TiDBはSQLレイヤーを処理するコンピューティングノードですが、今回は
ひとつのままにします。TiFlashもひとつのままにします。

起動すると、こんな表示が出ます。

Start pd instance:v8.1.0
Start pd instance:v8.1.0
Start pd instance:v8.1.0
Start tikv instance:v8.1.0
Start tikv instance:v8.1.0
Start tikv instance:v8.1.0
Start tidb instance:v8.1.0

起動しきったら、確認してみましょう。

🎉 TiDB Playground Cluster is started, enjoy!

Connect TiDB:    mysql --comments --host 172.17.0.2 --port 4000 -u root
TiDB Dashboard:  http://172.17.0.2:2379/dashboard
Grafana:         http://0.0.0.0:3000

まずはダッシュボードで確認するのがわかりやすいでしょう。URLはhttp://[host]:2379/dashboardです。

「Cluster Info」ページの「Instances」で各コンポーネントのインスタンス数を確認できます。

ストレージサーバーについてだと、「Store Topology」を見るのもよさそうです。

このあたりは、クラスターの状態を確認するこちらのページや

Check Cluster Status | PingCAP Docs

TiDB Dashboardのページが参考になるでしょう。

TiDB Dashboard Cluster Information Page | PingCAP Docs

information_schemaでも確認してみます。

TiDBへ接続。

$ tiup client

information_schemaを使うように切り替え。

my:root@172.17.0.2:4000=> use information_schema;
USE

クラスターの情報はcluster_infoで。

my:root@172.17.0.2:4000=> select * from cluster_info;
  TYPE   |     INSTANCE     |  STATUS_ADDRESS  | VERSION |                 GIT_HASH                 |      START_TIME      |      UPTIME      | SERVER_ID
---------+------------------+------------------+---------+------------------------------------------+----------------------+------------------+-----------
 tidb    | 172.17.0.2:4000  | 172.17.0.2:10080 | 8.1.0   | 945d07c5d5c7a1ae212f6013adfb187f2de24b23 | 2024-06-03T08:20:18Z | 13m59.963463577s |       360
 pd      | 172.17.0.2:2384  | 172.17.0.2:2384  | 8.1.0   | fca469ca33eb5d8b5e0891b507c87709a00b0e81 | 2024-06-03T08:19:45Z | 14m32.96347241s  |         0
 pd      | 172.17.0.2:2379  | 172.17.0.2:2379  | 8.1.0   | fca469ca33eb5d8b5e0891b507c87709a00b0e81 | 2024-06-03T08:19:45Z | 14m32.963476216s |         0
 pd      | 172.17.0.2:2382  | 172.17.0.2:2382  | 8.1.0   | fca469ca33eb5d8b5e0891b507c87709a00b0e81 | 2024-06-03T08:19:45Z | 14m32.963479484s |         0
 tikv    | 172.17.0.2:20162 | 172.17.0.2:20182 | 8.1.0   | ba73b0d92d94463d74543550d0efe61fa6a6f416 | 2024-06-03T08:19:49Z | 14m28.963485929s |         0
 tiflash | 172.17.0.2:3930  | 172.17.0.2:20292 | 8.1.0   | c1838001167c8ba83af759085a71ad61e6c2a5af | 2024-06-03T08:20:46Z | 13m31.963488001s |         0
 tikv    | 172.17.0.2:20160 | 172.17.0.2:20180 | 8.1.0   | ba73b0d92d94463d74543550d0efe61fa6a6f416 | 2024-06-03T08:19:49Z | 14m28.96348984s  |         0
 tikv    | 172.17.0.2:20161 | 172.17.0.2:20181 | 8.1.0   | ba73b0d92d94463d74543550d0efe61fa6a6f416 | 2024-06-03T08:19:49Z | 14m28.963491587s |         0
(8 rows)

CLUSTER_INFO | PingCAP Docs

クラスターのシステム情報はcluster_systeminfoで。

my:root@172.17.0.2:4000=> select * from cluster_systeminfo where name like '%kernel.osrelease%';
  TYPE   |     INSTANCE     | SYSTEM_TYPE | SYSTEM_NAME |       NAME       |       VALUE
---------+------------------+-------------+-------------+------------------+--------------------
 tidb    | 172.17.0.2:4000  | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 pd      | 172.17.0.2:2384  | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 pd      | 172.17.0.2:2379  | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 pd      | 172.17.0.2:2382  | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 tikv    | 172.17.0.2:20160 | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 tikv    | 172.17.0.2:20161 | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 tikv    | 172.17.0.2:20162 | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
 tiflash | 172.17.0.2:3930  | system      | sysctl      | kernel.osrelease | 5.15.0-107-generic
(8 rows)

CLUSTER_SYSTEMINFO | PingCAP Docs

クラスターに関するテーブルはいろいろありますが、ログも見れるようです。

CLUSTER_LOG | PingCAP Docs

利用には条件の指定が必須のようですが。

my:root@172.17.0.2:4000=> select * from cluster_log;
error: mysql: 1105: denied to scan logs, please specified the start time, such as `time > '2020-01-01 00:00:00'`


my:root@172.17.0.2:4000=> select * from cluster_log where time > '2024-06-03 00:00:00';
error: mysql: 1105: denied to scan logs, please specified the end time, such as `time < '2020-01-01 00:00:00'`


my:root@172.17.0.2:4000=> select * from cluster_log where time > '2024-06-03 00:00:00' and time < '2024-06-03 17:35:00';
error: mysql: 1105: denied to scan full logs (use `SELECT * FROM cluster_log WHERE message LIKE '%'` explicitly if intentionally)

時間の絞り込み＋なんらかの条件指定が必要なようです。たとえば、以下なら実行できます。

my:root@172.17.0.2:4000=> select * from cluster_log where type = 'tidb' and time > '2024-06-03 00:00:00' and time < '2024-06-03 17:35:00';

このクラスターは、ここで終了させます。

TiUP playgroundで起動済みのTiDBクラスター内のコンポーネントのインスタンス数を増減させてみる

次に、TiUP playgroundで起動済みのTiDBクラスター内のコンポーネントのインスタンス数を増減させてみます。

まずは各コンポーネントのインスタンスがひとつのクラスターを起動。

$ tiup playground v8.1.0 --host 0.0.0.0

続いて、別のターミナルでtiup playground scale-outを実行します。ヘルプはこちら。

$ tiup playground scale-out --help
Usage:
  tiup scale-out instances [flags]

Examples:
tiup playground scale-out --db 1

Flags:
      --db int                   TiDB instance number
      --db.binpath string        TiDB instance binary path
      --db.config string         TiDB instance configuration file
      --db.host host             Playground TiDB host. If not provided, TiDB will still use host flag as its host
      --drainer int              Drainer instance number
      --drainer.binpath string   Drainer instance binary path
      --drainer.config string    Drainer instance configuration file
  -h, --help                     help for scale-out
      --kv int                   TiKV instance number
      --kv.binpath string        TiKV instance binary path
      --kv.config string         TiKV instance configuration file
      --kvcdc int                TiKV-CDC instance number
      --kvcdc.binpath string     TiKVCDC instance binary path
      --pd int                   PD instance number
      --pd.binpath string        PD instance binary path
      --pd.config string         PD instance configuration file
      --pd.host host             Playground PD host. If not provided, PD will still use host flag as its host
      --pump int                 Pump instance number
      --pump.binpath string      Pump instance binary path
      --pump.config string       Pump instance configuration file
      --ticdc int                TiCDC instance number
      --ticdc.binpath string     TiCDC instance binary path
      --tiflash int              TiFlash instance number
      --tiflash.binpath string   TiFlash instance binary path
      --tiflash.config string    TiFlash instance configuration file
      --tiproxy int              TiProxy instance number
      --tiproxy.binpath string   TiProxy instance binary path
      --tiproxy.config string    TiProxy instance configuration file
      --tiproxy.host host        Playground TiProxy host. If not provided, TiProxy will still use host flag as its host

Global Flags:
  -T, --tag string   Specify a tag for playground

今回はTiKVに3を指定してみます。

$ tiup playground scale-out --kv 3

TiUP playgroundを実行している側のターミナルには、以下のようなメッセージが表示されます。

receive command: scale-out
Start tikv instance:v8.1.0
receive command: scale-out
Start tikv instance:v8.1.0
receive command: scale-out
Start tikv instance:v8.1.0

ダッシュボードでクラスターを構成するインタンス数を見てみます。

TiKVのインスタンスが4つありますね。どうやらTiUP playground scale-outは「コンポーネントを指定したインスタンス数にする」のではなく、
「コンポーネントのインスタンス数を指定した分だけ追加する」という挙動のようです。

TiKVをひとつ減らしてみましょう。

TiUP playground scale-inを行います。このコマンドを実行するには削除対象のインスタンスのpidを指定する必要があるのですが、これは
どうやって把握するのがいいんでしょうね。

答えはTiUP playground scale-inのヘルプにありました。

$ tiup playground scale-in --help
Usage:
  tiup scale-in a instance with specified pid [flags]

Examples:
tiup playground scale-in --pid 234 # You can get pid by `tiup playground display`

Flags:
  -h, --help       help for scale-in
      --pid ints   pid of instance to be scale in

Global Flags:
  -T, --tag string   Specify a tag for playground

TiUP playground displayを実行すればよさそうです。

$ tiup playground display
Pid   Role     Uptime
---   ----     ------
45    pd       10m6.353956919s
60    tikv     9m29.853588243s
1474  tikv     6m15.276438452s
1495  tikv     6m15.249303918s
1531  tikv     6m15.220257415s
225   tidb     9m19.067370921s
304   tiflash  7m24.357840942s

pidの1番大きいTiKVのインスタンスをスケールインしてみます。

$ tiup playground scale-in --pid 1531
tikv will be stop when tombstone

TiUP playgroundを実行しているターミナルにはこんなメッセージが表示された後に

receive command: scale-in

こういう表示が続くようになります。

no store matching address "0.0.0.0:42071" found
no store matching address "0.0.0.0:42071" found
no store matching address "0.0.0.0:42071" found

そしてクラスター上にも表示されたままです。

スケールインした時のメッセージにありましたが、Tombstoneというステータスにすれば止まるらしいですね。

tikv will be stop when tombstone

TiDB Scheduling / Information collection

この操作はpd-ctlという機能で行えるようです。TiUP ctlで起動。-iは対話形式で実行するオプションです。

$ tiup ctl:v8.1.0 pd -i

PD Control User Guide | PingCAP Docs

storeコマンドで現在認識しているストレージを確認できるので

» store

該当のストレージの情報を探して

    {
      "store": {
        "id": 129,
        "address": "172.17.0.2:42071",
        "version": "8.1.0",
        "peer_address": "172.17.0.2:42071",
        "status_address": "0.0.0.0:38379",
        "git_hash": "ba73b0d92d94463d74543550d0efe61fa6a6f416",
        "start_timestamp": 1717406404,
        "deploy_path": "/home/user/.tiup/components/tikv/v8.1.0",
        "last_heartbeat": 1717407595131118461,
        "state_name": "Up"
      },
      "status": {
        "capacity": "268.6GiB",
        "available": "12.53GiB",
        "used_size": "290.7MiB",
        "leader_count": 17,
        "leader_weight": 1,
        "leader_score": 17,
        "leader_size": 17,
        "region_count": 60,
        "region_weight": 1,
        "region_score": 7684711861.653917,
        "region_size": 60,
        "slow_score": 1,
        "slow_trend": {
          "cause_value": 250114.54530201343,
          "cause_rate": 0,
          "result_value": 3,
          "result_rate": 0
        },
        "start_ts": "2024-06-03T09:20:04Z",
        "last_heartbeat_ts": "2024-06-03T09:39:55.131118461Z",
        "uptime": "19m51.131118461s"
      }
    },

削除。

» store delete 129
Success!

クラスターからいなくなりました。

ちなみにこういうことになるのはデータを持つTiKVだからのようで、TiDBのようなコンピューティングノードであればあっさり済むようです。

試してみましょう。TiDBのインスタンスをひとつ追加します。

user@8c4410c3715a:~$ tiup playground scale-out --db 1
To connect new added TiDB: mysql --comments --host 172.17.0.2 --port 33223 -u root -p (no password)

TiUP playground側の表示されるメッセージ。

receive command: scale-out
Start tidb instance:v8.1.0
To connect new added TiDB: mysql --comments --host 172.17.0.2 --port 33223 -u root -p (no password)

クラスター内のインスタンスを確認。

1番新しいTiDBのインスタンスのpidを確認します。

$ tiup playground display
Pid   Role     Uptime
---   ----     ------
5370  pd       20m33.894317455s
5383  tikv     20m33.867691647s
5403  tidb     20m33.841844695s
6736  tidb     19m46.903273974s
5605  tiflash  20m10.217349543s

実行。今回はTombstoneとは言われません。

$ tiup playground scale-in --pid 6736
scale in tidb success

TiUP playground側に表示されるメッセージ。

receive command: scale-in
tidb quit

TiUP playground displayの結果からはいなくなりました。

$ tiup playground display
Pid   Role     Uptime
---   ----     ------
5370  pd       22m22.570134625s
5383  tikv     22m22.543507379s
5403  tidb     22m22.517660271s
5605  tiflash  21m58.893163139s

ただ、クラスターに存在していたことは覚えているようです。