uWSGIで複数プロセスを使ってアプリケーションを起動した時に、複数のCPUを使えているかどうか確認する

これは、なにをしたくて書いたもの？

uWSGIやGunicornなど、WSGIサーバーは起動時に複数プロセスを利用させるように設定できるのですが、それで複数のCPUコアを
使ってくれるのかを試してみようかなと。

今回は、uWSGIを使って試すことにします。

環境

環境は、Ubuntu Linux 18.04 LTS。

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:    18.04
Codename:   bionic

CPUが8つあるPCを利用します。

準備とサンプルコード

uWSGIのインストール。

$ pip3 install uwsgi

バージョン。

$ pip3 freeze
...
uWSGI==2.0.18

CPUを使うようなスクリプトを用意しました。フィボナッチ数の計算を…。
wsgi-app-heavy.py

import os
import threading

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])

    fib_result = fib(35)

    pid = os.getpid()
    thread_name = threading.current_thread().getName()

    return ['get response fib(35) = {} / pid = {}, thread-name = {}'.format(fib_result, pid, thread_name).encode()]

確認

まずは、デフォルト状態で確認してみます。1プロセスが起動します。

$ uwsgi --http :8000 --wsgi-file wsgi-app-heavy.py

...
spawned uWSGI worker 1 (and the only) (pid: 21443, cores: 1)

アクセスすると、手元の環境では4秒近くかかるようです。

$ time curl localhost:8000
get response fib(35) = 9227465 / pid = 21443, thread-name = uWSGIWorker1Core0
real    0m3.794s
user    0m0.012s
sys 0m0.004s

mpstatでCPUの状況を見てみます。

$ mpstat -P ALL 1

ひとつのCPUコアを、100％使っていますね。

22時27分35秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
22時27分36秒  all   15.61    0.00    0.75    0.00    0.00    0.12    0.00    0.00    0.00   83.52
22時27分36秒    0    4.08    0.00    0.00    0.00    0.00    1.02    0.00    0.00    0.00   94.90
22時27分36秒    1    2.02    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   96.97
22時27分36秒    2    6.73    0.00    0.96    0.00    0.00    0.00    0.00    0.00    0.00   92.31
22時27分36秒    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
22時27分36秒    4    3.03    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   96.97
22時27分36秒    5    1.01    0.00    3.03    0.00    0.00    0.00    0.00    0.00    0.00   95.96
22時27分36秒    6    4.95    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   94.06
22時27分36秒    7    2.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   97.00

こういう状態なので、Pythonの並列性の特性を考えると、2つ続けてリクエストを投げると…（別々のターミナルでcurlを
立て続けに実行しています）。

$ time curl localhost:8000
get response fib(35) = 9227465 / pid = 21443, thread-name = uWSGIWorker1Core0
real    0m3.789s
user    0m0.008s
sys 0m0.004s


$ time curl localhost:8000
get response fib(35) = 9227465 / pid = 21443, thread-name = uWSGIWorker1Core0
real    0m7.208s
user    0m0.004s
sys 0m0.003s

遅れてリクエストを送った方が、顕著に遅くなっています。

ここで、プロセス数を4つに増やしてみましょう。

$ uwsgi --http :8000 --master --processes 4 --wsgi-file wsgi-app-heavy.py

...
spawned uWSGI master process (pid: 29100)
spawned uWSGI worker 1 (pid: 29101, cores: 1)
spawned uWSGI worker 2 (pid: 29102, cores: 1)
spawned uWSGI worker 3 (pid: 29103, cores: 1)
spawned uWSGI worker 4 (pid: 29104, cores: 1)
spawned uWSGI http 1 (pid: 29105)

この状態で、別々のターミナルでリクエストを送ってみます。

$ time curl localhost:8000
get response fib(35) = 9227465 / pid = 29104, thread-name = uWSGIWorker4Core0
real    0m3.945s
user    0m0.006s
sys 0m0.006s


$ time curl localhost:8000
get response fib(35) = 9227465 / pid = 29103, thread-name = uWSGIWorker3Core0
real    0m3.965s
user    0m0.007s
sys 0m0.000s

すると、遅れてリクエストを投げた方も、ほぼ同じ速度になりました。

CPUの利用状況を見ると、2つ使っているようです。

23時25分24秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
23時25分25秒  all   27.81    0.00    0.62    0.37    0.00    0.37    0.00    0.00    0.00   70.82
23時25分25秒    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
23時25分25秒    1    5.10    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   94.90
23時25分25秒    2    2.00    0.00    1.00    0.00    0.00    1.00    0.00    0.00    0.00   96.00
23時25分25秒    3    1.94    0.00    1.94    0.00    0.00    1.94    0.00    0.00    0.00   94.17
23時25分25秒    4    4.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   95.00
23時25分25秒    5    5.94    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   93.07
23時25分25秒    6    2.97    0.00    0.99    2.97    0.00    0.00    0.00    0.00    0.00   93.07
23時25分25秒    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

ちゃんと、複数のCPUを使ってくれているようですね。

Apache Benchで確認する

それでは、簡易的にApache Benchで測ってみましょう。

まずは、デフォルト状態で。

$ uwsgi --http :8000 --wsgi-file wsgi-app-heavy.py

並行数4、60回のリクエストで実行します。

$ ab -n 60 -c 4 http://localhost:8000/

もちろん、使っているのはひとつのCPUです。

22時43分39秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
22時43分40秒  all   15.53    0.00    0.62    0.00    0.00    0.00    0.00    0.00    0.00   83.85
22時43分40秒    0    3.88    0.00    1.94    0.00    0.00    0.00    0.00    0.00    0.00   94.17
22時43分40秒    1    3.92    0.00    0.98    0.00    0.00    0.00    0.00    0.00    0.00   95.10
22時43分40秒    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
22時43分40秒    3    3.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.00
22時43分40秒    4    4.95    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   95.05
22時43分40秒    5    3.92    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   96.08
22時43分40秒    6    2.97    0.00    1.98    0.00    0.00    0.00    0.00    0.00    0.00   95.05
22時43分40秒    7    3.03    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   96.97

結果。

$ ab -n 60 -c 4 http://localhost:8000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        77 bytes

Concurrency Level:      4
Time taken for tests:   239.875 seconds
Complete requests:      60
Failed requests:        0
Total transferred:      7320 bytes
HTML transferred:       4620 bytes
Requests per second:    0.25 [#/sec] (mean)
Time per request:       15991.695 [ms] (mean)
Time per request:       3997.924 [ms] (mean, across all concurrent requests)
Transfer rate:          0.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:  3893 15591 1966.2  15961   17082
Waiting:     3893 15591 1966.2  15961   17081
Total:       3893 15591 1966.1  15961   17082

Percentage of the requests served within a certain time (ms)
  50%  15961
  66%  16138
  75%  16294
  80%  16327
  90%  16526
  95%  16940
  98%  16995
  99%  17082
 100%  17082 (longest request)

実行に約240秒、処理時間もだいぶ遅く出ています。

次に、プロセス数を2にしてみましょう。

$ uwsgi --http :8000 --master --processes 2 --wsgi-file wsgi-app-heavy.py

Apache Benchの実行条件は変更しません。

$ ab -n 60 -c 4 http://localhost:8000/

CPUを、2つ使ってくれるようになりました。

23時26分25秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
23時26分26秒  all   27.32    0.00    0.50    0.13    0.00    0.00    0.00    0.00    0.00   72.06
23時26分26秒    0    4.04    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   95.96
23時26分26秒    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
23時26分26秒    2    2.02    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   96.97
23時26分26秒    3    2.97    0.00    1.98    0.00    0.00    0.00    0.00    0.00    0.00   95.05
23時26分26秒    4    3.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   96.00
23時26分26秒    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
23時26分26秒    6    3.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.00
23時26分26秒    7    2.06    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.94

結果。実行時間は、130秒ほどと短縮されました。ほぼ半分ですね。

$ ab -n 60 -c 4 http://localhost:8000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        77 bytes

Concurrency Level:      4
Time taken for tests:   128.442 seconds
Complete requests:      60
Failed requests:        0
Total transferred:      7320 bytes
HTML transferred:       4620 bytes
Requests per second:    0.47 [#/sec] (mean)
Time per request:       8562.793 [ms] (mean)
Time per request:       2140.698 [ms] (mean, across all concurrent requests)
Transfer rate:          0.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:  4619 8412 863.8   8341   10004
Waiting:     4619 8412 863.8   8341   10004
Total:       4619 8412 863.8   8341   10004

Percentage of the requests served within a certain time (ms)
  50%   8341
  66%   8448
  75%   8794
  80%   8885
  90%   9484
  95%   9711
  98%   9847
  99%  10004
 100%  10004 (longest request)

最後に、4つのプロセスにしてみます。

$ uwsgi --http :8000 --master --processes 4 --wsgi-file wsgi-app-heavy.py

CPU使用率は、さらに上がります。

23時29分09秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
23時29分10秒  all   54.10    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   44.90
23時29分10秒    0    7.07    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   91.92
23時29分10秒    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
23時29分10秒    2   12.62    0.00    2.91    0.00    0.00    0.00    0.00    0.00    0.00   84.47
23時29分10秒    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
23時29分10秒    4    8.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   91.00
23時29分10秒    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
23時29分10秒    6    6.86    0.00    2.94    0.00    0.00    0.00    0.00    0.00    0.00   90.20
23時29分10秒    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

実行時間は、約80秒とさらに短くなりました。

$ ab -n 60 -c 4 http://localhost:8000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        77 bytes

Concurrency Level:      4
Time taken for tests:   77.810 seconds
Complete requests:      60
Failed requests:        0
Total transferred:      7320 bytes
HTML transferred:       4620 bytes
Requests per second:    0.77 [#/sec] (mean)
Time per request:       5187.320 [ms] (mean)
Time per request:       1296.830 [ms] (mean, across all concurrent requests)
Transfer rate:          0.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  4685 5175 370.2   5105    5893
Waiting:     4685 5175 370.2   5105    5893
Total:       4685 5175 370.2   5105    5893

Percentage of the requests served within a certain time (ms)
  50%   5105
  66%   5296
  75%   5435
  80%   5559
  90%   5782
  95%   5834
  98%   5852
  99%   5893
 100%   5893 (longest request)

Apache Benchやcurlで簡易的に確認したものですが、uWSGIが利用するプロセス数を増やすと、複数のCPUを使ってくれて、
同時リクエストをより多く処理できるようになることが確認できました。

最終的に調整するプロセス数に合わせたテストシナリオなのと、そもそもアプリケーションが極端なCPUバウンドな
内容になっているので、やや極端な例にはなってしまっていますが。

とはいえ、プロセス数を増やすと効果があることは確認できたので、今回はこれでOKとします。

ちなみに、プロセス数に加えてスレッド数も増やして1度やってみたのですが、その時はむしろ遅くなりました…。
IO処理が薄めなのに増やすと、かえってオーバーヘッドになりますよ、と…。

CLOVER🍀

That was when it all began.

uWSGIで複数プロセスを使ってアプリケーションを起動した時に、複数のCPUを使えているかどうか確認する

これは、なにをしたくて書いたもの？

環境

準備とサンプルコード

確認

Apache Benchで確認する