RHOSP13 - 500노드 이상 배포 시 디렉터 튜닝

Openstack

RHOSP13 - 500노드 이상 배포 시 디렉터 튜닝

zzerog 2021. 5. 26. 17:58

728x90

https://www.redhat.com/en/blog/scaling-red-hat-openstack-platform-more-500-overcloud-nodes

Scaling Red Hat OpenStack Platform to more than 500 Overcloud Nodes

At Red Hat, performance and scale are treated as first class citizens and a lot of time and effort are put into making sure our products scale. We have a dedicated team of performance and scale engineers that work closely with product management, developer

www.redhat.com

디렉터에서 배포한 compute노드가 150대 이상 될때 쯤 신규 노드 배포시 여러가지 이슈들이 발생했다.
상기 원문의 내용 중 필요할만한 내용만 추려본다.

keystone

/etc/keystone/keystone.conf
- We raised the number of Keystone admin workers to 32 and main workers to 24
- 기본값은 Director 노드에 할당된 CPU의 절반 수가 설정됨

[root@rhosp-director ~]# vi /etc/keystone/keystone.conf
admin_workers=32
public_workers=24

/etc/httpd/conf.d/10-keystone_wsgi_admin.conf
- process수를 32로 변경

/etc/httpd/conf.d/10-keystone_wsgi_main.conf
- process수를 24로 변경

[root@rhosp-director ~]# vi /etc/httpd/conf.d/10-keystone_wsgi_admin.conf
  WSGIDaemonProcess keystone_admin display-name=keystone-admin group=keystone processes=32 threads=1 user=keystone

[root@rhosp-director ~]# vi /etc/httpd/conf.d/10-keystone_wsgi_main.conf
  WSGIDaemonProcess keystone_main display-name=keystone-main group=keystone processes=24 threads=1 user=keystone

"Keystone processes do not take a substantial amount of memory, so it is safe to increase the process count. Even with 32 processes of admin workers, keystone admin takes around 3-4 GB of memory and with 24 processes, Keystone main takes around 2-3 GB of RSS memory."

We also had to enable caching with memcached to improving Keystone performance.(memcached를 cache로 사용하도록 설정)

[root@rhosp-director ~]# vi /etc/keystone/keystone.conf
[cache]
enabled = true
backend = dogpile.cache.memcached

notification driver를 noop으로 설정

[root@rhosp-director ~]# vi /etc/keystone/keystone.conf
[oslo_messaging_notifications]
driver=noop

Heat

/etc/heat/heat.conf
- num_engine_workers=48
- executor_thread_pool_size = 48
- rpc_response_timeout=1200

[root@rhosp-director ~]# vi /etc/heat/heat.conf
num_engine_workers=48
executor_thread_pool_size = 48
rpc_response_timeout=1200

enable caching in /etc/heat/heat.conf

[root@rhosp-director ~]# vi /etc/heat/heat.conf
[cache]
backend = dogpile.cache.memcached
enabled = true
memcache_servers = 127.0.0.1

MySQL

/etc/my.cnf.d/galera.cnf

[root@rhosp-director ~]# vi /etc/my.cnf.d/galera.cnf
[mysqld]
innodb_buffer_pool_size = 5G

Neutron

/etc/neutron/neutron.conf

[root@rhosp-director ~]# vi /etc/neutron/neutron.conf
notification_driver=noop

Ironic

/etc/ironic/ironic.conf
- ironic-conductor의 CPU 사용량을 감소시켜주는 효과

[root@rhosp-director ~]# vi /etc/ironic/ironic.conf
sync_power_state_interval = 180

Mistral

/etc/mistral/mistral.conf
- execution_field_size_limit_kb 값을 증가시켜야 함. (정해진 값이 있는것은 아닌것 같고 환경에 따라 증가시켜야 할듯)

[root@rhosp-director ~]# vi /etc/mistral/mistral.conf
[DEFAULT]
rpc_response_timeout=600

[engine]
execution_field_size_limit_kb=32768

Nova

/etc/nova/nova.conf

[root@rhosp-director ~]# vi /etc/nova/nova.conf
[oslo_messaging_notifications]
driver=noop

*Tip

> In OpenStack Queens, director/TripleO defaults to use an agent running on each overcloud node called os-collect-config. This agent periodically polls the undercloud Heat API for software configuration changes that need to be applied to the node. The os-collect-config agent runs os-refresh-config and os-apply-config as needed whenever new software configuration changes are detected.

compute 노드는 os-collect-config 서비스가 기동되며, 주기적으로 디렉터의 Heat API 요청을 polling 함

[root@rhosp-comp-1 ~]# systemctl status os-*
● os-collect-config.service - Collect metadata and run hook commands.
   Loaded: loaded (/usr/lib/systemd/system/os-collect-config.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-05-18 21:48:15 KST; 1 weeks 0 days ago
 Main PID: 2292 (os-collect-conf)
    Tasks: 1
   Memory: 161.5M
   CGroup: /system.slice/os-collect-config.service
           └─2292 /usr/bin/python /usr/bin/os-collect-config

> To add 1 compute node to a 500+ node overcloud using the default Heat/os-collect-config method, the stack update took approximately 68 minutes. The time taken for stack update as well as the amount of CPU resources consumed by the heat-engine can be significantly cut down by passing the --skip-deploy-identifier flag to the overcloud deploy which prevents puppet from running on previously deployed nodes where no changes are required. In this example, the time taken to add 1 compute node was reduced from 68 to 61 minutes along with reduced CPU usage by heat-engine.

저작자표시 (새창열림)