https://www.redhat.com/en/blog/scaling-red-hat-openstack-platform-more-500-overcloud-nodes
Scaling Red Hat OpenStack Platform to more than 500 Overcloud Nodes
At Red Hat, performance and scale are treated as first class citizens and a lot of time and effort are put into making sure our products scale. We have a dedicated team of performance and scale engineers that work closely with product management, developer
www.redhat.com
디렉터에서 배포한 compute노드가 150대 이상 될때 쯤 신규 노드 배포시 여러가지 이슈들이 발생했다.
상기 원문의 내용 중 필요할만한 내용만 추려본다.
keystone
- /etc/keystone/keystone.conf
- We raised the number of Keystone admin workers to 32 and main workers to 24
- 기본값은 Director 노드에 할당된 CPU의 절반 수가 설정됨
[root@rhosp-director ~]# vi /etc/keystone/keystone.conf
admin_workers=32
public_workers=24
- /etc/httpd/conf.d/10-keystone_wsgi_admin.conf
- process수를 32로 변경
- /etc/httpd/conf.d/10-keystone_wsgi_main.conf
- process수를 24로 변경
[root@rhosp-director ~]# vi /etc/httpd/conf.d/10-keystone_wsgi_admin.conf
WSGIDaemonProcess keystone_admin display-name=keystone-admin group=keystone processes=32 threads=1 user=keystone
[root@rhosp-director ~]# vi /etc/httpd/conf.d/10-keystone_wsgi_main.conf
WSGIDaemonProcess keystone_main display-name=keystone-main group=keystone processes=24 threads=1 user=keystone
"Keystone processes do not take a substantial amount of memory, so it is safe to increase the process count. Even with 32 processes of admin workers, keystone admin takes around 3-4 GB of memory and with 24 processes, Keystone main takes around 2-3 GB of RSS memory."
- We also had to enable caching with memcached to improving Keystone performance.(memcached를 cache로 사용하도록 설정)
[root@rhosp-director ~]# vi /etc/keystone/keystone.conf
[cache]
enabled = true
backend = dogpile.cache.memcached
- notification driver를 noop으로 설정
[root@rhosp-director ~]# vi /etc/keystone/keystone.conf
[oslo_messaging_notifications]
driver=noop
Heat
- /etc/heat/heat.conf
- num_engine_workers=48
- executor_thread_pool_size = 48
- rpc_response_timeout=1200
[root@rhosp-director ~]# vi /etc/heat/heat.conf
num_engine_workers=48
executor_thread_pool_size = 48
rpc_response_timeout=1200
- enable caching in /etc/heat/heat.conf
[root@rhosp-director ~]# vi /etc/heat/heat.conf
[cache]
backend = dogpile.cache.memcached
enabled = true
memcache_servers = 127.0.0.1
MySQL
- /etc/my.cnf.d/galera.cnf
[root@rhosp-director ~]# vi /etc/my.cnf.d/galera.cnf
[mysqld]
innodb_buffer_pool_size = 5G
Neutron
- /etc/neutron/neutron.conf
[root@rhosp-director ~]# vi /etc/neutron/neutron.conf
notification_driver=noop
Ironic
- /etc/ironic/ironic.conf
- ironic-conductor의 CPU 사용량을 감소시켜주는 효과
[root@rhosp-director ~]# vi /etc/ironic/ironic.conf
sync_power_state_interval = 180
Mistral
- /etc/mistral/mistral.conf
- execution_field_size_limit_kb 값을 증가시켜야 함. (정해진 값이 있는것은 아닌것 같고 환경에 따라 증가시켜야 할듯)
[root@rhosp-director ~]# vi /etc/mistral/mistral.conf
[DEFAULT]
rpc_response_timeout=600
[engine]
execution_field_size_limit_kb=32768
Nova
- /etc/nova/nova.conf
[root@rhosp-director ~]# vi /etc/nova/nova.conf
[oslo_messaging_notifications]
driver=noop
*Tip
> In OpenStack Queens, director/TripleO defaults to use an agent running on each overcloud node called os-collect-config. This agent periodically polls the undercloud Heat API for software configuration changes that need to be applied to the node. The os-collect-config agent runs os-refresh-config and os-apply-config as needed whenever new software configuration changes are detected.
compute 노드는 os-collect-config 서비스가 기동되며, 주기적으로 디렉터의 Heat API 요청을 polling 함
[root@rhosp-comp-1 ~]# systemctl status os-*
● os-collect-config.service - Collect metadata and run hook commands.
Loaded: loaded (/usr/lib/systemd/system/os-collect-config.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2021-05-18 21:48:15 KST; 1 weeks 0 days ago
Main PID: 2292 (os-collect-conf)
Tasks: 1
Memory: 161.5M
CGroup: /system.slice/os-collect-config.service
└─2292 /usr/bin/python /usr/bin/os-collect-config
> To add 1 compute node to a 500+ node overcloud using the default Heat/os-collect-config method, the stack update took approximately 68 minutes. The time taken for stack update as well as the amount of CPU resources consumed by the heat-engine can be significantly cut down by passing the --skip-deploy-identifier flag to the overcloud deploy which prevents puppet from running on previously deployed nodes where no changes are required. In this example, the time taken to add 1 compute node was reduced from 68 to 61 minutes along with reduced CPU usage by heat-engine.
'Openstack' 카테고리의 다른 글
KVM vs QEMU (0) | 2021.06.05 |
---|---|
instance의 interface, mac 정보 조회 (0) | 2021.05.28 |
horizon - multiple domain enable (0) | 2021.05.26 |
glance image customize (0) | 2021.05.26 |
redis password 확인 (0) | 2021.05.26 |