728x90
반응형
5. Test to failover resources
Test를 위해 migration-threshold = 3으로 설정
- Case 1. Active Node 장애
- Active Node에서 물리적/논리적 Server/Network 장애가 발생하면 Standby Node에서의 Heartbeat Check가 실패하므로 Resource를 Failover
- 장애 상황을 가정하여 Active Node의 Heartbeat Daemon을 정지
-
# /etc/init.d/heartbeat stop
Stopping High-Availability services: [ OK ]
- Resource(VIP, haproxy)가 Standby Node에서 Failover 됨.
- Active Node가 "OFFLINE" 상태.
-
============ Last updated: Thu Feb 21 13:17:53 2013 Stack: Heartbeat Current DC: STD (7b640fc0-0089-4db9-8c83-e34476c5e065) - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Node ACT (07e78f2b-c4cc-42e6-b058-d7e0bd3a91e5):OFFLINE Node STD (7b640fc0-0089-4db9-8c83-e34476c5e065): online haproxy (lsb:haproxy) Started VIP_192.168.0.10 (ocf::heartbeat:IPaddr) Started Inactive resources: Migration summary: * Node STD:
- Active Node 복구 후
- Active Node가 "online" 상태로 복귀.
-
============ Last updated: Thu Feb 21 13:21:11 2013 Stack: Heartbeat Current DC: STD (7b640fc0-0089-4db9-8c83-e34476c5e065) - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Node ACT (07e78f2b-c4cc-42e6-b058-d7e0bd3a91e5): online Node STD (7b640fc0-0089-4db9-8c83-e34476c5e065): online haproxy (lsb:haproxy) Started VIP_192.168.0.10 (ocf::heartbeat:IPaddr) Started Inactive resources: Migration summary: * Node STD: * Node ACT:
- Resource Failback
- Manual Failback 구성이므로 수동으로 해야 함.
-
# crm crm(live)# resource crm(live)resource# restart LB INFO: ordering LB to stop INFO: ordering LB to start crm(live)resource#
- 복구 완료(장애전과 다른점)
- Current DC (Designated Coodinator: Cluster Master) 변화: ACT -> STD
- Active Node가 장애 였으므로 Standby Node가 Cluster Master가 됨.
-
============ Last updated: Thu Feb 21 13:23:30 2013 Stack: Heartbeat Current DC: STD (7b640fc0-0089-4db9-8c83-e34476c5e065) - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Node ACT (07e78f2b-c4cc-42e6-b058-d7e0bd3a91e5): online VIP_192.168.0.10 (ocf::heartbeat:IPaddr) Started haproxy (lsb:haproxy) Started Node STD (7b640fc0-0089-4db9-8c83-e34476c5e065): online Inactive resources: Migration summary: * Node STD: * Node ACT:
- Case 2. haproxy Daemon Shutdown
- haproxy Daemon이 죽었을 경우 Cluster는 (migration-threshold - 1)회까지는 해당 Daemon을 재시작하고 fail-count를 횟수만큼 증가 시킴.
- fail-count == migration-threshold 이면 Node의 장애 여부에 상관없이 다른 Node (여기서는 Standby Node)로 Resource를 migration 한다.
- fail-count = 2
-
============ Last updated: Thu Feb 21 13:32:26 2013 Stack: Heartbeat Current DC: STD (7b640fc0-0089-4db9-8c83-e34476c5e065) - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Node ACT (07e78f2b-c4cc-42e6-b058-d7e0bd3a91e5): online VIP_192.168.0.10 (ocf::heartbeat:IPaddr) Started haproxy (lsb:haproxy) Started Node STD (7b640fc0-0089-4db9-8c83-e34476c5e065): online Inactive resources: Migration summary: * Node STD: * Node ACT: haproxy: migration-threshold=3 fail-count=2
- fail-count = 3
-
============ Last updated: Thu Feb 21 13:33:13 2013 Stack: Heartbeat Current DC: STD (7b640fc0-0089-4db9-8c83-e34476c5e065) - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, unknown expected votes 1 Resources configured. ============ Node ACT (07e78f2b-c4cc-42e6-b058-d7e0bd3a91e5): online Node STD (7b640fc0-0089-4db9-8c83-e34476c5e065): online haproxy (lsb:haproxy) Started VIP_192.168.0.10 (ocf::heartbeat:IPaddr) Started Inactive resources: Migration summary: * Node STD: * Node ACT: haproxy: migration-threshold=3 fail-count=3 Failed actions: haproxy_monitor_5000 (node=ACT, call=13, rc=7, status=complete): not running
- Resource Failback
- fail-count를 초기화하기 위해 cleanup 명령을 사용
-
# crm crm(live)# resource
crm(live)resource# cleanup LB
Cleaning up VIP_192.168.0.10 on ACT Cleaning up VIP_192.168.0.10 on STD Cleaning up haproxy on ACT Cleaning up haproxy on STD Waiting for 5 replies from the CRMd..... crm(live)resource# restart LB INFO: ordering LB to stop INFO: ordering LB to start crm(live)resource#
6. References
- High availability: http://en.wikipedia.org/wiki/High_availability
- Linux-HA: http://www.linux-ha.org/wiki/Main_Page
- HAProxy Configuration Manual: http://cbonte.github.com/haproxy-dconv/configuration-1.4.html
- Pacemaker 1.0 Configuration Explained: http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html-single/Pacemaker_Explained/index.html
- Pacemaker Example configurations: http://clusterlabs.org/wiki/Example_configurations
- Pacemaker wiki: http://clusterlabs.org/wiki/Main_Page
* haproxy는 Inline 방식입니다. (Traffic IN/OUT이 모두 haproxy를 통과하는 구조) 따라서 Session or 처리량이 클 경우 부하가 커질 수 있습니다.
이때는 DSR 방식으로 응답을 분산하는 방법이 있는데 이 방식을 구현하려면 haproxy 대신에 LVS를 사용하면 됩니다.
* 일반적으로 DSR은 L2 방식으로 구현합니다. 하지만 이경우 Load Balancer와 Real Node 들이 같은 VLAN에 존재 해야 한다는 제약이 있습니다.
소규모 Infra.에서는 문제가 안되지만 규모가 조금만 커지만 확장, 서비스 개편할 때마다 Server를 이리 저리 들고 날라야 하는 일이 생깁니다.
이짓(?)을 하기 싫다면 L3 DSR을 공부해보세요. LVS IP Tunneling으로 구현할 수 있습니다.
* 예산이 충분하면 Citrix Netscaler 쓰세요. 두번 쓰세요. ㅋㅋㅋㅋ
* 참고자료
- DSR
> http://blog.daum.net/_blog/BlogTypeView.do?blogid=05q6N&articleno=12760768&_bloghome_menu=recenttext#ajax_history_home
> http://blog.pages.kr/88
- LVS
> http://www.linuxvirtualserver.org/
- L3 DSR
> http://www.nanog.org/meetings/nanog51/presentations/Monday/NANOG51.Talk45.nanog51-Schaumann.pdf
> http://cs.uccs.edu/~scold/iptunnel.htm
반응형
'엔지니어 > Linux' 카테고리의 다른 글
How to make High Available Load Balancer(L4/L7) with haproxy and Pacemaker - 2/4 (0) | 2016.07.06 |
---|---|
How to make High Available Load Balancer(L4/L7) with haproxy and Pacemaker - 3/4 (0) | 2016.07.06 |
Howto install GlusterFS 3.5.1 on CentOS 6.5 (1) | 2016.05.27 |
Iptable QoS(DSCP) (0) | 2016.05.27 |
DNS서버(bind9)를 MySQL로 관리하기 (0) | 2016.05.27 |