Oracle

Oracle 12.2新特性 | 基于權(quán)重的節(jié)點(diǎn)驅(qū)逐

Oracle 12.2新特性 | 基于權(quán)重的節(jié)點(diǎn)驅(qū)逐

原創(chuàng) 2018-01-26 羅雪原數(shù)據(jù)和云
前言
在Oracle Clusterware需要從集群中逐出特定節(jié)點(diǎn)或一組節(jié)點(diǎn)的情況時(shí)，基于服務(wù)器權(quán)重的節(jié)點(diǎn)驅(qū)逐可充當(dāng)斷裂機(jī)制，通過服務(wù)器上的負(fù)載附加信息來識(shí)別要驅(qū)逐的節(jié)點(diǎn)或節(jié)點(diǎn)組。其中所有節(jié)點(diǎn)被驅(qū)逐的可能性是一樣的。

本課程視頻，點(diǎn)擊閱讀原文即可下載
基于權(quán)重的集群驅(qū)逐介紹
基于權(quán)重的集群驅(qū)逐（Server Weight-Based Node Eviction）是一項(xiàng)Oracle 12.2版本引入的一項(xiàng)新特性。在此之前，Oracle集群在處理腦裂問題時(shí)，通過判斷子群(各自獨(dú)立)的規(guī)模情況，來決定在腦裂問題發(fā)生時(shí)，終止哪個(gè)子群，判斷的依據(jù)如下：
- 如果子群規(guī)模（包含節(jié)點(diǎn)數(shù)量）不等，則大的子群獲勝，例如，{1} {2,3,4}后者獲勝，子群{1}被驅(qū)逐；
- 如果子群規(guī)模（包含節(jié)點(diǎn)數(shù)量）相等，則包含最小節(jié)點(diǎn)號(hào)的子群獲勝，例如,{1,4}{2,3}, 子群{1,4}獲勝，子群{2,3}被驅(qū)逐。
而在Oracle 12.2版本上引入的這項(xiàng)新特性，一定程度上增加了我們對(duì)于集群的控制。這種控制使得我們能避免由于一些規(guī)則原來的限定，而減少對(duì)大范圍的應(yīng)用產(chǎn)生影響。

注意：

以下針對(duì)weight以及權(quán)重表述的是同一個(gè)意思。
特性使用>>>>
使用前提條件
- 權(quán)重(Weight)分配只能在admin-managed的節(jié)點(diǎn)生效
- 可以對(duì)于server或者對(duì)于注冊(cè)到集群的應(yīng)用進(jìn)行權(quán)重(Weight)分配
>>>>
使用說明
- 權(quán)重分配給server
使用crsctl set server css_critical yes命令
- 權(quán)重分配給數(shù)據(jù)庫實(shí)例或服務(wù)
在srvctl add/modify databse 或 srvctladd/modify service命令中帶上“-css_critical yes”參數(shù)
- 權(quán)重分配給非ora.*資源
在crsctl add/modify resource命令中，加入-attr "CSS_CRITICAL=yes"參數(shù)

注意：

有些權(quán)重的分配需要重啟集群才能生效，而有些資源的分配，則不需要重啟資源。

根據(jù)目前發(fā)現(xiàn)，非ora.*資源并非所有都能直接加入屬性直接修改，但是屬性中可以看到有CSS_CRITICAL屬性，可能是當(dāng)前版本未開發(fā)修改接口。
特性測(cè)試>>>>
實(shí)驗(yàn)環(huán)境說明
- 使用軟件環(huán)境說明：
OS:MacOS 10.11.6

VirtualBox:v5.1.30 r118389 (Qt5.6.3)
- 虛擬機(jī)環(huán)境說明：
OEL(OracleEnterprise Linux) 6.5 , x86_64

Oracle 12.2.0.1 (2-node RAC)

>>>>
實(shí)驗(yàn)前的準(zhǔn)備工作

VirtualBox虛擬機(jī)中的網(wǎng)卡情況如下：

其中網(wǎng)卡1為Public網(wǎng)絡(luò)，網(wǎng)卡2為心跳網(wǎng)絡(luò)分別對(duì)應(yīng)OEL6.5中的eth0網(wǎng)卡和eth1網(wǎng)卡。

為了模擬心跳網(wǎng)絡(luò)中斷，我們?cè)谙到y(tǒng)中編寫了如下兩個(gè)腳本去模擬心跳網(wǎng)絡(luò)中斷以及恢復(fù)：

Milo-Mac:lab milo$ ls -l

total 16

-rwxr--r-- 1 milo staff 111 1 24 21:38 interconnect_down.sh

-rwxr--r-- 1 milo staff 109 1 24 21:39 interconnect_up.sh

Milo-Mac:lab milo$

#### 模擬心跳網(wǎng)絡(luò)中斷：

Milo-Mac:lab milo$ sh interconnect_down.sh

查看虛擬機(jī)的網(wǎng)卡的連通情況：

[root@rac122a ~]# mii-tool eth0

eth0: no autonegotiation, 100baseTx-FD, link ok

[root@rac122a ~]# mii-tool eth1

eth1: autonegotiation restarted, no link

[root@rac122b ~]# mii-tool eth0

eth0: no autonegotiation, 100baseTx-FD, link ok

[root@rac122b ~]# mii-tool eth1

eth1: autonegotiation restarted, no link

可以看到在腳本運(yùn)行后，私網(wǎng)網(wǎng)卡eth1顯示為no link表示網(wǎng)線沒有連接到網(wǎng)卡中，即我們認(rèn)為心跳網(wǎng)絡(luò)出現(xiàn)故障。

Milo-Mac:lab milo$ ls -l

total 16

-rwxr--r-- 1 milo staff 111 1 24 21:38 interconnect_down.sh

-rwxr--r-- 1 milo staff 109 1 24 21:39 interconnect_up.sh

Milo-Mac:lab milo$

#### 恢復(fù)心跳網(wǎng)絡(luò)：
Milo-Mac:lab milo$ sh interconnect_up.sh

查看虛擬機(jī)的網(wǎng)卡的連通情況：

[root@rac122a ~]# mii-tool eth0

eth0: no autonegotiation, 100baseTx-FD, link ok

[root@rac122a ~]# mii-tool eth1

eth1: no autonegotiation, 100baseTx-FD, link ok

[root@rac122b ~]# mii-tool eth0

eth0: no autonegotiation, 100baseTx-FD, link ok

[root@rac122b ~]# mii-tool eth1
eth1: no autonegotiation, 100baseTx-FD, link ok

可以看到在腳本運(yùn)行后，私網(wǎng)網(wǎng)卡eth1顯示為link ok表示網(wǎng)線連接到網(wǎng)卡中，即我們認(rèn)為心跳網(wǎng)絡(luò)恢復(fù)，當(dāng)然實(shí)際情況是我們還需要重新禁用和啟用eth1網(wǎng)卡，才能使其完全恢復(fù)，因此需要在兩個(gè)節(jié)點(diǎn)都執(zhí)行以下腳本：

[root@rac122a ~]# sh recover_interconnect.sh

Device state: 3 (disconnected)

Active connection state: activated
Active connection path: /org/freedesktop/NetworkManager/ActiveConnection/2

上述相關(guān)腳本如下：

Milo-Mac:lab milo$ cat interconnect_down.sh

VBoxManage controlvm "12c_rac_node1" setlinkstate2 off

VBoxManage controlvm "12c_rac_node2" setlinkstate2 off

Milo-Mac:lab milo$ cat interconnect_up.sh

VBoxManage controlvm "12c_rac_node1" setlinkstate2 on

VBoxManage controlvm "12c_rac_node2" setlinkstate2 on

[root@rac122a ~]# cat recover_interconnect.sh

ifdown eth1 && ifup eth1

經(jīng)過上述測(cè)試，我們已經(jīng)可以模擬心跳網(wǎng)絡(luò)故障以及恢復(fù)心跳網(wǎng)絡(luò)故障。

以下測(cè)試場(chǎng)景不再贅述，上述故障模擬以及故障恢復(fù)過程。

>>>>
測(cè)試場(chǎng)景

未設(shè)置任何權(quán)重的測(cè)試結(jié)果

[oracle@rac122a ~]$ crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is no.

[oracle@rac122b ~]$ crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is no.

當(dāng)前系統(tǒng)上存在的服務(wù)也為設(shè)置：

[oracle@rac122b ~]$ crsctl stat res ora.milodb.milodb_srv1.svc -f | grep CSS

CSS_CRITICAL=no

[oracle@rac122b ~]$ crsctl stat res ora.milodb.milodb_srv2.svc -f | grep CSS

CSS_CRITICAL=no

模擬心跳網(wǎng)絡(luò)故障

Milo-Mac:lab milo$ sh interconnect_down.sh

節(jié)點(diǎn)1集群的alert日志：

2018-01-25 08:17:33.423 [OCSSD(3896)]CRS-1612: Network communication with node rac122b (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.610 seconds

2018-01-25 08:17:41.435 [OCSSD(3896)]CRS-1611: Network communication with node rac122b (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.600 seconds

2018-01-25 08:17:45.438 [OCSSD(3896)]CRS-1610: Network communication with node rac122b (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.600 seconds

2018-01-25 08:17:49.554 [OCSSD(3896)]CRS-1607: Node rac122b is being evicted  in cluster incarnation 412297497; details at (:CSSNM00007:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc.

節(jié)點(diǎn)2集群的alert日志：

2018-01-25 08:17:34.128 [OCSSD(25756)]CRS-1612: Network communication with node rac122a (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.450 seconds

2018-01-25 08:17:41.140 [OCSSD(25756)]CRS-1611: Network communication with node rac122a (1) missing for 75% of timeout interval. Removal of this node from cluster in 7.440 seconds

2018-01-25 08:17:46.193 [OCSSD(25756)]CRS-1610: Network communication with node rac122a (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.380 seconds

…省略部分信息

2018-01-25 08:17:50.820 [OCSSD(25756)]CRS-1608: This node was evicted by node 1, rac122a; details at (:CSSNM00005:) in /u01/app/grid/diag/crs/rac122b/crs/trace/ocssd.trc.

從上述信息來看，根據(jù)之前的集群腦裂出現(xiàn)的規(guī)則，在同等規(guī)模的子群中，節(jié)點(diǎn)1節(jié)點(diǎn)號(hào)小，因而存活，節(jié)點(diǎn)2被驅(qū)逐出集群。

設(shè)置節(jié)點(diǎn)2上的sever級(jí)別的權(quán)重

將心跳網(wǎng)絡(luò)恢復(fù)以及集群狀態(tài)恢復(fù)正常后，我們將對(duì)server級(jí)別的權(quán)重進(jìn)行設(shè)置。

[root@rac122a ~]# crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is no.

[root@rac122b ~]# crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is no.

[root@rac122b ~]# crsctl set server css_critical yes

CRS-4416: Server attribute 'CSS_CRITICAL' successfully changed. Restart Oracle High Availability Services for new value to take effect.

設(shè)置完成后，提示需要重啟OHAS才能生效。

$ srvctl stop instance -d milodb -i milodb2

# crsctl stop crs

# crsctl start crs

模擬網(wǎng)路故障：

Milo-Mac:lab milo$ sh interconnect_down.sh

此時(shí)，我們來觀察集群的一些日志情況：

節(jié)點(diǎn)1集群的alert日志：

2018-01-25 09:44:03.671 [OCSSD(3717)]CRS-1612: Network communication with node rac122b (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.890 seconds

2018-01-25 09:44:11.731 [OCSSD(3717)]CRS-1611: Network communication with node rac122b (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.830 seconds

2018-01-25 09:44:15.739 [OCSSD(3717)]CRS-1610: Network communication with node rac122b (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.830 seconds

2018-01-25 09:44:18.573 [OCSSD(3717)]CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc.

2018-01-25 09:44:18.573 [OCSSD(3717)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc

2018-01-25 09:44:18.616 [OCSSD(3717)]CRS-1652: Starting clean up of CRSD resources.

2018-01-25 09:44:20.608 [OCSSD(3717)]CRS-1608: This node was evicted by node 2, rac122b; details at (:CSSNM00005:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc.

節(jié)點(diǎn)2集群的alert日志：

2018-01-25 09:44:04.327 [OCSSD(8586)]CRS-1612: Network communication with node rac122a (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.020 seconds

2018-01-25 09:44:11.609 [OCSSD(8586)]CRS-1611: Network communication with node rac122a (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.730 seconds

2018-01-25 09:44:15.611 [OCSSD(8586)]CRS-1610: Network communication with node rac122a (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.730 seconds

2018-01-25 09:44:19.777 [OCSSD(8586)]CRS-1607: Node rac122a is being evicted in cluster incarnation 412336033; details at (:CSSNM00007:) in /u01/app/grid/diag/crs/rac122b/crs/trace/ocssd.trc.

此時(shí)，我們從集群的alert日志中看到由于節(jié)點(diǎn)1被集群驅(qū)逐了。

通過從節(jié)點(diǎn)2的ocssd.trc日志我們看到如下信息：

[grid@rac122b trace]$ cat ocssd.trc |egrep -i weight| tail -300

……省略部分信息

2018-01-25 09:44:18.347 :    CSSD:1494243072: clssnmrCheckNodeWeight: node(1) has weight stamp(0) pebbles (0) goldstars (0) flags (0) SpoolVersion (0)

2018-01-25 09:44:18.347 :    CSSD:1494243072: clssnmrCheckNodeWeight: node(2) has weight stamp(412336032) pebbles (0) goldstars (1) flags (3) SpoolVersion (0)

2018-01-25 09:44:18.347 :    CSSD:1494243072: clssnmrCheckNodeWeight: Server pool version not consistent

2018-01-25 09:44:18.347 :    CSSD:1494243072: clssnmrCheckNodeWeight: stamp(412336032), completed(1/2)

2018-01-25 09:44:18.347 :    CSSD:1494243072: clssnmrCheckSplit: Waiting for node weights, stamp(412336032)

2018-01-25 09:44:19.777 :    CSSD:1494243072: clssnmrCheckNodeWeight: node(2) has weight stamp(412336032) pebbles (0) goldstars (1) flags (3) SpoolVersion (0)

2018-01-25 09:44:19.777 :    CSSD:1494243072: clssnmrCheckNodeWeight: Server pool version not consistent

2018-01-25 09:44:19.777 :    CSSD:1494243072: clssnmrCheckNodeWeight: stamp(412336032), completed(1/1)

2018-01-25 09:44:19.777 :    CSSD:1494243072: clssnmCompareNodeWeights: Best map is same as the cohort map of the current node

2018-01-25 09:44:19.777 :    CSSD:1494243072: clssnmFindBestMap: Using base map(2) of node(1) count(0), low(65535), bestcount(0), best_low(65535), cur_weightpebbles (0) goldstars (0) flags (0) SpoolVersion (0)best_weightpebbles (0) goldstars (0) flags (0) SpoolVersion (0)

2018-01-25 09:44:19.777 :    CSSD:1494243072: clssnmCompareNodeWeights: count(1), low(2), bestcount(0), best_low(65535), cur_weight: pebbles(0) goldstars(1) pubnw(1) flexasm(1)best_weight: pebbles(0) goldstars(0)pubnw(0) flexasm(0)

從上述信息來看，CSSD進(jìn)程會(huì)去檢查節(jié)點(diǎn)的權(quán)重情況(CheckNodeWeight)以及比較節(jié)點(diǎn)的權(quán)重（CompareNodeWeights）節(jié)點(diǎn)2的權(quán)重大，此時(shí)，節(jié)點(diǎn)2這個(gè)子群戰(zhàn)勝了節(jié)點(diǎn)1子群，因而我們看到的情況是節(jié)點(diǎn)1被集群驅(qū)逐。

設(shè)置節(jié)點(diǎn)2上的service的權(quán)重

將心跳網(wǎng)絡(luò)恢復(fù)以及集群狀態(tài)恢復(fù)正常后，我們將對(duì)service級(jí)別的權(quán)重進(jìn)行設(shè)置。

恢復(fù)集群server的權(quán)重：

[root@rac122b ~]# crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is yes.

[root@rac122b ~]# crsctl set server css_critical no

CRS-4416: Server attribute 'CSS_CRITICAL' successfully changed. Restart Oracle High Availability Services for new value to take effect.

[root@rac122b ~]# crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is no.

[root@rac122b ~]#

設(shè)置完成后，提示需要重啟OHAS才能生效。

$ srvctl stop instance -d milodb -i milodb2

# crsctl stop crs

# crsctl start crs

這里我們添加兩個(gè)專門用于做權(quán)重控制的服務(wù)（不對(duì)外使用）：

srvctl add service -db milodb -service milodb_wt_srv1 -preferred milodb1

srvctl add service -db milodb -service milodb_wt_srv2 -preferred milodb2

srvctl start service -database milodb -service milodb_wt_srv1

srvctl start service -database milodb -service milodb_wt_srv2

在實(shí)例1上的服務(wù)不設(shè)置權(quán)重：

[oracle@rac122b ~]$ crsctl stat res ora.milodb.milodb_wt_srv1.svc -f | grep CSS

CSS_CRITICAL=no

在實(shí)例2上的服務(wù)設(shè)置權(quán)重：

[oracle@rac122b ~]$ crsctl stat res ora.milodb.milodb_wt_srv2.svc -f | grep CSS

CSS_CRITICAL=no

[oracle@rac122b ~]$ srvctl modify service -db milodb -service milodb_wt_srv2 -css_critical yes

[oracle@rac122b ~]$ crsctl stat res ora.milodb.milodb_wt_srv2.svc -f | grep CSS

CSS_CRITICAL=yes

[oracle@rac122b ~]$

此時(shí)，我們來觀察集群的一些日志情況：

節(jié)點(diǎn)1集群的alert日志：

2018-01-25 11:33:30.343 [OCSSD(3652)]CRS-1612: Network communication with node rac122b (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.910 seconds

2018-01-25 11:33:38.351 [OCSSD(3652)]CRS-1611: Network communication with node rac122b (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.900 seconds

2018-01-25 11:33:42.576 [OCSSD(3652)]CRS-1610: Network communication with node rac122b (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.680 seconds

2018-01-25 11:33:45.255 [OCSSD(3652)]CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc.

2018-01-25 11:33:45.255 [OCSSD(3652)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc

2018-01-25 11:33:45.269 [OCSSD(3652)]CRS-1652: Starting clean up of CRSD resources.

2018-01-25 11:33:47.289 [OCSSD(3652)]CRS-1608: This node was evicted by node 2, rac122b; details at (:CSSNM00005:) in /u01/app/grid/diag/crs/rac122a/crs/trace/ocssd.trc.

節(jié)點(diǎn)2集群的alert日志：

2018-01-25 11:33:29.844 [OCSSD(3983)]CRS-1612: Network communication with node rac122a (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.900 seconds

2018-01-25 11:33:37.946 [OCSSD(3983)]CRS-1611: Network communication with node rac122a (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.790 seconds

2018-01-25 11:33:41.947 [OCSSD(3983)]CRS-1610: Network communication with node rac122a (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.790 seconds

2018-01-25 11:33:46.684 [OCSSD(3983)]CRS-1607: Node rac122a is being evicted in cluster incarnation 412345406; details at (:CSSNM00007:) in /u01/app/grid/diag/crs/rac122b/crs/trace/ocssd.trc.

2018-01-25 11:33:47.755 [ORAAGENT(5634)]CRS-5818: Aborted command 'check' for resource 'ora.SYSTEMDG.dg'. Details at (:CRSAGF00113:) {0:1:11} in /u01/app/grid/diag/crs/rac122b/crs/trace/crsd_oraagent_grid.trc.

2018-01-25 11:33:47.759 [ORAAGENT(5634)]CRS-5818: Aborted command 'check' for resource 'ora.DATADG.dg'. Details at (:CRSAGF00113:) {0:1:11} in /u01/app/grid/diag/crs/rac122b/crs/trace/crsd_oraagent_grid.trc.

2018-01-25 11:33:49.250 [OCSSD(3983)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac122b .

2018-01-25 11:33:49.617 [CRSD(5349)]CRS-5504: Node down event reported for node 'rac122a'.

從節(jié)點(diǎn)2的ocssd日志看到，同樣由于節(jié)點(diǎn)2的權(quán)重高，因而最終將節(jié)點(diǎn)1驅(qū)逐除了集群：

后續(xù)測(cè)試將兩個(gè)服務(wù)都設(shè)置權(quán)重時(shí)，情況與不設(shè)置權(quán)重的情況一致，即節(jié)點(diǎn)1驅(qū)逐節(jié)點(diǎn)2。
總結(jié)
通過上述場(chǎng)景的測(cè)試，我們得知：
- 基于權(quán)重的節(jié)點(diǎn)驅(qū)逐可以在集群出現(xiàn)腦裂時(shí)，通過控制減少消除一些影響；
- 配置server方式的權(quán)重，需要修改配置后，重啟crs才能生效；
- 通過服務(wù)的方式配置節(jié)點(diǎn)的權(quán)重不需要重啟資源，可以實(shí)現(xiàn)動(dòng)態(tài)控制，更靈活。
當(dāng)然，由于在該特性是12.2引入的，屬于新特性，可能會(huì)存在一些bug，建議在使用前應(yīng)該經(jīng)過充分測(cè)試后使用。

作者介紹：

羅雪原，云和恩墨南區(qū)交付技術(shù)顧問。

個(gè)人有著8年的Oracle技術(shù)支持經(jīng)驗(yàn)，曾服務(wù)過金融、保險(xiǎn)、電力、政府、運(yùn)營商等客戶，有著較豐富的Troubleshooting以及優(yōu)化經(jīng)驗(yàn)。