MHAを使ってアプリケーションから透過的にMySQLのマスターを切り替えてみた

         

DeNAで開発されたありがたいツールMHAを拝借して、HA構成を作ってみることにしました。

構成は
dns01.yorozuyah.local  DNSサーバ兼MHAマネージャ
db01.yorozuyah.local  MySQL マスター
db02.yorozuyah.local  MySQL スレーブ1
db03.yorozuyah.local  MySQL スレーブ2
とします。

DNSは障害が発生したときにアプリケーションから見たマスターとスレーブのホスト名を切り替えるときに使います。
あとはスレーブのDNSラウンドロビン用として。

OSはCentOS 5.5
MySQLはバンドルの5.0.77(古い!)
mhaは0.53を使います。

勝手にタイムトライアルでスタート。

手順はかなりは省略してます。

【0:00】
VirtualBoxにCentOS 5.5がころがっていたので、マスターOSとして構築開始。

まずは、MHAマネージャとノードをインストール。

こちらからダウンロード

インストールにはいくつかのperlモジュールが必要。

追加にはepelを使用。
enable=0は忘れないように。
こんな感じでインストール

ノードは全サーバにインストール。CPANでDBIを入れた場合は依存を無視でコマンド実行。
# rpm -Uvh mha4mysql-node-0.53-0.noarch.rpm --nodeps

マネージャインストール。こちらはperlモジュールが複数必要。rpmは依存関係無視で。
# yum install perl-Config-Tiny.noarch perl-Log-Dispatch.noarch perl-Parallel-ForkManager.noarch --enablerepo=epel

# rpm -Uvh mha4mysql-manager-0.53-0.noarch.rpm --nodeps

続いてbind97のインストールですが、conflictしたのでremoveしてからインストール。

# yum remove bind-libs
# yum install bind97.i386 bind97-utils.i386

ゾーンファイルはこんな感じで。あんまり書いたことないので正しいのか不明・・・。

$TTL 1D
@ IN SOA dns01.yorozuyah.local. root.localhost. (
20120323 ; serial
1D ; refresh
1H ; retry
1W ; expire
3H ) ; minimum
IN NS dns01.yorozuyah.local.
dns01 IN A 192.168.0.150
db01 IN A 192.168.0.151
db02 IN A 192.168.0.152
db03 IN A 192.168.0.153; master アプリケーションから見たマスター名
dbm IN A 192.168.0.151

; slave アプリケーションから見たスレーブ名
dbs IN A 192.168.0.152
dbs IN A 192.168.0.153

スレーブはDNSラウンドロビン。

named用のファイルをrootで作ってしまって若干ハマるが完成。

【1:30】
MySQLレプリケーションにとりかかる。

VirtualBoxのCloneを使うのでマスターだけ構築して、あとはクローンをスレーブにすればよし。

レプリユーザはMHAの場合は全サーバがマスターになる可能性があるので、セグメントで許可しておいた方が良さそう。

mysql> GRANT REPLICATION SLAVE ON *.* TO repl@'192.168.0.%' IDENTIFIED BY 'replrepl';

【1:45】
肩がこってきたのでエアサロンパスをかけてみる。
仕事では大体1日に3万回、キーボートとマウスを叩いているので肩がこってしょうがない。

さて、マスターをシャットダウンして、スレーブ2つをクローンから作る。

2つ以上のターミナル操作にはTera Term Pro アシスタントが便利!
http://www.vector.co.jp/soft/win95/net/se276622.html

いつものCHANGE MASTERを2スレーブで実行。

mysql> CHANGE MASTER TO
MASTER_HOST='db01.yorozuyah.local',
MASTER_USER='repl',
MASTER_PASSWORD='replrepl',
MASTER_LOG_FILE='mysql-bin.000002',
MASTER_LOG_POS=4
;mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: db01.yorozuyah.local
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 98
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 235
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:


できてます。

【2:15】
MHAを使うためには各ノードのrootのSSH相互ノンパス設定が必要。

【2:30】
ここからMHAの設定

コンフィグファイルを作成

# mkdir /etc/conf/masterha -p
# vi /etc/conf/masterha/app1.cnf[server default]
user=root
password=
ssh_user=root

manager_workdir=/var/lib/mysql/log/masterha/app1
remote_workdir=/var/lib/mysql/log/masterha/app1
#フェールオーバ用のスクリプトを書いておく↓
master_ip_failover_script= /scripts/masterha/master_ip_failover
[server1]
hostname=db01.yorozuyah.local

[server2]
hostname=db02.yorozuyah.local

[server3]
hostname=db03.yorozuyah.local

ssh設定のチェック
# masterha_check_ssh --conf=/etc/conf/masterha/app1.cnf
Sat Mar 24 02:39:23 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 24 02:39:23 2012 - [info] Reading application default configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 02:39:23 2012 - [info] Reading server configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 02:39:23 2012 - [info] Starting SSH connection tests..
Sat Mar 24 02:39:24 2012 - [debug]
Sat Mar 24 02:39:23 2012 - [debug] Connecting via SSH from root@db01.yorozuyah.local(192.168.0.151:22) to root@db02.yorozuyah.local(192.168.0.152:22)..
Sat Mar 24 02:39:23 2012 - [debug] ok.
Sat Mar 24 02:39:23 2012 - [debug] Connecting via SSH from root@db01.yorozuyah.local(192.168.0.151:22) to root@db03.yorozuyah.local(192.168.0.153:22)..
Sat Mar 24 02:39:24 2012 - [debug] ok.
Sat Mar 24 02:39:25 2012 - [debug]
Sat Mar 24 02:39:24 2012 - [debug] Connecting via SSH from root@db02.yorozuyah.local(192.168.0.152:22) to root@db01.yorozuyah.local(192.168.0.151:22)..
Sat Mar 24 02:39:24 2012 - [debug] ok.
Sat Mar 24 02:39:24 2012 - [debug] Connecting via SSH from root@db02.yorozuyah.local(192.168.0.152:22) to root@db03.yorozuyah.local(192.168.0.153:22)..
Sat Mar 24 02:39:25 2012 - [debug] ok.
Sat Mar 24 02:39:25 2012 - [debug]
Sat Mar 24 02:39:24 2012 - [debug] Connecting via SSH from root@db03.yorozuyah.local(192.168.0.153:22) to root@db01.yorozuyah.local(192.168.0.151:22)..
Sat Mar 24 02:39:25 2012 - [debug] ok.
Sat Mar 24 02:39:25 2012 - [debug] Connecting via SSH from root@db03.yorozuyah.local(192.168.0.153:22) to root@db02.yorozuyah.local(192.168.0.152:22)..
Sat Mar 24 02:39:25 2012 - [debug] ok.
Sat Mar 24 02:39:25 2012 - [info] All SSH connection tests passed successfully.

OKみたい

MHAのチェック# masterha_check_repl --conf=/etc/conf/masterha/app1.cnf
Sat Mar 24 02:40:55 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 24 02:40:55 2012 - [info] Reading application default configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 02:40:55 2012 - [info] Reading server configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 02:40:55 2012 - [info] MHA::MasterMonitor version 0.53.
Sat Mar 24 02:40:55 2012 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln298] Got MySQL error when connecting db01.yorozuyah.local(192.168.0.151:3306) :1045:Access denied for user 'root'@'192.168.0.150' (using password: NO), but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl/MHA/ServerManager.pm line 251
Sat Mar 24 02:40:55 2012 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln298] Got MySQL error when connecting db02.yorozuyah.local(192.168.0.152:3306) :1130:Host '192.168.0.150' is not allowed to connect to this MySQL server, but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl/MHA/ServerManager.pm line 251
Sat Mar 24 02:40:55 2012 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln298] Got MySQL error when connecting db03.yorozuyah.local(192.168.0.153:3306) :1130:Host '192.168.0.150' is not allowed to connect to this MySQL server, but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl/MHA/ServerManager.pm line 251
Sat Mar 24 02:40:55 2012 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln298] Got fatal error, stopping operations
Sat Mar 24 02:40:55 2012 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Sat Mar 24 02:40:55 2012 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Mar 24 02:40:55 2012 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

だめみたい

mysqlのrootユーザのログイン許可がデータベースに無かった・・・。

マスターからユーザ作成してもう一度mysql> GRANT ALL ON *.* TO root@'192.168.0.%';

# masterha_check_repl --conf=/etc/conf/masterha/app1.cnf
Sat Mar 24 02:50:11 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 24 02:50:11 2012 - [info] Reading application default configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 02:50:11 2012 - [info] Reading server configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 02:50:11 2012 - [info] MHA::MasterMonitor version 0.53.
Sat Mar 24 02:50:12 2012 - [info] Dead Servers:
Sat Mar 24 02:50:12 2012 - [info] Alive Servers:
Sat Mar 24 02:50:12 2012 - [info] db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 02:50:12 2012 - [info] db02.yorozuyah.local(192.168.0.152:3306)
Sat Mar 24 02:50:12 2012 - [info] db03.yorozuyah.local(192.168.0.153:3306)
Sat Mar 24 02:50:12 2012 - [info] Alive Slaves:
Sat Mar 24 02:50:12 2012 - [info] db02.yorozuyah.local(192.168.0.152:3306) Version=5.0.77-log (oldest major version between slaves) log-bin:enabled
Sat Mar 24 02:50:12 2012 - [info] Replicating from db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 02:50:12 2012 - [info] db03.yorozuyah.local(192.168.0.153:3306) Version=5.0.77-log (oldest major version between slaves) log-bin:enabled
Sat Mar 24 02:50:12 2012 - [info] Replicating from db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 02:50:12 2012 - [info] Current Alive Master: db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 02:50:12 2012 - [info] Checking slave configurations..
Sat Mar 24 02:50:12 2012 - [info] read_only=1 is not set on slave db02.yorozuyah.local(192.168.0.152:3306).
Sat Mar 24 02:50:12 2012 - [warning] relay_log_purge=0 is not set on slave db02.yorozuyah.local(192.168.0.152:3306).
Sat Mar 24 02:50:12 2012 - [info] read_only=1 is not set on slave db03.yorozuyah.local(192.168.0.153:3306).
Sat Mar 24 02:50:12 2012 - [warning] relay_log_purge=0 is not set on slave db03.yorozuyah.local(192.168.0.153:3306).
Sat Mar 24 02:50:12 2012 - [info] Checking replication filtering settings..
Sat Mar 24 02:50:12 2012 - [info] binlog_do_db= , binlog_ignore_db=
Sat Mar 24 02:50:12 2012 - [info] Replication filtering check ok.
Sat Mar 24 02:50:12 2012 - [info] Starting SSH connection tests..
Sat Mar 24 02:50:14 2012 - [info] All SSH connection tests passed successfully.
Sat Mar 24 02:50:14 2012 - [info] Checking MHA Node version..
Sat Mar 24 02:50:15 2012 - [info] Version check ok.
Sat Mar 24 02:50:15 2012 - [info] Checking SSH publickey authentication settings on the current master..
Sat Mar 24 02:50:15 2012 - [info] HealthCheck: SSH to db01.yorozuyah.local is reachable.
Sat Mar 24 02:50:15 2012 - [info] Master MHA Node version is 0.53.
Sat Mar 24 02:50:15 2012 - [info] Checking recovery script configurations on the current master..
Sat Mar 24 02:50:15 2012 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/lib/mysql/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000002
Sat Mar 24 02:50:15 2012 - [info] Connecting to root@db01.yorozuyah.local(db01.yorozuyah.local)..
Creating /var/lib/mysql/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000002
Sat Mar 24 02:50:15 2012 - [info] Master setting check done.
Sat Mar 24 02:50:15 2012 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Mar 24 02:50:15 2012 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=db02.yorozuyah.local --slave_ip=192.168.0.152 --slave_port=3306 --workdir=/var/lib/mysql/log/masterha/app1 --target_version=5.0.77-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Mar 24 02:50:15 2012 - [info] Connecting to root@192.168.0.152(db02.yorozuyah.local:22)..
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Mar 24 02:50:15 2012 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=db03.yorozuyah.local --slave_ip=192.168.0.153 --slave_port=3306 --workdir=/var/lib/mysql/log/masterha/app1 --target_version=5.0.77-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Mar 24 02:50:15 2012 - [info] Connecting to root@192.168.0.153(db03.yorozuyah.local:22)..
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Mar 24 02:50:15 2012 - [info] Slaves settings check done.
Sat Mar 24 02:50:15 2012 - [info]
db01.yorozuyah.local (current master)
+--db02.yorozuyah.local
+--db03.yorozuyah.local

Sat Mar 24 02:50:15 2012 - [info] Checking replication health on db02.yorozuyah.local..
Sat Mar 24 02:50:15 2012 - [info] ok.
Sat Mar 24 02:50:15 2012 - [info] Checking replication health on db03.yorozuyah.local..
Sat Mar 24 02:50:15 2012 - [info] ok.
Sat Mar 24 02:50:15 2012 - [warning] master_ip_failover_script is not defined.
Sat Mar 24 02:50:15 2012 - [warning] shutdown_script is not defined.
Sat Mar 24 02:50:15 2012 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

OKです

あとは、managerを起動してから、マスターのmysqldとmysqld_safeをkillすれば自動的に旧スレーブのどれかがマスターになります。

あわせて、その時にDNSを書き換えてサービス用のマスターAレコードを新マスターのアドレスにして、スレーブのAレコードから新マスターのアドレスを削除する。

もともとはこうなっていたのを

db01 IN A 192.168.0.151
db02 IN A 192.168.0.152
db03 IN A 192.168.0.153
; master
dbm IN A 192.168.0.151
; slave
dbs IN A 192.168.0.152
dbs IN A 192.168.0.153

↓ db01が落ちた時にこうする

db01 IN A 192.168.0.151
db02 IN A 192.168.0.152
db03 IN A 192.168.0.153
; master
dbm IN A 192.168.0.152(db02がマスターに昇格した場合)
; slave
dbs IN A 192.168.0.153(生きているスレーブだけを残す)

そのためには、master_ip_failoverスクリプトを使うらしい。
ソースに含まれてます。

【4:30】

maste_ip_failoverの前に、dynamicDNSでレコードが書き換えられるか確認・・・ができない。

# nsupdate
server dns01.yorozuyah.local
update delete dbm.yorozuyah.local A
update add dbm.yorozuyah.local 3600 IN A 192.168.0.152
> send
update failed: SERVFAILpermissionを変えとかないと駄目でした。

# cd /var/
# chmod g+w named

ここで、動的にDNSレコードを書き換えるように簡易スクリプト作成。

安直にddnsという名前で、/usr/local/binに作りました。#vi /usr/local/bin/ddns
ddns

master_ip_failoverを/scripts/masterha/に置いて、上のスクリプトを指定。

抜粋

88 ## Update master ip on the catalog database, etc
89 #FIXME_xxx;
90
91 system("ddns delete dbm.yorozuyah.local $orig_master_ip");
↑旧マスターをサービス用のマスターホスト名から外して
92 system("ddns add dbm.yorozuyah.local $new_master_ip");
↑新マスターをサービス用のマスターホスト名に加えて
93 system("ddns delete dbs.yorozuyah.local $new_master_ip");
↑新マスターをサービス用のスレーブホスト名から外す

そしてためしにフェールオーバさせると。

Sat Mar 24 05:19:57 2012 - [info] Executing master IP activate script:
Sat Mar 24 05:19:57 2012 - [info] /scripts/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=db02.yorozuyah.local --orig_master_ip=192.168.0.152 --orig_master_port=3306 --new_master_host=db01.yorozuyah.local --new_master_ip=192.168.0.151 --new_master_port=3306
Can't locate object method "new" via package "MHA::DBHelper" (perhaps you forgot to load "MHA::DBHelper"?) at /scripts/masterha/master_ip_failover line 70.
Sat Mar 24 05:19:57 2012 - [error][/usr/lib/perl5/vendor_perl/MH
MHA::DBHelperパッケージが無い??

追記!

抜粋

25 use Getopt::Long;
26 use MHA::DBHelper;

【5:30】

やっと最終確認!

まず、MHAのマネージャ起動
[root@dns01 var]# masterha_manager --conf=/etc/conf/masterha/app1.cnf
Sat Mar 24 05:41:20 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 24 05:41:20 2012 - [info] Reading application default configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 05:41:20 2012 - [info] Reading server configurations from /etc/conf/masterha/app1.cnf..
Sat Mar 24 05:41:20 2012 - [info] MHA::MasterMonitor version 0.53.
Sat Mar 24 05:41:20 2012 - [info] Dead Servers:
Sat Mar 24 05:41:20 2012 - [info] Alive Servers:
Sat Mar 24 05:41:20 2012 - [info] db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 05:41:20 2012 - [info] db02.yorozuyah.local(192.168.0.152:3306)
Sat Mar 24 05:41:20 2012 - [info] db03.yorozuyah.local(192.168.0.153:3306)
Sat Mar 24 05:41:20 2012 - [info] Alive Slaves:
Sat Mar 24 05:41:20 2012 - [info] db02.yorozuyah.local(192.168.0.152:3306) Version=5.0.77-log (oldest major version between slaves) log-bin:enabled
Sat Mar 24 05:41:20 2012 - [info] Replicating from db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 05:41:20 2012 - [info] db03.yorozuyah.local(192.168.0.153:3306) Version=5.0.77-log (oldest major version between slaves) log-bin:enabled
Sat Mar 24 05:41:20 2012 - [info] Replicating from db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 05:41:20 2012 - [info] Current Alive Master: db01.yorozuyah.local(192.168.0.151:3306)
Sat Mar 24 05:41:20 2012 - [info] Checking slave configurations..
Sat Mar 24 05:41:20 2012 - [warning] relay_log_purge=0 is not set on slave db02.yorozuyah.local(192.168.0.152:3306).
Sat Mar 24 05:41:20 2012 - [warning] relay_log_purge=0 is not set on slave db03.yorozuyah.local(192.168.0.153:3306).
Sat Mar 24 05:41:20 2012 - [info] Checking replication filtering settings..
Sat Mar 24 05:41:20 2012 - [info] binlog_do_db= , binlog_ignore_db=
Sat Mar 24 05:41:20 2012 - [info] Replication filtering check ok.
Sat Mar 24 05:41:20 2012 - [info] Starting SSH connection tests..
Sat Mar 24 05:41:22 2012 - [info] All SSH connection tests passed successfully.
Sat Mar 24 05:41:22 2012 - [info] Checking MHA Node version..
Sat Mar 24 05:41:22 2012 - [info] Version check ok.
Sat Mar 24 05:41:22 2012 - [info] Checking SSH publickey authentication settings on the current master..
Sat Mar 24 05:41:22 2012 - [info] HealthCheck: SSH to db01.yorozuyah.local is reachable.
Sat Mar 24 05:41:23 2012 - [info] Master MHA Node version is 0.53.
Sat Mar 24 05:41:23 2012 - [info] Checking recovery script configurations on the current master..
Sat Mar 24 05:41:23 2012 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/lib/mysql/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000008
Sat Mar 24 05:41:23 2012 - [info] Connecting to root@db01.yorozuyah.local(db01.yorozuyah.local)..
Creating /var/lib/mysql/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000008
Sat Mar 24 05:41:23 2012 - [info] Master setting check done.
Sat Mar 24 05:41:23 2012 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Mar 24 05:41:23 2012 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=db02.yorozuyah.local --slave_ip=192.168.0.152 --slave_port=3306 --workdir=/var/lib/mysql/log/masterha/app1 --target_version=5.0.77-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Mar 24 05:41:23 2012 - [info] Connecting to root@192.168.0.152(db02.yorozuyah.local:22)..
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to db02-relay-bin.000002
Temporary relay log file is /var/lib/mysql/db02-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Mar 24 05:41:23 2012 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=db03.yorozuyah.local --slave_ip=192.168.0.153 --slave_port=3306 --workdir=/var/lib/mysql/log/masterha/app1 --target_version=5.0.77-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Mar 24 05:41:23 2012 - [info] Connecting to root@192.168.0.153(db03.yorozuyah.local:22)..
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to db03-relay-bin.000002
Temporary relay log file is /var/lib/mysql/db03-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Mar 24 05:41:23 2012 - [info] Slaves settings check done.
Sat Mar 24 05:41:23 2012 - [info]
db01.yorozuyah.local (current master)
+--db02.yorozuyah.local
+--db03.yorozuyah.local

Sat Mar 24 05:41:23 2012 - [info] Checking master_ip_failover_script status:
Sat Mar 24 05:41:23 2012 - [info] /scripts/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=db01.yorozuyah.local --orig_master_ip=192.168.0.151 --orig_master_port=3306
Sat Mar 24 05:41:23 2012 - [info] OK.
Sat Mar 24 05:41:23 2012 - [warning] shutdown_script is not defined.
Sat Mar 24 05:41:23 2012 - [info] Set master ping interval 3 seconds.
Sat Mar 24 05:41:23 2012 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sat Mar 24 05:41:23 2012 - [info] Starting ping health check on db01.yorozuyah.local(192.168.0.151:3306)..
Sat Mar 24 05:41:23 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

DNSを確認

[root@db02 ~]# nslookup
> dbs     ←サービス用のスレーブホスト名
Server: 192.168.0.150
Address: 192.168.0.150#53
Name: dbs.yorozuyah.local
Address: 192.168.0.152     ←db02
Name: dbs.yorozuyah.local
Address: 192.168.0.153     ←db03

> dbm     ←サービス用のマスターホスト名
Server: 192.168.0.150
Address: 192.168.0.150#53
Name: dbm.yorozuyah.local
Address: 192.168.0.151     ←db01

db01のマスターmysqlを停止

[root@db01 ~]# ps -ef |grep mysqld
root 16761 2641 0 05:33 pts/0 00:00:00 /bin/sh /usr/bin/mysqld_safe
mysql 16787 16761 0 05:33 pts/0 00:00:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/db01.pid --skip-external-locking --port=3306 --socket=/var/lib/mysql/mysql.sock
root 17039 2641 0 05:43 pts/0 00:00:00 grep mysqld
[root@db01 ~]# kill -9 16761 16787

しばらくするとマネージャログが動き始める。

省略


----- Failover Report -----

app1: MySQL Master failover db01.yorozuyah.local to db02.yorozuyah.local succeeded

Master db01.yorozuyah.local is down!

Check MHA Manager logs at dns01 for details.

Started automated(non-interactive) failover.
Invalidated master IP address on db01.yorozuyah.local.
The latest slave db02.yorozuyah.local(192.168.0.152:3306) has all relay logs for recovery.
Selected db02.yorozuyah.local as a new master.
db02.yorozuyah.local: OK: Applying all logs succeeded.
db02.yorozuyah.local: OK: Activated master IP address.
db03.yorozuyah.local: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
db03.yorozuyah.local: OK: Applying all logs succeeded. Slave started, replicating from db02.yorozuyah.local.
db02.yorozuyah.local: Resetting slave info succeeded.
Master failover to db02.yorozuyah.local(192.168.0.152:3306) completed successfully.

正常にフェールオーバが終わったみたい。

残ったスレーブのレプリケーション状況は?

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: db02.yorozuyah.local
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 98
Relay_Log_File: db03-relay-bin.000002
Relay_Log_Pos: 235
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 98
Relay_Log_Space: 235
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)

マスターがdb02になってます!

db02は?

mysql> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000005 | 98 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> show slave status;
Empty set (0.00 sec)

マスターになってます。

ちゃんとread_onlyがOFFになってるのもありがたい。

mysql> show global variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.00 sec)

DNSは?

> dbs
Server: 192.168.0.150
Address: 192.168.0.150#53

Name: dbs.yorozuyah.local
Address: 192.168.0.153     ←生きているスレーブdb03
>
>
> dbm
Server: 192.168.0.150
Address: 192.168.0.150#53

Name: dbm.yorozuyah.local
Address: 192.168.0.152     ←新マスターdb02

【5:50】
確認おわり

スクリプトはもう少しちゃんと書くのと、何らかのDB処理をしながらフェールオーバした時にデータがロストしないかは別途確認です。
あと、他にもレポートスクリプトなどサンプルがソースに含まれてるのでそちらも。

はあ、6時間弱で構築から確認までなんとか出来ましたが思ったより時間がかかった。。
VirtualBoxに4インスタンス作成するとさすがに重くて、という言い訳。

間違ってる箇所があるかもしれませんので、後ほど見直します。

以下サイトを参考にさせていただきました。

http://d.hatena.ne.jp/ke-16/20110912/1315824419
MySQL-MHA上でのフェールオーバー処理

5 comments to this article

  1. フジ

    on 2012/4/19 木曜日 at 23:14:39 - 返信

    master_ip_failoverは以下の様なエラーが出て困っています。
    どのように修正すればよいでしょうか?
    =========================================
    Bareword “FIXME_xxx” not allowed while “strict subs” in use at /etc/mha/master_ip_failover line 88.
    Execution of /etc/mha/master_ip_failover aborted due to compilation errors.
    Sat Apr 14 09:34:24 2012 – [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln214] Failed to get master_ip_failover_script status with return code 255:0.
    Sat Apr 14 09:34:24 2012 – [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
    Sat Apr 14 09:34:24 2012 – [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.[/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
    Sat Apr 14 09:34:24 2012 – [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
    Sat Apr 14 09:34:24 2012 – [info] Got exit code 1 (Not master dead).

    MySQL Replication Health is NOT OK!
    =========================================
    以下の値を修正すればよいのでしょうか。教えてください。
    GetOptions(
    ‘command=s’ => \$command,
    ‘ssh_user=s’ => \$ssh_user,
    ‘orig_master_host=s’ => \$orig_master_host,
    ‘orig_master_ip=s’ => \$orig_master_ip,
    ‘orig_master_port=i’ => \$orig_master_port,
    ‘new_master_host=s’ => \$new_master_host,
    ‘new_master_ip=s’ => \$new_master_ip,
    ‘new_master_port=i’ => \$new_master_port,
    );

  2. yorozuyah

    on 2012/4/20 金曜日 at 0:57:57 - 返信

    >フジさん

    FIXME_xxxの部分が二箇所ありますので、環境に合わせて変える必要があります。
    (83行目と88行目です)

    それぞれ、新しいマスターにアプリケーション用のユーザを作成する場合、アプリケーションからみた場合のマスターのホスト名を変更する場合、に処理を追加すれば良いとのコメントが書いてあります。

    必要がない場合はコメントアウトしてしまえば良いかと思います。

    GetOptionsの部分は特に変える必要はないはずです。

    作者ではないので間違っているかもしれませんが、ご了承ください。

  3. フジ

    on 2012/4/20 金曜日 at 20:58:04 - 返信

    ご連絡ありがとうございます。
    コメントアウトで良さそうです。
    回答ありがとうございました。

  4. yorozuyah

    on 2012/4/22 日曜日 at 21:33:27 - 返信

    >フジさん

    いえいえ、解決されて何よりです。

コメントを残す