Auto failover of mysql master in mysql multi-master multi-slave cluster

This post is an extension to my earlier posts about multi master replication cluster multi master replication in mysql and mysql multi master replication act II The problem I had encountered and discussed was with automatic failover. What happens when the master goes down? How can either a slave or another master be promoted to become the master? Once the settings are done on all the mysql dbs in the cluster to identify the new master, we will also have to change our app to point to the new master.

Let us look at the auto failover part from the mysql cluster point of view. The tool that we are going to use is mysql master ha. It is a tool written by http://yoshinorimatsunobu.blogspot.com/ and has been in place for quite some time now. The tool supports failover in a master-master scenario, but the master-master setup has to be an active-passive(read only) setup. One of the master mysql servers has to be in read only mode. It does not support active-active mysql master-master setup where both masters are handling writes. In order to make a mysql slave read-only, simply enter “read_only” in its my.cnf file.

The mysql version I am using here is mysql 5.6.17. I have created a mysql cluster with 4 nodes.

M1(RW)—>S1
 |
 |
M2(RO)—>S2

M1 and M2 are 2 masters running on ports 3311 & 3312. S1 is a slave running on port 3313 and replicates from master M1. S2 is another slave running on port 3314 and replicates from master M2. Remember, in mysql 5.6, an additional variable has been incorporated known as the server-uuid. This has to be different for all mysql servers in the mysql cluster. So if you are replicating by copying the data directory, simply remove the auto.cnf file, which contains the server-uuid and mysql will create a new uuid when it starts. In order to make M2 read only slave, I have to put the following parameter in its my.cnf file.

read_only

Download the mysql-master-ha tool from https://code.google.com/p/mysql-master-ha/downloads/list. I am using the following version.

mha4mysql-manager-0.55.tar.gz
mha4mysql-node-0.54.tar.gz

untar the mha4mysql-node archive first and install it.

mha4mysql-node-0.54$ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies…
[Core Features]
– DBI        …loaded. (1.63)
– DBD::mysql …loaded. (4.025)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::node
Writing MYMETA.yml and MYMETA.json

mha4mysql-node-0.54$ make
mha4mysql-node-0.54$ sudo make install

Next install the mha4mysql-manager.

mha4mysql-manager-0.55$ perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies…
[Core Features]
– DBI                   …loaded. (1.63)
– DBD::mysql            …loaded. (4.025)
– Time::HiRes           …loaded. (1.9725)
– Config::Tiny          …loaded. (2.20)
– Log::Dispatch         …loaded. (2.41)
– Parallel::ForkManager …loaded. (1.06)
– MHA::NodeConst        …loaded. (0.54)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json

mha4mysql-manager-0.55$ make
mha4mysql-manager-0.55$ sudo make install

Watch out for errors. Once the tool is installed, we need to create a configuration file for the cluster. Here is the configuration file – app1.cnf.

[server default]
multi_tier_slave=1
manager_workdir=/home/jayantk/log/masterha/app1
manager_log=/home/jayantk/log/masterha/app1/manager.log

[server1]
hostname=127.0.0.1
candidate_master=1
port=3311
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data1

[server2]
hostname=127.0.0.1
candidate_master=1
port=3312
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data2

[server3]
hostname=127.0.0.1
port=3313
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data3

[server4]
hostname=127.0.0.1
port=3314
user=root
ssh_user=jayantk
master_binlog_dir=/home/jayantk/mysql-5.6.17/data4

Let us go through it and understand what is happening. “multi_tier_slave” is a parameter used for specifying that the mysql cluster is a multi-master and multi-slave cluster where each master can have its own slave. If the parameter is not specified, mha will give an error. Next we specify the working directory and log file for mha manager. Then we start specifying each of our servers with the host name and port. We have to specify the root username and password for each of our mysql servers. The parameters are “user” and “password“. I have not specified password for my mysql servers and so do not have that parameter in my configuration file. The parameter “candidate_master” is used to prioritize a certain mysql server as a master candidate during failover. In our case, we are prioritizing server2 (M2) as the master. Finally mha needs ssh access to the machines running our mysql servers. I have specified the ssh username to be used for ssh as “ssh_user“. And have enabled ssh public key authentication without pass phrase for the machines (127.0.0.1 in my case). The parameter “master_binlog_dir” is used if the dead master mysql is reachable via ssh to copy any pending binary log events. This is required because there is no information with slave about where the binary log files are located.

Another thing to remember is to grant “replication slave” to all other mysql servers on the network from each mysql server in the cluster – irrespective of whether it is a master or a slave. In my case, I had to run the following grant statement on all my mysql servers – M1, M2, S1 and S2.

grant replication slave on *.* to  slave_user@localhost identified by ‘slave_pwd’;

Once all the settings are in place, run the command

masterha_check_repl –conf=app1.cnf

It will output a lot of messages ending with

“MySQL Replication Health is OK.”

Which means that the configuration is a success. In case of errors, check the wiki at https://code.google.com/p/mysql-master-ha/wiki/TableOfContents for solutions and missing parameters.

To start the masterha manager, run the following command

masterha_manager –conf=app1.cnf

In order to check the status of the mha manager script, go to the log directory – in our case – /home/jayantk/log/masterha/app1. It has a file specifying the health status. You can cat the file to see the health status.

~/log/masterha/app1$ cat app1.master_status.health
9100    0:PING_OK    master:127.0.0.1

Another file is the log file – manager.log – which will give the following output as tail.

Tue Apr  1 12:20:50 2014 – [warning] master_ip_failover_script is not defined.
Tue Apr  1 12:20:50 2014 – [warning] shutdown_script is not defined.
Tue Apr  1 12:20:50 2014 – [info] Set master ping interval 3 seconds.
Tue Apr  1 12:20:50 2014 – [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Apr  1 12:20:50 2014 – [info] Starting ping health check on 127.0.0.1(127.0.0.1:3311)..
Tue Apr  1 12:20:50 2014 – [info] Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..

The mha manager will keep on pinging the master every 3 seconds to see if it is alive or not. The ping interval time can be changed by specifying the parameter “ping_interval” in the application configuration. Failover is triggered after missing 3 pings to the master. So if ping_interval is 3 seconds, the maximum time to discover that the master mysql is down is 12 seconds.

Now, lets kill the master and see what happens. Find out the pid of the mysqld process and kill it. It is automatically restarted by the parent process “mysqld_safe“. Here is what we get in out manager.log file.

140401 12:34:12 mysqld_safe Number of processes running now: 0
140401 12:34:12 mysqld_safe mysqld restarted
Tue Apr  1 12:34:14 2014 – [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Apr  1 12:34:14 2014 – [info] Executing SSH check script: save_binary_logs –command=test –start_pos=4 –binlog_dir=/home/jayantk/mysql-5.6.17/data1 –output_file=/var/tmp/save_binary_logs_test –manager_version=0.55 –binlog_prefix=zdev-bin
Tue Apr  1 12:34:15 2014 – [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Apr  1 12:34:17 2014 – [info] Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..

It tires to do an ssh and check if the machine is reachable. But in the meantime the machine is back up an ping is available, so nothing happens. As a next step, lets kill both parent “mysqld_safe” and “mysqld” processes associated with the master mysql server in that sequence. Let us see what happens in the manager.log file. There are a lot of log entires here.

Tue Apr  1 12:44:23 2014 – [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)

Tue Apr  1 12:44:26 2014 – [warning] Got error on MySQL connect: 2003 (Can’t connect to MySQL server on ‘127.0.0.1’ (111))
Tue Apr  1 12:44:26 2014 – [warning] Connection failed 1 time(s)..
Tue Apr  1 12:44:29 2014 – [warning] Got error on MySQL connect: 2003 (Can’t connect to MySQL server on ‘127.0.0.1’ (111))
Tue Apr  1 12:44:29 2014 – [warning] Connection failed 2 time(s)..
Tue Apr  1 12:44:32 2014 – [warning] Got error on MySQL connect: 2003 (Can’t connect to MySQL server on ‘127.0.0.1’ (111))
Tue Apr  1 12:44:32 2014 – [warning] Connection failed 3 time(s)..
Tue Apr  1 12:44:32 2014 – [warning] Master is not reachable from health checker!

Tue Apr  1 12:44:32 2014 – [info] Multi-master configuration is detected. Current primary(writable) master is 127.0.0.1(127.0.0.1:3311)
Tue Apr  1 12:44:32 2014 – [info] Master configurations are as below:
Master 127.0.0.1(127.0.0.1:3312), replicating from localhost(127.0.0.1:3311), read-only
Master 127.0.0.1(127.0.0.1:3311), dead


Tue Apr  1 12:44:32 2014 – [info] * Phase 1: Configuration Check Phase..
Tue Apr  1 12:44:32 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Multi-master configuration is detected. Current primary(writable) master is 127.0.0.1(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info] Master configurations are as below:
Master 127.0.0.1(127.0.0.1:3311), dead
Master 127.0.0.1(127.0.0.1:3312), replicating from localhost(127.0.0.1:3311), read-only

Tue Apr  1 12:44:33 2014 – [info] Dead Servers:
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info] Checking master reachability via mysql(double check)..
Tue Apr  1 12:44:33 2014 – [info]  ok.
Tue Apr  1 12:44:33 2014 – [info] Alive Servers:
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3312)
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3313)
Tue Apr  1 12:44:33 2014 – [info] Alive Slaves:
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3313)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info] Unmanaged Servers:
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3314)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3312)
Tue Apr  1 12:44:33 2014 – [info] ** Phase 1: Configuration Check Phase completed.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 2: Dead Master Shutdown Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Forcing shutdown so that applications never connect to the current master..

Tue Apr  1 12:44:33 2014 – [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 3: Master Recovery Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] The latest binary log file/position on all slaves is zdev-bin.000009:120
Tue Apr  1 12:44:33 2014 – [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3313)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info] The oldest binary log file/position on all slaves is zdev-bin.000009:120
Tue Apr  1 12:44:33 2014 – [info] Oldest slaves:
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3313)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 3.2: Saving Dead Master’s Binlog Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Fetching dead master’s binary logs..

Tue Apr  1 12:44:33 2014 – [info] Additional events were not found from the orig master. No need to save.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 3.3: Determining New Master Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Apr  1 12:44:33 2014 – [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Apr  1 12:44:33 2014 – [info] Searching new master from slaves..
Tue Apr  1 12:44:33 2014 – [info]  Candidate masters from the configuration file:
Tue Apr  1 12:44:33 2014 – [info]   127.0.0.1(127.0.0.1:3312)  Version=5.6.17-log (oldest major version between slaves) log-bin:enabled
Tue Apr  1 12:44:33 2014 – [info]     Replicating from localhost(127.0.0.1:3311)
Tue Apr  1 12:44:33 2014 – [info]     Primary candidate for the new Master (candidate_master is set)

Tue Apr  1 12:44:33 2014 – [info] New master is 127.0.0.1(127.0.0.1:3312)
Tue Apr  1 12:44:33 2014 – [info] Starting master failover..
Tue Apr  1 12:44:33 2014 – [info]
From:
127.0.0.1 (current master)
 +–127.0.0.1
 +–127.0.0.1

To:
127.0.0.1 (new master)
 +–127.0.0.1
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 3.4: Master Log Apply Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Apr  1 12:44:33 2014 – [info] Starting recovery on 127.0.0.1(127.0.0.1:3312)..
Tue Apr  1 12:44:33 2014 – [info]  This server has all relay logs. Waiting all logs to be applied..
Tue Apr  1 12:44:33 2014 – [info]   done.
Tue Apr  1 12:44:33 2014 – [info]  All relay logs were successfully applied.
Tue Apr  1 12:44:33 2014 – [info] Getting new master’s binlog name and position..
Tue Apr  1 12:44:33 2014 – [info]  zdev-bin.000009:797
Tue Apr  1 12:44:33 2014 – [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=’127.0.0.1′, MASTER_PORT=3312, MASTER_LOG_FILE=’zdev-bin.000009′, MASTER_LOG_POS=797, MASTER_USER=’slave_user’, MASTER_PASSWORD=’xxx’;
Tue Apr  1 12:44:33 2014 – [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Apr  1 12:44:33 2014 – [info] Setting read_only=0 on 127.0.0.1(127.0.0.1:3312)..
Tue Apr  1 12:44:33 2014 – [info]  ok.
Tue Apr  1 12:44:33 2014 – [info] ** Finished master recovery successfully.
Tue Apr  1 12:44:33 2014 – [info] * Phase 3: Master Recovery Phase completed.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 4: Slaves Recovery Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] — Slave diff file generation on host 127.0.0.1(127.0.0.1:3313) started, pid: 10586. Check tmp log /home/jayantk/log/masterha/app1/127.0.0.1_3313_20140401124432.log if it takes time..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Log messages from 127.0.0.1 …
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Apr  1 12:44:33 2014 – [info] End of log messages from 127.0.0.1.
Tue Apr  1 12:44:33 2014 – [info] — 127.0.0.1(127.0.0.1:3313) has the latest relay log events.
Tue Apr  1 12:44:33 2014 – [info] Generating relay diff files from the latest slave succeeded.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] — Slave recovery on host 127.0.0.1(127.0.0.1:3313) started, pid: 10588. Check tmp log /home/jayantk/log/masterha/app1/127.0.0.1_3313_20140401124432.log if it takes time..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Log messages from 127.0.0.1 …
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Starting recovery on 127.0.0.1(127.0.0.1:3313)..
Tue Apr  1 12:44:33 2014 – [info]  This server has all relay logs. Waiting all logs to be applied..
Tue Apr  1 12:44:33 2014 – [info]   done.
Tue Apr  1 12:44:33 2014 – [info]  All relay logs were successfully applied.
Tue Apr  1 12:44:33 2014 – [info]  Resetting slave 127.0.0.1(127.0.0.1:3313) and starting replication from the new master 127.0.0.1(127.0.0.1:3312)..
Tue Apr  1 12:44:33 2014 – [info]  Executed CHANGE MASTER.
Tue Apr  1 12:44:33 2014 – [info]  Slave started.
Tue Apr  1 12:44:33 2014 – [info] End of log messages from 127.0.0.1.
Tue Apr  1 12:44:33 2014 – [info] — Slave recovery on host 127.0.0.1(127.0.0.1:3313) succeeded.
Tue Apr  1 12:44:33 2014 – [info] All new slave servers recovered successfully.
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] * Phase 5: New master cleanup phase..
Tue Apr  1 12:44:33 2014 – [info]
Tue Apr  1 12:44:33 2014 – [info] Resetting slave info on the new master..
Tue Apr  1 12:44:33 2014 – [info]  127.0.0.1: Resetting slave info succeeded.
Tue Apr  1 12:44:33 2014 – [info] Master failover to 127.0.0.1(127.0.0.1:3312) completed successfully.
Tue Apr  1 12:44:33 2014 – [info]

—– Failover Report —–

app1: MySQL Master failover 127.0.0.1 to 127.0.0.1 succeeded

Master 127.0.0.1 is down!

Check MHA Manager logs at zdev.net:/home/jayantk/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 127.0.0.1(127.0.0.1:3312) has all relay logs for recovery.
Selected 127.0.0.1 as a new master.
127.0.0.1: OK: Applying all logs succeeded.
127.0.0.1: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
127.0.0.1: OK: Applying all logs succeeded. Slave started, replicating from 127.0.0.1.
127.0.0.1: Resetting slave info succeeded.
Master failover to 127.0.0.1(127.0.0.1:3312) completed successfully.

So eventually what has happened is that the following mysql cluster is now in place

M2(RW) —> S2
   |
   |
  V
  S1

We can log into each of the mysql servers and verify the same. So, the failover happened. There are a few problems with this solution.

  1. Now that M2 is the master, we will have to change our application to make M2 as master instead of M1. So all inserts start happening on M2. An easier way to do this is to use an HA solution known as Pacemaker which will take over the virtual ip of the master M1 and assign it to the new master M2. Another way is to change the script on all app servers and swap the ip of M1 with that of M2. This is handled by the “master_ip_failover_script” which can be configured to handle either scenario. More details on configuring the same is available here – https://code.google.com/p/mysql-master-ha/wiki/Parameters#master_ip_failover_script 
  2. The purpose of Master-Master mysql replication is lost when failover happens. Master-Master replication allows the possibility of easy switching of the app between two masters. It is even possible to write to certain databases on M1 and other databases on M2. But with this solution, firstly M2 has to be read-only, which is only acting as a slave. Secondly, after the failover, M2 loses all information which makes it a slave of M1. So, after M1 comes back up, it cannot be put into circular replication as before.
  3. The script does not handle slave failover. So it is expected that each master will have multiple slaves and the failover of slaves should be handled separately. If a mysql slave server goes down, the application should be able to identify the same and not use that particular slave server for queries. Also if mha is monitoring the mysql servers and one of the slaves goes down (undetected), and then the master mysql fails, it may not be able to make the failover.
  4. And finally, after the failover happens the script simply exits. It does not update the mha configuration file. So, multiple failovers cannot be handled. In order to start it again, we will have to manually change the configuration file and start the mha manager script again.

But, inspite of the limitations of the script, I believe it can be used in majority of the cases. And provides a good solution to mysql master failover.

Database library to handle multiple masters and multiple slaves

In a large scale mysql deployment there could be multiple masters and multiple slaves. Masters are generally in circular replication. And are used for running all inserts, updates and deletes. Slaves are used to run selects.

When you are dealing with multiple mysql instances running in a large scale environment, it is important to take care of lags between masters and slaves. To handle such scenarios, the code should be capable of firing query on a server dynamically. Which means that for each query, I as a developer should have the flexibility to decide which server the query should go.

A list of existing scenarios :

1. All registrations / username generation process should happen on a single master. If you generate usernames at both masters, there may be scenarios where, due to lag between mysql masters, the user is not reflected. And in such a case, the user may register again and land on another master. Creating the same username again and breaking the circular replication. So all registrations and check for “username exists” should happen on a single master.

2. For all other Insert, Update and Delete operations, the user should be stuck to a single master. Why ? Assume there is a lag of around 30 minutes between the masters and slaves. The user inserts a record and immediately wants to see what record has been inserted. If we fetch the record from another master or slave, the record will not be available, because it has not yet been replicated. To take care of this scenario, whenever a record is inserted the immediate select has to be from the same server.

3. For all other selects, the query can be fired on any of the slaves. For example, the user logs into the site and sees his own profile. We show him his profile using one of the slave servers. This can be cached as well. The point here is that for data which has not been updated recently – the query can be fired on any of the slaves.

The following piece of code/library handles most of the scenarios. Please feel free to suggest modifications or improvements.


/**
* Created by : Jayant Kumar
* Description : php database library to handle multiple masters & multiple slaves
**/
class DatabaseList // jk : Base class
{
public $db = array();
public function setDb($db)
{
$this->db = $db;
}
public function getDb()
{
return $this->db;
}
}
class SDatabaseList extends DatabaseList // jk : Slave mysql servers
{
function __construct()
{
$this->db[0] = array(‘ip’=>’10.20.1.11’, ‘u’=>’user11’, ‘p’=>’pass11’, ‘db’=>’database1’);
$this->db[1] = array(‘ip’=>’10.20.1.12’, ‘u’=>’user12’, ‘p’=>’pass12’, ‘db’=>’database1’);
$this->db[2] = array(‘ip’=>’10.20.1.13’, ‘u’=>’user13’, ‘p’=>’pass13’, ‘db’=>’database1’);
//print_r($db);
}
}
class MDatabaseList extends DatabaseList // jk : Master mysql servers
{
function __construct()
{
$this->db[0] = array(‘ip’=>’10.20.1.1’, ‘u’=>’user1’, ‘p’=>’pass1’, ‘db’=>’database1’);
$this->db[1] = array(‘ip’=>’10.20.1.2’, ‘u’=>’user2’, ‘p’=>’pass2’, ‘db’=>’database2’);
//print_r($db);
}
}
class MemcacheList extends DatabaseList // jk : memcache servers
{
function __construct()
{
$this->db[0] = array(‘ip’=>’localhost’, ‘port’=>11211);
}
}
Interface DatabaseSelectionStrategy  // jk : Database interface
{
public function getCurrentDb();
}
class StickyDbSelectionStrategy implements DatabaseSelectionStrategy // jk : sticky db . For update / delete / insert
{
private $dblist;
private $uid;
private $sessionDb;
private $sessionTimeout = 3600;
function __construct(DatabaseList $dblist)
{
$this->dblist = $dblist;
}
public function setUserId($uid)
{
$this->uid = $uid;
}
public function setSessionDb($sessionDb)
{
$this->sessionDb = $sessionDb->db;
}
private function getDbForUser() // jk : get db for this user. If not found – assign him random master db.
{
$memc = new Memcache;
foreach ($this->sessionDb as $key => $value) {
$memc->addServer($value[‘ip’], $value[‘port’]);
}
$dbIp = $memc->get($this->uid);
if($dbIp == null)
{
$masterlist = new MDatabaseList();
$randomdb = new RandomDbSelectionStrategy($masterlist);
$mdb = $randomdb->getCurrentDb();
$dbIp = $mdb[‘ip’];
$memc->set($this->uid, $dbIp, false, $this->sessionTimeout);
}
return $dbIp;
}
public function getCurrentDb()
{
$dbIp = $this->getDbForUser();
foreach ($this->dblist->db as $key => $value) 
{
if($value[‘ip’] == $dbIp)
return $value;
}
}
}
class RandomDbSelectionStrategy implements DatabaseSelectionStrategy // jk : select random db from list
{
private $dblist;
function __construct(DatabaseList $dblist)
{
//print_r($dblist);
$this->dblist = $dblist;
}
public function getCurrentDb()
{
//print_r($this->dblist);
$cnt = sizeof($this->dblist->db);
$rnd = rand(0,$cnt-1);
$current = $this->dblist->db[$rnd];
return $current;
}
}
class SingleDbSelectionStrategy implements DatabaseSelectionStrategy // jk : select one master db – to generate unique keys
{
private $dblist;
function __construct(DatabaseList $dblist)
{
$this->dblist = $dblist;
}
public function getCurrentDb()
{
//print_r($this->dblist);
return $this->dblist->db[0];
}
}
Interface Database
{
public function getIp();
public function getDbConnection();
}
class DatabaseFactory implements Database // cmt : database factory
{
private $db;
public function getIp()
{
return $this->db[‘ip’];
}
public function getDbConnection($type = ‘slave’, $uid = 0)
{
$dbStrategy;
switch($type)
{
case ‘slave’:
$dblist = new SDatabaseList();
//print_r($dblist);
$dbStrategy = new RandomDbSelectionStrategy($dblist);
break;
case ‘master’:
$dblist = new MDatabaseList();
//print_r($dblist);
$dbStrategy = new StickyDbSelectionStrategy($dblist);
$dbStrategy->setSessionDb(new MemcacheList());
$dbStrategy->setUserId($uid);
break;
case ‘unique’:
$dblist = new MDatabaseList();
//print_r($dblist);
$dbStrategy = new SingleDbSelectionStrategy($dblist);
break;
}
$this->db = $dbStrategy->getCurrentDb();
print_r($this->db);
// return mysql_connect($this->db[‘ip’], $this->db[‘u’], $this->db[‘p’], $this->db[‘db’]);
}
}
// tst :  test this out…
$factory = new DatabaseFactory();
echo ‘Slave : ‘; $factory->getDbConnection(‘slave’);
echo ‘Slave2 : ‘; $factory->getDbConnection(‘slave’);
echo ‘Unique : ‘; $factory->getDbConnection(‘unique’);
echo ‘New Master 100: ‘; $factory->getDbConnection(‘master’,100);
echo ‘New Master 101: ‘; $factory->getDbConnection(‘master’,101);
echo ‘New Master 102: ‘; $factory->getDbConnection(‘master’,102);
echo ‘old Master 100: ‘; $factory->getDbConnection(‘master’,100);
echo ‘old Master 102: ‘; $factory->getDbConnection(‘master’,102);
?>

Microsoft Licences

Recently I got the opportunity to be a part of the windows team. We are (yes still are) using a microsoft (yes the same microsoft) product to handle one of our websites due to legacy bindings – user base, existing technology, backend team.

My first encounter with microsoft on the enterprise end was when we were trying to use Microsoft Navision – supply chain management solution – in one of my previous companies. The reason why I say that we were “trying” to use was because it took us more than 6-8 months to put it into production. And spend another 3 months in training. Microsoft sucks the user. I saw that if I purchase 1 product from microsoft, the dependencies are so well built in that I eventually end up purchasing a lot of other microsoft product.

Microsoft NAV cost us around 1 million INR. Now I cannot use NAV as it is, it needs to be customized. And it cannot be customized by just any developer. NAV can only be customized only by companies / developers who have the licence to do so. The licence for customization is extremely expensive – maybe even more expensive than the licence for selling liquor in india. Once I pay for customization, I have to go ahead and deploy the software somewhere. For which I need microsoft licences – OS, web server, database server. And then ofcourse plan for HA (high availability) – which means atleast 2 of each. So the strategy here is that once you purchase a product licence, you need the complete platform licence and eventually you end up paying many times more than the actual product cost.

Another concept that I became aware of recently was “software assurance”. What is that ?? Well, have you heard of life insurance ? Software Assurance (SA) is somewhat similar to that. It ensures that you get all the patches and version upgrades – (may or may not be free) as and when they are released. So if you purchase windows 2012 and plan to shift to windows 2014 when it is released, it is possible. There may be some cost involved.

Among all microsoft licences, I believe that the DB licence is the killer. The standard licence costs 1/4th of the cost of enterprise licence.  The difference between enterprise and standard licence is that the standard can utilize only upto 2 cores in a machine. But an Enterprise edition can utilize upto any number of cores – and the licence cost is in mulitples of “dual cores”. So if you have a dual quad core machine (8 cores), you end up purchasing 4 licences which is 16 times that of the standard licence cost.

And why should I pay for microsoft? when there are so many technologies which are better and available for free of cost. If I have to pay for support, why should i pay for the product and then for the support. Why not get the product for free and then pay for support ?

Final accessment was that microsoft is like a spider’s web, once you get entangled, you keep on getting more and more entangled. And there is no getting out without losing your own investment. Beware!!!

Mysql HA solutions

Lets see what HA solutions can be designed in mysql and where are they suited.

1. Single master – single slave.

M(RW)
|
S(R)

A simple master slave solution can be used for a small site – where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master – once it has replicated the “available” logs from the master. You will need to create another slave of the now “new” master to make your mysql highly available again. In case the slave crashes, you will have to switch your read queries on master and create a new slave.

As mentioned earlier this is for a very “small” site. There are multiple scenarios where single master – single slave solution is not suitable. You will not be able to perform read scalability or run heavy queries to generate reports without affecting your site performance. Also for creating a new slave after failure, you will need to lock and take backup from the available mysql server. This will affect your site.

2. Single master – multiple slave.

          M1(RW)
            |
      ——————————-
      |                |                         |
    S1(R)       S2(R)              Sn(R)

A single master multiple slave scenario is the most suitable architecture for many web sites. It provides read scalability across multiple slaves. Creation of new slaves are much easier. You can easily allocate a slave for backups and another for running heavy reports without affecting the site performance. You can create new slaves to scale reads as and when needed. But all inserts go into the only master. This architecture is not suitable for write scalability.

When any of the slave crashes, you can simply remove that slave, create another slave and put it back into the system. In case the master fails, you will need to wait for the slaves to be in sync with the master – all replication binary logs have been executed and then make one of them as the master. Other slaves then become the slave of the new master. You will need to be very careful in defining the exact position from where the new slaves start replication. Else you will end up with lots of duplicate records and may lose data sanity on some of the slaves.

3. Single master – standby master – multiple slaves.

         M1(RW) —— M(R)
           |
      ——————————–
      |                   |                       |
    S1(R)       S2(R)               Sn(R)

This architecture is very much similar to the previous single master – multiple slave. The standby master is identified and kept for failover. The benefit of this architecture is that the standby master can be of the same configuration as the original master. This architecture is suitable for medium to high traffic websites where master is of a much higher configuration than the slaves – maybe having RAID 1+0 or SSD drives. The standby master is kept close to the original master so that there is hardly any lag between the two. Standby master can be used for reads also, but care should be taken that there is not much lag between the master and the standby – so that in case of failure, switching can be done with minimum downtime.

When the master fails, you need to wait for the slaves to catch up with the old master and the simply switch them and the app to the standby master.

4. Single master – candidate master – multiple slaves.

         M1(RW) ——– M(R)
                                        |
              ———————————–
              |                   |                           |
            S1(R)         S2(R)                  Sn(R)

    This is an architecture very similar to the earlier one. The only difference being that all slaves are replicating from the candidate master instead of the original master. The benefit of this is that in case the master goes down, there is no switching required in the slaves. The old master can be removed from the system and the new master will automatically take over. Afterwards, in order to get the architecure back in place a new candidate master needs to be identified and the slaves can be moved one by one to the new master. The downtime here is minimal. The catch here is that there would be a definite lag between the master and the slaves, since replication on slaves happen through the candidate. This lag can be quite annoying in some cases. Also if the standby fails, all slaves will stop replication and will need to be moves to either the old master or a new standby server needs to be identified and all slaves be pointed to it.

5. Single master – multiple slaves – candidate master – multiple slaves

       M1(RW) ———————– M(R)
           |                                               |
   —————                      ———————  
   |                  |                       |                           |
S1(R)      S1n(R)            S2(R)                  S2n(R)

  This architecture is again similar to the earlier one with the fact that there is a complete failover setup for the current master. If either the master of the candidate master goes down, there are still slaves which are replicating and can be used. This is suitable for a high traffic website which require read scalability. The only drawback of this architecture is that writes cannot be scaled.

5. Multiple master – multiple slaves

        M1(RW) ———————– M2(RW)
          |                                                 |
  —————-                       —————-
  |                    |                       |                      |
S1(R)         S2(R)               S3(R)             S4(R)

This is “the” solution for high traffic websites. It provides read and write scalability as well as high availability. M1 and M2 are two masters in circular replication – both replicate each other. All slaves either point to M1 or M2. In case if one of the masters go down, it can be removed from the system, a new master can be created and put back in the system without affecting the site. If you are worried about performance issues when a master goes down and all queries are redirected to another master, you can have even 3 or more Masters in circular replication.

It is necessary to decide beforehand how many masters you would like to have in circular replication because adding more masters – though possible, is not easy. Having 2 masters does not mean that you will be able to do 2X writes. Writes also happen due to replication on the masters, so it depends entirely on the system resources how many writes can the complete system handle. Your application has to handle unique key generation in a fashion that does not result in duplication between the masters. Your application also needs to handle scenarios where the lag between M1 and M2 becomes extensite or annoying. But with proper thought to this architecture, it could be scaled and managed very well.

Database speed tests (mysql and postgresql) – part 3 – code

Here is the code structure

dbfuncs.php : is the file which contains classes and functions for firing queries on mysql and pgsql
mysqlinsert.php : creates and fires inserts on mysql
mysqlselect.php : creates and fires selects on mysql
pgsqlinsert.php : creates and fires inserts on pgsql
pgsqlselect.php : creates and fires selects on pgsql
benchmark.php : script used to control concurrency and number of queries per script

Please refer to http://www.jayantkumar.in/index.php/2010/09/29/database-speed-tests-mysql-and_29/ and http://www.jayantkumar.in/index.php/2010/09/27/database-speed-tests-mysql-and-2/ for benchmarks of selects and inserts respectively.

And the code….

dbfuncs.php

abstract class dbfuncs
{
  abstract function insertqry($qry);
  abstract function selectqry($qry);
  abstract function connectdb();

  public function log($str)
  {
    $file = "error.log";
    $fp = fopen($file, "a");
    fwrite($fp, "$strn");
    fclose($fp);
  }
}

class mysqldb extends dbfuncs
{
  private $user = "root";
  private $pass = "jayant";
  private $host = "localhost";
  //private $port = 3307;
  private $socket = "/tmp/mysql.sock";
  private $database = "benchmark";

  public $db;

  function __construct()
  {
    $this->connectdb();
  }

  public function connectdb()
  {
    $this->db = mysql_connect($this->host.':'.$this->socket, $this->user, $this->pass) or die(mysql_error())
    mysql_select_db($this->database, $this->db);
  }

  public function insertqry($qry)
  {
    if(!mysql_ping($this->db))
      $this->connectdb();

    mysql_query($qry, $this->db) or $this->log(mysql_error());
  }

  public function selectqry($qry)
  {
    if(!mysql_ping($this->db))
      $this->connectdb();

    $rs = mysql_query($qry, $this->db) or $this->log(mysql_error());
    return $rs;
  }
}

class pgsqldb extends dbfuncs
{
  private $dns = "host=localhost port=5432 user=jayant password=12qwaszx dbname=benchmark";

  public $db;

  function __construct()
  {
    $this->connectdb();
  }

  public function connectdb()
  {
    $this->db = pg_connect($this->dns) or die(pg_last_error());
  }

  public function insertqry($qry)
  {
    if(!pg_ping($this->db))
      connectdb();

    pg_query($this->db, $qry) or $this->log(pg_last_error($this->db));
  }

  public function selectqry($qry)
  {
    if(!pg_ping($this->db))
      $this->connectdb();

    $rs = pg_query($this->db, $qry) or $this->log(pg_last_error($this->db));
    return $rs;
  }
}

function logtime($str)
{
    $file = "benchmark.log";
    $fp = fopen($file, "a");
    fputs($fp, $str);
    fclose($fp);
}

mysqlinsert.php

include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new mysqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "insert into data (val, txt) values ('$xx','$x in $scriptno')";
  $mysql->insertqry($qry);
}
$end = microtime(true);
$log = "nMysql innodb Time to insert $count in run $scriptno = ".($end-$start);
logtime($log);

pgsqlinsert.php

include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new pgsqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "insert into data (val, txt) values ('$xx','$x in $scriptno')";
  $mysql->insertqry($qry);
}
$end = microtime(true);
$log = "nAvg Pgsql Time to insert $count in run $scriptno = ".($end-$start);
logtime($log);

mysqlselect.php

include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new mysqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "select * from `data` where val ='$xx'";
  $mysql->selectqry($qry);
}
$end = microtime(true);
$log = "nMysql innodb Time to select $count in run $scriptno = ".($end-$start);
logtime($log);

pgsqlselect.php

include "dbfuncs.php";

$scriptno = $argv[1]+1;
$count = $argv[2];

$mysql = new pgsqldb();
$start = microtime(true);
for($x=0; $x<$count; $x++)
{
  $xx = $x*$scriptno;
  $qry = "select * from data where val ='$xx'";
  $mysql->selectqry($qry);
}
$end = microtime(true);
$log = "nPgsql Time to select $count in run $scriptno = ".($end-$start);
logtime($log);

benchmark.php

$count = 100000;
$concurrency = 40;

for($i=0; $i<$concurrency; $i++)
{
   exec("php -q mysqlselect.php $i $count > /dev/null &");
//   exec("php -q pgsqlselect.php $i $count > /dev/null &");
//   exec("php -q mysqlinsert.php $i $count > /dev/null &");
//   exec("php -q pgsqlinsert.php $i $count > /dev/null &");
}

All test runs were individual – meaning that only one script was run at a time with different count and concurrency. For inserts $count was set to 10,000 and for selects $count was set to 100,000 while $concurrency kept on varying.

I am using ubuntu 10.04 with 32 bit kernel 2.6.32-24 and ext4 file system. And my system has around 3GB of RAM and Intel Core 2 DUO T5870 @ 2.0 GHz.

Database speed tests (mysql and postgresql) – part 2

Here is the comparison between mysql and postgresql for selects (only). I had used the same table that i had created earlier http://www.jayantkumar.in/index.php/2010/09/27/database-speed-tests-mysql-and-2/ while comparing insertion speed. I have created approximately 1,000,000 records in the table and ran selects on them. I also modified the configuration of both mysql and postgresql to enable faster selects.

Mysql

In mysql I specially disabled query_cache – the reason being that I would use innodb for tables with large number of inserts – due to its support for row level locking. And with every insert t
he query cache is marked as dirty. So enabling query cache would lead to caching of queries which will not be the scenario in a live setup.

Mysql Configuration :
innodb_buffer_pool_size = 256MB
key_buffer_size = 256MB
read_buffer_size = 512KB
sort_buffer_size = 512KB
query_cache_size = 0
thread_cache_size = 32
table_open_cache = 64

table information :
No of records : 9755366
data (idb) size : 692 MB

Mysql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 58.67
Avg time for 1000000 selects with concurrency = 10 : 122.8
Avg time for 2000000 selects with concurrency = 20 : 225.67
Avg time for 3000000 selects with concurrency = 30 : 351.66
Avg time for 4000000 selects with concurrency = 40 : 452.3

PostgreSQL :

Mysql has better table compression as compared to postgres. Same data in innodb is of around 700 MB while that in Postgres is of around 900 MB.

Postgres configuration :
shared_buffers = 128MB
work_mem = 1MB
random_page_cost = 4.0
effective_cache_size = 256MB

table information :
No of records : 9755366
data size : 912 MB

Pgsql : (time for 100000 selects)

Avg time for 500000 selects with concurrency = 5 : 86.8
Avg time for 1000000 selects with concurrency = 10 : 144.74
Avg time for 2000000 selects with concurrency = 20 : 274.37
Avg time for 3000000 selects with concurrency = 30 : 402.92
Avg time for 4000000 selects with concurrency = 40 : 528.17

Mysql seems to perform better with selects. The graph also shows that with increase in concurrency, selects in innodb take lesser time than that in postgresql.

So, why would you switch from mysql to postgresql – only if you have a very high ratio of inserts as compared to selects. The benefit in inserts outweigh the loss in selects to some extent.

Database speed tests (mysql and postgresql) – part 1

There has been major changes in mysql and postgres over a couple of years. Mysql has been focusing on improving and optimizing innodb. Postgres on the other hand has been focusing on database replication and hot standby.

Recently postgres came out with version 9.0 which has built-in replication and hot standby – the two most requested feature in postgresql. Earlier people used to shy away from postgres because there was no proper “easily deployable” solution available for replication. Now with this release, postgres had taken a major step forward. Here http://www.postgresql.org/docs/9.0/static/release-9-0 is a list of features that has been introduced in postgres 9.0

Mysql has released the rc version of Mysql 5.5 which has a bunch of improvements over the previous version of mysql. Support for multi-core cpus, Changes in Innodb for effective use of available I/O capacity, semisynchronous replication – are some of the features that mysql 5.5 promices. Here is a list of all the new features in MySQL 5.5 http://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html

It has been a long time, since posted my last benchmark http://www.jayantkumar.in/index.php/2008/06/03/mysql-versus-postgresq/. And i believe it is time i do some rough benchmarks and post it out. The scope is to check out innodb tables in mysql 5.5.6 versus the tables in postgresql 9.0. I am focusing only on inserts and selects. And i will be benchmarking pure inserts and selects only. Thie blog focuses only on inserts. I will be focusing on selects in my next blog. I am running these tests on my laptop which has a Intel Core 2 Duo T5870 @ 2.00 GHz and 3 GB of RAM

I have created a simple php script to perform the benchmark. Which spawns out multiple php scripts that work on background. Let me know if you need the scripts and i will share it with you.

Mysql:

Innodb Settings in my.cnf :

innodb_buffer_pool_size = 256M
innodb_additional_mem_pool_size = 8M
innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table

Table structure :
CREATE TABLE `data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`val` int(11) NOT NULL,
`txt` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_val` (`val`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Avg Time for 20000 inserts with concurrency of 2 : 1367 seconds
Load on machine : 3.4
size on disk : 9 MB

Avg Time for 50000 inserts with concurrency of 5 : 1537.2 seconds
Load on machine : 4.6
Size on disk : 13 MB

Avg Time for 100000 inserts with concurrency of 10 : 1255
Load on machine : 3.5
Size on disk : 17 MB

Avg Time for 200000 inserts with concurrency of 20 : 1403
Load on machine : 4.2
Size on disk : 26 MB

Time for 400000 inserts with concurrency of 40 : 1730
Load in machine : 6.6
Size on disk : 52 MB

Postgresql :

Settings in postgresql.conf:
shared_buffers = 64MB
work_mem = 1MB
synchronous_commit = on

Table structure :
Column | Type | Modifiers | Storage | Description
——–+———————–+—————————————————+———-+————-
id | integer | not null default nextval(‘data_id_seq’::regclass) | plain |
val | integer | not null | plain |
txt | character varying(20) | | extended |
Indexes:
“data_pkey” PRIMARY KEY, btree (id)
“idx_val” btree (val)
Has OIDs: no

Avg Time for 20000 inserts with concurrency of 2 : 221.3 seconds
Load on machine : 2.0
size on disk : 2 MB

Avg Time for 50000 inserts with concurrency of 5 : 232 seconds
Load on machine : 2.0
Size on disk : 4.8 MB

Avg Time for 100000 inserts with concurrency of 10 : 248.75 seconds
Load on machine : 4.0
Size on disk : 9.8 MB

Avg Time for 200000 inserts with concurrency of 20 : 276.34
Load on machine : 3.6
Size on disk : 19 MB

Time for 400000 inserts with concurrency of 40 : 350.11
Load in machine : 5.8
size on disk : 38 MB

The graph shows that mysql is heavy as compared to pgsql. The base timings are almost 5 times more in mysql as compared to pgsql. Also as the concurrency goes up the time required for inserts in mysql spikes up more steeply as compared to that required for postgres.

I did a sample run on mysql by turning innodb_flush_logs_at_trx_commit=2 and the benefit I got was a lot

Avg Time for 20000 inserts with concurrency of 2 (innodb_flush_logs_at_trx_commit=2) : 5.2 seconds
Avg Time for 100000 inserts with concurrency of 10 (innodb_flush_logs_at_trx_commit=2) : 18.69 seconds

Similarly i disabled synchronous_commit on postgres and did a sample run

Avg Time for 20000 inserts with concurrency of 2 (synchronous_commit = off) : 2.95 seconds
Avg Time for 100000 inserts with concurrency of 10 (synchronous_commit = off) : 15.06 seconds

PS : The average time is the time for 10000 inserts (inserts per instance)

Lets see what do the selects tell – in the next blog.

Converting myisam tables to innodb

Why should you convert myisam tables to innodb ?

For the perfectly simple reason that innodb tables do not get locked by concurrent selects & inserts. So if you find that your myisam table is suffering for too many locks – due to concurrent selects and inserts, it is time for you to covert the table to innodb.

The simple query which does the trick is

Alter table myisam_table_name engine = innodb;

This query is good for small tables, which get converted in a flash (I am refering to tables smaller than 1 GB). But when you try to run this query to alter bigger tables, it takes a huge amount of time. You have to wait – maybe hours if not days to get your job done.

Recently I had a 30 GB table which i wanted to convert to innodb and the alter query went on for 3 days after which i gave up and killed the query. And then went on finding ways to make this alter happen fast.

There are multiple ways to make your alter fast –

1. create a temporary table with engine = innodb, disable unique checks and insert into the table in batches. The sequence of sql statements for this are

create table new_table_name like old_table_name;
alter table new_table_name engine = innodb;
set unique_checks=0;
// loop till all records are ported
insert into new_table_name select * from old_table_name where key > something and key

Troubleshooting : mysql server has gone away

When running query against a database the following error can be generated:

ERROR 2006 (HY000) at line NNN: MySQL server has gone away

Where “NNN” is the line number of the script currently being run where the error occurred.

Possible Causes and Resolution

This is a general error which can have a number of possible causes. The one certainty is that the MySQL database is no longer listening on the connection – but this does not necessarily mean that MySQL is no longer available.

Possible causes:

MySQL has crashed.
Check that MySQL is still running on the server. If it is not running and this was not a planned shutdown then it is reasonable to conclude that MySQL has crashed. I personally have never experienced this, but if it happens then investigate whether you can upgrade to a newer version.

MySQL has been shutdown.
Check that MySQL is still running, if it is not and it was shut down (accidentally or intentionally) then that is the cause of the problem. Restart MySQL.

Network connectivity has failed.
Even if the database is still up, is it possible to contact the server? If you can ping the server (or MySQL is running no the localhost) then basic network connectivity is not the problem.

MySQL networking has failed
The problem may be the MySQL connection. If the query was trying to transfer a large block of data then the MySQL packet-size may be set too small.

Try increasing the value of max_allowed_packet and restart the MySQL database service. In MySQL Administrator select “Startup Variables” andon the “Advanced Networking” tab thsi is shown under “Data / Memory size” and is labeled as “Max. packet size.” The default is 1MB and one of my systems now runs at 16MB.

The database connection has timed out and thus the SQL has failed.
Try increasing the wait_timeout, or reconnecting just before firing the query.

Modified after copying from http://www.cryer.co.uk/brian/mysql/trouble_mysql_gone_away.htm