Friday, May 11, 2007

mysql multi-master replication - Act II

SETUP PROCEDURE
Created 4 instances of mysql on 2 machines running on different ports. Lets call these instances A, B, C and D. So A & C are on one machine and B & D are on another machine. B is slave of A, C is slave of B, D is slave of C and A is slave of D.

(origin) A --> B --> C --> D --> A (origin)

Each instance has its own serverid. A query originating from any machine will travel through the loop replicating data on all the machines are return back to the origniating server - which in effect will identify that the query had originated from here and would not execute/replicate the query further.

To handle auto_increment field, two variables are defined in the configuraion of the server.

1. auto_increment_increment : controls the increment between successive AUTO_INCREMENT values.

2. auto_increment_offset : determines the starting point of AUTO_INCREMENT columns.

Using these two variables, the auto generated values of auto_increment columns can be controlled.

Once the replication loop is up and running, any data inserted in any node would automatically propagate to all the other nodes.

ISSUES WITH MULTI-MASTER REPLICATION & AUTOMATIC FAILOVER

1. LATENCY

There is a definite latency between the first node and the last node. The replication steps lead to the slave reading the master's binary log through the network and writing it to its own relay-bin log. It then executes the query and then writes it to its own binary log. So each node takes a small amount of time to execute and then propagate the query further.

For a 4 node multi-master replication when i replicated a single insert query containing one int and one varchar field, the time taken for data to reach the last node is 5 ms.
This latency would ofcourse depend on the following factors
a) amount of data to be relicated. Latency increases with increase in data
b) Network speed between the nodes. If the data is to replicated over internet, it will take more time as compared to that on nodes in LAN
c) Amount of indexes on the tables being replicated. As the number of indexes increase, the time taken to insert data in tables also increases. Increasig the latency.
d) The hardware of the machine and the load on the machine will also determine how fast the replication between master and slave will take place.

2. AUTOMATIC FAILOVER

Automatic failover can be accomplished using events and federated tables in mysql 5.1. Federated tables are created between master and slave and are used to check connection between master and slave. An event is used to trigger a query on the federated table which checks the connection between master and slave. If the query fails, then a stored procedure can be created which should chunk the master out of the replication loop.

So suppose in the loop shown above, A goes out, then the event on B would detect that A is out and would make D as a master of B.

This is the theoretical implementation of automatic failover. Practically there are a few issues with its implementation.

a) for failover, you need to know the position of master's master from where the slave should take over. (So B should know the position on D from where it has to start replication). One of the ways to do this is that each slave logs its master's position on a table which is replicated throughout the loop. But this again is not possible using events & stored procedures - cause there is no query which can capture the information available from "SHOW SLAVE STATUS" query in a variable and write it in a table. Another way to do this is to have an external script running at an interval of say 10 seconds which logs this information and checks the federated tables for any disruption in the master-slave connection. With this methodology, the problem is that there could be a 10 second window period during which data can be lost.

b) You could also land into an infinite replication situation. Lets look how using the 4 node example above. Suppose "A" goes down and It has a query in its binary log which has to be replicated throught the loop. So the query propagates from "B" to "C" and finally to "D". Now since "A" is down and failover has happened, "B" would be the slave of "D". So the query which originated from "A" would go from "D" to "B". Thats because if "A" would have been there, it would have identified its own query and stopped it. But since "A" is out of the loop, the query will not be stopped and it will propagate to "B" and will always be in the loop. This can result in either the query running in the loop indefinitely or an error on all the slaves and all the slaves going down.

These are the major issues with multi-master replication and automatic failover. Though one can still live with the automatic failover scenario - as it might occur once in a while. But the latency between first and last node during replication has no solution. This latency is due to physical constraints and cannot be avoided.

A work around would be distributing the queries based on tables on all the nodes. So that queries for a table would always be served from a single node. But this then would result in table locks on that node and again we would not be able to reap the benefits of multi-master replication.

2 comments:

jana said...

Jayant,
Stumbled across your post and was relieved to see your thoughts(they match what I have been fed from the net for some time on multi master replication). Your thoughts on Infinite Loop, though has caught my attention, but hasn't popped anything in my head. If you by chance find time, please re-articulate(for every other soul's sake). I too will try to do my bit.

Anonymous said...

hi jayant
thanku very much and ur article helped me a lot. thanku