Mysql HA solutions

Lets see what HA solutions can be designed in mysql and where are they suited.

1. Single master – single slave.

M(RW)
|
S(R)

A simple master slave solution can be used for a small site – where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master – once it has replicated the “available” logs from the master. You will need to create another slave of the now “new” master to make your mysql highly available again. In case the slave crashes, you will have to switch your read queries on master and create a new slave.

As mentioned earlier this is for a very “small” site. There are multiple scenarios where single master – single slave solution is not suitable. You will not be able to perform read scalability or run heavy queries to generate reports without affecting your site performance. Also for creating a new slave after failure, you will need to lock and take backup from the available mysql server. This will affect your site.

2. Single master – multiple slave.

          M1(RW)
            |
      ——————————-
      |                |                         |
    S1(R)       S2(R)              Sn(R)

A single master multiple slave scenario is the most suitable architecture for many web sites. It provides read scalability across multiple slaves. Creation of new slaves are much easier. You can easily allocate a slave for backups and another for running heavy reports without affecting the site performance. You can create new slaves to scale reads as and when needed. But all inserts go into the only master. This architecture is not suitable for write scalability.

When any of the slave crashes, you can simply remove that slave, create another slave and put it back into the system. In case the master fails, you will need to wait for the slaves to be in sync with the master – all replication binary logs have been executed and then make one of them as the master. Other slaves then become the slave of the new master. You will need to be very careful in defining the exact position from where the new slaves start replication. Else you will end up with lots of duplicate records and may lose data sanity on some of the slaves.

3. Single master – standby master – multiple slaves.

         M1(RW) —— M(R)
           |
      ——————————–
      |                   |                       |
    S1(R)       S2(R)               Sn(R)

This architecture is very much similar to the previous single master – multiple slave. The standby master is identified and kept for failover. The benefit of this architecture is that the standby master can be of the same configuration as the original master. This architecture is suitable for medium to high traffic websites where master is of a much higher configuration than the slaves – maybe having RAID 1+0 or SSD drives. The standby master is kept close to the original master so that there is hardly any lag between the two. Standby master can be used for reads also, but care should be taken that there is not much lag between the master and the standby – so that in case of failure, switching can be done with minimum downtime.

When the master fails, you need to wait for the slaves to catch up with the old master and the simply switch them and the app to the standby master.

4. Single master – candidate master – multiple slaves.

         M1(RW) ——– M(R)
                                        |
              ———————————–
              |                   |                           |
            S1(R)         S2(R)                  Sn(R)

    This is an architecture very similar to the earlier one. The only difference being that all slaves are replicating from the candidate master instead of the original master. The benefit of this is that in case the master goes down, there is no switching required in the slaves. The old master can be removed from the system and the new master will automatically take over. Afterwards, in order to get the architecure back in place a new candidate master needs to be identified and the slaves can be moved one by one to the new master. The downtime here is minimal. The catch here is that there would be a definite lag between the master and the slaves, since replication on slaves happen through the candidate. This lag can be quite annoying in some cases. Also if the standby fails, all slaves will stop replication and will need to be moves to either the old master or a new standby server needs to be identified and all slaves be pointed to it.

5. Single master – multiple slaves – candidate master – multiple slaves

       M1(RW) ———————– M(R)
           |                                               |
   —————                      ———————  
   |                  |                       |                           |
S1(R)      S1n(R)            S2(R)                  S2n(R)

  This architecture is again similar to the earlier one with the fact that there is a complete failover setup for the current master. If either the master of the candidate master goes down, there are still slaves which are replicating and can be used. This is suitable for a high traffic website which require read scalability. The only drawback of this architecture is that writes cannot be scaled.

5. Multiple master – multiple slaves

        M1(RW) ———————– M2(RW)
          |                                                 |
  —————-                       —————-
  |                    |                       |                      |
S1(R)         S2(R)               S3(R)             S4(R)

This is “the” solution for high traffic websites. It provides read and write scalability as well as high availability. M1 and M2 are two masters in circular replication – both replicate each other. All slaves either point to M1 or M2. In case if one of the masters go down, it can be removed from the system, a new master can be created and put back in the system without affecting the site. If you are worried about performance issues when a master goes down and all queries are redirected to another master, you can have even 3 or more Masters in circular replication.

It is necessary to decide beforehand how many masters you would like to have in circular replication because adding more masters – though possible, is not easy. Having 2 masters does not mean that you will be able to do 2X writes. Writes also happen due to replication on the masters, so it depends entirely on the system resources how many writes can the complete system handle. Your application has to handle unique key generation in a fashion that does not result in duplication between the masters. Your application also needs to handle scenarios where the lag between M1 and M2 becomes extensite or annoying. But with proper thought to this architecture, it could be scaled and managed very well.

Postgresql replication

There are many solutions to postgresql replication available in the market. Almost all of them are third party solutions, since there was no inbuilt replication in postgresql. Postgresql 9.0 introduced replication into the database – it is also known as streaming replication.And it can be used only for master-slave replication. There is no master-master or clustering feature available with postgresql SR (streaming replication).

The way SR works is that there are log files (known as XLOG files) which are shipped to the standby or slave server via network. Multiple slave servers can connect to the master over the network. The stand by servers continuously replay the XLOG records shipped in continuous recovery mode.As soon as XLOG files are shipped, they are replayed on the slave. This makes latest data available on slave almost immediately. Log shipping does not interfere with any query execution on master. In case the primary goes offline, the standby server will wait for the primary to become active.

Here is how i did a test setup of master-slave replication using postgresql.

I had 2 machines 241 and 242. I downloaded postgresql-9.1.1.tar.bz2 on both.

Steps to setup replication :

1. untar, compile and install

241/242 ]#  tar -xvjf postgresql-9.1.1.tar.bz2
241/242 ]#  cd postgresql-9.1.1
241/242 postgresql-9.1.1]#  ./configure
241/242 postgresql-9.1.1]#  make
241/242 postgresql-9.1.1]#  sudo make install

This will install postgresql in /usr/local/pgsql folder

2. Setup 241 as master. Initialize the database cluster on 241

241 ]# adduser postgres
241 ]# mkdir /usr/local/pgsql/data
241 ]# chown postgres /usr/local/pgsql/data
241 ]# su – postgres
241 ]# /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

Do not start the postgres database server now.

3. configure master server to listen on all ip addresses.

241 ]# vim /usr/local/pgsql/data/postgresql.conf
  
    listen_addresses = ‘*’
  
4. Allow standby server to connect to postgresql on master with replication privilege

241 ]# vim /usr/local/pgsql/data/pg_hba.conf
  
    host   replication   postgres    192.168.1.242/22    trust
  
5. Setup replication related parameters in the master server

241 ]# vim /usr/local/pgsql/data/postgresql.conf

    # To enable read-only queries on a standby server, wal_level must be set to
    # “hot_standby”. But you can choose “archive” if you never connect to the
    # server in standby mode.
    wal_level = hot_standby

    # Set the maximum number of concurrent connections from the standby servers.
    max_wal_senders = 5

    # To prevent the primary server from removing the WAL segments required for
    # the standby server before shipping them, set the minimum number of segments
    # retained in the pg_xlog directory. At least wal_keep_segments should be
    # larger than the number of segments generated between the beginning of
    # online-backup and the startup of streaming replication. If you enable WAL
    # archiving to an archive directory accessible from the standby, this may
    # not be necessary.
    wal_keep_segments = 128
  
    # Enable WAL archiving on the primary to an archive directory accessible from
    # the standby. If wal_keep_segments is a high enough number to retain the WAL
    # segments required for the standby server, this is not necessary.
    archive_mode    = on
    archive_command = ‘cp %p /usr/local/pgsql/data/pg_archive/%f’

6. start postgresql on master

241 ]# /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &

7. copy the master server’s data to standby server

241 ]# /usr/local/pgsql/bin/psql -c “SELECT pg_start_backup(‘label’, true)”
 pg_start_backup
—————–
 0/4000020
(1 row)

241 ]# rsync -a /usr/local/pgsql/data/ root@192.168.1.242:/usr/local/pgsql/data –exclude postmaster.pid

241 ]# /usr/local/pgsql/bin/psql -c “SELECT pg_stop_backup()”
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
 pg_stop_backup
—————-
 0/40000D8
(1 row)

This will also copy all the configuration parameters and authentication related stuff from primary to standby slave.
Ensuring that the slave can be converted to a master/primary in case of a failover.

8. Change postgresql.conf to enable readonly queries on standby server

242 ]# vim /usr/local/pgsql/data/postgresql.conf

    hot_standby = on
  
9. Enable recovery on the standby server and change configuration.

242 ]# cp /usr/local/pgsql/share/recovery.conf.sample /usr/local/pgsql/data/recovery.conf
242 ]# vim /usr/local/pgsql/data/recovery.conf

    # Specifies whether to start the server as a standby. In streaming replication,
    # this parameter must to be set to on.
    standby_mode          = ‘on’

    # Specifies a connection string which is used for the standby server to connect
    # with the primary.
    primary_conninfo      = ‘host=192.168.1.241 port=5432 user=postgres’

    # Specifies a trigger file whose presence should cause streaming replication to
    # end (i.e., failover). Once the trigger file is found the server acts as a primary server.
    trigger_file = ‘/home/postgres/failover’

    # Specifies a command to load archive segments from the WAL archive. If
    # wal_keep_segments is a high enough number to retain the WAL segments
    # required for the standby server, this may not be necessary. But
    # a large workload can cause segments to be recycled before the standby
    # is fully synchronized, requiring you to start again from a new base backup.
    restore_command = ‘cp /usr/local/pgsql/data/pg_archive/%f “%p”‘

10. Start postgres on standby server. This will start streaming replication on the standby server.

242 ]# /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &

11. You can check the status of streaming replication using either the ps command or through psql – postgresql command prompt

241 (primary) ]# /usr/local/pgsql/bin/psql -c “SELECT pg_current_xlog_location()”
 pg_current_xlog_location
————————–
 0/5000EC0
(1 row)

242 (standby) ]# /usr/local/pgsql/bin/psql -c “select pg_last_xlog_receive_location()”
 pg_last_xlog_receive_location
——————————-
 0/5000EC0
(1 row)

242 (standby) ]$ /usr/local/pgsql/bin/psql -c “select pg_last_xlog_replay_location()”
 pg_last_xlog_replay_location
——————————
 0/5000EC0
(1 row)

To check using ps use the following commands

241 (master)]# ps ax | grep sender
 2728 ?        Ss     0:00 postgres: wal sender process postgres 192.168.1.242(54792) streaming 0/5000EC0
 2768 pts/1    R+     0:00 grep sender

242 (standby)]# ps ax| grep receiver
 28125 ?        Ss     0:00 postgres: wal receiver process   streaming 0/5000EC0
 28154 pts/1    S+     0:00 grep receiver

To do a failover, all that needs to be done is to create the ‘trigger’ file at the specified location. This will automatically turn off standby mode and the postgres server will start acting as a primary or master.

Do remember to use the “pg_ctl stop” command to stop either the primary or standby server. This will ensure graceful shutdown and no records will be missed being replicated.

In order to create another standby server repeat steps from 7 onwards – after adding the ip of the standby server in master configuration as in step 4

Setting up replication on postgresql9.0 beta

Eventually postgresql comes out with its inbuilt replication solution in 9.0 beta. Setting up of replication is quite simple. We will look at setting up a simple master-slave replication between two servers of postgresql.

The way replication works in postgres is that the master keeps on writing “write ahead log” files (also known as wal files). The slave can be configured to run either in recovery mode or hot standby. The difference being that in recovery mode the wal files require to be shipped using something like a cron. And the slave applies the wal files to its database. The problem here is that it is not smooth and the delay between master and slave can be big. The logic of hot standby is that the slave connects to the master and fetches the log files from master. And it applies those log files on its database. Here the lag between master and slave is not huge since the slave tries to keep up with the master. This is also known as streaming replication because the wal files are continuously shipped to the slave and applied there. There could be multiple slaves fetching wal files from master and applying to their own database. As opposed to mysql the slave here is readonly – it cannot be modified. It can be used only as read-only slave for select queries.

Lets look at deploying master-slave replication between two servers

Master server : 172.16.3.173
Slave server : 172.16.3.211

Step 1:
Install postgresql 9.0 on both servers. Simple steps for installation are
$ ./configure
$ make
$ make install
$ adduser postgres

Step 2:
Initialize the data directory on master. No need to initialize it on slave – since we will be copying the master directory to slave for starting replication.
$ mkdir /usr/local/pgsql/data
$ chown -R postgres.postgres /usr/local/pgsql/data
$ su – postgres
$ /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

In case you are trying to create a slave of existing postgres database, you can simply skip these steps and install postgres on a slave machine – and not initialize the data directory on the slave machine.

Step 3:
Change config on master so that any slave is authorized connect to the master. Add the following line to pg_hba.conf of master.
$ vim data/pg_hba.conf

host    replication     postgres        172.16.3.211/22         trust

This says that 172.16.3.211 (the slave) is allowed to connect to master for replication without any password. Configure this for all slaves (if you plan to have multiple slaves).

Step 4:
Change parameters on the master postgresql for streaming replication.
$ vim data/postgresql.conf

# allow all ip addresses to connect to the master.
listen_addresses = '*'

# set wal_level for hot standby replication.
wal_level = hot_standby

# enable archiving to an archive directory accessible from the standby. Not required for streaming replication. But set it for "just in case" scenarios 
archive_mode = on
archive_command = 'cp "%p" /path/to/archive_directory/"%f"'

# set the number of concurrent slave connections
max_wal_senders = 5

# set the minimum no of WAL segments to be retained in the pg_xlog directory from where the slave can pick up the WAL segments.
# This prevents removal of WAL segments before they are shipped the slave.
wal_keep_segments = 32


Step 5:
Start postgres on primary/master server.

Step 6:
Create a backup of master and ship it to the slave server.
$ ./bin/psql -c “SELECT pg_start_backup(‘postgres’,true)”
$ rsync -a /usr/local/pgsql/data postgres@172.16.3.211:/usr/local/pgsql/data
$ ./bin/psql -c “SELECT pg_stop_backup()”

Step 7:
Clean the data directory in slave and set replication parameters.
$ rm /usr/local/pgsql/data/postmaster.pid
Here the postgresql.conf is same as master, so just set the parameters required by slave.
$ vim /usr/local/pgsql/data/postgresql.conf

# disable archiving
archive_mode = off
# enable read-only queries on slave - during replication/recovery
hot_standby = on

Step 8:
Setup the parameters in recovery.conf required for streaming replication
$ vim recovery.conf

# shell command for copying log files back from the archive. Should not be required.
restore_command = 'cp -i /home/postgres/archive/%f "%p" </dev/null'
# enable standby
standby_mode = 'on'
# connection information for connecting to master
primary_conninfo = 'host=172.16.3.173 port=5432 user=postgres'
# stop streaming and enable the server to work in read/write mode, if this file is found.
# server keeps on polling for the trigger file and stops replication once this file is found. The server can then be used in read/write mode.
trigger_file = '/usr/local/pgsql9.0/data/stop.replication'

Step 9:
Start postgres on slave. It will start streaming replication between master and slave.
To check replication see if sender process is active on master

$ ps -ef | grep sender
postgres 12332 12158  0 10:13 ?        00:00:00 postgres: wal sender process postgres 172.16.3.211(41497) streaming 0/90149F0
postgres 16233  9474  0 11:36 pts/2    00:00:00 grep sender

On slave check if receiver process is working

$ ps -ef | grep receiver
postgres 27952 27945  0 10:12 ?        00:00:00 postgres: wal receiver process   streaming 0/9014974        
postgres 30354 23492  0 11:36 pts/0    00:00:00 grep receiver


That is it. Run a few DML/DDL queries on master and check if they are being replicated on slave. The lag between master and slave is not noticable.

memcached replication

Wow… finally a solution that provides replication in memcache – repcached.

You can have a look at it repcached.lab.klab.org.

They provide two types of packages

1. a pached memcache source, which can be directly compiled.
2. a patch which can be applied to the memcache source and then compiled.

So, i downloaded the memcached-(version)-repcached-(version).tar.gz source and simply compiled it.

./configure –enable-replication
make
sudo make install

Note : When you enable replication, you cannot do –enable-threads.

I started two instances of memcached on ports 11211 & 11222

jayant@gamegeek:~/php$ memcached -p 11211 -m 64 -x 127.0.0.1 -v
replication: connect (peer=127.0.0.1:11212)
replication: marugoto copying
replication: close
replication: listen

jayant@gamegeek:~/php$ memcached -p 11222 -m 64 -x 127.0.0.1 -v
replication: connect (peer=127.0.0.1:11212)
replication: marugoto copying
replication: start

Now set and get a value on instance on port 11211

jayant@gamegeek:~$ telnet localhost 11211
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
set hello 0 0 5
world
STORED
get hello
VALUE hello 0 5
world
END

Connect to port 11222 and try getting this value

jayant@gamegeek:~$ telnet localhost 11222
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
get hello
VALUE hello 0 5
world
END

Try the reverse as well

On 11222
<—snip–>
set key 0 0 5
myval
STORED
get key
VALUE key 0 5
myval
END
<—snip–>

On 11212
<—snip–>
get key
VALUE key 0 5
myval
END
<—snip–>

Suppose the master goes down (in this case lets assume that the memcached on port 11211 goes down). So, we redirect all traffic on port 11222. But later when memcached on port 11211 comes up, the data should be automatically replicated on the new instance. Lets kill the memcache on port 11211 and restart it

On port 11211

Killed
jayant@gamegeek:~/php$ memcached -p 11211 -m 64 -x 127.0.0.1 -v
replication: connect (peer=127.0.0.1:11212)
replication: marugoto copying
replication: start

On port 11222

<—snip–>
replication: close
replication: listen
replication: accept
replication: marugoto start
replication: marugoto 2
replication: marugoto owari
<—snip–>

Lets see if the data has been replicated on port 11211

jayant@gamegeek:~$ telnet localhost 11211
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
get hello
VALUE hello 0 5
world
END
get key
VALUE key 0 5
myval
END

Bingo…
Please share your experience if you have tried it on a live scenario with large number of sets and gets.

postgresql replication using slony-I

As most postgresql users must be knowing, postgresql does not provide any inbuilt replication solution. There are lots of 3rd party replication products available for postgresql. Slony is one of them. Slony is a trigger based replication solution, that is it used triggers to push data to the slave. Slony is supposed to be one of the most stable replication solutions for postgresql.

You can download slony from www.slony.info. There are two major versions of slony – slony-I & slony-II. Slony-I is a simple master-slave replication solution. Whereas slony-II is a advanced multi-master replication solution. We will go ahead with simple master-slave replication solution. So we will download Slony-I. The latest version available is Slony-I 1.2.15. Slony-I 2.0 is in RC and should be soon released. But we will go with a stable release – 1.2.15.

Postgresql version being used is 8.3.3. To install slony, simply untar the downloaded file and run
./configure –with-pgconfigdir=<path to pg_config>
make
sudo make install

I have used two machines for setting up replication. Installed postgresql and slony-I on both of them.

master server ip : 172.16.3.211
slave server ip : 172.16.3.173

We will be working with the superuser postgres which is used to start and stop the postgresql server.

Quick steps

  • Define environment variables on master & slave. The main purpose is to make our task easier. Lets create an env.sh file containing all the definitions.

    #!/bin/sh

    REPLICATIONUSER=postgres
    CLUSTERNAME=replcluster
    MASTERDBNAME=repltestdb
    SLAVEDBNAME=repltestdb
    MASTERHOST=172.16.3.211
    SLAVEHOST=172.16.3.173
    MASTERPORT=5432
    SLAVEPORT=5432
    MASTERDBA=postgres
    SLAVEDBA=postgres
    PSQL=/usr/local/pgsql/bin/psql
    CREATEDB=/usr/local/pgsql/bin/createdb
    CREATELANG=/usr/local/pgsql/bin/createlang
    CREATEUSER=/usr/local/pgsql/bin/createuser
    PGDUMP=/usr/local/pgsql/bin/pg_dump

    export REPLICATIONUSER CLUSTERNAME MASTERDBNAME SLAVEDBNAME MASTERHOST SLAVEHOST PSQL CREATEDB CREATELANG CREATEUSER PGDUMP MASTERPORT SLAVEPORT MASTERDBA SLAVEDBA

    As you can see here, my postgresql is installed in /usr/local/pgsql. I have defined the IP addresses & ports of master and slave servers. I have used the superuser postgres for replication. And i have defined the master and slave databases to be used for replication. You can replicate between databases with different names on master and slave – just change the names in all the scripts.

  • Create database on master & slave
    On master run
    /usr/local/pgsql/bin/createdb -O $REPLICATIONUSER -h $MASTERHOST -p $MASTERPORT $MASTERDBNAME
    On slave run
    /usr/local/pgsql/bin/createdb -O $REPLICATIONUSER -h $SLAVEHOST -p $SLAVEPORT $SLAVEDBNAME
  • Since slony-I depends on triggers for replication, you will need to install the plsql procedural language on master to generate and run triggers & stored procedures for pushing data to slave.
    /usr/local/pgsql/bin/createlang -h $MASTERHOST -p $MASTERPORT plpgsql $MASTERDBNAME
  • Put some tables in the $MASTERDBNAME on master, which you want to replicate. And port the tables to slave. It has to be done manually.

    Dump the tables on master
    /usr/local/pgsql/bin/pg_dump -s -U $MASTERDBA -h $MASTERHOST -p $MASTERPORT $MASTERDBNAME > replmaster.sql

    Import the tables on slave
    /usr/local/pgsql/bin/psql -U $SLAVEDBA -h $SLAVEHOST -p $SLAVEPORT $SLAVEDBNAME < replmaster.sql

  • And now configure the databases for replication. When you install Slony-I, it puts two binaries slonik and slon in the pgsql/bin directory. Slonik is the tool which is used for creating configuration tables, stored procedures and triggers. All we need to do is create a configuration file to pass it to the slonik tool. Here i am assuming that there are two tables which need to be replicated – parent & child.

    vim replconfig.cnf
    # define the namespace the replication system uses in our example it is
    # replcluster
    cluster name = replcluster;
    # admin conninfo’s are used by slonik to connect to the nodes one for each
    # node on each side of the cluster, the syntax is that of PQconnectdb in
    # the C-API
    node 1 admin conninfo = ‘dbname=repltestdb host=172.16.3.211 port=5432 user=postgres’;
    node 2 admin conninfo = ‘dbname=repltestdb host=172.16.3.173 port=5432 user=postgres’;
    # init the first node. Its id MUST be 1. This creates the schema
    # _$CLUSTERNAME containing all replication system specific database
    # objects.
    init cluster ( id=1, comment = ‘Master Node’);
    # Add unique keys to table that do not have one.
    # This command adds a bigint column named “_Slony-I_$CLUSTERNAME_rowID” to the table which will have a default value of nextval(‘_$CLUSTERNAME.s1_rowid_seq’) and have UNIQUE & NOT NULL constraints applied on it.
    # table add key (node id = 1, fully qualified name = ‘table_name’);
    # Slony-I organizes tables into sets. The smallest unit a node can
    # subscribe is a set.
    # you need to have a set add table() for each table you wish to replicate
    create set (id=1, origin=1, comment=’parent child table’)
    set add table (set id=1, origin=1, id=1, fully qualified name = ‘public.parent’, comment=’parent table’);
    set add table (set id=1, origin=1, id=2, fully qualified name = ‘public.child’, comment=’child table’);
    # Create the second node (the slave) tell the 2 nodes how to connect to
    # each other and how they should listen for events.
    store node (id=2, comment = ‘Slave node’);
    store path (server = 1, client = 2, conninfo=’dbname=repltestdb host=172.16.3.211 port=5432 user=postgres’);
    store path (server = 2, client = 1, conninfo=’dbname=repltestdb host=172.16.3.173 port=5432 user=postgres’);
    store listen (origin=1, provider = 1, receiver =2);
    store listen (origin=2, provider = 2, receiver =1);

    Pass the config file to slonik for creating required triggers & config tables.

    /usr/local/pgsql/bin/slonik replconfig.cnf

  • Lets start the replication daemons on master & slave

    On master run
    /usr/local/pgsql/bin/slon $CLUSTERNAME “dbname=$MASTERDBNAME user=$MASTERDBA host=$MASTERHOST port=$MASTERPORT” > slon.log &

    On slave run
    /usr/local/pgsql/bin/slon $CLUSTERNAME “dbname=$SLAVEDBNAME user=$SLAVEDBA host=$SLAVEHOST port=$SLAVEPORT” > slon.log &

    Check out the output in slon.log files

  • Now everything is setup and from the slon.log files on master and slave you can see that both the servers are trying to sync with each other. But still replication is not on way. To start replication we need to make the slave subscribe to the master. Here is the required config file for doing this

    startrepl.cnf
    # This defines which namespace the replication system uses
    cluster name = replcluster;
    # connection info for slonik to connect to master & slave
    node 1 admin conninfo = ‘dbname=repltestdb host=172.16.3.211 port=5432 user=postgres’;
    node 2 admin conninfo = ‘dbname=repltestdb host=172.16.3.173 port=5432 user=postgres’;
    # Node 2 subscribes set 1
    subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);

    Passing this file to slonik will do the trick and replication would start happening.

    /usr/local/pgsql/bin/slonik startrepl.cnf

Now simply make some inserts, updates and deletes on the master and check out whether they are happening on the slave as well. Officially, since replication is on full swing all changes in master tables should be replicated on the slave.

Please note that new tables & changes to table structures wont be replicated automatically. So whenever a new table is created or an existing table is altered the changes has to be manually propagated to slave and the scripts need to be run to make appropriate changes in the triggers and config tables.

Another important thing to note is that postgresql on master and slave should be able to communicate with both the ip addresses. For this add the ip addresses in the pgsql/data/pg_hba.conf.

For the able replication i had added the lines
host all all 172.16.3.211/32 trust
host all all 172.16.3.173/32 trust
to the pg_hba.conf file in both master & slave.

mysql multi-master replication – Act II

SETUP PROCEDURE
Created 4 instances of mysql on 2 machines running on different ports. Lets call these instances A, B, C and D. So A & C are on one machine and B & D are on another machine. B is slave of A, C is slave of B, D is slave of C and A is slave of D.

(origin) A –> B –> C –> D –> A (origin)

Each instance has its own serverid. A query originating from any machine will travel through the loop replicating data on all the machines are return back to the origniating server – which in effect will identify that the query had originated from here and would not execute/replicate the query further.

To handle auto_increment field, two variables are defined in the configuraion of the server.

1. auto_increment_increment : controls the increment between successive AUTO_INCREMENT values.

2. auto_increment_offset : determines the starting point of AUTO_INCREMENT columns.

Using these two variables, the auto generated values of auto_increment columns can be controlled.

Once the replication loop is up and running, any data inserted in any node would automatically propagate to all the other nodes.

ISSUES WITH MULTI-MASTER REPLICATION & AUTOMATIC FAILOVER

1. LATENCY

There is a definite latency between the first node and the last node. The replication steps lead to the slave reading the master’s binary log through the network and writing it to its own relay-bin log. It then executes the query and then writes it to its own binary log. So each node takes a small amount of time to execute and then propagate the query further.

For a 4 node multi-master replication when i replicated a single insert query containing one int and one varchar field, the time taken for data to reach the last node is 5 ms.
This latency would ofcourse depend on the following factors
a) amount of data to be relicated. Latency increases with increase in data
b) Network speed between the nodes. If the data is to replicated over internet, it will take more time as compared to that on nodes in LAN
c) Amount of indexes on the tables being replicated. As the number of indexes increase, the time taken to insert data in tables also increases. Increasig the latency.
d) The hardware of the machine and the load on the machine will also determine how fast the replication between master and slave will take place.

2. AUTOMATIC FAILOVER

Automatic failover can be accomplished using events and federated tables in mysql 5.1. Federated tables are created between master and slave and are used to check connection between master and slave. An event is used to trigger a query on the federated table which checks the connection between master and slave. If the query fails, then a stored procedure can be created which should chunk the master out of the replication loop.

So suppose in the loop shown above, A goes out, then the event on B would detect that A is out and would make D as a master of B.

This is the theoretical implementation of automatic failover. Practically there are a few issues with its implementation.

a) for failover, you need to know the position of master’s master from where the slave should take over. (So B should know the position on D from where it has to start replication). One of the ways to do this is that each slave logs its master’s position on a table which is replicated throughout the loop. But this again is not possible using events & stored procedures – cause there is no query which can capture the information available from “SHOW SLAVE STATUS” query in a variable and write it in a table. Another way to do this is to have an external script running at an interval of say 10 seconds which logs this information and checks the federated tables for any disruption in the master-slave connection. With this methodology, the problem is that there could be a 10 second window period during which data can be lost.

b) You could also land into an infinite replication situation. Lets look how using the 4 node example above. Suppose “A” goes down and It has a query in its binary log which has to be replicated throught the loop. So the query propagates from “B” to “C” and finally to “D”. Now since “A” is down and failover has happened, “B” would be the slave of “D”. So the query which originated from “A” would go from “D” to “B”. Thats because if “A” would have been there, it would have identified its own query and stopped it. But since “A” is out of the loop, the query will not be stopped and it will propagate to “B” and will always be in the loop. This can result in either the query running in the loop indefinitely or an error on all the slaves and all the slaves going down.

These are the major issues with multi-master replication and automatic failover. Though one can still live with the automatic failover scenario – as it might occur once in a while. But the latency between first and last node during replication has no solution. This latency is due to physical constraints and cannot be avoided.

A work around would be distributing the queries based on tables on all the nodes. So that queries for a table would always be served from a single node. But this then would result in table locks on that node and again we would not be able to reap the benefits of multi-master replication.

Multi-Master replication in mysql

What is this? How can a slave have multiple masters? Well the architecture is not what the name implies. A slave can have only one master. But here we are going to create a loop between slaves and masters, also known as Circular replication.

Lets take 3 machines. (Since i have only a single machine – my laptop – i created 3 instances of mysql running on 3 different ports and with 3 different data directories to work out this architecture.) Suppose, we have installed mysql on all the 3 machines/nodes – say NodeA, NodeB, NodeC. What needs to be done is that NodeA is slave of NodeC, NodeC is slave of NodeB and NodeB is slave of NodeA. So each mysql instance is a master and also a slave. I will go into implementation details of this in a minute. First of all, let me list down the benefits and issues with this setup.

Benefits:

  • You can build a high availability and high performance cluster catering to a very busy web site.
  • In case if a node goes down, it is comparatively easy to bypass the node and get the cluster up and running in a small time.

Issues:

  • There is a small time difference between an insert in the master and the corresponding data being replicated in the slave. Though this time difference is very small, but in this case, since the data has to propagate through all the nodes in the loop, the time difference may be of consecuence as the number of nodes increase.
  • The process of detecting and cleaning up a failure in replication still needs manual intervention.

To go into the technicalities, lets say that NodeA, NodeB and NodeC have mysql installed on them. You need to have a mysql version >= 4.1 on all the nodes. Also all nodes must have the same mysql version, to maintain compatibility in replication between different versions. I used mysql version 5.1.9-beta.

If you are new to mysql replication, please go through the following link http://dev.mysql.com/doc/refman/5.1/en/replication-implementation.html. This will help you to understand how to setup mysql replication.

Following figure explains the complete architecture. You can also add extra slaves for all the nodes which can be used/added to the loop in case if any node goes down or for simple select queries – further increasing the performance.

The configuration files on different nodes were as follows

Node A
=====
[mysqld]
basedir=/usr/local/mysql5
datadir=/usr/local/mysql5/data
socket=/tmp/mysql5.sock
port=3306

# for replication
server-id=10
log-bin
log-slave-updates
replicate-same-server-id=0
auto_increment_increment=10
auto_increment_offset=1
master-host=NodeC_IP
master-user=repl
master-password=jayant
master-port=3306
report-host=NodeA
skip-slave-start

log-slow-queries
log-output=FILE
# log-queries-not-using-indexes
log-slow-admin-statements

[mysql.server]
user=mysql
basedir=/usr/local/mysql5

Node B
=====
[mysqld]
basedir=/usr/local/mysql5
datadir=/usr/local/mysql5/data
socket=/tmp/mysql5.sock
port=3306

# for replication
server-id=20
log-bin
log-slave-updates
replicate-same-server-id=0
auto_increment_increment=10
auto_increment_offset=2
master-host=NodeA_IP
master-user=repl
master-password=jayant
master-port=3306
report-host=NodeB
skip-slave-start

log-slow-queries
log-output=FILE
# log-queries-not-using-indexes
log-slow-admin-statements

[mysql.server]
user=mysql
basedir=/usr/local/mysql5

Node C
=====
[mysqld]
basedir=/usr/local/mysql5
datadir=/usr/local/mysql5/data
socket=/tmp/mysql5.sock
port=3306

# for replication
server-id=30
log-bin
log-slave-updates
replicate-same-server-id=0
auto_increment_increment=10
auto_increment_offset=3
master-host=NodeB_IP
master-user=repl
master-password=jayant
master-port=3306
report-host=NodeC
skip-slave-start

log-slow-queries
log-output=FILE
# log-queries-not-using-indexes
log-slow-admin-statements

[mysql.server]
user=mysql
basedir=/usr/local/mysql5

Once mysql on all 3 nodes are properly setup, all that needs to be done is run a “START SLAVE” command on all of them. This will bring the circular loop in function. Now you can have an array of mysql connections in your application and may be do a round robin execution of any type of queries (select, update, insert, delete) using the array on any of the machines in the cluster.

Points to be noted:

  • server-id has to be different for each of the machine in the circular replication
  • log-slave-updates tells the mysql slave to log queries that are replicated from master to its binary log, so that they can be passed on to who-so-ever is reading its binary log
  • replicate-same-server-id tells the mysql slave not to replicate its own queries in circular replication. If this variable is ignored, the setup may result in an infinite loop where a query fired on node A may keep on replicating itself repeatedly. This variable should be set to 0 so that the mysql slave does not replicate events that have its own server id. Preventing infinite loops.
  • auto_increment_offset determines the starting point of auto increment column values.
  • auto_increment_increment determines the increment between auto increment values.

auto_increment_increment and auto_increment_offset have to be used so that auto increment values on different servers do not conflict with each other. If this is not used, then, when a query is fired on node A and node B simultaneously or may be the replication between node A and node B breaks after the first query is fired on node A and before the second query is fired on node B.

On any Node
create table x(id int auto_increment primary key, val varchar(250) null);

On Node A
insert into table x(val) values (‘in node a’);

On Node B
insert into table x(val) values (‘in node b’);

In this case a “duplicate primary key” error will come when the query from node A is replicated and run on node B.

But if these variables are set appropriately, there would be no such issue. The above setup has auto_increment_increment set to 10 allowing addition of upto 10 nodes in the cluster.

For more details on this you can refer http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html. It is a very lengthy article but very detailed. Explains failover mechanisms and also provides some scripts for detecting breakages in the loop.