Thursday, April 12, 2012

Mysql HA solutions

Lets see what HA solutions can be designed in mysql and where are they suited.

1. Single master - single slave.

M(RW)
|
S(R)

A simple master slave solution can be used for a small site - where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master - once it has replicated the "available" logs from the master. You will need to create another slave of the now "new" master to make your mysql highly available again. In case the slave crashes, you will have to switch your read queries on master and create a new slave.

As mentioned earlier this is for a very "small" site. There are multiple scenarios where single master - single slave solution is not suitable. You will not be able to perform read scalability or run heavy queries to generate reports without affecting your site performance. Also for creating a new slave after failure, you will need to lock and take backup from the available mysql server. This will affect your site.


2. Single master - multiple slave.

          M1(RW)
            |
      -------------------------------
      |                |                         |
    S1(R)       S2(R)              Sn(R)

A single master multiple slave scenario is the most suitable architecture for many web sites. It provides read scalability across multiple slaves. Creation of new slaves are much easier. You can easily allocate a slave for backups and another for running heavy reports without affecting the site performance. You can create new slaves to scale reads as and when needed. But all inserts go into the only master. This architecture is not suitable for write scalability.

When any of the slave crashes, you can simply remove that slave, create another slave and put it back into the system. In case the master fails, you will need to wait for the slaves to be in sync with the master - all replication binary logs have been executed and then make one of them as the master. Other slaves then become the slave of the new master. You will need to be very careful in defining the exact position from where the new slaves start replication. Else you will end up with lots of duplicate records and may lose data sanity on some of the slaves.


3. Single master - standby master - multiple slaves.

         M1(RW) ------ M(R)
           |
      --------------------------------
      |                   |                       |
    S1(R)       S2(R)               Sn(R)

This architecture is very much similar to the previous single master - multiple slave. The standby master is identified and kept for failover. The benefit of this architecture is that the standby master can be of the same configuration as the original master. This architecture is suitable for medium to high traffic websites where master is of a much higher configuration than the slaves - maybe having RAID 1+0 or SSD drives. The standby master is kept close to the original master so that there is hardly any lag between the two. Standby master can be used for reads also, but care should be taken that there is not much lag between the master and the standby - so that in case of failure, switching can be done with minimum downtime.

When the master fails, you need to wait for the slaves to catch up with the old master and the simply switch them and the app to the standby master.


4. Single master - candidate master - multiple slaves.

         M1(RW) -------- M(R)
                                        |
              -----------------------------------
              |                   |                           |
            S1(R)         S2(R)                  Sn(R)

    This is an architecture very similar to the earlier one. The only difference being that all slaves are replicating from the candidate master instead of the original master. The benefit of this is that in case the master goes down, there is no switching required in the slaves. The old master can be removed from the system and the new master will automatically take over. Afterwards, in order to get the architecure back in place a new candidate master needs to be identified and the slaves can be moved one by one to the new master. The downtime here is minimal. The catch here is that there would be a definite lag between the master and the slaves, since replication on slaves happen through the candidate. This lag can be quite annoying in some cases. Also if the standby fails, all slaves will stop replication and will need to be moves to either the old master or a new standby server needs to be identified and all slaves be pointed to it.


5. Single master - multiple slaves - candidate master - multiple slaves

       M1(RW) ----------------------- M(R)
           |                                               |
   ---------------                      ---------------------  
   |                  |                       |                           |
S1(R)      S1n(R)            S2(R)                  S2n(R)


  This architecture is again similar to the earlier one with the fact that there is a complete failover setup for the current master. If either the master of the candidate master goes down, there are still slaves which are replicating and can be used. This is suitable for a high traffic website which require read scalability. The only drawback of this architecture is that writes cannot be scaled.


5. Multiple master - multiple slaves

        M1(RW) ----------------------- M2(RW)
          |                                                 |
  ----------------                       ----------------
  |                    |                       |                      |
S1(R)         S2(R)               S3(R)             S4(R)

This is "the" solution for high traffic websites. It provides read and write scalability as well as high availability. M1 and M2 are two masters in circular replication - both replicate each other. All slaves either point to M1 or M2. In case if one of the masters go down, it can be removed from the system, a new master can be created and put back in the system without affecting the site. If you are worried about performance issues when a master goes down and all queries are redirected to another master, you can have even 3 or more Masters in circular replication.

It is necessary to decide beforehand how many masters you would like to have in circular replication because adding more masters - though possible, is not easy. Having 2 masters does not mean that you will be able to do 2X writes. Writes also happen due to replication on the masters, so it depends entirely on the system resources how many writes can the complete system handle. Your application has to handle unique key generation in a fashion that does not result in duplication between the masters. Your application also needs to handle scenarios where the lag between M1 and M2 becomes extensite or annoying. But with proper thought to this architecture, it could be scaled and managed very well.

Tuesday, April 10, 2012

introducing varnish-cache

Varnish is a web accelerator. It is used as a reverse proxy in front of the actual web server. You must have used either nginx or lighttpd as a reverse proxy in front of you apache( or any other web server ). Why Varnish? Varnish claims that it is very fast - really really fast. The plus point that i can see over here is that varnish has been designed from root up as a reverse proxy. Where as nginx and lighttpd can also be used as a web server.

Lets compile varnish and try setting it up.

Get varnish from https://www.varnish-cache.org/releases. I got the source code of 3.0.2. To compile simply run.

./configure
make
sudo make install

If you went ahead and installed varnish at the default prefix /usr/local, you will be able to find the varnish configuration file at

/usr/local/etc/varnish/default.vcl

The very basic configuration required for starting varnish is the setting of the backend servers. Open up the default.vcl file and put

backend default {
     .host = "";
     .port = "";
}

To start varnish simply run

varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,2G -T 127.0.0.1:2000 -a 0.0.0.0:8080
This states that varnish
-f should use the default.vcl configuration file.
-s has been allocated memory of 2GB.
-T The administration interface is running on localhost at port 2000
-a varnish should listen to port 8080. You will need to change it to 80 when you want to make it live.

Ideally varnish does not cache any page which has a cookie header attached to it. If you have a dynamic web site and are using cookies heavily, you will find that your varnish hit ratio is too low. You can check the hit ratio on varnish using the varnishstat command.

There are few ways to get around it.

1. cache the content along with the cookie in the hash key. This results in a per user cache and there can be hit ratio but it is low.
sub vcl_hash {
    set req.hash += req.http.cookie;
}

2. Remove setcookie from the backend for a particular path. Can be used for static content
sub vcl_recv {
    if (req.url ~ "^/images") {
        unset req.http.cookie;
    }
}

sub vcl_fetch {
    if (req.url ~ "^/images") {
        unset beresp.http.set-cookie;
    }
}

3. Throw away the cookie header for certain file extensions. Mostly js/css and images.

sub vcl_recv {
 if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
    lookup;
 }
}

# strip the cookie before the image is inserted into cache.
sub vcl_fetch {
 if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
   unset beresp.http.set-cookie;
}





Varnish can also be used as a load balancer with multiple backends. Lets see the configuration.

First create multiple backends in the config file

backend web1 {
     .host = "192.168.1.1";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

backend web2 {
     .host = "192.168.1.2";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

For each backend there is a health check probe. Varnish should fetch the "/" every 5 sec. If it takes more than 1 sec, it is considered a failure.
If more than 3 out of last 5 probes are ok, the backend is considered healthy.

Now create a director. There are a number of directors - random, client, hash, round-robin, DNS and fallback. Lets configure a random director and we will see what can be done using the different directors. To configure random director

director web random {
        {
                .backend = web1;
                .weight = 1;
        }
        {
                .backend = web2;
                .weight = 1;
        }
}

Now tell your requests to use the "web" director for serving requests

sub vcl_recv {
   if (req.http.host ~ "^(www.)?mysite.com$") {
       set req.backend = web;
   }
}

Lets see what the different directors are there for.

The client director
       The client director picks a backend based on the clients identity. You can set the VCL variable client.identity to identify the client by picking up the value of a  session  cookie.

The hash director
       The hash director will pick a backend based on the URL hash value. This is useful is you are using Varnish to load balance in front of other Varnish caches or other web accelerators as objects won't be duplicated across caches.      

The round-robin director
       The round-robin director does not take any options. It will use the first backend for the first request, the second backend for the second request and so on, and start from the top again when it gets to the end. If a backend is unhealthy or Varnish fails to connect, it will be skipped.  The round-robin director will try all the backends once before giving up.

The DNS director
       The DNS director can use backends in two different ways. Either like the random or round-robin director or using .list:

       director directorname dns {
               .list = {
                       .host_header = "www.example.com";
                       .port = "80";
                       .connect_timeout = 0.4s;
                       "192.168.15.0"/24;
                       "192.168.16.128"/25;
               }
               .ttl = 5m;
               .suffix = "internal.example.net";
       }


       This will specify 384 backends, all using port 80 and a connection timeout of 0.4s. Options must come before the list of IPs in the .list statement. The .ttl defines the cache duration of the DNS lookups. Health checks are not thoroughly supported. DNS round robin balancing is supported. If a hostname resolves to multiple backends, the director will divide the traffic between all of them in a round-robin manner.

The fallback director
     The fallback director will pick the first backend that is healthy. It considers them in the order in which they are listed in its definition. The fallback director does not take any options.

       An example of a fallback director:

       director b3 fallback {
         { .backend = www1; }
         { .backend = www2; } // will only be used if www1 is unhealthy.
         { .backend = www3; } // will only be used if both www1 and www2
                              // are unhealthy.
       }

There is a huge list of configurations that can be done in varnish. You can check the list here and see which suits your needs.

https://www.varnish-cache.org/trac/wiki/VCLExamples