Saturday, December 29, 2012

Does this work ??

Recently had a very unique experience with the tata safari. I own a 1 year olo tata safari dicor 2.2

It was parked for about 2-3 hours near sector 30 noida. When i came back to the car, i saw the alarm blaring. And there was no one near the car. Assuming that i there was some malfunction, i locked and unlocked the car. But the alarm kept on blaring.

Finally when i was near the car, i saw that the driver side door was open. Looked around and saw no one interested in either the car or the alarm. Climbed inside and saw my sterio still in place. Thought that had left the door open °by mistake°. So started the engine and headed home. Then i noticed that my door - the driver side door was unlocked. Tried pushing the lock, but it was stuck. It was then i realized that some "not so smart" thief tried to break into my car and was unsuccessful.

This is when the story starts.

Had a sleepless night. Cause my car would not lock. I woke up 3-4 times just to check if my car was still standing. To add fuel to the fire, googled about tata safari theft and found that both sa#ari & scorpio are toppers in the list of vehicles which are stolen. My Tata safari has engine immobilizer, gear lock and the now broken central lock. But i read cases where a tata safari with gps also was stolen.

Next morning i went to a nearby service center of tata safari and told him the complete tale. People were awed. But they told me a different tale. That of replacing the complete lock set. And get a new set of keys. I was like "i wud almost never use the key to open the door. Why spend almost 7000 to replace it? I am ok with the key not working as long as the central locking works.".

The driver side door look was examined. It was assumed that the thief tried inserting some sort of screw-driver to attempt to open the lock. As a result the key channel was damaged and the key would not go into the lock. The lock was in a tilted position due to which the central locking was open and "not movable". My simple idea was to bring the lock to its original vertical position so that the central locking becomes possible. But sad to say the TASS guys were not willing to comply.

I took it to another TASS and then to a local mechanic. Made both of them understand that it was a simple matter of "bringing the lock back to its vertical position". Everyone was of the opinion of a complete new lock set.

Even i was convinced that spending 7k was the only option. But as a last try approached another local mechanic and explained him the complete story. This guy said that he will try but cannot promise anything. He opened up the door from inside and took a look at the lock. And turned it anti-clockwise to make it vertical. And bingo my central locking was back in place. He took rs.200 but i was ok with it. He has saved a lot.

If i had gone by the book it would have cost me much more to put a solution in place which i would not be using much.

Sunday, November 04, 2012

Ecommerce - an experience worth mentioning

I have just realized that I have started purchasing a lot of stuff online. Starting from electronics, clothes to even groceries and vegetables. The fact is that not only they are heavily discounted - but all i have to do is "a few clicks" and some one else does the effort of getting it home for me.

I remember the first purchase I made online - where-else but flipkart and what-else but books. The experience was ok. I have used www.flipkart.com many times since then, but my recent experience was terrible. I ordered something for myself and after 10 days, when I followed up, I came to know that the order will "not" be delivered because they do not have the stuff. I was like "hello, did you have to wait for 10 days to tell me - and that too after I give a reminder". Dont you have an escalation system for missed orders. Thanks god, the order was for myself - what if I had sent it as a gift to someone else? And I got no response back - except that my money will be credited back in another 7-8 days.

My experience with homeshop18 has been much better. Timely deliveries and heavily discounted products. It is important to remember that for an Indian customer - price matters. Remember the "kitna deti hai" ad ?

And my experience with jabong a lot better. Same day delivery - wow. And that too on prices which are way-way below. Though the site is terrible. They hardly have any content for any product. You have to look at the pic and figure out what it is. I had to purchase a watch and i had to do a lot of google to figure out the specs for the watch I liked. But eventually i went to jabong to get the "extra" discount. As long as they are burning money, I am happy.

I have used www.watchkart.com and www.bagskart.com to order stuff and the experience has been good. Watchkart guys played a trick where they gave me a huge discount coupon immediately after making my first order. It made me feel guilty. I was double minded on cancelling my first order and using the coupon to order the product again. Maybe I should have.

Recently, I have ordered number of stuff at www.firstcry.com and www.hushbabies.com. Firstcry has huge delivery times but a better range of products. Hushbabies is much better. Delivery times are less and discounts are easily available. Also purchased a lot of stuff through www.shersingh.com. The single page purchase is a very unique experience. And the packaging for www.shersingh.com is wonderful. It makes you feel a level higher.

Have been doing some shopping at www.shopclues.com. They have the "jaw dropping deals" option where you get extremely cheap stuff. Have tried it twice and both times have been extremely satisfactory. www.letsbuy.com is a site that i miss - after it was killed by flipkart. It had much better deals than flipkart.

My first attempt to shop for vegetables online at www.sabjiwala.com went unsuccessful, as they have next day delivery option and we needed vegetables - on an urgent basis.

But the best experience so far was with www.gopeppers.com - a site which sells groceries. The interface is very intuitive. I sat with a list given to me and opened up two sites offering groceries - www.mygrahak.com & www.gopeppers.com placing orders side by side. On mygrahak, firstly the site was terribly slow, and searching for stuff was so difficult. But gopeppers is much better. It took me less than 10 minutes to get the complete list of around 50+ items in my cart.

When I enquired on the chat option about payment and delivery - i came to know that the stuff can be delivered in 1 hour - flat. I was like "what!!! Are you sure??" And the response was "Yes". So, I placed the order and selected COD (Card on delivery) option - ever heard of that... Sat back and started watching a movie - thinking 1 hour is impossible - it will take atleast 2 hours. In 30 minutes, the guy was at my place with all the stuff. Unpacked it - made me tick the list he had brought with him. And then he told me that I was his "first" customer. I was both shocked and surprised. Shocked cause, I could have been duped - If i had paid online and surprised at the efficiency with which he made his first order. I will look forward to ordering my next month groceries through the same site.

Wednesday, June 27, 2012

SQL Joins

Reference pic for SQL joins

Courtesy : someone who posted it on facebook..

Tuesday, May 29, 2012

Petrol car versus diesel car (part 2)

For the earlier part please refer to http://jayant7k.blogspot.in/2006/10/petrol-engine-versus-diesel-engine.html

In this blog i am trying to create a calculation sheet which can be used to figure out - if you go for a diesel car, when would you be able to recover the cost of purchasing the car - as to spending in petrol.

Government in india loves petrol. It is used generally by the middle class with small cars and 2 wheelers and who rarely vote. People who fall below the middle class mostly travel using public transport. And people who are above the middle class generally have long cars powered by diesel engines. The government finds it difficult to increase the price of diesel - due to the ripple effect it will cause in the price of food commodities and the overall inflation. So raising the price of petrol seems the only viable option - to cater to the loss due to subside on diesel and cooking gas - without effecting the government's vote bank.

And ofcourse the amount of tax the government is able to make out of fuel is really good. How the tax amount is used - is a blind spot for the residents of india. We do not see any improvement in roads - jams are always there. It has been more than 60 years since independence and we still are struggling in providing the basic necessities like electricity and water to the general public. It seems valid that there should be a fine on the "government" for unable to accomplish what it has been promising for ages - food, clothes, electricity, water and home.

But lets focus on diesel cars. The new diesel cars from tata and fiat have really long maintenance schedules. Earlier older diesel engines need quick maintenance. Most petrol engines need a service in every 5K kms - or at max 10K kms. On the other hand diesel engines in tata and fiat need maintenance every 15K kms or once a year. This brings down the cost of ownership of diesel engines to a lot less.

Maintenance set aside, a simple table can be used to figure out how long it will take to recover the "extra" cost of diesel.



A B C D

## Petrol Diesel Difference
1 Price per litre 71 41
2 average 14 12
3 rs per km 5.0714285714 (=B1/B2) 3.4166666667 (=C1/C2)
4 kms per day 100 100
5 car price 500000 600000 100000 (=C5-B5)
6 cost per day 507.1428571429 (=B3*B4) 341.6666666667 (=C3*C4) 165.4761904762 (=B6-C6)





7 Days to recover cost

604.3165467626 =(D5/D6)
8 Months

20.1438848921 = (D7/12)




As you see, with petrol and diesel at 71 and 41 respectively, and giving average of 14 and 12 respectively - the cost of owning a diesel car will be recovered in around 20 months (1.5 years), provided everyday run in 100 kms and the difference in price is 1 Lakh. Feel free to copy the info on an excel sheet and try altering the numbers to suit your needs and see when would you be able to recover the cost - if you intend to purchase a diesel car.

Tuesday, May 01, 2012

Deployments

Deployments are a critical phase of any project. In the "stone age", developers used to simply scp the files required into production. And there used to be issues when you are dealing with multiple http servers. Keeping all the servers in sync was always the issue.

Then capistrano came into picture. It made deployment of ruby/rails apps very easy. Me and a few other people went ahead and modified it to deploy code into production for php apps as well. But it was a tricky job since capistrano was originally designed to work for ruby/rails apps. Here is a sample capistrano code that sucks out code from svn and pushes it to multiple web servers - and then runs some specific post deployment tasks on them

deploy.rb

set :application, "app"
set :repository,  "http:///tags/TAG102"
set :imgpath, "/var/images"

# svn settings
set :deploy_via, :copy
set :scm, :subversion
set :scm_username, "svn_username"
set :scm_password, "svn_password"
set :scm_checkout, "export"
set :copy_cache, true
set :copy_exclude, [".svn", "**/.svn"]

# ssh settings
set :user, "server_username"
set :use_sudo, true
default_run_options[:pty] = true

#deployment settings
set :current_dir, "html"
set :deploy_to, ""
set :site_root, "/var/www/#{current_dir}"
set :keep_releases, 3

#web servers
role :web, "192.168.1.1","192.168.1.2","192.168.1.3"

#the actual script
namespace :deploy do
    desc <<-DESC
  deploy the app
    DESC
    task :update do
      transaction do
        update_code
            symlink
        end
      end

    task :finalize_update do
      transaction do
        sudo "chown -R apache.apache #{release_path}"
        sudo "ln -nfs #{imgpath}/images #{release_path}/images"     
      end
    end

    task :symlink do
      transaction do
            puts "Symlinking #{current_path} to #{site_root}."
            sudo "ln -nfs #{release_path} #{site_root}"
      end
    end


    task :migrate do
      #do nothing
    end

    task :restart do
      #do nothing
    end   
end

This sucks out the code from the svn repository. creates a tar on local. Scps it to production web servers. Untars it to the specified location. Runs all the tasks specified in finalize_update and finally changes the symlink of "html" directory to the new deployed path. The good point about capistrano is that you are almost blind as to what happens in the backend. The bad point is that since you are blind, you do not know how to do what you want to do. It would need a bit of digging and a bit of tweaking to get your requirements fulfilled by this script.

Now lets check fabric.

Installation is quite easy using pip.

sudo pip install fabric

In case you like the old fashioned way, you can go ahead and download the source code and do a

sudo python setup.py install

To create a fabric script, you need to create a simple fab file with whatever you require. For example, if you need to run a simple command like 'uname -a' on all your servers, just create a simple script fabfile.py with the following code

from fabric.api import run

def host_type():
        run('uname -a')

And run the script using the following command

$ fab -f fabfile.py -H localhost,127.0.0.1 host_type

[localhost] Executing task 'host_type'
[localhost] run: uname -a
[localhost] Login password:
[localhost] out: Linux gamegeek 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[127.0.0.1] Executing task 'host_type'
[127.0.0.1] run: uname -a
[127.0.0.1] out: Linux gamegeek 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Done.
Disconnecting from localhost... done.
Disconnecting from 127.0.0.1... done.

A simple fabric script which can do whatever the earlier capistrano script was doing is here.

fabfile.py

from __future__ import with_statement
from fabric.api import *
from fabric.operations import local,put

def production():
  env.user = 'server_username'
  env.hosts = ['192.168.1.1','192.168.1.2','192.168.1.3']
  env.deploy_to = ''
  env.site_root = '/var/www/html'
  env.tag_name = 'tag101'
  env.repository = {  'url':'http:///tags/TAG101', \
            'username': 'svn_username', \
            'password': 'svn_password', \
            'command': 'svn export --force', \
          }
  env.image_path = '/var/images'

def deploy():
  checkout()
  pack()
  unpack()
  symlinks()
  makelive()

def checkout():
  local('%s --username %s --password %s --no-auth-cache %s /tmp/%s' % \
    (env.repository['command'], env.repository['username'], env.repository['password'], env.repository['url'], env.tag_name));

def pack():
  local('tar -czf /tmp/%s.tar.gz /tmp/%s' % (env.tag_name, env.tag_name))

def unpack():
  put('/tmp/%s.tar.gz' % (env.tag_name), '/tmp/')
  with cd('%s' % (env.deploy_to)):
    run('tar -xzf /tmp/%s.tar.gz' % (env.tag_name))

def symlinks():
  run('ln -nfs %s/images %s/%s/images' % (env.image_path, env.deploy_to, env.tag_name))

def makelive():
  run('ln -nfs %s/%s %s' % (env.deploy_to, env.tag_name, env.site_root))


The good point is that i have more control on what i want to do using fabric as compared to capistrano. And it took me a lot less time to cook the fabric recipe as compared to capistrano.

To run this script simply do

fab production deploy

This will execute the tasks production and deploy in that order. You can have separate settings for staging and local in the same script. You can even go ahead and create your own deployment infrastructure and process to do whatever you want without running into any restrictions.

cfengine - a beginners guide A tool to automate infra...

Thursday, April 12, 2012

Mysql HA solutions

Lets see what HA solutions can be designed in mysql and where are they suited.

1. Single master - single slave.

M(RW)
|
S(R)

A simple master slave solution can be used for a small site - where all the inserts go into the master and some (non-critical) requests are served from the slave. In case if the master crashes, the slave can be simply promoted as the master - once it has replicated the "available" logs from the master. You will need to create another slave of the now "new" master to make your mysql highly available again. In case the slave crashes, you will have to switch your read queries on master and create a new slave.

As mentioned earlier this is for a very "small" site. There are multiple scenarios where single master - single slave solution is not suitable. You will not be able to perform read scalability or run heavy queries to generate reports without affecting your site performance. Also for creating a new slave after failure, you will need to lock and take backup from the available mysql server. This will affect your site.


2. Single master - multiple slave.

          M1(RW)
            |
      -------------------------------
      |                |                         |
    S1(R)       S2(R)              Sn(R)

A single master multiple slave scenario is the most suitable architecture for many web sites. It provides read scalability across multiple slaves. Creation of new slaves are much easier. You can easily allocate a slave for backups and another for running heavy reports without affecting the site performance. You can create new slaves to scale reads as and when needed. But all inserts go into the only master. This architecture is not suitable for write scalability.

When any of the slave crashes, you can simply remove that slave, create another slave and put it back into the system. In case the master fails, you will need to wait for the slaves to be in sync with the master - all replication binary logs have been executed and then make one of them as the master. Other slaves then become the slave of the new master. You will need to be very careful in defining the exact position from where the new slaves start replication. Else you will end up with lots of duplicate records and may lose data sanity on some of the slaves.


3. Single master - standby master - multiple slaves.

         M1(RW) ------ M(R)
           |
      --------------------------------
      |                   |                       |
    S1(R)       S2(R)               Sn(R)

This architecture is very much similar to the previous single master - multiple slave. The standby master is identified and kept for failover. The benefit of this architecture is that the standby master can be of the same configuration as the original master. This architecture is suitable for medium to high traffic websites where master is of a much higher configuration than the slaves - maybe having RAID 1+0 or SSD drives. The standby master is kept close to the original master so that there is hardly any lag between the two. Standby master can be used for reads also, but care should be taken that there is not much lag between the master and the standby - so that in case of failure, switching can be done with minimum downtime.

When the master fails, you need to wait for the slaves to catch up with the old master and the simply switch them and the app to the standby master.


4. Single master - candidate master - multiple slaves.

         M1(RW) -------- M(R)
                                        |
              -----------------------------------
              |                   |                           |
            S1(R)         S2(R)                  Sn(R)

    This is an architecture very similar to the earlier one. The only difference being that all slaves are replicating from the candidate master instead of the original master. The benefit of this is that in case the master goes down, there is no switching required in the slaves. The old master can be removed from the system and the new master will automatically take over. Afterwards, in order to get the architecure back in place a new candidate master needs to be identified and the slaves can be moved one by one to the new master. The downtime here is minimal. The catch here is that there would be a definite lag between the master and the slaves, since replication on slaves happen through the candidate. This lag can be quite annoying in some cases. Also if the standby fails, all slaves will stop replication and will need to be moves to either the old master or a new standby server needs to be identified and all slaves be pointed to it.


5. Single master - multiple slaves - candidate master - multiple slaves

       M1(RW) ----------------------- M(R)
           |                                               |
   ---------------                      ---------------------  
   |                  |                       |                           |
S1(R)      S1n(R)            S2(R)                  S2n(R)


  This architecture is again similar to the earlier one with the fact that there is a complete failover setup for the current master. If either the master of the candidate master goes down, there are still slaves which are replicating and can be used. This is suitable for a high traffic website which require read scalability. The only drawback of this architecture is that writes cannot be scaled.


5. Multiple master - multiple slaves

        M1(RW) ----------------------- M2(RW)
          |                                                 |
  ----------------                       ----------------
  |                    |                       |                      |
S1(R)         S2(R)               S3(R)             S4(R)

This is "the" solution for high traffic websites. It provides read and write scalability as well as high availability. M1 and M2 are two masters in circular replication - both replicate each other. All slaves either point to M1 or M2. In case if one of the masters go down, it can be removed from the system, a new master can be created and put back in the system without affecting the site. If you are worried about performance issues when a master goes down and all queries are redirected to another master, you can have even 3 or more Masters in circular replication.

It is necessary to decide beforehand how many masters you would like to have in circular replication because adding more masters - though possible, is not easy. Having 2 masters does not mean that you will be able to do 2X writes. Writes also happen due to replication on the masters, so it depends entirely on the system resources how many writes can the complete system handle. Your application has to handle unique key generation in a fashion that does not result in duplication between the masters. Your application also needs to handle scenarios where the lag between M1 and M2 becomes extensite or annoying. But with proper thought to this architecture, it could be scaled and managed very well.

Tuesday, April 10, 2012

introducing varnish-cache

Varnish is a web accelerator. It is used as a reverse proxy in front of the actual web server. You must have used either nginx or lighttpd as a reverse proxy in front of you apache( or any other web server ). Why Varnish? Varnish claims that it is very fast - really really fast. The plus point that i can see over here is that varnish has been designed from root up as a reverse proxy. Where as nginx and lighttpd can also be used as a web server.

Lets compile varnish and try setting it up.

Get varnish from https://www.varnish-cache.org/releases. I got the source code of 3.0.2. To compile simply run.

./configure
make
sudo make install

If you went ahead and installed varnish at the default prefix /usr/local, you will be able to find the varnish configuration file at

/usr/local/etc/varnish/default.vcl

The very basic configuration required for starting varnish is the setting of the backend servers. Open up the default.vcl file and put

backend default {
     .host = "";
     .port = "";
}

To start varnish simply run

varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,2G -T 127.0.0.1:2000 -a 0.0.0.0:8080
This states that varnish
-f should use the default.vcl configuration file.
-s has been allocated memory of 2GB.
-T The administration interface is running on localhost at port 2000
-a varnish should listen to port 8080. You will need to change it to 80 when you want to make it live.

Ideally varnish does not cache any page which has a cookie header attached to it. If you have a dynamic web site and are using cookies heavily, you will find that your varnish hit ratio is too low. You can check the hit ratio on varnish using the varnishstat command.

There are few ways to get around it.

1. cache the content along with the cookie in the hash key. This results in a per user cache and there can be hit ratio but it is low.
sub vcl_hash {
    set req.hash += req.http.cookie;
}

2. Remove setcookie from the backend for a particular path. Can be used for static content
sub vcl_recv {
    if (req.url ~ "^/images") {
        unset req.http.cookie;
    }
}

sub vcl_fetch {
    if (req.url ~ "^/images") {
        unset beresp.http.set-cookie;
    }
}

3. Throw away the cookie header for certain file extensions. Mostly js/css and images.

sub vcl_recv {
 if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
    lookup;
 }
}

# strip the cookie before the image is inserted into cache.
sub vcl_fetch {
 if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
   unset beresp.http.set-cookie;
}





Varnish can also be used as a load balancer with multiple backends. Lets see the configuration.

First create multiple backends in the config file

backend web1 {
     .host = "192.168.1.1";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

backend web2 {
     .host = "192.168.1.2";
     .port = "81";
     .probe = {
        .url = "/";
        .interval = 5s;
        .timeout = 1 s;
        .window = 5;
        .threshold = 3;
     }
}

For each backend there is a health check probe. Varnish should fetch the "/" every 5 sec. If it takes more than 1 sec, it is considered a failure.
If more than 3 out of last 5 probes are ok, the backend is considered healthy.

Now create a director. There are a number of directors - random, client, hash, round-robin, DNS and fallback. Lets configure a random director and we will see what can be done using the different directors. To configure random director

director web random {
        {
                .backend = web1;
                .weight = 1;
        }
        {
                .backend = web2;
                .weight = 1;
        }
}

Now tell your requests to use the "web" director for serving requests

sub vcl_recv {
   if (req.http.host ~ "^(www.)?mysite.com$") {
       set req.backend = web;
   }
}

Lets see what the different directors are there for.

The client director
       The client director picks a backend based on the clients identity. You can set the VCL variable client.identity to identify the client by picking up the value of a  session  cookie.

The hash director
       The hash director will pick a backend based on the URL hash value. This is useful is you are using Varnish to load balance in front of other Varnish caches or other web accelerators as objects won't be duplicated across caches.      

The round-robin director
       The round-robin director does not take any options. It will use the first backend for the first request, the second backend for the second request and so on, and start from the top again when it gets to the end. If a backend is unhealthy or Varnish fails to connect, it will be skipped.  The round-robin director will try all the backends once before giving up.

The DNS director
       The DNS director can use backends in two different ways. Either like the random or round-robin director or using .list:

       director directorname dns {
               .list = {
                       .host_header = "www.example.com";
                       .port = "80";
                       .connect_timeout = 0.4s;
                       "192.168.15.0"/24;
                       "192.168.16.128"/25;
               }
               .ttl = 5m;
               .suffix = "internal.example.net";
       }


       This will specify 384 backends, all using port 80 and a connection timeout of 0.4s. Options must come before the list of IPs in the .list statement. The .ttl defines the cache duration of the DNS lookups. Health checks are not thoroughly supported. DNS round robin balancing is supported. If a hostname resolves to multiple backends, the director will divide the traffic between all of them in a round-robin manner.

The fallback director
     The fallback director will pick the first backend that is healthy. It considers them in the order in which they are listed in its definition. The fallback director does not take any options.

       An example of a fallback director:

       director b3 fallback {
         { .backend = www1; }
         { .backend = www2; } // will only be used if www1 is unhealthy.
         { .backend = www3; } // will only be used if both www1 and www2
                              // are unhealthy.
       }

There is a huge list of configurations that can be done in varnish. You can check the list here and see which suits your needs.

https://www.varnish-cache.org/trac/wiki/VCLExamples

Monday, February 20, 2012

scaling using mongodb : Map-Reduce on sharded collection (part 3)

The idea here is to create a sharded database and then run map-reduce on it to process the data. This is a very basic example that I am trying to emulate. I created a sharded collection "posts" with the following structure. The idea is to find the count of tags throughout the complete sharded collection. I am using two machines named "241" and "243" as shards for mongodb. The mongos service is running on a thrid machine "249".

Input collection structure :

mongos> db.posts.find()
{ "_id" : ObjectId("4f4221149a6777895a000000"), "id" : "0", "content" : "data for content 0", "tags" : [ "tag1", "tag2", "tag3", "tag4" ] }
{ "_id" : ObjectId("4f4221149a6777895a000001"), "id" : "1", "content" : "data for content 1", "tags" : [ "tag2", "tag4", "tag5", "tag7", "tag9" ] }

Output collection structure :

mongos> db.tags.find()
{ "_id" : "tag1", "value" : 14705 }
{ "_id" : "tag3", "value" : 14418 }

Lets see the step by step process for creation, population of test data and running of map-reduce.

Lets create the posts collection by putting in a few records. If you print the collection stats, you will see that it is not sharded.

mongos> db.printCollectionStats()
---
posts
{
        "sharded" : false,
        "primary" : "shard241",
        "ns" : "test.posts",
        "count" : 2,
        "size" : 256,
        "avgObjSize" : 128,
        "storageSize" : 8192,
        "numExtents" : 1,
        "nindexes" : 1,
        "lastExtentSize" : 8192,
        "paddingFactor" : 1,
        "flags" : 1,
        "totalIndexSize" : 8176,
        "indexSizes" : {
                "_id_" : 8176
        },
        "ok" : 1
}
---

To shard the collection, you will need to first create index on "id". And then shard the collection using "id" as the key.

mongos> db.posts.ensureIndex({id:1})
mongos> db.posts.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.posts",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1
                },
                "ns" : "test.posts",
                "name" : "id_1"
        }
]

mongos> use admin
switched to db admin
mongos> db.runCommand( { shardcollection : "test.posts" , key : { id : 1 } } )
{ "collectionsharded" : "test.posts", "ok" : 1 }





The collection "posts" is now sharded. Lets populate some test data into the collection. Here is the php script that I used to populate data into the collection.

$m = new Mongo( "mongodb://192.168.1.249:10003", array("persist" => "x") );
$db = $m->test;
$table = $db->posts;
$start = 0;
$end = 200000;
for($i=$start; $i<$end; $i++)
{
        $tags = getTag();
        $obj = array("id"=>"$i", "content"=>"data for content $i", "tags"=>$tags);
        $table->insert($obj);
        echo "$i:".implode(',',$tags);
}
$found = $table->count();
echo "Found : $found\n";

function getTag()
{
        $tagArray = array('tag1','tag2','tag3','tag4','tag5','tag6','tag7','tag8','tag9','tag10','tag11','tag12','tag13','tag14','tag15','tag16','tag17','tag18','tag19','tag20','tag21','tag22','tag23','tag24','tag25','tag26','tag27','tag28','tag29','tag30','tag31','tag32','tag33','tag34','tag35','tag36','tag37','tag38','tag39','tag40','tag41','tag43');

        $tags = array();
        $tagcount = rand(2,5);

        $count = sizeof($tagArray);
        for($x=0; $x<$tagcount; $x++)
        {
                $tid = rand(0,$count);

                $tags[] = $tagArray[$tid];
        }
        return $tags;
}
?>



I pushed in 200,000 records into the collection. Here is how the data was sharded between "241" and "243";

mongos> db.printCollectionStats()
---
posts
{
        "sharded" : true,
        "flags" : 1,
        "ns" : "test.posts",
        "count" : 200000,
        "numExtents" : 10,
        "size" : 24430872,
        "storageSize" : 32743424,
        "totalIndexSize" : 15534400,
        "indexSizes" : {
                "_id_" : 6508096,
                "id_1" : 9026304
        },
        "avgObjSize" : 122.15436,
        "nindexes" : 2,
        "nchunks" : 4,
        "shards" : {
                "shard241" : {
                        "ns" : "test.posts",
                        "count" : 109889,
                        "size" : 13423484,
                        "avgObjSize" : 122.15493598947415,
                        "storageSize" : 17978183,
                        "numExtents" : 8,
                        "nindexes" : 2,
                        "lastExtentSize" : 12083200,
                        "paddingFactor" : 1,
                        "flags" : 1,
                        "totalIndexSize" : 8531049,
                        "indexSizes" : {
                                "_id_" : 3573332,
                                "id_1" : 4957718
                        },
                        "ok" : 1
                },
                "shard243" : {
                        "ns" : "test.posts",
                        "count" : 90111,
                        "size" : 10913985,
                        "avgObjSize" : 121.11711711711712,
                        "storageSize" : 33251771,
                        "numExtents" : 8,
                        "nindexes" : 2,
                        "lastExtentSize" : 12083200,
                        "paddingFactor" : 1,
                        "flags" : 1,
                        "totalIndexSize" : 13274730,
                        "indexSizes" : {
                                "_id_" : 6617370,
                                "id_1" : 6657360
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}
---




Now we will create the map and reduce functions. The map function will check for the tags array for each record in the posts collection. For each element of the tag array, it will emit the tag and a count of 1. Next we create a reduce function which counts the occurrances of each tag and returns the final count. The map function calls the emit(key, value) any number of times to feed data to the reducer. The reduce function will receive an array of emitted values from the map function and reduce them to a single value. The structure of the object returned by the reduce function must be identical to the structure of the map function's emitted value.

mongos> map = function() {
... if(!this.tags) {
... return;
... }
... for ( index in this.tags) {
... emit(this.tags[index],1);
... }
... }
function () {
    if (!this.tags) {
        return;
    }
    for (index in this.tags) {
        emit(this.tags[index], 1);
    }
}
mongos> reduce = function(key, values) {
... var count = 0;
... for(index in values) {
... count += values[index];
... }
... return count;
... }
function (key, values) {
    var count = 0;
    for (index in values) {
        count += values[index];
    }
    return count;
}
To understand how it works, lets say that after some iterations, map emitts the following value { "tag1" , 1 }. Suppose at that point "tag1" has a count of 50. That is the document can be represented as:

{ "tag1", 50 }

It map again emits { "tag1", 1 }, reduce will be called as follows :

reduce( "tag1", [50,1] )

The result will be a simple combination of counts for tag1

{ "tag1", 51 }

To invoke map-reduce run the following commands. The command states that mapreduce is run on "posts" collection. Map function is "map" and reduce function is "reduce". Output is redirected to a collection named "tags".

mongos> result =  db.runCommand( {
... "mapreduce" : "posts",
... "map" : map, //name of map function
... "reduce" : reduce,  //name of reduce function
... "out" : "tags" } )
{
        "result" : "tags",
        "shardCounts" : {
                "192.168.1.241:10000" : {
                        "input" : 109889,
                        "emit" : 499098,
                        "reduce" : 6400,
                        "output" : 43
                },
                "192.168.1.243:10000" : {
                        "input" : 90111,
                        "emit" : 200395,
                        "reduce" : 3094,
                        "output" : 43
                }
        },
        "counts" : {
                "emit" : NumberLong(699493),
                "input" : NumberLong(200000),
                "output" : NumberLong(43),
                "reduce" : NumberLong(9494)
        },
        "ok" : 1,
        "timeMillis" : 9199,
        "timing" : {
                "shards" : 9171,
                "final" : 28
        }
}

See how the output documents. The output has only "no of tags" documents - in our case 43.

mongos> db.tags.find()
{ "_id" : "tag1", "value" : 14643 }
{ "_id" : "tag2", "value" : 14705 }
{ "_id" : "tag3", "value" : 14418 }
{ "_id" : "tag4", "value" : 14577 }
{ "_id" : "tag5", "value" : 14642 }
{ "_id" : "tag6", "value" : 14505 }
{ "_id" : "tag7", "value" : 14623 }
{ "_id" : "tag8", "value" : 14529 }
{ "_id" : "tag9", "value" : 14767 }
{ "_id" : "tag10", "value" : 14489 }
has more
 

mongos> db.tags.count()
43


References

http://cookbook.mongodb.org/patterns/count_tags/
http://www.mongodb.org/display/DOCS/MapReduce/

Thursday, February 16, 2012

Scaling using mongodb : HA and Automatic sharding (part 2)

We discussed creation of HA using replica sets in the earlier post part 1. In this post we will create a sharded cluster of replica sets consisting of multiple mongodb servers. If you would have noticed the parameters passed during starting mongodb on the earlier server 241 & 242, you could see that in addition to the db path, log path and replica set parameters, I have also added the parameter '--shardsvr'. This enables sharding on this instance of mongodb.

For creating a cluster of database servers using mongodb you need data nodes, config servers and router servers. Data nodes are used to store data. The config servers are used to store sharding information. It is used for the client to figure out where the data resides on the data nodes. The router servers are used by the client to communicate with the config servers and data nodes. The clients interface to the database through the router servers.

On each of the config servers, start mongodb with the parameter '--configsvr'.

230:/home/mongodb } ./bin/mongod --configsvr --logappend --logpath ./data/log --pidfilepath ./data/mongodb.pid --fork --dbpath ./data/config --directoryperdb --bind_ip 192.168.1.230 --port 10002
231:/home/mongodb } ./bin/mongod --configsvr --logappend --logpath ./data/log --pidfilepath ./data/mongodb.pid --fork --dbpath ./data/config --directoryperdb --bind_ip 192.168.1.231 --port 10002
And start routing service on the routing servers.

233:/home/mongodb } ./bin/mongos --port 10003 --configdb 192.168.1.230:10002,192.168.1.231:10002 --logappend --logpath ./data/route_log --fork --bind_ip 192.168.1.233 &
The same service can be started on routing server 234. To shard a database and add nodes to the shard, you need to connect to the routing server.

233:/home/mongodb } ./bin/mongo 192.168.1.233:10003
233:/home/mongodb } mongos> use admin
switched to db admin

To add the first replica set (set_a) to the cluster, run the following command.

233:/home/mongodb } mongos> db.runCommand( { addshard : "set_a/192.168.1.241:10000,192.168.1.242:10000,192.168.1.243:10000", name : "shard1" } )
{ "shardAdded" : "shard1", "ok" : 1 }

Similarly replica sets of the other 2 shards can also be added to the mongodb cluster. Eventually you can fire a listShards command to see the shards added to the cluster.

233:/home/mongodb } mongos > db.runCommand( {listshards:1} )
{
        "shards" : [
                {
                        "_id" : "shard1",
                        "host" : "192.168.1.241:10000,192.168.1.242:10000,192.168.1.243:10000"
                },
                {
                        "_id" : "shard2",
                        "host" : "192.168.1.244:10000,192.168.1.245:10000,192.168.1.246:10000"
                },
                {
                        "_id" : "shard3",
                        "host" : "192.168.1.247:10000,192.168.1.248:10000,192.168.1.249:10000"
                }           
        ],                                                                                                                                                        
        "ok" : 1                    
}              

To enable sharding on a database say "test" run the following commands.

233:/home/mongodb } mongos > db.runCommand( { enablesharding : "test" } )
{ "ok" : 1 }

To shard on a particular key. The test collection consists of the following fields id, name and email. Lets shard on id.

233:/home/mongodb } > db.runCommand( { shardcollection : "test.test", key : { id : 1 } } )
{ "collectionsharded" : "test.test", "ok" : 1 }

I used a php script to push more than 20,000 records to the sharded cluster. You can check the status of sharded cluster using the following commands.

233:/home/mongodb } mongos > db.printShardingStatus()
--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
        {  "_id" : "shard1",  "host" : "192.168.1.241:10000,192.168.1.242:10000,192.168.1.243:10000" }
        {  "_id" : "shard2",  "host" : "192.168.1.244:10000,192.168.1.245:10000,192.168.1.246:10000" }
        {  "_id" : "shard3",  "host" : "192.168.1.247:10000,192.168.1.248:10000,192.168.1.249:10000" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : true,  "primary" : "shard1" }
                test.test chunks:
                                shard1        2
                                shard2        1
                        { "id" : { $minKey : 1 } } -->> { "id" : "1" } on : shard1 { "t" : 2000, "i" : 1 }
                        { "id" : "1" } -->> { "id" : "999" } on : shard2 { "t" : 1000, "i" : 3 }
                        { "id" : "999" } -->> { "id" : { $maxKey : 1 } } on : shard2 { "t" : 2000, "i" : 0 }

233:/home/mongodb } mongos > db.printCollectionStats()
---
test
{
        "sharded" : true,
        "flags" : 1,
        "ns" : "test.test",
        "count" : 19998,
        "numExtents" : 6,
        "size" : 1719236,
        "storageSize" : 2801664,
        "totalIndexSize" : 1537088,
        "indexSizes" : {
                "_id_" : 662256,
                "id_1" : 874832
        },
        "avgObjSize" : 85.97039703970397,
        "nindexes" : 2,
        "nchunks" : 3,
        "shards" : {
                "shard1" : {
                        "ns" : "test.test",
                        "count" : 19987,
                        "size" : 1718312,
                        "avgObjSize" : 85.97148146295092,
                        "storageSize" : 2793472,
                        "numExtents" : 5,
                        "nindexes" : 2,
                        "lastExtentSize" : 2097152,
                        "paddingFactor" : 1,
                        "flags" : 1,
                        "totalIndexSize" : 1520736,
                        "indexSizes" : {
                                "_id_" : 654080,
                                "id_1" : 866656
                        },
                        "ok" : 1
                },
                "shard2" : {
                        "ns" : "test.test",
                        "count" : 11,
                        "size" : 924,
                        "avgObjSize" : 84,
                        "storageSize" : 8192,
                        "numExtents" : 1,
                        "nindexes" : 2,
                        "lastExtentSize" : 8192,
                        "paddingFactor" : 1,
                        "flags" : 1,
                        "totalIndexSize" : 16352,
                        "indexSizes" : {
                                "_id_" : 8176,
                                "id_1" : 8176
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}
---

To use multiple routing servers in a database connection string - for connecting to mongodb through a programming language for example php the following syntax can be used

mongodb://[username:password@]host1[:port1][,host2[:port2:],...]/db_name

This will connect to atleast one host from the list of hosts provided. If none of the hosts are available, an exception will be thrown.

Wednesday, February 15, 2012

Scaling using mongodb : HA and Automatic sharding (part 1)

Mongodb introduces the concept of automatic sharding and replica sets. Lets see how both can be used to create a highly available cluster of database which is horizontally scalable.

First lets take a look at the HA capability of mongodb. HA is handled in mongodb by something known as Replica Sets. Replica sets are basically two or more mongodb nodes which are copies/replicas of each other. A replica set automatically selects a primary master and provides automated failover and disaster recovery.

Lets create a replica set of 2 nodes first and proceed from there.

I have taken 2 machines 192.168.1.241 and 192.168.1.242 for creating replica sets. Lets download the tar of mongodb 2.0.2 and put it in 241 and 242. We will create separate directories for database and replica db. Here is how we go about it.

241:/home/mongodb } mkdir -p data/db
241:/home/mongodb } mkdir -p repl/db
242:/home/mongodb } mkdir -p data/db
242:/home/mongodb } mkdir -p repl/db

Create primary mongodb replica set server on 241

241:/home/mongodb } ./bin/mongod --shardsvr --replSet set_a --logappend --logpath ./data/log --pidfilepath ./data/mongodb.pid --fork --dbpath ./data/db --directoryperdb --bind_ip 192.168.1.241 --port 10000 &
Create secondary replica set on 242 running on port 10001

242:/home/mongodb } ./bin/mongod --shardsvr --replSet set_a --logappend --logpath ./data/log --pidfilepath ./data/mongodb.pid --fork --dbpath ./data/db --directoryperdb --bind_ip 192.168.1.241 --port 10001 &
Check whether the replica set is working fine.

241:/home/mongodb } ./bin/mongo 192.168.1.241:10000
241: } > rs.status()
{
        "startupStatus" : 3,
        "info" : "run rs.initiate(...) if not yet done for the set",
        "errmsg" : "can't get local.system.replset config from self or any seed (EMPTYCONFIG)",
        "ok" : 0
}
241: } > rs.initiate()
{
        "info2" : "no configuration explicitly specified -- making one",
        "me" : "192.168.1.243:10000",
        "info" : "Config now saved locally.  Should come online in about a minute.",
        "ok" : 1
}
241: } > rs.isMaster()
db.isMaster()
{
        "setName" : "set_a",
        "ismaster" : true,
        "secondary" : false,
        "hosts" : [
                "192.168.1.241:10000"
        ],
        "primary" : "192.168.1.241:10000",
        "me" : "192.168.1.241:10000",
        "maxBsonObjectSize" : 16777216,
        "ok" : 1
}

We can see that 241 is master, but 242 is not added to the replica set. Lets add 242 as a secondary server of the replica set.

241: } >  rs.add("192.168.1.242:10001")
{ "ok" : 1 }

241: } > rs.conf()
{
        "_id" : "set_a",
        "version" : 2,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "192.168.1.241:10000"
                },
                {
                        "_id" : 1,
                        "host" : "192.168.1.242:10001"
                }
        ]
}
241: } > rs.status()
{
        "set" : "set_a",
        "date" : ISODate("2012-02-15T09:42:07Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.1.241:10000",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "optime" : {
                                "t" : 1329298881000,
                                "i" : 1
                        },
                        "optimeDate" : ISODate("2012-02-15T09:41:21Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.1.242:10001",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 46,
                        "optime" : {
                                "t" : 0,
                                "i" : 0
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2012-02-15T09:42:07Z"),
                        "pingMs" : 5339686
                }
        ],
        "ok" : 1
}
242 is in recovery mode - replicating the data that is already there on 241. Once the data is completely replicated, the second node starts acting as a secondary node.

241: } > rs.status()
{
        "set" : "set_a",
        "date" : ISODate("2012-02-15T10:33:32Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.1.241:10000",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "optime" : {
                                "t" : 1329298881000,
                                "i" : 1
                        },
                        "optimeDate" : ISODate("2012-02-15T09:41:21Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.1.242:10001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 3131,
                        "optime" : {
                                "t" : 1329298881000,
                                "i" : 1
                        },
                        "optimeDate" : ISODate("2012-02-15T09:41:21Z"),
                        "lastHeartbeat" : ISODate("2012-02-15T10:33:31Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}

Now the replica set is working fine and the secondary node will act as primary if the primary goes out of service. Ideally there should be 3 or more nodes added to any replica set.