Saturday, April 28, 2007


I would say that the root cause of all misunderstandings is expectations. It is human nature to expect more than what he is getting now. Either you are not satisfied with your job profile or your salary or your organizations work culture. You expect them to get better. And once they get better, you expect them to get even better. There is no end to the amount of expectations.

It is same in relationships, you expect that your close ones grow up in life. That your wife takes care of you. And your wife would expect you to take care of her, of her emotions, of her will. You expect that your wife or your hubby has great personality in front of others. That others say that you two are a nice couple. You expect that you both listen to each other and respect each other. Etc.. Etc..

The expectations never end, and so neither does the issues or misunderstandings that come out of it.

And the best and simplest way to get out of this is "not to expect". Wow, you would say - how is that possible. How can you not expect. How can you let go of your emotions. How can you say that "let things be as it is".

It is the biggest thing to ask. But that is what the monks do. They just let go of their emotions. Lead a life where they dont care for anything. They meditate. Live with the bare minimum. And they are happy most of the times.

I dont know whether they believe in GOD or not. I dont. How can you believe something that you have not seen, not felt, not heard. Do you believe your friend if he says something odd. I wont. I will have to look at it to believe it. Oh, what would you say if i tell you that i met GOD yesterday. Would you believe me? No way... I think it is the same thing. People say that there is GOD, but where??

And then there is love. I think, i have written about it before. A feeling that cannot be defined. How can you expect something from someone you love. That is not love. That is possessiveness. You dont have expectations in love. All you intend is to give to the one you love. You take care of him/her. You forget and forgive all his/her mistakes. You want that person to be healthy. In all if you love someone, you would expect that person to be happy. And would do anything to make him/her happy. That i think is love.

So expectations dont come in between GOD(whom i dont believe in) and love. But in this materialistic world - you have to eat, you have to drink, and you have to live. And you have some materialistic/physical and emotional needs. And this is where expectations come in picture and they ruin everything.

Whenever you expect something from someone - you open a path to misunderstandings and issues.

Eventually everyone who is born has to die. Then why not make this miserable life happy for others and for self.

LiveJournal - system architecture

Lets discuss the system architecture of Live Journal.

Live Journal or LJ for short kicked off as a hobby project in April 1999 and was built on open source completely. It reached 2.8 Million accounts in April 2004 and 6.8 Million accounts in April 2005. Currently It has more than 10 Million accounts. Caters to several thousands of hits per second and lots of MySQL queries.

Here is a complex diagram which roughly outlines the architecture of LJ.

The technologies which are visible over here are -

Caching - Memcached
Mysql Clusters - HA & LB
Httpd load balancing - using perlbal
MogileFS - Distributed File System

Lets start off with mysql...

A single server with mysql wont be able to handle the large no of reads and writes. With increasing no of reads and writes, the server slows down. Next stage would be to have 2 servers in a master-slave architecture in which the master handles all inserts and the slaves are read-only. But then the queries have to be spread over in such a manner that replication lag between master and slave (though very small) is handled. As the number of database and web servers are increased - chaos increases. Site is fast for a while and then again slow - and there is need for more servers with higher configurations. Also, as the number of slaves increases, the number of writes to the slave also increases. So eventually you come to a situation where the number of writes is very large as compared to the number of reads. Resulting in large I/O and low CPU utilization.

The best way to handle such situation is to divide the database. How LJ did this was by creating user clusters. So each user was assigned a cluster number. And each cluster had multiple machines in a master-slave fashion. The first query would then find the cluster number for that user from the global database and subsequent queries for that user could then be redirected to the user cluster. Ofcourse few issues like uniqueness of userid, and moving user around clusters had to be tackled. Caching of mysql connections and using mysql query cache to cache query results added to the better performance of the site.

Again the problem was the single point of failure with the master databases. If any of the master database dies, the site would go down. To avoid this situation master-master cluster was created. In case of any problem - the other master would come into play and handle all active connections.

Which database engine to use - InnoDB or MyISAM. InnoDB allows concurrent reads and writes and so is comparatively fast. Whereas MyISAM has table level locks and so is not as fast as InnoDB.

And then there is MySQL cluster which is an in-memory engine. It requires about 2-4x of RAM for the dataset. So it is good for handling small data sets only.

An even better way of storing database is by using shared storage - SAN, SCSI, DRDB. You turn a pair of InnoDB machines to a cluster - looks like a single box from outside with floating IP address. Heartbeat to move IP, mount/unmount filesystem, start/stop mysql. DRDB can be used to sync one machine's block device with another. This requires dedicated gigabit cable between the two machines to handle the high amount of data transfer.


Memcache is used to cache records which has already been computed for frequent access. Memcache is an open source distributed caching system - instances of which can be run on any machine where-ever free memory is available. It also provides simple APIs for different languages like java, php, perl, python and ruby. And it is extremely fast.

LJ created 28 instances of memcache on 12 machines (not dedicated) and was able to cache 30 GB of data. This cache was getting a hit rate of 90-93%. Which reduced the number of queries to the database to a great extent. They started caching stuff which was very frequently accessed and aim at caching almost everything possible. With cache - there is an extra overhead of updating the cache.

http load balancing

After trying a large number of reverse proxies, LJ people were unable to find anything which satisfied their needs. So they built up their own reverse proxy - perlbal - a small, fast, manageable, HTTP web server which can do internal redirects.
It is single threaded, asynchronous and event based. Handles dead nodes. And works in multiple modes - static web server, reverse proxy and plug-ins.

Allows persistent connections and has no complex load balancing logic - uses whatever is free. Connects fast and has multiple queues - for free and paid users.

MogileFS - Distributed File System

Files belong to classes. It tracks what disks are files on. Keeps replicas on devices on different hosts. It has libraries available for most of the languages - php, perl, java, python.

clients, trackers, mysql database cluster and storage nodes - all were brought under MogileFS. It handles automatic file replication, deletion etc.

Have put in only major points and finer details can be found in the link below.


Tuesday, April 24, 2007

realtime fulltext index

For past some days, i have been struggling with getting something which can allow me to create, insert, update and search fulltext indexes - big fulltext indexes. Not very large but somewhere around 3-4 GB of data.

You would suggest mysql, but with mysql the inserts on table with fulltext indexes is very slow. For every insert, data is put in the index which leads to slow inserts.

What else? Well, i had come across some other fulltext engines like sphinx and senna. Compiled mysql with senna but there was no benefit. The index size was almost double and the searches were also slow - as slow as mysql fulltext index.

What about sphinx. Well, had integrated sphinx with mysql. But the sphinx engine has lots of limitations - like the first 2 columns must be integer and the 3rd column should be a text. Rest all columns should be integers. So if i use sphinx, i would need to write a stored procedure and trigger it to port data from my current table to a parallel sphinx table whenever a new row is inserted. And what would i do if i have to run a query - i would be joining both the tables. Would that be fast. Dont know. Let me figure out how to get this thing running...

You would say - what about lucene - my favorite search engine. Well dear, lucene is in java and there is no way i could integrate it with mysql if i have to do it in a jiffy. I would need to design and build a library to get this thing running. Forget it. In fact dbsight provides a similar type of library. Maybe i will get that thing working and check it out. At a small price, I might get what the organization requires and that too without wasting any resources on building it.

Hope i get some solution to the situation i am in right now. I have solutions for it, but not a solution which requires least amount of resources to get it running. Till i get one, the search will continue...

Friday, April 20, 2007

water bridge

Water Bridge in Germany .... What a feat!
Six years, 500 million euros, 918 meters this is engineering!
This is a channel-bridge over the River Elbe and joins the former East and West Germany, as part of the unification project. It is located in the city of Magdeburg, near Berlin. The photo was taken on the day of inauguration.