Tuesday, August 11, 2009

Document oriented data stores

A document oriented database or data store does not use tables for storing data. It stores each record as a document with certain characteristics. So a multiple documents in this type of store can have different characteristics - which means different number of fields per record and different fields per row. The benefit would be that if you are using a document oriented database for storing a large number of records in a huge database, any change in the number or type of row does not need an alter on the table. All you need to do is insert new documents with new structure and it is automatically inserted to the current datastore.

I went ahead and tried comparing some document-oriented data stores - TokyoTyrant, MongoDb and CouchDb and i compared them to Mysql-5.4 as well to get an idea about the performance advantage.

I created 3 scripts in all - 2 for insert and 1 for select. And created a 50,00,000 record table with a very normal record structure

field1 -> string, indexed
field2 -> string, indexed
date -> date, not indexed
n -> integer, indexed

The size of the original mysql table was 214 MB data & 205 MB index - total 419 MB for 50,00,000 records. The 3 scripts could be described as follows

  • insert script 1 : went from 1 to 25,00,000 pulling 1000 records in a single query from the source database and inserting the records one by one to the target data store.

  • insert script 2 : went from 50,00,000 to 24,99,999 pulling 1000 records in a single query from source database and inserting the records one by one to the target data store.

  • select script : picks records from the source database and makes 2 queries for both field1 & field2 and fires both the queries on the target data store.

Test Machine configuration:

CPU : Intel Xeon 1.6 GHz - Quad Core [4 MB L2 cache] 64 Bit
Memory : 8 GB RAM
OS : Centos 5.2 - Kernel 2.6.18 64 bit


Used mysql version 5.4
Load during execution : 3.5
Time taken by Insert script 1 : 19682.673708916 sec for 2501000 records
Time taken by Insert script 2 : 19608.902073145 sec for 2499000 records
Time taken by Select script : 20465.380266905 sec for 8099782 records
Table engine : MyISAM
Total database size in mysql 5.4 : 215+233 = 248 MB

Ofcourse i used MyISAM which resulted in table locks, and i had indexes on three fields which resulted in increased locking times on these tables during inserts & selects.

MongoDB - www.mongodb.org

Used version for testing : 0.9.1
The current version available is 0.9.7 which has a lot of bug fixes and performance improvement features.MongoDB is written in C++ and stores data in BSON format which is Binary Serialization of JSON documents. It has a console (similar to mysql) which can be used to fire queries - so unlike bdb, you dont have to go and write programs to fire queries. MongoDB has full indexing support for different columns. You can also do query profiling (like explain in mysql) to check the execution path of the query and then make optimizations if possible. It provides replication support. It can be used to store large binary data like videos very efficiently.

The one good thing about the future of mongodb is that it would be providing auto sharding of data for cloud level scalability. This feature is in alpha stage now, but it should be mature in some time and could be used then.

Mongodb provides drivers/apis for a lot of languages including python, php, java, ruby, c++ and perl. I downloaded the php driver and compiled it and installed the extension mongo.so in the php extentions directory. And then i ran the same tests to check out the speed of mongodb.

Load during execution : 2.5
Time taken by Insert script 1 : 1006.6038348675 sec for 2501000 records
Time taken by Insert script 2 : 1435.0536739826 sec for 2499000 records
Time taken by Select script : 2942.2539789677 sec - 9999914 records
Total database size in mongodb : 4 GB (both data & index)

Wow, so Mongodb turns to be approximately 16 times faster than simple mysql - MyISAM tables

But it takes up a huge amount of space. Why is that? Well, Mongodb creates data files with predefined sizes so that it does not have to increase or decrease the file size as per requirements. The first file it creates is of 64MB, the next file is 128MB etc. upto 2GB. After 2GB the remaining files are of 2GB only. So, even if we exceed a byte above the current file of 2GB, another file of 2GB will be created - even if it is 98% empty.

Another important fact about mongodb is that mongodb's storage engine uses memory-mapped files for performance. This limits the data size on a 32 bit machine to around 2GB. I had hit this limitation earlier so i used a 64 bit machine for testing. But this architecture of data files in
mongodb storage engine allows the code to be much plain and simple and open to embrace the 64 bit world. Mongodb does not support traditional locking which is another reason it is fast.

More info / References :
32 bit limitation => blog.mongodb.org/post/137788967/32-bit-limitations
performance testing => www.mongodb.org/display/DOCS/Performance+Testing
regarding mongodb => www.mongodb.org/display/DOCS/Home
Quickstart => www.mongodb.org/display/DOCS/Quickstart
Production deployments => www.mongodb.org/display/DOCS/Production+Deployments
Locking in mongodb => www.mongodb.org/display/DOCS/Atomic+Operations

Tokyo Cabinet/Tyrant

Tokyo Tyrant is a set of 3 applications
- tokyo cabinet : the embedded data store
Tokyo Cabinet => tokyocabinet.sourceforge.net/spex-en.html
- tokyo tyrant : network api
Tokyo Tyrant => tokyocabinet.sourceforge.net/tyrantdoc/
- tokyo dystopia : fulltext search system
Tokyo Dystopia => tokyocabinet.sourceforge.net/dystopiadoc/

I explored tokyo cabinet-1.4.29 & tokyo tyrant-1.1.30. So after installing tokyo cabinet, I had to go ahead and install tokyo tyrant on top of tokyo cabinet for the networking support. Tokyo tyrant provides a set of binaries for talking to the tokyo cabinet embedded database. So you can use tokyo tyrant to insert and select data from a tokyo cabinet server.

I got an api for php to interact with the tokyo tyrant server. It is known as Php Tyrant and it can be obtained from mamasam.indefero.net/p/tyrant/. It is written completely in php and is not a C extension.

If you go through the documentation, you would see that Tokyo cabinet supports majorly 4 types of databases.

Hash database - a key value store which uses hash algorithm to retrieve records. The time complexity in this case is constant [ O(1) ].
B+ tree database - a B+Tree is slower than the hash database. Records of a B+Tree are sorted and arranged in logical pages. So the time complex
ity of record retrival is O(log n). But the size of a B+Tree database is half of a hash database.
Fixed length database - It is faster than a hash database. But it has a restriction that each key has to be a natural number and the length of
each value is limited. The whole region of the database is mapped to memory by the mmap call - which reduces overhead related to file I/O.
Table database - It is close to a document oriented database than a key value pair. You can also form indexes on columns in the table to improve search and sorting performance. Indices are implemented as different files of B+Tree database.

So, i created a simple table database with the structure i wanted and created indexes on the required columns. And started running my scripts. But i found that tokyo cabinet is fast in the beginning, but it suffers from file locks. So any operation happening on the table locks the comp
lete file. This caused a lot of problem with simultaneous inserts & selects. In fact i had cases where the server stopped responding totally. And it took 1 hr 17 minutes to push in just 661 inserts & fire 270 selects. So i stopped running the select script and focused on running multiple insert scripts. And then ran the select script separately.

This says that if you go ahead implementing tokyo tyrant on a huge table, you should be aware of the file locks and hence should implement the selects on a slave of the main database. A master can handle multiple inserts while a slave can handle multiple selects. Or you form a queue for firing queries sequentially instead of simultaneously.

I ran 2 insert scripts simultaneously and the results were even worse then mysql.

Load during execution : 2.9
Time taken by Insert script 1 : 65671.21945715 sec for 2501000 records
Time taken by Insert script 2 : 63807.51564312 sec for 2499000 records
Total database size : 1095 MB [715 MB data + 380 MB index]

And i ran 2 select scripts simultaneously and the results were comparable to that of mysql.

Load during execution : 2.1
Time taken by Select script 1 : 20269.342120886 sec for 9999914 records
Time taken by Select script 2 : 20115.848437071 sec for 9999914 records

What could be said about tokyo tyrant is that it should be used as a key-value store - maybe by using a hash database or a fixed length database for persistent session storage.


The third database i looked into was couchdb. Which i had heard a lot about. The good thing about couchdb is that it provides a RESTful JSON API that can be accessed from any environment that allows HTTP requests. Couchdb is written in erlang and is quite fast. Again it does not have a c extention. There is a php api known as phpillow which wraps all functionalities of couchdb in its function calls.

You can download phpillow at arbitracker.org/phpillow.html

Also note that it works only with php 5.3. With php 5.2.x it still has a lot of bugs. Firstly there is no proper documentation, so it took me some time to figure out how to write insertion and select scripts and how to go about creating views (couchdb calls indexes as views). When i ra
n the scripts i found that it was crashing a lot. So, i did not go ahead with a proper testing of couchdb. Couchdb also supports master master replication (with developer supplied conflict resolution).

You could go ahead and read the comparison of couchdb & mongodb at mongodb's page



Anonymous said...

What were your tuning settings for TokyoCabinet/Tyrant?

dwight said...

Great info. I don't know a lot about Tokyo -- were you using the Table database for your tests? With table database, must one predefine the columns (like MySQL) or can they be dynamic (like MongoDB & Couch)?

gamegeek said...

@anonymous : The only setting I applied was using the asynchronous I/O for logging. Is there some other setting that i should look into. Please point out...

@dm : Yes i was using the table database for tests. And the table database is a document oriented database - it resembles that of MongoDB & CouchDB.

ferdhie said...

sometimes, the driver is the bottleneck, not the product

lasi said...

Can you share your test suite? With reproducible tests the benchmarks are better ;-)

Juan Valdez said...

@Jayant Kurmar: There is some tuning that needs to be done with Tokyo Tyrant, not as much as BerkeleyDB which is really the win here, but it needs tuning. The one major tune that we use here at work is the "xmsiz"(Extra memory) setting. In default, I believe it will only use 64MBs of RAM, that's either going to lead to a lot of swapping or an "out of memory" issue(what we ran into).

Log Buffer said...

"Jayant Kumar provides a nice backgrounder on document oriented data stores, and compares some examples—TokyoTyrant, MongoDb, and CouchDb—to MySQL 5.4. [...]"

Log Buffer #158

Vincent said...

I'm investigating other options to storing non-relational data too. There are quite a number of 'NoSQL'/document oriented type technologies which are still in early development.
Good info! Thanks for sharing your investigations. I hope I can post similar findings about my investigations soon.

Chunk said...

This Anonymous from the first post here. You should look at increasing bnum and xmsiz values for your TokyoTyrant database, they will have the most dramatic effect.

Unknown said...

A PHP C extension for CouchDB is under development at http://www.topdog.za.net/php_couchdb_extension

Anonymous said...


"A PHP C extension for CouchDB is under development at http://www.topdog.za.net/php_couchdb_extension"

Who cares? PHP is the worst joke in history.