Saturday, June 24, 2006

mysql storage engines

I would be discussing mysql storage engines/table types over here. MySQL supports pluggable storage engine architecture, which means that you can create your own storage engine and ask MySQL to store data for you using the sql query syntax defined in MySQL. Sounds cool right, but lets not dig that deep now. Lets first see what types of storage engines does MySQL provide and their benefits/drawbacks.

MYISAM Storage Engine

Based on the older IASM storage engine and currently the default storage engine in MySQL. So, if you create a table in MySQL and dont define a storage engine then, the table would use MYIASM as the storage engine. MySQL databases are generally directories and MYISAM tables are stored using 3 files -
.frm => the table format
.MYD => table data
.MYI => table indexes

Points to Remember:

  • The tables are machine/os independent and can be shifted by simply copying the 3 files to another installation of mysql on another machine.

  • Limit of 2^32 rows which can be increased to (2^32)^2.

  • Maximum no of indexes per table = 64, which can be increased upto 128 by recompiling.

  • Maximum no of columns per index = 16.

  • Blob and TEXT columns can be indexed.

  • Table size is almost unlimited (256TB). Though the table gives an error of "Table is full" when it reaches 4 GB. Then some parameters (MAX_ROWS & AVG_ROW_LENGTH) in the table definition needs to be changed to make the table expandable.

  • Transactions are not supported

  • Fulltext indexing and searching is supported though it is very slow.

  • Granuality of locking is TABLE. So if an insert and select happens one after another, the insert will lock the complete table and the select will have to wait till the lock is removed.

  • The tables do get corrupted sometimes and to recover it, you would need to run the "REPAIR TABLE" command.

  • Does not provide "Foreign key" usage.

InnoDB Storage Engine

InnoDB provides transactions and relational database support in MySQL - supports foreign keys and commit, rollback and crash recovery capabilities. InnoDB creates two log files namely "ib_logfile0" and "ib_logfile1" and a data file "ibdata1" in the MySQL data directory where it stores its tables. You can specify the path for creation of data file and the initial/max size of the file and also the no of data files to create. Placing different data files on different disks would of course lead to increased efficiency.

The table definitions are stored in database directory with a .frm extension whereas the data is stored in the "ibdata1" - tablespace. InnoDB has something known as "clustered index" where the data for the rows is stored. Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O.

innodb_data_home_dir = /ibdata

Points to remember:

  • Moving databases/tables is not easy. You need to take a dump and import it in the other installation of mysql

  • Table cannot contain more than 1000 columns

  • Although InnoDB supports row sizes larger than 65535 internally, you cannot define a row containing VARCHAR columns with a combined size larger than 65535.

  • combined size of the InnoDB log files must be less than 4GB

  • Minimum tablespace size is 10MB. And maximum tablespace size is 64TB. This is also the maximum size for a table

  • Does not support FULLTEXT indexes.

  • InnoDB does not keep an internal count of rows in a table. So to process a select count(*) from tablename, InnoDB must scan the complete index which may take some time.

  • Delete from tablename does not regenerate the table, but deletes rows one by one.

  • Supports foreign keys constrains and transactions (commit and rollback)

  • Has row level locking and also provides consistent non-locking read in select statements. Which provides better multi-user concurrency and performance.

Memory/Heap Storage Engine

As the name suggests, the tables are made in memory/RAM. Since the data for the tables are stored in memory they are lost if the mysql server is shut down or if it crashes, but the tables continue to exist since the table definition is a .frm file stored in the database directory. On a mysql restart, all data would have been lost and the tables would be empty. The indexes made on these tables are HASH indexes which makes access to these tables extremely fast. You can create temporary HEAP tables which can be used for high frequency reads & writes.

Points to remember:

  • Can have 32 indexes per table and 16 columns per index.

  • Type of indexing can be decided to be either HASH or BTREE.

  • Using Hash indexes can lead to slower updates and deletes on the table if the degree of key duplication is high.

  • Cannot contain BLOB or TEXT columns

  • Server needs sufficient memory to maintain all MEMORY tables at one point of time

  • To populate a MEMORY table when MySQL server is started, the --init-file option can be used

  • It has table level locking

  • Memory can be freed up by using either truncate/drop table command

Merge Storage Engine

Using MERGE storage engine, we can make a collection of identical MYISAM tables look and work like one. Each MERGE table is made up of two files
.frm - table definition/format
.MRG - names of tables that should be used as one
MERGEd tables support SELECT, DELETE, UPDATE and INSERT. On DROPPING a merge table, only the merge specification is dropped. The underlying tables are unaffected.

Points to remember:

  • MERGE table cannot enforce uniqueness over the underlying table.

  • MERGE table can be used to spilt big tables into parts and refer to them using the merged table name

  • MERGE table does not have its own indexes, it uses indexes of its underlying tables.

  • You cannot search the MERGE tables using FULLTEXT indexes. The FULLTEXT indexes can be created on the underlying MyIASM tables but they cannot be used through the MERGE table.

  • Merge tables use more file descriptors. If 10 clients are using a MERGE table that maps to 10 tables, the server uses (10 × 10) + 10 file descriptors.

  • Key reads are slower, since the on performing a SELECT, the MERGE storage engine has to issue a call on all the underlying tables one by one.

  • INSERTS can be directed to either the first or the last table using INSERT_METHOD parameter during MERGE table creation.

  • REPLACE does not work

  • When you create a MERGE table, there is no check to ensure that the underlying tables exist and have identical structures.

  • There is a limit of 2^32 rows to a MERGE table.

BDB(BerkeleyDB) Storage Engine

Sleepycat's Berkeley DB is a database engine that provides developers with fast, reliable, local persistence with zero administration (check ). Mysql has been provided the BDB transactional storage engine. You need to have BDB installed on your system and compile MySQL with BDB to use the BDB storage engine. Each BDB table is stored in two files
.frm -> table definition/format
.db -> contains table data and indexes

Points to remember:

  • BDB tables can have up to 31 indexes per table and 16 columns per index.

  • MySQL needs a primary key for each BDB table. If not provided, MySQL would create a hidden primary key and use it.

  • If all columns that you access in a BDB table are part of the same index or part of the primary key, MySQL can execute the query without having to access the actual row.

  • Sequential scanning is slower for BDB tables than for MyISAM tables because the data in BDB tables is stored in B-trees and not in a separate data file.

  • There are often holes in the BDB table to allow you to insert new rows in the middle of the index tree. This makes BDB tables somewhat larger than MyISAM tables.

  • SELECT COUNT(*) FROM tablename is slow for BDB tables, because no row count is maintained in the table.

  • Internal locking in BDB tables is done at the page level.

  • Opening many BDB tables at the same time may be quite slow.

  • Each BDB table stores in its .db file the path to the file as it was created. So it is not possible to move BDB table files from one database directory to another. To shift a BDB table mysqldump and import are required to be done.

FEDERATED Storage Engine

The federated storage engine can be used to access data in tables of remote databases locally. When a federated table is created a .frm file is created in the database directory. This file just contains the table definition. The actual table remains in the remote mysql database.
Flow of information in case of federated tables is as follows :

SQL calls issued locally
MySQL handler API (data in handler format)
MySQL client API (data converted to SQL calls)
Remote database -> MySQL client API
Convert result sets (if any) to handler format
Handler API -> Result rows or rows-affected count to local

The structure of the federated table must be exactly similar to the remote table structure except that the ENGINE should be FEDERATED and CONNECTION parameter should contain the connection string that indicates to the federated engine how to connect to the remote mysql server and access the table over there.

connection string format:

Points to remember:

  • The remote table that a FEDERATED table points to must exist before you try to access the table through the FEDERATED table.

  • It is possible for one FEDERATED table to point to another, but you must be careful not to create a loop.

  • Transactions are not supported

  • There is no way for the FEDERATED engine to know if the remote table has changed.

  • The FEDERATED storage engine supports SELECT, INSERT, UPDATE, DELETE, and indexes. It does not support ALTER TABLE, or any Data Definition Language statements other than DROP TABLE. The DROP TABLE statement would drop only the local table and not the remote table.

  • FEDERATED tables do not work with the query cache.

ARCHIVE Storage Engine

Used for storing large amounts of data without indexes in a very small footprint. When an archive table is created, following files are created in the database directory.
.frm -> table definition
.ARZ -> DATA file
The engine uses zlib lossless data compression (check ).

The Archive table is almost 75% smaller in size then the MyISAM table and almost 83% smaller in size then the transactional InnoDB engine. Even after compression of MyISAM table, Archive tables are still 7% smaller. Insert operations on an Archive table is faster than that on a MyISAM table. If table scans on both tables are compared, table scan on an Archive table is found to be faster.

Points to remember:

  • Supports only INSERTS and SELECTS

  • Does NOT support DELETE, REPLACE and UPDATE

  • Has row level locking

  • Rows are compressed as they are inserted.

  • A SELECT statement performs a complete table scan and uncompresses rows on demand. There is no ROW cache.

CSV Storage Engine

Stores data in text files using comma-separated values format. When a table is created 2 files are created in the database directory
.frm -> table definition
.CSV -> data file. A normal text file with comma separated values.

This engine does not support INDEXING. The data file (.csv) file can be read by spreadsheet applications like openoffice, MS office or star office.

In addition to these, there are two other storage engines:

EXAMPLE Storage engine: It is a stub engine and does nothing. You cannot perform anything on this engine except creation of tables. Tables are created as .frm files in the database directory. No data file is created and no data can be stored using this engine.
BLACKHOLE Storage engine: It acts as a black hole, accepts data and throws it away. It does not store anything. Selects always return an empty result set.

This summarizes the mysql storage engines. For details the following url can be referred.

The following link can be used to select a storage engine.

Next ... lets c what i can write

Thursday, June 22, 2006

How much tax do I pay?

Lets start from the ground up. I would create a hypothetical situation and try to figure out the amount of tax i pay per month. Well suppose i have a taxable salary of 10,000 per month and since i am a well paid indian, i would come in the 30% tax bracket.

So i would pay up 3,000/- as tax per month. Next, except home cooked food, everything else is taxed at 12.25%. Which means that if i spend around 100/- on say having my diner outside, i would be paying 12.25/- as tax. On an average, i end up spending 100/- every day on one thing or the other. That makes up for 3000/- per month and 3000*12.25 = 367.5/- as tax.

Moreover, in India, petrol is priced at 50/- per litre out of which 25/- is tax+duty. So if i spend around 1000/- on petrol, i would be purchasing only 500/- worth of petrol and paying 500/- worth of tax. Lucky for me, my office is close to my home, so i end up spending not more than 1000/- on petrol. Some people from my office end up spending something around 5000/- to 6000/- on petrol. its called luxury.

Atleast one movie a week makes sense to me. Currently all movies in noida - Delhi/NCR is priced at 125/- out of which 30% is entertainment tax which comes to about 29/-. So if i am watching 4 movies in a week, i would end up paying around 120/- as entertainment tax. Plus on purchase of any commodity, i am supposed to be paying 4% VAT. Well, i suppose, i must be getting around 1000/- worth of FMCG stuff and as a result pay around 40/- as tax.

To sum it all up, out of 10,000 i would be paying 3000+370+500+120+40=4030 as tax. I pay 40% of my salary as TAX. So, in a year, if i get a salary of 3 Lakhs, i would end up paying 1.2 Lakhs to the government as tax. Do i get the comparable returns is a big question, the answer of which is "not available" as of now.

Monday, June 19, 2006

life as usual...

Have not been able to write anything for a long time...Well, i had been a bit buzy with unwanted work...

Had to go home last weekend on an urgent work. Was unable to get any reservations in the Indian Railways, and so had to take a flight - "go-air" home. The interesting thing that happened is that the flight was priced at 2000/- for Delhi-Ahmedabad and 800/- for Ahmedabad-Delhi. And the next day when i was about to purchase the ticket, it costed me 3000/- for Delhi-Ahmedabad and 2000/- for the return. :-(

I landed at ahmedabad and then took an auto to the ST bus depo and from there went to baroda on the bus. Total cost came to be around 6000/- for flight(with taxes + fuel surcharge) and 400/- for the auto+bus travel from and to baroda.

Thennnn, i have been a bit buzy with the work at office. RESDEX, "my product" went live on saturday - 17th june. Though i almost dont do anything in it now, but still in case of emergency, i am supposed to do a bit of fire fighting.

And on sunday, i went to the railway reservation counter to cancel some of my train tickets. Well, after standing in queue, i realized that i had forgotten to bring the tickets which needed to be canclled. So... did nothing.

Currently reading a novel "Bag of Bones" By "Stephen King"...

Am planning to write an article about the amount of tax an indian pays, but i need to do some thinking on this and am not getting time :-( ... Well hope to write it by this weekend...

Saturday, June 10, 2006

Multi-Master replication in mysql

What is this? How can a slave have multiple masters? Well the architecture is not what the name implies. A slave can have only one master. But here we are going to create a loop between slaves and masters, also known as Circular replication.

Lets take 3 machines. (Since i have only a single machine - my laptop - i created 3 instances of mysql running on 3 different ports and with 3 different data directories to work out this architecture.) Suppose, we have installed mysql on all the 3 machines/nodes - say NodeA, NodeB, NodeC. What needs to be done is that NodeA is slave of NodeC, NodeC is slave of NodeB and NodeB is slave of NodeA. So each mysql instance is a master and also a slave. I will go into implementation details of this in a minute. First of all, let me list down the benefits and issues with this setup.


  • You can build a high availability and high performance cluster catering to a very busy web site.

  • In case if a node goes down, it is comparatively easy to bypass the node and get the cluster up and running in a small time.


  • There is a small time difference between an insert in the master and the corresponding data being replicated in the slave. Though this time difference is very small, but in this case, since the data has to propagate through all the nodes in the loop, the time difference may be of consecuence as the number of nodes increase.

  • The process of detecting and cleaning up a failure in replication still needs manual intervention.

To go into the technicalities, lets say that NodeA, NodeB and NodeC have mysql installed on them. You need to have a mysql version >= 4.1 on all the nodes. Also all nodes must have the same mysql version, to maintain compatibility in replication between different versions. I used mysql version 5.1.9-beta.

If you are new to mysql replication, please go through the following link This will help you to understand how to setup mysql replication.

Following figure explains the complete architecture. You can also add extra slaves for all the nodes which can be used/added to the loop in case if any node goes down or for simple select queries - further increasing the performance.

The configuration files on different nodes were as follows

Node A

# for replication

# log-queries-not-using-indexes


Node B

# for replication

# log-queries-not-using-indexes


Node C

# for replication

# log-queries-not-using-indexes


Once mysql on all 3 nodes are properly setup, all that needs to be done is run a "START SLAVE" command on all of them. This will bring the circular loop in function. Now you can have an array of mysql connections in your application and may be do a round robin execution of any type of queries (select, update, insert, delete) using the array on any of the machines in the cluster.

Points to be noted:

  • server-id has to be different for each of the machine in the circular replication

  • log-slave-updates tells the mysql slave to log queries that are replicated from master to its binary log, so that they can be passed on to who-so-ever is reading its binary log

  • replicate-same-server-id tells the mysql slave not to replicate its own queries in circular replication. If this variable is ignored, the setup may result in an infinite loop where a query fired on node A may keep on replicating itself repeatedly. This variable should be set to 0 so that the mysql slave does not replicate events that have its own server id. Preventing infinite loops.

  • auto_increment_offset determines the starting point of auto increment column values.

  • auto_increment_increment determines the increment between auto increment values.

auto_increment_increment and auto_increment_offset have to be used so that auto increment values on different servers do not conflict with each other. If this is not used, then, when a query is fired on node A and node B simultaneously or may be the replication between node A and node B breaks after the first query is fired on node A and before the second query is fired on node B.

On any Node
create table x(id int auto_increment primary key, val varchar(250) null);

On Node A
insert into table x(val) values ('in node a');

On Node B
insert into table x(val) values ('in node b');

In this case a "duplicate primary key" error will come when the query from node A is replicated and run on node B.

But if these variables are set appropriately, there would be no such issue. The above setup has auto_increment_increment set to 10 allowing addition of upto 10 nodes in the cluster.

For more details on this you can refer It is a very lengthy article but very detailed. Explains failover mechanisms and also provides some scripts for detecting breakages in the loop.

I love bikes

Take a look at these...

The GOD of all bikes...

1. 500 bhp (372 kW) @ 5600 rpm (60.4 bhp/liter); 525 lb.-ft. (712 Nm) @ 4200 rpm
2. 10-cylinder 90-degree V-type, liquid-cooled, 505 cubic inches (8277 cc)
3. 356-T6 aluminum alloy block with cast-iron liners, aluminum alloy cylinder heads
4. Bore x Stroke: 4.03 inches x 3.96 inches (102.4 x 100.6)
5. Two pushrod-actuated overhead valves per cylinder with roller-type hydraulic lifters
6. Sequential, multi-port electronic fuel injection with individual runners
7. Compression Ratio: 9.6:1
8. Max Engine Speed: 6000 rpm
9. Fuel Requirement: Unleaded premium, 93 octane (R+M/2)
10. Oil System: Dry Sump; takes 8 quarts Mobil1 10W30 Synthetic
11. Cooling System: Twin aluminum radiators mounted atop engine intake manifolds,
force-fed from front-mounted, belt-driven turbine fan.
12. Takes 11 quarts of antifreeze.
13. Exhaust System: Equal-length tubular stainless steel headers with dual collectors and
central rear outlets


Outboard, single-sided parallel upper and lower control arms made from polished billet aluminum. Mounted via ball
joint to aluminum steering uprights and hubs. Five degrees caster. Single, fully adjustable centrally located
coil-over damper ( 2.25-inch coil with adjustable spring perch); pullrod and rocker-actuated mono linkage.
Center-lock racing-style hubs.

Hand-fabricated box-section steel inboard swing arms, incorporating "hydral-link" lockable recirculating hydraulic
circuit parking stand. Single fully adjustable centrally located Koni coil-over damper ( 2.25-inch coil with
adjustable spring perch); pushrod and rocker-actuated mono linkage. Center-lock racing-style hubs


20-inch perimeter-mounted drilled machined stainless steel rotors, one per wheel. Two four-piston fixed aluminum
calipers per wheel (16 pistons total), custom designed. Blue anodized caliper finish. Hand-activated.

20-inch perimeter-mounted drilled cast-iron rotors, one per wheel. One four-piston fixed aluminum caliper per wheel
(8 pistons total), custom designed. Blue anodized caliper finish. Foot-activated.


0-60 mph: 2.5 seconds (est.)
Top Speed: 300+ mph (est.)


Length: 102 inches
Width: 27.7 inches
Height: 36.9 inches
Wheelbase: 76 inches
Seat Height: 29 inches
Weight: 1,500 lbs.
Track, Front: 8.75 in
Track, Rear: 10 in
Weight Dist: 49F/51R
Ground Clearance: 3 in
Fuel: 3.25 gallons


Alternator: 136-amp high-speed
Battery : Leak-resistant, maintenance-free 600 CCA
Lighting: Headlights consist of 12 five-watt LEDs, front, with beam-modifying optics and masked lenses. Eight LEDs,
rear. Headlamps articulate with wheels.

TRANSMISSION: Manual, foot-shifted two-speed

Aluminum-cased two-speed, sequential racing-style with dog ring, straight-cut gears
Gear Ratios: 1st 18:38; 2nd 23:25
Clutch: Double-disc, dry-plate with organic friction materials, hand lever actuated with assist
Final drive: Dual 110-link motorcycle-style chains

Front Sprockets: 14 teeth
Rear Sprockets: 35 teeth

Longitudinal, centrally mounted engine, rear-wheel drive layout; monocoque construction, engine is central, stressed
member. Body of bullet aluminum

Sunday, June 04, 2006

Which of the X-Men Are You?

Saw X-Men III - Last Stand some time back. Very nice movie. Cool animations and cool theme. Totally unrealistic and unexpected.


You Are Cyclops

Dedicated and responsible, you will always remain loyal to your cause.
You are a commanding leader - after all, you can kill someone just by looking at them.

Power: force beams from your eyes

Friday, June 02, 2006

Benchmarking results of mysql, lucene and sphinx...

Finally i was able to do a benchmark of the 3 search engines
- mysql fulltext search
- lucene search engine
- sphinx

I came across sphinx while doing a search on full text search engines. It is a very good engine. Few points regarding sphinx
-> Very simple to configure, create index and search
-> Very easy to integrate with php and mysql. APIs for the same are also available. I was able to build index and search using sphinx in a few hours.
-> The index which has been created is a combination of all fields of mysql. There is no distinction between different fields being searched. So you can perform search on an index and not on different fields of the index.
-> Of course since its source code is available, the searching process can be customized according to your needs. Moreover 0.9.6 version which is under development will be providing field wise search.
-> Since this is in C, it is supposed to be faster as compared to lucene.

I did the benchmarking on my own laptop. It is a dell Inspiron 700m running linux (fedora core 4) kernel 2.6.11. Configuration of the m/c ==>>
Processor : Intel(R) Pentium(R) M processor 1.60GHz
Cache size : 2048 KB
Memory : 512 MB

I got down a table containing 1 Lakh (100,000) records. The data size was 456 MB. And created index on some fields from the table.


Mysql Version - 5.1.9 (Beta)
Stop words : Built in
Indexing words of length >=2 & <=84 ( There is a feature in mysql only which allows you to specify the min & max length of words you want to index. By default min length is 4. I changed it to 2 so that i can index words like net, php, hr etc. If you want to index all words, change this to 1.
Indexing time : 1440 seconds
Index size : 263 MB (456 MB data - remains same).

Lucene Version - 1.9.1
Stop words : 660 general words (like is, if, this etc...)
Indexing time : 2761 seconds (default configuration was used during indexing. There are certain parameters like mergefactor and maxmergedocs using which indexing can be tuned to work much faster. Though it may result in Too Many Open Files error in linux.
Index Size : 109 MB (No data is stored. Had stored only the unique id of each document using which i can retrieve the document later.)

Sphinx Version - 0.9.5
Stop words : NONE
Indexing time : 246 seconds (using default configuration. Dont have much idea whether indexing can be tuned.)
Index Size : 211.1 MB (3.1 MB Index - .spi file + 208 MB Data - .spd file).
3.1 MB of index looks extremely good. Also in case of sphinx, there is no need to maintain separate id for retrieval of data, since the unique id of your data is maintained in the index. As compared to lucene where you have to maintain a separate id and enforce uniqueness on it with your program. The indexing time and data compression are both good.


The searches done were using scripts. I did a number of searches on randomly selected words and then came out with an average time. In case of lucene and mysql, the search was done on 2 fields with an OR between them. In case of sphinx the search was done on the complete index.

Searches/ThreadConcurrency - no of simultaneous threadsTotal searchesTotal time (milli seconds)Average time (milli seconds)
Found that search for an exact phrase which can be done using "in boolean mode" queries is more resource hungry. The query time in mysql is extremely high. Mysql runs purely on RAM, so with more RAM and accordingly configured mysql the queries would be much faster. Concurrency does not affect query execution speed to a major extent.
Initially searches are slow. But as we keep on searching the index is cached in RAM and the speed of searches increases. The searches are very fast as compared to MySQL. Here, from the results, it does not seem that concurrency is an issue. But i have been using lucene for some time now and have found that there are some concurrency issues in lucene. So if you are searching a huge index of say 100,00,000 records and the index size is say 8-10 GB, then with a concurrency of 100 searches at the same time, issues pop up, as searches seem to get locked. But still performance wise it is much better than mysql
Single searches are faster than that in lucene. But here we will have to consider the fact that there is no OR clause in the search. So the search engine does not have to get 2 different result sets and do a union on them. But as the concurrency of searches in increased the average time per search does not drop majorly as in lucene. Clearly pointing out that there may be concurrency issues here. Since i have not explored this to a great extent, i cannot comment on the problems related to concurrency here.

To sum up, lucene seems to be the right choice for the time being if you are looking forward to searching large amounts of data and performance is your primary goal. The list of features available is also impressive. Sphinx will come in next where indexing time is very small and indexing/searching hassle free. With evolution, sphinx may overtake lucene some time down the line providing both a list of good features and performance. MySQL fulltext search comes as a last option, which, it seems should be used only if the data set is small and quick development time is required.

Date : 12 Aug 2009
Some latest benchmarks on Lucene Versus Sphinx