Saturday, April 22, 2006

A journey with mysql

It has been a long time since i wrote anything. Actually there has been nothing to write about, since nothing has happened. So, i thought why not share some knowledge about mysql. HE HE...

I came across mysql in my college days during a summer project that we did at ORG-MARG India now known as Ac Nielson. It is now the third largest market research organization in the world. There is a person over there in the IT-Department known as Mr. Sanjay Bhatia who was very enthusiast about linux. And he asked us to develop a small site. It was more of a DB entry and retrieveing information type of work. The challenge was that we had never had any experience on using linux for web development. We started by choosing apache-tomcat, jsp and mysql for developing our application. Everything gor set up fine, except we were unable to use JDBC to connect JSP to MySQL and so we gave up on mysql at that time. And shifted to PostgreSQL.

Next i encountered MySQL in my first job. And though i never had a chance to look at the administration part, mysql did a good job. I had no idea about indexes, storage engines etc etc. But we used it.

To sum it, mysql is a good db for beginers as well for advanced people. Without any knowledge of how to use mysql, we can start off with SQL queries. Enough of blah blah... lets go on with mysql.

Firstly, lets see what type of database mysql is. MySQL is an engine pluggable database. Which means that you can write your own storage engine and plug it in with mysql. Quite geeky!!!. Most common engines are MyISAM, INNODB, HEAP, BDB. New engines that have come up with mysql 5.x are FEDERATED, ARCHIVE and CSV engines. Now you would say, this is information overload. Well actually it is... My main encounter has been with MYISAM and so, to avoid information overload, we will discuss that only. Though some of the concepts that i will write may be cover up the whole of mysql

What is MYISAM? Well, it is the default and legendary storage engine of MySQL. Here, each database is a directory and each table is divided into 3 files namely
.frm -> table structure
.MYD -> Data
.MYI -> Index

MyISAM tables are generally OS independent. So if you need to copy a table from one m/c to another all you need to do is copy the 3 files for that table to another machine, irrespective of the OS. The default size of a MYISAM table is 4 GB. After 4 GB, you need to change certain parameters and the table can be expanded to infinity. ;-)

Actually, the size of table is not not limited in MYSQL. Once you reach a 4 GB limit, you will need to convert the table from Dynamic to Static format. Use the query

Alter table <tablename> AVG_ROW_LENGTH=<size in bytes> MAX_ROWS=<no of rows>.
AVG_ROW_LENGTH = current table size in bytes / no of rows in the table.
current table size = size of .MYD file


Once this is done, the table can be expanded to whatever limit the OS has on the max size of file. For Linux - kernel 2.4+ (using ext3 file system) the limit is 4TB. That is too much.

There is too much to write about if I start off with this... Lets move to a bit about mysql performance optimizations.

Firstly indexes. Very important. All indexes are generlly stored as B-Trees. Points to remember about indexes

1. Always run a query with indexed columns being utitlized in the where clause. Makes a huge difference. Maybe of about 90% gain.

2. Secondly, do not have unwanted indexes on a table. The more indexes that you have, the slower the inserts/updates would be on this table. Because for every insert/update the index needs to be updated.

3. Try to avoid having 2 or more indexes on the same column. This generally used to happen with me. Firstly, i used to make the column unique and then index it. This is not required. Remember that unique key is also an index.

For more information on how to use indexes effectively check out
http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html

Secondly mysql configuration. Generally in the my.cnf file in /etc directory in linux. Things to remember about configuring mysql

1. Key_cache = memory on system / 2.
Key cache is used to load index in the memory. Frequently used blocks of indexes are loaded in the key cache and used whenever required. Key cache should be a bit more than the max size of .MYI file which is frequently used. It should not exceed the half of (memory on system). Else mysql may start using swap and slow down. As of mysql 4.1, multiple threads can access the key cache simultaneously. Earlier than that access to key cache was sequential. Which is why, mysql 4.1+ has better performance gain over older versions of mysql.

2. Read_buffer_size
Generally used for sequential scan of a table. Each thread that does a sequential scan allocates a buffer of this size for each table it scans. So, if you do many sequential scans, you should increase this value.

3. Sort_buffer_size
Used for Order by ans group by caluses. Each thread that needs to do a sort allocates a buffer of this size. So, you need to increase this value for faster ORDER BY or GROUP BY operations.

4. Max connections
Determines how many simultaneous connections can be there in mysql. Since each connection eats up some memory/system resources, following calculation should be used to decide this.

total memory = (read_buffer + sort_buffer)*max_connection + key_buffer.

If, your total memory exceeds the physical ram limit, then mysql may use the virtual memory/swap and slow down or may even crash. These parameters should be balanced out depending on the requirements. There is no fixed good/bad value for this. It all depends on how and where mysql is being used. Also remembet that both read and sort buffers should be reasonable. Too high or too low value for either of them is not good.


5. table_cache
Defines the no of tables which are opened and cached in memory. The best way to set this is to run a query

Show status like '%opened_tables%';

If this parameter is increasing at a good rate, then mysql is using up resouces in opening and closing tables. so increase the table cache.

6. thread_cache_size
Defines the no of threads cached in memory. To set this run

Show status like '%threads_created%';

Again, if this parameter is increasing rapidly, just increase the thread cache.

Adjusting the last 2 parameters is very important, since if not done, mysql may use up system resources for opening/closing tables and creating/destroying threads. In this case you will see that system usage of CPU is high in top command.

And pls note, i m a linux junkie, so all commands over here are for linux only. I have no clue what their counter part commands will be in windoze... Maybe i will rant about mysql later is my some other blog.

Thats all for today. Lots of technical stuff. Next blog would be much better - and it would be non-technical.

1 comment:

Anonymous said...

Thanx again! Impressed...