Tuesday, June 03, 2008

MySQL versus PostgreSQL

I created and ran some simple tests on mysql and postgresql to figure out which one is faster. It is already known that postgresql is more stable and reliable than mysql. pgsql has a rich set of features. It is a complete RDBMS and also supports fulltext search.

All benchmarks were done on my laptop - Intel core 2 duo (2.0 GHz) with 4MB L2 cache & 2 GB ram. I have 64 Bit ubuntu system loaded with MySQL 5.1.24-rc (64 bit binary) and PostgreSQL 8.3.1 (compiled from source).

I used python as a scripting language for writing down my benchmark scripts. I used psycopg2 as a connector from python to postgres and mysql-python as a connector from python to mysql.

The benchmarking was done in phases. Firstly simple Insert, update and select queries were run to check the raw speed of these queries. Then threads were created to run simultaneous insert, update, select and delete queries. I checked the benchmark times for different number of concurrent threads.

I created a simple table on both mysql and pgsql. I used the MyISAM database engine to create table in mysql. :

ABC(id int not null auto_increment primary key, value varchar(250));

Queries that were run are:

Insert(I) : Insert ignore into ABC (id, value) ...(For pgsql, a rule has to be created to ignore duplicate inserts)
Update(U) : Update ABC set value=<something> where id=<random_id>
Select(S) : Select * from ABC where id=<random_id>
Delete(D) : Delete from ABC where id=<random_id>

  • Insert - 100000 rows in 1 thread
    Time taken for Mysql : 20.8 seconds
    Time taken for Pgsql : 58.1 seconds
    So, raw insert speed of mysql is much better as compared to pgsql

  • 100000 selects in 1 thread
    Time taken for Mysql : 21.76 seconds
    Time taken for Pgsql : 20.15 seconds
    Raw selects are better in pgsql as compared to mysql

  • Selects - 2 threads of 100000 selects
    Time taken for Mysql : 40.46 seconds
    Time taken for Pgsql : 27.38 seconds
    So, if i increase the concurrency of selects, pgsql perfors much than mysql

  • Update - 2 threads of 50000
    Time taken for Mysql : 23.97 seconds
    Time taken for Pgsql : 34.03 seconds
    Mysql looks better in handling updates here.

  • 4 Threads
    Run 1 : [100000 Selects, 50000 Inserts, 50000 Updates, 20000 Deletes]
    Time taken for Mysql : 45.25 seconds
    Time taken for Pgsql : 54.58 seconds
    Run 2 : [100000 Selects, 100000 Inserts, 50000 Updates, 10000 Deletes]
    Time taken for Mysql : 59.05 seconds
    Time taken for Pgsql : 69.38 seconds
    Run 3 : [100000 Selects, 20000 Inserts, 20000 Updates, 1000 Deletes]
    Time taken for Mysql : 35.54 seconds
    Time taken for Pgsql : 31.23 seconds
    These runs show that Mysql is good when you have very large no of inserts/updates/deletes as compared to selects. But pgsql's performance surpasses that of mysql when the number of selects are much higher.

  • Finally, lets approach the real life scenario where generally the number of selects are much more than the number of inserts and there are multiple threads performing selects and inserts.
    I will use the following notification here - <no_of_threads> X <no_of_operations(select/insert/update/delete)_per_thread>
    So, for example 3 X 20 Selects = 3 threads of 20 Selects in each thread

    Run 1 : [2 X 30000 selects, 3 X 20000 selects, 1 X 20000 inserts, 2 X 10000 inserts, 2 X 100000 updates, 2 X 1000 deletes] Total - 12 threads
    Time taken for Mysql : 42.28 seconds
    Time taken for Pgsql : 44.28 seconds
    Both Mysql and Pgsql are almost at par.

    Run 2 : [2 X 50000 selects, 2 X 40000 selects, 1 X 30000 selects, 1 X 20000 inserts, 2 X 15000 inserts, 2 X 15000 updates, 2 X 2000 deletes] Total - 12 threads but number of selects are quite high
    Time taken for Mysql : 61.02 seconds
    Time taken for Pgsql : 48.60 seconds
    So, as we increase the number of operations (specially selects) mysql's performance degrades, whereas pgsql's performance remains almost the same

    Run 3 : [4 X 50000 selects, 4 X 40000 selects, 2 X 30000 selects, 2 X 20000 inserts, 3 X 15000 inserts, 3 X 15000 updates, 2 X 3000 deletes] Total - 20 threads (10 threads for select, 5 for insert, 3 for update and 2 for delete) Which is the normal trend in database servers.
    Time taken for Mysql : 169.31 seconds
    Time taken for Pgsql : 128.7 seconds
    Bingo, so as concurrency increases pgsql becomes faster than mysql.

My earlier benchmarks with pgsql 7.x was not as good as this one. With postgresql 8.3.1, the speed of serving concurrent requests has increased a lot. So, in a high concurrency environment, i would generally recommend to go ahead with using postgresql rather than mysql.

Please check the comments section. We have some really interesting comments there...


shlomoid said...

Very interesting!
Could you repeat these tests using the InnoDB engine? MyISAM is hardly comparable to Postgresql, which is a fully ACID compliant database. InnoDB is the important benchmark, IMHO...

Anonymous said...

this is not comparing apples to apples. myisam uses table locking where as i would assume pg would use row level locking.

Unknown said...

No one would use myisam in a high concurrency environment. If that's your workload, you'd use Innodb.

Anonymous said...

This isn't really a fair comparison unless you use the innodb engine for MySQL.

Anonymous said...

Interesting comparison.

Would really like to compare speed against SQL Server 2005 or 2008.
I suspect MySQL and Postgresql would be much faster in most tests.

Mark Callaghan said...

I love statements like It is already known that postgresql is more stable and reliable than mysql. That is guaranteed to stir things up.

Do you have numbers with replication enabled (add log_bin to /etc/my.cnf for MySQL)? I am curious about the overhead of trigger based solutions for PostgreSQL.

Thanks for doing the work to get numbers.

ryan said...

Unfortunately, your tests are badly flawed... you have just shown the well documented behaviour of the non-transactional MyISAM engine at different levels of concurrency.

To compare apples with apples, you should compare the transactional Postgres engine against the various MySQL transactional engines. InnoDB, Falcon, PBXT, etc.

There are many other areas that you should example for a true comparison. The default MySQL memory settings are likely not optimal for your box, are you showing bad defaults? Also you may be showing performance implications within python or different performance for the python drivers. You should also make sure to disable MySQL's binlog so you aren't measuring any performance penalty from writing each statement to disk twice only on the MySQL test.

Benchmarks can reflect as much about the engines as the people who run the benchmarks. This is why I'm not brave enough to publish my own benchmarks. :-) If I did, I'm sure a Postgres expert would likely come back with a similar list of complaints about my Postgres configuration.

gamegeek said...

Yes, it is not a true comparison between myisam engine and postgresql. Because myisam engine uses table level locking and postgresql uses row level locking.

I had earlier run some tests comparing myisam and innodb. And i found innodb to be much slower than myisam. But looking to the outburst of comments here, i will run the same benchmarks on my box comparing innodb and postgres and publish the results.

@Mark Callaghan => i had searched lots of feature and stability comparison between mysql and pgsql on the net. And have found people who have shifted from mysql to pgsql and found pgsql to be much stable and reliable. Yes, i had binary logging enabled for mysql. If you look at the results, you would find that mysql's inserts/updates/deletes are still better than postgresql's(which does not have any replication set up yet). So if i set up replication on both database engines, i am sure that mysql would perform better.

@ryan => I had configured mysql for better performance on my machine. The table size for ABC is around 10 MB and index size is 2.6 MB. I had set up key_buffer = 96 MB and read_buffer & sort_buffer = 1 MB. So the index can be easily loaded in the key_buffer. And selects should be much faster. Yes, i agree, i should have disabled mysql's bin_log to bring down the i/o for mysql. But postgres also does some logging(like innodb - because it is a transactional database). So, i think it is a fair comparison. Though, i am fairly new to postgres and am not sure about the pgsql configuration. So, it might be the case that pgsql is under-performing.

Mark Callaghan said...

My question is whether you had enabled replication, or at least the logging required for replication, for PostgreSQL. The popular options are trigger based, and I am curious about the overhead of such a solution on OLTP performance. But I am not curious enough to run the tests myself.

Again, thanks for providing numbers.

gamegeek said...

@ mark Callaghan => As far as i know, postgresql does not do any logging for replication similar to what mysql does. Replication solutions for postgres are provided by programs external to postgresql which either use triggers for replication or there are middle wares which sit between the client and the server and run inserts on all machines and distribute selects on all machines.

To answer your question - NO, i did not do anything for replication on postgres.

Mark Robson said...

Perhaps, as well as using InnoDB, you could repeat the tests:

1. On production-grade hardware - use a server with a proper IO layer such as a battery backed raid controller. A laptop HD is not representative of a real-world test

2. Run the client and server on separate machines

Running the client and server on the same machine means that the client can slow the server down. It's also not very representative of how real applications work.

For your INSERT test, on MySQL, you can insert several rows with the extended insert syntax. Did you use this? Does PG support something similar? What batch size did you use for your transactions (which won't affect myisam) ? PG will surely perform better if you don't send each INSERT in its own transaction.

Finally, 100k rows is a bit small, especially given the tiny size of the rows and the small number of columns / indexes. You should definitely try some larger tables with more columns, rows and indexes.

Other than that, good stuff. Nice to see that someone's doing something.


Anonymous said...

I would love to see separation of client server, and not localhost benchmarks

Luís Soares said...

I did some small comparison between MySQL and PostgreSQL also. I used a workload based on the TPC-b benchmark, meaning that it was mostly a stress test. Nevertheless, I have used InnoDb as a storage engine and ran the tests with clients and servers on different machines.

If you are interested in these results, you can find them here. Have a look, and if you have any comment whatsoever, for instance, regarding MySQL or PostgreSQL tunning, please let me know. Maybe I can further improve the test, with some of your comments/sugestions.