Tuesday, June 29, 2010

mysql hack - altering huge tables

You have a huge mysql table - maybe 100 GB. And you need to run alter on it - to either add an index, drop an index, add a column or drop a column. If you run the simple mysql "alter table" command, you will end up spending ages to bring the table back into production.

Here is a simple hack to get the thing done. The benefit of the hack is that the alter runs quite fast. But since this is a hack, you will need to take care of the backups - in case anything goes wrong. I have seen this hack work effectively with both MyISAM and InnoDB tables.

Here i have created a simple table to show this hack process. You can assume that this table has billions of rows and is more than 100GB in size.

CREATE TABLE `testhack` (
`id` int(11) NOT NULL DEFAULT '0',
`unq` varchar(100) DEFAULT NULL,
`keyword` varchar(250) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unq` (`unq`)
) ENGINE=MyISAM

I need to drop the unique key. So, i create a new table testhack_new with the following schema

CREATE TABLE `testhack_new` (
`id` int(11) NOT NULL DEFAULT '0',
`unq` varchar(100) DEFAULT NULL,
`keyword` varchar(250) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM

Flush both tables with read lock

mysql> Flush tables with read lock;

Open another terminal. And go to the mysql/data/<database> directory. Do the following:

mysql/data/test $ mv testhack.frm testhack_old.frm; mv testhack_new.frm testhack.frm; mv testhack_old.frm testhack_new.frm; mv testhack.MYI testhack_old.MYI; mv testhack_new.MYI testhack.MYI; mv testhack_old.MYI testhack_new.MYI;

So, what is happening here is that the index, table definitions are being switched. After this process, the table definition of testhack will not contain the unique key. Now unlock the tables in the main window. And run repair tables to remove any issues.

mysql> unlock tables;
mysql> repair tables testhack;
+---------------+--------+----------+-------------------------------------------+
| Table         | Op     | Msg_type | Msg_text                                  |
+---------------+--------+----------+-------------------------------------------+
| test.testhack | repair | warning  | Number of rows changed from 0 to 20000000 | 
| test.testhack | repair | status   | OK                                        | 
+---------------+--------+----------+-------------------------------------------+

The repair table rebuilds the indexes. It is faster since it skips the use of key_cache for rebuilding the index (which is used in a normal alter table scenario).

Tuesday, June 22, 2010

How Solar Power Works

What is solar energy ?

Solar energy is radiant energy which is produced by the sun. Each day the sun radiates, or sends out, an immense amount of energy. The sun radiates more energy in a second than people have used since the beginning of time!

The energy of the Sun originates from within the sun itself. Like other stars, the sun is mostly a big ball of gases––mostly hydrogen and helium atoms.

The hydrogen atoms in the sun’s core combine to form helium and generate energy in a process called nuclear fusion.

During nuclear fusion, the sun’s extremely high pressure and temperature cause hydrogen atoms to come apart and their nuclei (the central cores of the atoms) to fuse or combine. Four hydrogen nuclei fuse to become one helium atom. But the helium atom contains less mass compared to four hydrogen atoms that fused. Some matter is lost during nuclear fusion. The lost matter is emitted into space as radiant energy.

It requires many years for the energy in the sun’s core to make its way to the solar surface, after which just a little over eight minutes to travel the 93 million miles to earth. The solar energy travels to the earth at a speed of 186,000 miles per second, the velocity of light.

Simply a small percentage of the power radiated from the sun into space strikes our planet, one part in two billion. Yet this quantity of energy is enormous. Everyday enough energy strikes the usa to provide the nation’s energy needs for one and a half years!

Where does all this energy go?

About 15 percent of the sun’s energy that hits our planet is reflected back to space. Another 30 percent is used to evaporate water, which, lifted into the atmosphere, produces rainfall. Solar energy is also absorbed by plants, the land, and the oceans. The rest could be employed to supply our energy needs.

Who invented solar energy ?

Humans have harnessed solar power for centuries. As early as the 7th century B.C., people used simple magnifying glasses to concentrate the light of the sun into beams so hot they would cause wood to catch fire. More than a century ago in France, a scientist used heat from a solar collector to make steam to drive a steam engine. At first of this century, scientists and engineers began researching ways to use solar power in earnest. One important development was a remarkably efficient solar boiler introduced by Charles Greeley Abbott, a united states astrophysicist, in 1936.

The solar hot water heater gained popularity at this time in Florida, California, and the Southwest. The industry started in the early 1920s and was in full swing just before World War II. This growth lasted prior to the mid-1950s when low-cost propane took over as primary fuel for heating American homes.

The public and world governments remained largely indifferent to the possibilities of solar power before oil shortages of the1970s. Today, people use solar technology to heat buildings and water and also to generate electricity.

How we use solar power today ?

Solar energy can be used in several different ways, of course. There are 2 simple types of solar energy:

* Solar thermal energy collects the sun's warmth through 1 of 2 means: in water or in an anti-freeze (glycol) mixture.
* Solar photovoltaic energy converts the sun's radiation to usable electricity.

Listed below are the five most practical and popular ways that solar power can be used:

1. Small portable solar photovoltaic systems. We see these used everywhere, from calculators to solar garden tools. Portable units can be utilised for everything from RV appliances while single panel systems can be used traffic signs and remote monitoring stations.

2. Solar pool heating. Running water in direct circulation systems via a solar collector is a very practical method to heat water for your pool or hot spa.

3. Thermal glycol energy to heat water. In this method (indirect circulation), glycol is heated by sunshine and the heat is then transferred to water in a warm water tank. This process of collecting the sun's energy is more practical now than ever. In areas as far north as Edmonton, Alberta, solar thermal to heat water is economically sound. It can pay for itself in 3 years or less.

4. Integrating solar photovoltaic energy into your home or business power. In many parts on the planet, solar photovoltaics is an economically feasible approach to supplement the power of your home. In Japan, photovoltaics are competitive with other forms of power. In america alone, new incentive programs make this form of solar power ever more viable in many states. An increasingly popular and practical method of integrating solar energy into the power of your home or business is through the usage of building integrated solar photovoltaics.

5. Large independent photovoltaic systems. When you have enough sun power at your site, you may be able to go off grid. You may also integrate or hybridize your solar power system with wind power or other forms of renewable energy to stay 'off the grid.'

How do Photovoltaic panels work ?

Silicon is mounted beneath non-reflective glass to create photovoltaic panels. These panels collect photons from the sun, converting them into DC electrical energy. The power created then flows into an inverter. The inverter transforms the power into basic voltage and AC electric power.

Pv cells are prepared with particular materials called semiconductors such as silicon, which is presently the most generally used. When light hits the Photovoltaic cell, a specific share of it is absorbed inside the semiconductor material. This means that the energy of the absorbed light is given to the semiconductor.

The energy unfastens the electrons, permitting them to run freely. Solar power cells also have one or more electric fields that act to compel electrons unfastened by light absorption to flow in a specific direction. This flow of electrons is a current, and by introducing metal links on the top and bottom of the -Photovoltaic cell, the current can be drawn to use it externally.

Do you know the pros and cons of solar energy ?

Solar Pro Arguments

- Heating our homes with oil or gas or using electricity from power plants running with coal and oil is a reason for climate change and climate disruption. Solar energy, on the other hand, is clean and environmentally-friendly.

- Solar hot-water heaters require little maintenance, and their initial investment could be recovered within a relatively short time.

- Solar hot-water heaters can work in almost any climate, even just in very cold ones. You just have to choose the right system for your climate: drainback, thermosyphon, batch-ICS, etc.

- Maintenance costs of solar powered systems are minimal and the warranties large.

- Financial incentives (USA, Canada, European states…) can aid in eliminating the price of the initial investment in solar technologies. The U.S. government, for example, offers tax credits for solar systems certified by by the SRCC (Solar Rating and Certification Corporation), which amount to 30 percent of the investment (2009-2016 period).

Solar Cons Arguments

- The initial investment in Solar Water heaters or in Photovoltaic Electric Systems is greater than that required by conventional electric and gas heaters systems.

- The payback period of solar PV-electric systems is high, as well as those of solar space heating or solar cooling (only the solar domestic hot water heating payback is short or relatively short).

- Solar water heating do not support a direct combination with radiators (including baseboard ones).

- Some ac (solar space heating and the solar cooling systems) are expensive, and rather untested technologies: solar air-con isn't, till now, a really economical option.

- The efficiency of solar powered systems is rather influenced by sunlight resources. It's in colder climates, where heating or electricity needs are higher, that the efficiency is smaller.

About the writer - Barbara Young writes on RV solar power kits in her personal hobby web log 12voltsolarpanels.net. Her work is dedicated to helping people save energy using solar energy to reduce CO2 emissions and energy dependency.

Wednesday, June 16, 2010

Writing your first map-reduce program on hadoop

Before we go ahead with the actual program, lets have a look at what map-reduce is and its usage.

Map-Reduce is a programming model or concept which helps in implementing and processing large scale data sets. MapReduce model consists of two functions map() and reduce(). The map() function is applied to all input items in the input data set converting them to a set of key-value pairs. Multiple map jobs can be run in parallel over different processing nodes to process large data sets.

Map (K1, V1) = List (K2, V2).

The list of processed pairs (K2,V2) is collected by the MapReduce framework. All pairs with the same key are grouped together, creating one group for each one of the generated keys. The reduce function is then applied to each group in parallel to produce a collection of "similar" values.

Reduce (K2, List(V2)) = List (V3)

Since both map and reduce jobs can be run in parallel, they could be distributed over a large number of processing nodes to process large data sets.

Lets write a simple program to count the number of characters in each string using the MapReduce concept but without using any framework. Our program here would create some random strings and use MapReduce concept to distribute the task over a number of threads and eventually output the total number of characters.

// MyMapReduce.java
import java.util.*;

public class MyMapReduce
{
  List<List<String>> buckets = new ArrayList<List<String>>();
  List<String> intermediateresults = new ArrayList<String>();
  List<String> values = new ArrayList<String>();

  public void init()
  {
    for(int i = 1; i<=30; i++)
    {
      values.add("zyx" + new Integer(i).toString());
    }
    System.out.println("**Running Conversion into Buckets**\n");
    //convert the input data in smaller chunks. Here dividing 30 strings into chunks of 6 chunks of 5.
    List b = step1ConvertIntoBuckets(values,5);
    System.out.println("*DONE*\n");
    System.out.println("**Running #Map Function# concurrently for all Buckets\n");
    List res = step2RunMapFunctionForAllBuckets(b);
    System.out.println("*MAP Done*\n");
    System.out.println("**Running #Reduce Function# for collating Intermediate Results and Printing Results\n");
    step3RunReduceFunctionForAllBuckets(res);
    System.out.println("*REDUCE Done*\n");
  }
  public List step1ConvertIntoBuckets(List list,int numberofbuckets)
  {
    int n = list.size();
    int m = n / numberofbuckets;
    int rem = n% numberofbuckets;

    int count = 0;
    System.out.println("BUCKETS");
    for(int j =1; j<= numberofbuckets; j++)
    {
      List<String> temp = new ArrayList<String>();
      for(int i=1; i<= m; i++)
      {
        temp.add((String)values.get(count));
        count++;
      }
      buckets.add(temp);
      temp = new ArrayList<String>();
    }
    if(rem != 0)
    {
      List<String> temp = new ArrayList<String>();
      for(int i =1; i<=rem;i++)
      {
        temp.add((String)values.get(count));
        count++;
      }
      buckets.add(temp);
    }
    System.out.println(buckets);
    return buckets;
  }

  public List step2RunMapFunctionForAllBuckets(List list)
  {
    for(int i=0; i< list.size(); i++)
    {
      List<String> elementList = (ArrayList)list.get(i);
      new StartThread(elementList).start();
    }
    try
    {
      Thread.currentThread().sleep(1000);
    }catch(Exception e)
    {    }
    return intermediateresults;
  }

  public void step3RunReduceFunctionForAllBuckets(List list)
  {
    int sum =0;
    for(int i=0; i< list.size(); i++)
    {
      //you can do some processing here, like finding max of all results etc
      int t = Integer.parseInt((String)list.get(i));
      sum += t;
    }
    System.out.println("\nTotal Count is "+ sum+"\n");
  }

  class StartThread extends Thread
  {
    private List<String> tempList = new ArrayList<String>();
    public StartThread(List<String> list)
    {
      tempList = list;
    }
    public void run()
    {
      System.out.println("In Map...");
      for (String str : tempList)
      {
        synchronized(this)
        {
          intermediateresults.add(new Integer(str.length()).toString());
        }
      }
    }
  }
  public static void main(String[] args)
  {
    MyMapReduce my = new MyMapReduce();
    my.init();
  }
}

In this program, you have an arraylist of 30 strings. It is being divided into 5 batches of 6 strings. Each Batch is run in parallel in a thread and the processed information is collected in an intermediate arraylist. The reduce function is then called which loops through the intermediate arraylist and sums up the counts to output the information. Here only the Map function is being distributed over multiple threads. The reduce simply loops through the output of map and sums up the information.

Compile and run the program - output is :

$ java MyMapReduce 
**Running Conversion into Buckets**

BUCKETS
[[zyx1, zyx2, zyx3, zyx4, zyx5, zyx6], [zyx7, zyx8, zyx9, zyx10, zyx11, zyx12], [zyx13, zyx14, zyx15, zyx16, zyx17, zyx18], [zyx19, zyx20, zyx21, zyx22, zyx23, zyx24], [zyx25, zyx26, zyx27, zyx28, zyx29, zyx30]]
*DONE*

**Running #Map Function# concurrently for all Buckets

In Map...
In Map...
In Map...
In Map...
In Map...
*MAP Done*

**Running #Reduce Function# for collating Intermediate Results and Printing Results


Total Count is 141

*REDUCE Done*

A mapreduce framework like hadoop provides the distributed setup to run such mapreduce programs over a large number of processing nodes. It can automatically handle the spawning of processing threads and store the intermediatory information in some temporary file among its nodes.

Pre-requisite : setup hadoop on multiple nodes refer - http://jayant7k.blogspot.com/2008/10/setting-up-hadoop.html

Here is the same code written in the hadoop 0.20.1 map-reduce framework

import java.io.*;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount extends Configured implements Tool {

  public static class MapClass extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
      String line = value.toString();
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  /**
   * A reducer class that just emits the sum of the input values.
   */
  public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable value : values) {
        sum += value.get();
      }
      context.write(key, new IntWritable(sum));
    }
  }

  static int printUsage() {
    System.out.println("wordcount [-r <reduces>] <input> <output>");
    ToolRunner.printGenericCommandUsage(System.out);
    return -1;
  }

  public int run(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "WordCount example for hadoop 0.20.1");

    job.setJarByClass(WordCount.class);
    job.setMapperClass(MapClass.class);
    job.setCombinerClass(Reduce.class);
    job.setReducerClass(Reduce.class);

    // the keys are words (strings)
    job.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    job.setOutputValueClass(IntWritable.class);


    List<String> other_args = new ArrayList<String>();
    for(int i=0; i < args.length; ++i) {
      try {
        // The number of map tasks was earlier configurable, 
        // But with hadoop 0.20.1, it is decided by the framework.
        // Since this heavily depends on the input data size and how it is being split.
        if ("-r".equals(args[i])) {
          job.setNumReduceTasks(Integer.parseInt(args[++i]));
        } else {
          other_args.add(args[i]);
        }
      } catch (NumberFormatException except) {
        System.out.println("ERROR: Integer expected instead of " + args[i]);
        return printUsage();
      } catch (ArrayIndexOutOfBoundsException except) {
        System.out.println("ERROR: Required parameter missing from " +
            args[i-1]);
        return printUsage();
      }
    }
    // Make sure there are exactly 2 parameters left.
    if (other_args.size() != 2) {
      System.out.println("ERROR: Wrong number of parameters: " + other_args.size() + " instead of 2.");
      return printUsage();
    }
    FileInputFormat.addInputPath(job, new Path(other_args.get(0)));
    FileOutputFormat.setOutputPath(job, new Path(other_args.get(1)));

    //submit job and wait for completion. Also show output to user.
    job.waitForCompletion(true);
    return 0;
  }
  public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new WordCount(), args);
    System.exit(res);
  }
}
Compile, create jar and run the file
$ javac -classpath ../hadoop-0.20.1/hadoop-0.20.1-core.jar -d wordcount_classes/ WordCount.java
$ jar -cvf wordcount.jar -C wordcount_classes/ .
$ cd ../hadoop-0.20.1
$ ./bin/hadoop jar ../progs/wordcount.jar WordCount -r 2 /user/hadoop/input /user/hadoop/output

10/06/16 17:00:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/06/16 17:00:48 INFO input.FileInputFormat: Total input paths to process : 2
10/06/16 17:00:48 INFO mapred.JobClient: Running job: job_201006141203_0008
10/06/16 17:00:49 INFO mapred.JobClient:  map 0% reduce 0%
10/06/16 17:00:58 INFO mapred.JobClient:  map 100% reduce 0%
10/06/16 17:01:10 INFO mapred.JobClient:  map 100% reduce 100%
10/06/16 17:01:12 INFO mapred.JobClient: Job complete: job_201006141203_0008
10/06/16 17:01:12 INFO mapred.JobClient: Counters: 17
10/06/16 17:01:12 INFO mapred.JobClient:   Job Counters 
10/06/16 17:01:12 INFO mapred.JobClient:     Launched reduce tasks=2
10/06/16 17:01:12 INFO mapred.JobClient:     Launched map tasks=2
10/06/16 17:01:12 INFO mapred.JobClient:     Data-local map tasks=2
10/06/16 17:01:12 INFO mapred.JobClient:   FileSystemCounters
10/06/16 17:01:12 INFO mapred.JobClient:     FILE_BYTES_READ=2009
10/06/16 17:01:12 INFO mapred.JobClient:     HDFS_BYTES_READ=1467
10/06/16 17:01:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4142
10/06/16 17:01:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1356
10/06/16 17:01:12 INFO mapred.JobClient:   Map-Reduce Framework
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce input groups=0
10/06/16 17:01:12 INFO mapred.JobClient:     Combine output records=142
10/06/16 17:01:12 INFO mapred.JobClient:     Map input records=33
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce shuffle bytes=2021
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce output records=0
10/06/16 17:01:12 INFO mapred.JobClient:     Spilled Records=284
10/06/16 17:01:12 INFO mapred.JobClient:     Map output bytes=2200
10/06/16 17:01:12 INFO mapred.JobClient:     Combine input records=190
10/06/16 17:01:12 INFO mapred.JobClient:     Map output records=190
10/06/16 17:01:12 INFO mapred.JobClient:     Reduce input records=142
Here is an explanation of the program

Configuration is the configuration which is replicated at all nodes in HDFS. All HDFS nodes run on their own JVM. JobTracker runs on master and accepts jobs from clients. It distributes the job into parts and sends the parts over to all the nodes via TaskTracker. taskTracker communicates with the jobtracker and keeps on asking for more job to run - as and when it finishes its task. Tasktracker can run multiple instances for multiple tasks. You set the mapper class and the reducer class in job.

job.setMapperClass();
job.setReducerClass();

You can also specify the input and output paths in the job

FileInputFormat.addInputPath(job, Path);
FileOutputFormat.setOutputPath(job, Path);

You can also set a number of other options in the Job.

job.setNumReduceTasks(); //set number of reduce tasks
job.setOutputFormat(); //set the output format

Job is submitted by calling either of.

job.submit(); //send the job to the cluster and return - non blocking
job.WaitForCompletion(); // send the job to the cluster and wait for its completion

Job determines the proper division of input into parts and sends job data to master jobtracker server. Tasktracker asks the jobtracker for its inputsplit data and the job to run. It calls mapper for each record received from the inputSplit. Mapper should extend the mapreduce.Mapper class which has code for some of the required functions of mapper. All mappers run separately on each node inside the tasktracker running on separate instances of jvm. There is no sharing of data. If you set a static variable in one mapper, you will not get its value in another mapper on another tasktracker.

Map(Object key, Object value, Context context)

To allow serialization and transfer of all types of data, java defines its own writable class. These box classes like Text (for String), IntWritable (for integers), LongWritable (for long) are instances of base class Writable (for values), and instances of WritableComparable (for Keys). Context is used to collect and write the ouput into intermediate as well as final files.

context.write() takes (Key,Value)

Partitioner is used to get the partition number for a given key. By default HashPartitioner is used which uses key.hashCode() to return the partition number. You can use job to specify custo
m partitioner.

Reducer(Object Key, Iterator Values, Context context)

Keys and values sent to one partition all go to the same reduce task. Reduce calls are sorted by key. Reducer sends the data to the recordwriter which writes the data to a output file.

Hadoop provides lots of flexibility to override the components and customize the input and output data.

HadoopStreaming can be used to interface the hadoop map-reduce framework with arbitary program code. It uses stdin and stdout for data flow. You can define a separate program for each of map
per and reducer in any language of your choice.

Monday, June 07, 2010

Innodb now supports native AIO on Linux

With the exception of Windows InnoDB has used ’simulated AIO’ on all other platforms to perform certain IO operations. The IO requests that have been performed in a ’simulated AIO’ way are the write requests and the readahead requests for the datafile pages. Let us first look at what does ’simulated AIO’ mean in this context.

We call it ’simulated AIO’ because it appears asynchronous from the context of a query thread but from the OS perspective the IO calls are still synchronous. The query thread simply queues the request in an array and then returns to the normal working. One of the IO helper thread, which is a background thread, then takes the request from the queue and issues a synchronous IO call (pread/pwrite) meaning it blocks on the IO call. Once it returns from the pread/pwrite call, this helper thread then calls the IO completion routine on the block in question which includes doing a merge of buffered operations, if any, in case of a read. In case of a write, the block is marked as ‘clean’ and is removed from the flush_list. Some other book keeping stuff also happens in IO completion routine.

What we have changed in the InnoDB Plugin 1.1 is to use the native AIO interface on Linux. Note that this feature requires that your system has libaio installed on it. libaio is a thin wrapper around the kernelized AIO on Linux. It is different from Posix AIO which requires user level threads to service AIO requests. There is a new boolean switch, innodb_use_native_aio, to choose between simulated or native AIO, the default being to use native AIO.

How does this change the design of the InnoDB IO subsystem? Now the query thread instead of enqueueing the IO request actually dispatches the request to the kernel and returns to the normal working. The IO helper thread, instead of picking up enqueued requests, waits on the IO wait events for any completed IO requests. As soon as it is notified by the kernel that a certain request has been completed it calls the IO completion routine on that request and then returns back to wait on the IO wait events. In this new design the IO requesting thread becomes kind of a dispatcher while the background IO thread takes on the role of a collector.

What will this buy us? The answer is simple – scalability. For example, consider a system which is heavily IO bound. In InnoDB one IO helper thread works on a maximum of 256 IO requests at one time. Assume that the heavy workload results in the queue being filled up. In simulated AIO the IO helper thread will go through these requests one by one making a synchronous call for each request. This means serialisation forcing the request that is serviced last to wait for the other 255 requests before it gets a chance. What this implies is that with simulated AIO there can be at most ‘n’ IO requests in parallel inside the kernel where ‘n’ is the total number of IO helper threads (this is not entirely true because query threads are also allowed to issue synchronous requests as well, but I’ll gloss over that detail for now). In case of native AIO all 256 requests are dispatched to the kernel and if the underlying OS can service more requests in parallel then we’ll take advantage of that.

The idea of coalescing contiguous requests is now off loaded to the kernel/IO scheduler. What this means is that which IO scheduler you are using or the properties of your RAID/disk controller may now have more affect on the overall IO performance. This is also true because now many more IO requests will be inside the kernel than before. Though we have not run tests to specifically certify any particular IO scheduler the conventional wisdom has been that for database engine workloads perhaps no-op or deadline scheduler would give optimal performance. I have heard that lately a lots of improvements have gone in cfq as well. It is for you to try and as always YMMV. And we look forward to hear your story.

NOTE:InnoDB h as always used native AIO on Windows and it continues to do so in Plugin 1.1. innodb_use_native_aio will have no affect on Windows.

Friday, June 04, 2010

Postgresql tuning - quick tips

The configuration file for postgresql is located in <postgresql_install_dir>/data/postgresql.conf

You can alter the following settings in the config for better performance of postgresql.

# listen to all ip addresses
listen_addresses = '*'

# should not exceed 1000. Req 400 bytes per connection
max_connections = 500

# used for managing connection, active operations
# should be 1/4 of the available memory

shared_buffers = 1024MB

# dedicated memory for each operation. Used basically for sorting
# should be available_memory/max_connections for normal operations. Max available_memory/(2*max_connections)
# another way of getting this number is using the EXPLAIN ANALYZE query. If the plan shows "sort method: external merge disk: 7532kb", then work_mem of 8Mb can do wonders.

work_mem = 1MB

# same as work_mem but for vaccum, alter, other ddl qry.
# should be around 256MB

maintenance_work_mem = 256MB

# size of write ahead log files
# default 8 KB. 1 MB is enough for large systems.
# SMP machines are better with 8 MB

wal_buffers = 1MB

# After every transaction, pgsql forces a commit to disk out its write-ahead log.
# defaults to fsync.
# generally switched to open_sync, but it is buggy on many platforms.
# Should be benchmarked with very heavy query, before switching.

wal_sync_method = fsync

# estimate of how much memory is available for disk caching by the OS and within the DB itself
# recomended to 1/2 of available memory. On unix can be set to free+cached from "free".

effective_cache_size = 512MB

# by default 3*16MB per segment = 48 MB. Can be resource intensive on modern systems.
# setting it to 32 - checkpoint every 512 MB can be effective in reducing disk io

checkpoint_segments = 32

# checkpoint occurs every 5 minutes. can be increased
checkpoint_timeout = 5min

# should be increased if you want to collect a lot of information for helping pgsql to create query plans
default_statistics_target=100

# Synchronous commit introduced in pgsql 8.3 allows a small amount of data loss (in case of failure) for large boost in number of updates on the database per second.
synchronous_commit=on

# If after tweaking every variable, your query execution plan still is not acceptable,
# you can bring down the random page cost to 3.0 or 2.0.
# setting this variable lower will encourage the query optimizer to use random access index scans.

random_page_cost=4.0