opcode cache comparison – xcache versus apc versus no-cache

Opcode caching tools like xcache, apc are also known as php accelerators. They work by storing the bytecode of interpreted php in shared memory.

To explain in common english, php is an interpreted language. Every time a http request comes, it picks up all php files required for its processing and interprets them – compiles them into machine readable code – known as opcode. The opcode is then run on the machine. Php accelerators speed up php execution by lazily compiling php (on demand compiling) and storing the generated opcode in shared memory. Thus preventing file io and overhead of interpreting the code again and again.

There are multiple solutions available for opcode caching in php. Here is the list available at wikipedia http://en.wikipedia.org/wiki/List_of_PHP_accelerators. APC and Xcache are two out of these which are somewhat famous.

I did a benchmark of both and tried to figure out which one would be better. Here is the benchmark result, and an analysis of the benchmark.

I ran the benchmark on my system

Configuration
Intel(R) Core(TM)2 Duo CPU T6570 @ 2.10GHz
L2 cache : 2048 Kb
RAM : 2 GB
php version 5.2.17
apache version 2.2.19
apc version 3.1.9
xcache version 1.3.2
Ubuntu 11.04 – 32 bit
siege was used to run the benchmarks

Installation
APC : sudo pecl install apc
xcache : download xcache, untar, ./configure; make; make install
Load the respective caching module in php.ini
extension = apc.so OR extension = xcache.so

Cache Configuration
For both xcache and apc, i accepted the default settings and changed only these variables
gc_interval = 3600;
ttl = 3600;
size = 32M;

I benchmarked a page deployed on my local system with images, tons of php code and lots of mysql queries.

Run 1 : 5 minutes with concurrency = 10

without cache

Transactions: 422 hits
Availability: 100.00 %
Elapsed time: 299.20 secs
Data transferred: 39.65 MB
Response time: 6.50 secs
Transaction rate: 1.41 trans/sec
Throughput: 0.13 MB/sec
Concurrency: 9.16
Successful transactions: 422
Failed transactions: 0
Longest transaction: 8.17
Shortest transaction: 3.44

APC

Transactions: 465 hits
Availability: 100.00 %
Elapsed time: 299.74 secs
Data transferred: 43.66 MB
Response time: 5.86 secs
Transaction rate: 1.55 trans/sec
Throughput: 0.15 MB/sec
Concurrency: 9.09
Successful transactions: 465
Failed transactions: 0
Longest transaction: 9.20
Shortest transaction: 3.91
Total Hits in cache: 85,773
Total Misses in cache: 223

Xcache

Transactions: 479 hits
Availability: 100.00 %
Elapsed time: 299.11 secs
Data transferred: 44.99 MB
Response time: 5.67 secs
Transaction rate: 1.60 trans/sec
Throughput: 0.15 MB/sec
Concurrency: 9.07
Successful transactions: 479
Failed transactions: 0
Longest transaction: 7.39
Shortest transaction: 3.80
Total Hits on cache: 87,884
Total Misses in cache: 158

As you can see with a concurrency of 10, xcache gives a transaction rate of 1.6, apc gives 1.55 and no-cache gives 1.41 transactions per second. There is a 10% improvement with apc and 14% improvement with xcache.

Shortest transaction with nocache was 3.44 where as that with xcache was 3.8 and that with apc was 3.91. This shows that xcache took less time as compared to apc for caching a page miss. The longest transaction of apc was higher than that of no-cache. But xcache’s longest transaction was better than the longest transaction of no-cache. Ideally the longest transaction of cached page should be less than that of no-cache page. Why APC had a higher longest transaction – i was unable to figure out.

Run 2 : 5 minutes with concurrency = 20

No cache

Transactions: 373 hits
Availability: 100.00 %
Elapsed time: 299.70 secs
Data transferred: 35.11 MB
Response time: 15.10 secs
Transaction rate: 1.24 trans/sec
Throughput: 0.12 MB/sec
Concurrency: 18.79
Successful transactions: 373
Failed transactions: 0
Longest transaction: 20.58
Shortest transaction: 5.41

APC

Transactions: 458 hits
Availability: 100.00 %
Elapsed time: 299.93 secs
Data transferred: 43.09 MB
Response time: 12.28 secs
Transaction rate: 1.53 trans/sec
Throughput: 0.14 MB/sec
Concurrency: 18.75
Successful transactions: 458
Failed transactions: 0
Longest transaction: 19.19
Shortest transaction: 9.73

Xcache

Transactions: 459 hits
Availability: 100.00 %
Elapsed time: 299.85 secs
Data transferred: 43.18 MB
Response time: 12.30 secs
Transaction rate: 1.53 trans/sec
Throughput: 0.14 MB/sec
Concurrency: 18.82
Successful transactions: 459
Failed transactions: 0
Longest transaction: 15.12
Shortest transaction: 6.60

In the second run though both xcache and apc have the same transaction rate of 1.53 which is 23% higher than that of no cache 1.24, but the longest transaction and shortest transaction of xcache was better than that of apc. It shows that xcache handles caching better than apc.

Eventually, there were some bugs in apc caused crashes when i tried to implement it. And xcache ran fine without any issues. This scenario shows that xcache is better.

Data structures – TRIE

Trie is a data structure used for data retrieval. It is a multiway tree structure used to store strings over an alphabet. It is mainly used for large dictionaries of english words in spell-checking and natural language understanding programs. The idea is that all strings which have a common stem or prefix hang off a common node. the elements in a string can be recovered by scanning from the root to the leaf that ends the string. All strings in a trie can be recovered by doing a depth first scan of the tree.

The time to insert, delete or find is almost identical because the code paths followed for each are almost identical. As a result, for inserting, deleting or searching, tries can easily beat binary search trees or even hash tables.

Advantages of Trie over Binary Search Tree
1. Looking up of keys is faster. Worst case scenario a key of length m takes O(m) time. BST performs search by doing O(log(n)) comparisons. In worst case BST takes O(m log n) time. Simple operations that tries use during lookup such as array indexing using a character are fast on real machines.
2. Tries are more space efficient when they contain a large number of short keys, because nodes are shared between keys with common initial subsequences.
3. Tries facilitate longest-prefix matching, helping to find the key sharing the longest possible prefix of characters all unique.
4. The number of internal nodes from root to leaf equals the length of the key. Balancing the tree is therefore no concern.

Advantages of Trie over Hash Tables
1. Tries support ordered iteration, whereas iteration over a hash table will result in a pseudorandom order given by the hash function (also, the order of hash collisions is implementation defined), which is usually meaningless
2. Tries facilitate longest-prefix matching, but hashing does not, as a consequence of the above. Performing such a “closest fit” find can, depending on implementation, be as quick as an exact find.
3. Tries tend to be faster on average at insertion than hash tables because hash tables must rebuild their index when it becomes full – a very expensive operation. Tries therefore have much better bounded worst case time costs, which is important for latency sensitive programs.
4. By avoiding the hash function, tries are generally faster than hash tables for small keys like integers and pointers.

Some disadvantages of Trie
1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random access time is high compared to main memory
2. Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless a bitwise trie can handle standard IEEE single and double format floating point numbers.

Code for implementing trie in java

import java.util.HashMap;
import java.util.Map;

public class Trie {
    private static class Node {
        private boolean isWord = false; // indicates a complete word
        private int prefixes = 0; // indicates how many words have the prefix
        private Map children = new HashMap(); // references to all possible children
    }

    private Node root = new Node();

    /**
     * Inserts a new word into the trie
     * @param word
     */
    public void insertWord(String word){
        if(searchWord(word) == true) return;

        Node current = root;
        for(char c : word.toCharArray()){
            if(current.children.containsKey(Character.valueOf(c))){
                Node child = current.children.get(Character.valueOf(c));
                child.prefixes++;

                current = child;
            }else{
                Node child = new Node();
                child.prefixes = 1;

                current.children.put(Character.valueOf(c), child);
                current = child;
            }
        }
        // we have reached the endof the word, hence mark it true
        // if during a search we reach the end of the search string and this
        // flag is still false, then the search string is not registered as a valid
        // word in the trie but is a prefix
        current.isWord = true;
    }

    /**
     * Searches for a word in the trie
     * @param word
     */
    public boolean searchWord(String word){
        Node current = root;
        for(char c : word.toCharArray()){
            if(current.children.containsKey(Character.valueOf(c))){
                current = current.children.get(Character.valueOf(c));
            }else{
                return false;
            }
        }
        // if at the end of the search of entire word the boolean variable is
        // still false means that the given word is not regitered as a valid
        // word in the trie, but is a prefix
        return current.isWord;
    }

    /**
     * Deletes a word from the trie
     * @param word
     */
    public void deleteWord(String word){
        if(searchWord(word) == false) return;

        Node current = root;
        for(char c : word.toCharArray()){
            Node child = current.children.get(Character.valueOf(c));
            if(child.prefixes == 1){
                current.children.remove(Character.valueOf(c));
                return;
            }else{
                child.prefixes--;
                current = child;
            }
        }
        // since the word is removed now, set the flag to false
        current.isWord = false;
    }

    public static void main(String[] args) {
        Trie trie = new Trie();
        trie.insertWord("ball");
        trie.insertWord("balls");
        trie.insertWord("bat");
        trie.insertWord("doll");
        trie.insertWord("dork");
        trie.insertWord("dorm");
        trie.insertWord("send");
        trie.insertWord("sense");

        // testing deletion
        System.out.println("Testing deletion : ");
        System.out.println(trie.searchWord("ball"));
        trie.deleteWord("ball");
        System.out.println(trie.searchWord("ball"));
        System.out.println(trie.searchWord("balls"));

        // testing insertion
        System.out.println("Testing insertion : ");
        System.out.println(trie.searchWord("dumb"));
        trie.insertWord("dumb");
        System.out.println(trie.searchWord("dumb"));
        trie.deleteWord("dumb");
        System.out.println(trie.searchWord("dumb"));
    }
} 

OUTPUT : java Trie

Testing deletion :
true
false
true
Testing insertion :
false
true

Source : http://en.wikipedia.org/wiki/Trie,
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie/

Strategies for porting data

In every application there always comes a time when data needs to be ported – either from old application to another new application or from one data store to another. You might have changed the database structure to implement some new functionality. Maybe move your data from an SQL to a NoSQL.

The most important tool for moving data from one data-set to another is a porting script. The porting script maps the data from fields of old data-set to the fields of the new data-set. The porting script can contain logic or simple sql.

In a live system where data keeps on coming, it becomes difficult to port data. If data is not handled properly it might lose its sanity. There is a catch while dealing with live systems. It should be considered if the porting of data leads to a downtime or not. Ideally there should be as little downtime as possible. Here when I refer to downtime, it is downtime for both internal and external users.

There are different ways for handling the porting of data.

1. The sure-shot way of porting data without losing any sanity is the bring down the application. Freeze the data set and run your scripts. This technique is ok when you are dealing with small amounts of data. And the porting time is not more than a few minutes – if you can afford a downtime of a few minutes. The good thing about this type of porting is that you do not have to worry about the sanity of data. It is difficult to mess your data when your application is completely offline.

2. Another technique is to move to your new application and use the new data-set to insert new data and select old data from the old data-set for display. This is a very effective way of porting data. All that needs to be done is put in the adapter design pattern (wrapper design pattern) at the point where you are selecting data. Make a note of the D-Day when the new application is made live. And use the adapter pattern to fire selects on old data-set if you need to fetch data previous to the D-Day else fetch data from the new data-set. All inserts would happen on the new data set. This is very effective because ideally all data would slowly move on its own from old data-set to new data-set. If you want to expedite the process, you can have a separate script for porting data older than the D-Day.

3. The third technique is an alternate to the second technique. Instead of putting the adapter pattern at the point of select, you put an adapter pattern at the point of insert. So in addition to inserting into the old data-set, you also insert into the new data-set. All your selects are still fired on the old data-set. Eventually when you are satisfied that data has moved from old data-set to new, you can shift to the new application and start using the new data-set. Here also a script can be run to explicitly port data from old data-set to new data-set.

4. Another variant of the above techniques, is using user access requests to port data. No matter how many tables there are in a system, there is a master table and almost all the tables in the system have their foreign key referring to the master. When data is ported the primary key of the master is kept track of. What this technique does is that it ports data when a primary key is accessed. So for example when a user logs into the system, you check if his data is ported to the new data-set. If not, you port all data related to the user at that instant and move the user to the new system. The bad point about this type of data porting is that if there is a scenario where large number of users suddenly come online – it may become difficult to handle the spike and the users may experience slowness or errors.

Techniques 2, 3 and 4 require some planning and care while execution. But when you are dealing with GBs of data which cannot be worked upon in a few minutes or hours, and you cannot afford a downtime, they are the ideal way to move data. It is really important to remember that even a small issue in moving data can result in losing the sanity. Hence utmost caution and thorough testing is required before you can go ahead and implement these techniques.

In case you come across any other interesting techniques or have used any other technique, do share your knowledge.