Installing mogilefs for dummies

MogileFS is an open source distributed filesystem created by Danga Interactive to be used by LiveJournal project. Its features include

  • No single point of failure
  • Automatic file replication – satisfying the number of replica counts specified in the configuration
  • Flat namespace – Files are identified by named keys in a flat, global namespace. You can create as many namespaces as you’d like, so multiple applications with potentially conflicting keys can run on the same MogileFS installation.
  • Shared-Nothing – MogileFS doesn’t depend on a pricey SAN with shared disks. Every machine maintains its own local disks.
  • No RAID required – RAID doesn’t buy you any safety that MogileFS doesn’t already provide.
  • Local filesystem agnostic – local disks for mogilefs storage nodes can be formatted with the filesystem of choice (ext3, xfs, etc…)

Files inside mogilefs cannot be accessed directly. You need specific APIs to access the files and there are client implementations for several
languages:

Perl – http://search.cpan.org/~bradfitz/MogileFS-Client/
Java – http://github.com/eml/java-mogilefs
Ruby – http://seattlerb.rubyforge.org/mogilefs-client/
PHP – http://projects.usrportage.de/index.fcgi/php-mogilefs
Python – http://www.albany.edu/~ja6447/mogilefs.py

We will skip directly to the installation. You will have to install the following modules. You can use the cpan shell to install these modules. For novice perl users, simply type cpan from the command prompt (sudo cpan) and type out install >module_name< on the prompt. Install the following modules.

IO::AIO
Danga::Socket
Gearman::Server
Gearman::Client
Gearman::Client::Async
Net::Netmask
Perlbal
Sys::Syscall
IO::Stringy

Cpan would automatically check the dependencies and install the required modules. If you are not interested in the latest release of mogilefs, you can go ahead and install the mogilefs from cpan itself. Install these modules as well.

MogileFS::Server
MogileFS::Utils
MogileFS::Client

If you prefer latest versions, just get the files and install them individually. With latest versions, your earlier installations of dependencies using cpan may not work – in which case you will need to get the files from www.cpan.org and install them.

To install mogilefs do the following

  • Checkout the latest code from svn
    $ mkdir mogilefs-src
    $ cd mogilefs-src
    $ svn checkout http://code.sixapart.com/svn/mogilefs/trunk
  • Create database mogilefs and grant necessary permissions
    $ mysql
    mysql> CREATE DATABASE mogilefs;
    mysql> GRANT ALL ON mogilefs.* TO ‘mogile’@’%’;
    mysql> SET PASSWORD FOR ‘mogile’@’%’ = OLD_PASSWORD( ‘sekrit’ );
    mysql> FLUSH PRIVILEGES;
    mysql> quit
  • Set up trackers and storage servers. Install the mogile server.
    $ cd <path to mogilefs-src>trunk/server/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
    If during make test, you get a mysql related connection error, it could be safely ignored – assuming that you have mysql installed and perl-mysql connection (DBD::MySQL) configured.
    Now lets install some utilities:
    $ cd <path to mogilefs-src>trunk/utils/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
    And the perl api:
    $ cd <path to mogilefs-src>trunk/api/perl/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
  • Configure the database
    $ ./mogdbsetup –dbhost=mogiledb.yourdomain.com –dbname=mogilefs –dbuser=mogile –dbpass=sekrit
  • Create the configuration files
    $ mkdir /etc/mogilefs/
    $ cp <path to mogilefs-src>trunk/server/conf/*.conf /etc/mogilefs/
    Edit the configuration files.
    $ vim /etc/mogilefs/mogilefsd.conf

    #Configuration for MogileFS daemon
    db_dsn = DBI:mysql:mogilefs:host=mogiledb.yourdomain.com
    db_user = mogile
    db_pass = sekrit
    listen = 127.0.0.1:7001 # IP:PORT to listen on for mogilefs client requests

    $ vim /etc/mogilefs/mogstored.conf

    #Configuration for storage nodes
    maxconns = 10000
    httplisten = 0.0.0.0:7500
    mgmtlisten = 0.0.0.0:7501
    docroot = /home/mogile/mogdata #where data will be stored

  • Create the user mogile.
    $ adduser mogile
  • Start the storage node
    $ mogstored –daemon
  • Start the tracker node as mogile user
    $ su – mogile
    $ mogilefsd -c /etc/mogilefs/mogilefsd.conf –daemon
  • Now that we have the trackers & storage nodes up lets tell the tracker that a storage node is available.
    $ mogadm –trackers=<tracker_ip>:7001 host add <storage_node_name> –ip=127.0.0.1 –port=7500 –status=alive
    And check that the host is being recognized.
    $ mogadm –trackers=<tracker_ip>:7001 host list
  • Add a device to the storage node where files would be kept.
    $ mogadm –trackers=<tracker_ip>:7001 device add <storage_node_name> 1
    And create the directory for the device (dev1 in our case).
    $ mkdir -p /home/mogile/mogdata/dev1
    Check that the device information is being displayed
    $ mogadm –trackers=<tracker_ip>:7001 device list
  • Now we are up and running. Next step would be creating namespaces and adding files.
    We could create domains and classes withing the domains using the mogadm utility and then use some api to add files to the class.
    create domain
    $ mogadm –trackers=<tracker_ip>:7001 domain add <domain_name>
    check if domain has been added
    $ mogadm –trackers=<tracker_ip>:7001 domain list
    create a class in the domain
    $ mogadm –trackers=<tracker_ip>:7001 class add <domain_name> <class_name>
    check if class has been added
    $ mogadm –trackers=<tracker_ip>:7001 class list

You can use the stats command to see a summary of the status of mogilefs.

$ mogadm –trackers=<tracker_ip>:7001 stats

That finishes the tutorial for installing mogilefs on a single node. You can easily replicate these steps on multiple servers creating a number of trackers and storage nodes. All that is needed is to add all the storage nodes to the trackers and using the same db for storing all the information.

If you have built up enough redundancy with more than 3 storage nodes & trackers, there should not be any point of failure. The only single point of failure that i could figure out was the database. You should create a slave to use it for failover scenarios.

intro to lucene 2.9

What crap!!!. Why do they have to come out with a new version every now and then. And make people rewrite their code to upgrade to a new version. How much do they still have to improve their code. Just because of their frequent upgrades, i have to change my code every now and then. Why
should i upgrade to lucene 2.9?

To answer this question – it could be said that you build something and then you figure out that – oh no if this could have been done in this way then it would have been better. So for example you make an omlette and you figure out that putting a bit of cheese and pepper would have improved its taste. So next time you try that, but then you figure out that making it in butter would bring out more taste. Or you buy a Pentium 3 pc with 1 GB ram and after 2 years you see that it is outdated – the softwares have grown and so have the processing powers. To run the currently available softwares, you would need to upgrade your pc to a Pentium 4 – core 2 duo and maybe upgrade your graphics card to ATI Radeon 4870 X2 from the previous nvidia 9800 GT to play the recent games more effectively. And maybe upgrade your 20 inch CRT television to a 42 inch HD LCD for better graphics display.

It is the same reason that lucene keeps on optimizing its code and improving the features – they realize that better code leads to faster indexing and searching on the same machine.

The reason why you shud upgrade your lucene version is defined by the list of features that lucene 2.9 provides:

  • Per segment searching and caching (can lead to much faster reopen among other things). FieldCache – takes advantage of the fact that most segments of the index are static, only processes the parts that change, save on time and memory. Also faster searching among multiple segments.
  • Near real-time search capabilities added to IndexWriter – new way to search the current in-memory segment before index is written to disk.
  • New Query types
  • Smarter, more scalable multi-term queries (wildcard, range, etc)
  • A freshly optimized Collector/Scorer API
  • Improved Unicode support and the addition of Collation contrib
  • A new Attribute based TokenStream API
  • A new QueryParser framework in contrib with a core QueryParser replacement impl included.
  • Scoring is now optional when sorting by Field, or using a custom Collector, gaining sizable performance when scores are not required.
  • New analyzers (PersianAnalyzer, ArabicAnalyzer, SmartChineseAnalyzer)
  • New fast-vector-highlighter for large documents
  • Lucene now includes high-performance handling of numeric fields (NumericField & NumericRangeQuery). Such fields are indexed with a trie structure, enabling simple to use and much faster numeric range searching without having to externally pre-process numeric values into textual values. This improves the Lucene number indexing, and is faster for searching numbers, geo-locations, and dates, faster for sorting, and hugely faster for range searching.

For the newbies, all this techno-rant is just there to make you feel good about upgrading. In brief – faster search and more features.

Lets take a look at how you would go ahead with indexing and searching using lucene 2.9

Here is a very rough example. What i have done is use the twitter api to search for keywords in twitter and fetch the micro-blogs and create an index using lucene 2.9. And then use the same program to open the index and run a search – displaying only the n results. You can fetch the twitter api from http://yusuke.homeip.net/twitter4j/en/index.html


import twitter4j.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.search.*;
import org.apache.lucene.document.*;
import java.io.*;
import java.util.Date;
import java.util.ArrayList;
import java.util.List;

public class lucene
{
public static void main(String[] args) throws Exception
{
if(args.length != 3)
{
System.out.println("Usage : java lucene <index/search> <dirname> <string>");
System.exit(1);
}

if(!args[0].equalsIgnoreCase("index") && !args[0].equalsIgnoreCase("search"))
{
System.out.println("Usage : java lucene <index/search> <dirname> <string>");
System.exit(1);
}
System.out.println(args[0]+","+args[1]+","+args[2]);

lucene lu = new lucene(args[0], args[1]);
if(args[0].equalsIgnoreCase("index"))
lu.indexFiles(args[2]);
else if(args[0].equalsIgnoreCase("search"))
lu.searchFiles(args[2]);


}

File index_dir;
String action;

public lucene(String action, String dirname) throws Exception
{
this.index_dir = new File(dirname);
this.action = action;

if(index_dir.exists() && action.equalsIgnoreCase("index"))
{
System.out.println("Index already exisits... enter another another directory for indexing...");
System.exit(1);
}
}

public void indexFiles(String searchstr) throws Exception
{
Twitter tw = new Twitter();
System.out.println("Getting tweets for "+searchstr);
twitter4j.Query qry = new twitter4j.Query("source:twitter4j "+searchstr);
qry.setRpp(50);

QueryResult res = tw.search(qry);
List<Tweet> tweets = res.getTweets();
System.out.println("Got "+tweets.size()+" tweets in "+res.getCompletedIn()+" : "+res.getMaxId());

// constructor changed from lucene 2.4.1
IndexWriter iw = new IndexWriter(NIOFSDirectory.open(this.index_dir), new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);

int docs = 0;
for(int z=0; z<tweets.size(); z++)
{
Tweet twt = (Tweet)(tweets.get(z));
String user = twt.getFromUser();
String usrTwt = twt.getText();
System.out.println("Got : "+user+" => "+usrTwt);

Document d = new Document();
// constructor for Field changed - introduced new constants ANALYZED & NOT_ANALYZED. Not storing NORMS improve performance.
d.add(new Field("user", user, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS, Field.TermVector.YES));
d.add(new Field("tweet", usrTwt, Field.Store.YES, Field.Index.ANALYZED_NO_NORMS, Field.TermVector.WITH_POSITIONS_OFFSETS));

iw.addDocument(d);
docs++;
}

System.out.println("optimizing..."+docs+" docs");
iw.optimize();
iw.close();
}

public void searchFiles(String searchstr) throws Exception
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
QueryParser parser = new QueryParser("tweet",new WhitespaceAnalyzer());
// New constructor in 2.9 - pass true to open in readonly mode.
IndexReader ir = IndexReader.open(NIOFSDirectory.open(this.index_dir), true);
Searcher searcher = new IndexSearcher(ir);
int ResultsPerPage = 5;
do
{
org.apache.lucene.search.Query qry = parser.parse(searchstr);
System.out.println("Searching for : "+searchstr);

//use TopScoreDocCollector to get results and do paging. Get 2 page in a go. Do not sort on score.
TopScoreDocCollector collector = TopScoreDocCollector.create(2*ResultsPerPage, false);
searcher.search(qry, collector);
//get total no of hits found;
int totalResults = collector.getTotalHits();
int start = 0;
int end = Math.min(totalResults, ResultsPerPage);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

System.out.println("Total hits : "+totalResults+", end : "+end);

for(int i=start; i<end; i++)
{
Document doc = searcher.doc(hits[i].doc);
System.out.println(i+"] "+doc.get("user")+" => "+doc.get("tweet"));
}


System.out.print("nQuery (enter "quit" to exit): ");
searchstr = br.readLine();
if(searchstr == null || searchstr.length() == -1)
{
break;
}
searchstr.trim();
if(searchstr.length()==0)
{
break;
}

}while(!searchstr.equalsIgnoreCase("quit"));

}
}

SHOW ENGINE INNODB STATUS;

Innodb monitors show information about innodb internal status – which could be used for performance tuning. Lets break down the output of show engine innodb status and get a look at what is happening and how it can be improved. Just fire the “Show engine innodb status” command and check the Output.

mysql> show engine innodb statusG
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
091002 9:44:20 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 45 seconds

To get a better picture ensure that the output is sampled for some time (say more than 15 seconds – the more the better). If the sampling time is less, just run the same command again to get another sample.

———-
SEMAPHORES
———-
OS WAIT ARRAY INFO: reservation count 20636234, signal count 20151488
Mutex spin waits 0, rounds 486769929, OS waits 6568865
RW-shared spins 19231656, OS waits 8487916; RW-excl spins 13940968, OS waits 4652674

This section reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A large number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries or problems in operating system thread scheduling. Setting the innodb_thread_concurrency system variable smaller than the default value might help in such situations.

A more explicit output could be (obtained from mysql.com)

———-
SEMAPHORES
———-
OS WAIT ARRAY INFO: reservation count 413452, signal count 378357
–Thread 32782 has waited at btr0sea.c line 1477 for 0.00 seconds the
semaphore: X-lock on RW-latch at 41a28668 created in file btr0sea.c line 135
a writer (thread id 32782) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 731
Last time write locked in file btr0sea.c line 1347
Mutex spin waits 0, rounds 0, OS waits 0
RW-shared spins 108462, OS waits 37964; RW-excl spins 681824, OS waits 375485

Thread wait information is available only if mysql is running in a very high concurrency environment so innodb has to fall back to OS waits. It would require a bit of knowledge of innodb source code to figure out where you are getting stuck – like in this piece of output we are getting stuck at btr0sea.c lines 731 & 1347.

The reservation count and signal count information show how actively innodb uses internal sync array – how frequently slots are allocated in it and how frequently threads are signaled using sync array. High OS Waits are bad – it means that there is a lot of context switching in the OS. Spin waits and spin rounds are cheaper as compared to OS waits but they also waste CPU cycles. So a large amount of Spin waits and spin rounds mean cpu resources are being wasted. innodb_sync_spin_loops can be used to balance and improve these status variables.

InnoDB uses spin-loop in hopes thread locked mutex is very fast and will release mutex while current thread runs in spins, so there is saving of expensive context switching. However we can’t run spin very long as it eats CPU resources and after some loops it is reasonable to give CPU resource to concurrent threads.

————————
LATEST FOREIGN KEY ERROR
————————
030709 13:00:59 Transaction:
TRANSACTION 0 290328284, ACTIVE 0 sec, process no 3195, OS thread id 34831 inserting
15 lock struct(s), heap size 2496, undo log entries 9
MySQL thread id 25, query id 4668733 localhost heikki update
insert into ibtest11a (D, B, C) values (5, ‘khDk’ ,’khDk’)
Foreign key constraint fails for table test/ibtest11a:,
CONSTRAINT `0_219242` FOREIGN KEY (`A`, `D`) REFERENCES `ibtest11b` (`A`,
`D`) ON DELETE CASCADE ON UPDATE CASCADE
Trying to add in child table, in index PRIMARY tuple:
0: len 4; hex 80000101; asc ….;; 1: len 4; hex 80000005; asc ….;; 2:
len 4; hex 6b68446b; asc khDk;; 3: len 6; hex 0000114e0edc; asc …N..;; 4:
len 7; hex 00000000c3e0a7; asc …….;; 5: len 4; hex 6b68446b; asc khDk;;
But in parent table test/ibtest11b, in index PRIMARY,
the closest match we can find is record:
RECORD: info bits 0 0: len 4; hex 8000015b; asc …[;; 1: len 4; hex
80000005; asc ….;; 2: len 3; hex 6b6864; asc khd;; 3: len 6; hex
0000111ef3eb; asc ……;; 4: len 7; hex 800001001e0084; asc …….;; 5:
len 3; hex 6b6864; asc khd;;

If you are using foreign keys and there has been an error, you would also have a Foreign key error section. This section provides information about the most recent foreign key constraint error.

————————
LATEST DETECTED DEADLOCK ————————
091001 14:43:14
*** (1) TRANSACTION:
TRANSACTION 0 611308122, ACTIVE 0 sec, process no 13095, OS thread id 459504560 setting auto-inc lock
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 675137455, query id 2100136547 update
REPLACE INTO TEST (ID, CONTENT) VALUES(‘2012258’, ‘xina0157’)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `TEST` trx id 0 611308122 lock mode AUTO-INC waiting
*** (2) TRANSACTION:
TRANSACTION 0 611308121, ACTIVE 0 sec, process no 13095, OS thread id 458902448 updating or deleting, thread declared inside InnoDB 492
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1024, 11 row lock(s), undo log entries 6
MySQL thread id 791833830, query id 2100136545 10.208.67.50 rx update
REPLACE INTO TEST (ID, CONTENT) VALUES(‘2012257’, ‘bhuvneshahuja’), (‘2012257’, ‘lalitgarg007’), (‘2012257’, ‘yatin1904’), (‘2012257’, ‘vinay168’), (‘2012257’, ‘Bhupennsnghn’), (‘2012257’
, ‘vulnerabl003’), (‘2012257’, ‘manollom’), (‘2012257’, ‘Solankiccoutantom’), (‘2012257’, ‘gprashad@rediffmaiom’), (‘2012257’, ‘deep1_1’), (‘2012257’, ‘ankur20584’), (‘2012257’, ‘amarrock’), (‘2012257’,
‘jyotima_89yaon’)
*** (2) HOLDS THE LOCK(S):

TABLE LOCK table `TEST` trx id 0 611308121 lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 0 page no 1822478 n bits 480 index `ID` of table `TEST` trx id 0 611308121 lock_mode X waiting
Record lock, heap no 314 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

0: len 4; hex 801eb462; asc b;; 1: len 8; hex 78696e6130313537; asc xina0157;; 2: len 4; hex ee5c72a8; asc r ;;

*** WE ROLL BACK TRANSACTION (1)

This section provides information about the most recent deadlock. It is not present if no deadlock has occurred. The contents show which transactions are involved, the statement each was attempting to execute, the locks they have and need, and which transaction InnoDB decided to roll back to break the deadlock. Innodb only prints information about few of the locks which transaction is holding. Also only last statement from each transactions is displayed, while locks rows could be locked by one of previous statements.

Here we can see that query 1 was waiting for lock to obtain auto_increment_id from for the table TEST – which has been locked by query 2. So, auto_increment_id is the one creating problems here.

————
TRANSACTIONS
————
Trx id counter 0 612244468
Purge done for trx’s n:o < 0 612244461 undo n:o < 0 0
History list length 11
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
—TRANSACTION 0 0, not started, process no 13095, OS thread id 524331952
MySQL thread id 793300751, query id 2104652418 localhost root
show engine innodb status
—TRANSACTION 0 612244464, not started, process no 13095, OS thread id 422017968
MySQL thread id 793301055, query id 2104652413 10.208.67.111 rx
—TRANSACTION 0 612244463, not started, process no 13095, OS thread id 288115632
MySQL thread id 793301051, query id 2104652405 10.208.67.46 rx
—TRANSACTION 0 612244465, not started, process no 13095, OS thread id 290991024
MySQL thread id 793301050, query id 2104652414 10.208.67.111 rx
—TRANSACTION 0 612244457, not started, process no 13095, OS thread id 54217648
MySQL thread id 793301046, query id 2104652381 10.208.67.46 rx
—TRANSACTION 0 612244467, not started, process no 13095, OS thread id 521587632
MySQL thread id 793301039, query id 2104652417 10.208.67.46 rx
—TRANSACTION 0 612244453, not started, process no 13095, OS thread id 469560240
MySQL thread id 793301038, query id 2104652371 10.208.67.111 rx
—TRANSACTION 0 612244444, not started, process no 13095, OS thread id 502590384
MySQL thread id 793301028, query id 2104652341 10.208.67.111 rx
—TRANSACTION 0 612244438, not started, process no 13095, OS thread id 220674992
MySQL thread id 793301027, query id 2104652325 10.208.67.103 rx
—TRANSACTION 0 612244423, not started, process no 13095, OS thread id 498207664
MySQL thread id 793301014, query id 2104652276 10.208.67.103 rx
—TRANSACTION 0 612244417, not started, process no 13095, OS thread id 291793840
MySQL thread id 793301013, query id 2104652269 10.208.67.103 rx
—TRANSACTION 0 612244397, not started, process no 13095, OS thread id 450493360
MySQL thread id 793300979, query id 2104652179 10.208.67.111 rx
—TRANSACTION 0 612244426, not started, process no 13095, OS thread id 459504560
MySQL thread id 675137455, query id 2104652279 Has read all relay log; waiting for the slave I/O thread to update it

If this section reports lock waits, your applications might have lock contention. If you have small number of connections all connections will be printed in transaction list, if you have large number of connections Innodb will only print number of them,so the output will not grow too large. Transaction id is current transaction identifier – it is incremented for each transaction.

“Purge done for trx’s n:o” is number of transaction to which purge is done. Innodb can only purge old versions if they there are no running transactions potentially needing them. Old stale uncommitted transactions may block purge process eating up resources. By looking at transaction counter difference between current and last purged transaction you will be able to spot it. In some rare cases purge also could have hard time to keep up with update rate, in this case difference between these values will also grow.
“undo n:o” will show the undo log record number which purge is currently processing, if it is active otherwise it will be zero.
“History list length 11” is number of unpurged transactions in undo space. It is increased as transactions which have done updates are commited and decreased as purge runs.
For each of connections for MySQL there will be ether “not started” state if there is no active Innodb transaction for this connection, or “ACTIVE” if transaction is active.
Also transaction status is reported which is basically what transaction is doing it can be “fetching rows”, “updating” and couple of other values.
“Total number of lock structs in row lock hash table 0” is number of row lock structures allocated by all transactions. Note not same as number of locked rows – there are normally many rows for each lock structure.
“Thread declared inside InnoDB 400” sometimes appears. It means thread is running inside Innodb kernel and still has 400 tickets to use. Innodb tries to limit thread concurrency allowing only innodb_thread_concurrency threads to run inside Innodb kernel at the same time. If thread is not runniing inside innodb kernel status could be “waiting in InnoDB queue” or “sleeping before joining InnoDB queue”.
To avoid too many threads competing to enter innodb queue at the same time Innodb makes thread to sleep for some time before trying to wait (if no free slot was available). This may cause number of threads active inside kernel being less than number of threads allowed by “innodb_thread_concurrency”. For certain workloads it may help to decrease the time thread waits before it enters the queue. This is done by adjusting “innodb_thread_sleep_delay variable”. Value is specified in microseconds.
“mysql tables in use 1, locked 0” is number of tables used by transaction in question and number of tables locked by transactions. Innodb does not lock tables for normal operation so number of tables locked normally stays 0, unless it is ALTER TABLE or similar statement, or if LOCK TABLES was used.
In addition to Innodb specific information, there is generic statement information which is visible in SHOW PROCESSLIST showed in SHOW INNODB STATUS, such as statement which is being executed, query id, query status etc.

——–
FILE I/O ——–
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o’s: 0, sync i/o’s: 0
Pending flushes (fsync) log: 0; buffer pool: 0
35472867 OS file reads, 512713655 OS file writes, 33949612 OS fsyncs
0.29 reads/s, 16384 avg bytes/read, 20.76 writes/s, 1.33 fsyncs/s

This section shows state of file IO helper threads – insert buffer thread, log thread, read thread and write thread. These are responsible appropriately for insert buffer merges, asynchronous log flushes, read-ahead and flushing of dirty buffers. Normal reads originated from query executions are executed by threads running queries. On Unix/Linux you will always see 4 helper threads, on Windows it however can be adjusted by innodb_file_io_threads variable. For each helper thread you can see thread state – if thread is ready – waiting for i/o request or if it is executing certain operation.
Number of pending operation is shown for each of helper threads – these are amount of operations queued for execution or being executed at the same time. Also number of pending fsync operations is displayed. For writes Innodb has to ensure data makes it to the disk – just passing it to OS cache is not enough. This is typically done by calling fsync() for modified files. Constant high values for any of these variables is indication of IO bound workload.
Next we see the number of IO operations & the averages of these operations which could be used for graphing/monitoring purposes.

————————————-
INSERT BUFFER AND ADAPTIVE HASH INDEX ————————————-
Ibuf: size 1, free list len 5, seg size 7,
7567365 inserts, 7567365 merged recs, 4848077 merges
Hash table size 4425293, used cells 1294280, node heap has 1502 buffer(s)
14.07 hash searches/s, 78.40 non-hash searches/s

This section shows insert buffer and adaptive hash status. First line shows status of insert buffer – segment size and free list as well as if there are any records is insert buffer. Next it shows how many inserts were done in insert buffer, how many recs were merged and how many merges did it took. Ratio of number of merges to number of inserts is pretty much insert buffer efficiency.

Adaptive hash index is hash index Innodb builds for some pages to speed up row lookup replacing btree search with hash search. This section shows hash table size, number of used cells and number of buffers used by adaptive hash index. You can also see number of hash index lookups and number of non-hash index lookups which is indication of hash index efficiency.


LOG

Log sequence number 263 105379218
Log flushed up to 263 105373095
Last checkpoint at 263 102251451
0 pending log writes, 0 pending chkp writes
391975179 log i/o’s done, 20.00 log i/o’s/second

Log section provides information about log subsystem of Innodb. You can see current log sequence number – which is amount of bytes Innodb has written in log files since system tablespace creation. You can also see up to which point logs have been flushed – so how much data is unflushed in log buffer as well as when last checkpoint was performed. Innodb uses fuzzy checkpointing so this line hold log sequence, all changes up to which has been flushed from buffer pool. Changes having higher log sequences may still only be recored in logs and not flushed from buffer pool so such log sequences can’t be over written in log files. By monitoring log sequence number and value up to which logs have been flushed you can check if your innodb_log_buffer_size is optimal – if you see more than 30% of log buffer size being unflushed you may want to increase it.

You also can see number of pending normal log writes and number of checkpoint log writes. Number of log/io operations allows to separate tablespace related IO from log related IO so you can see how much IO your log file requires. Note depending on your innodb_flush_log_at_trx_commit value your log writes may be more or less expensive. If innodb_flush_logs_at_trx_commit=2 log writes are done to OS cache, and being sequential writes these logs writes are pretty fast.

———————-
BUFFER POOL AND MEMORY ———————-
Total memory allocated 1177422796; in additional pool allocated 5829888
Dictionary memory allocated 94872
Buffer pool size 65536
Free buffers 1
Database pages 64033
Modified db pages 476
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 58886259, created 19378118, written 210440562
0.29 reads/s, 0.42 creates/s, 0.96 writes/s
Buffer pool hit rate 1000 / 1000

This section shows Buffer pool activity and memory usage. You can see total memory allocated by Innodb (sometimes it is higher than you anticipated), amount of memory allocated in additional memory pool (so you can check if it is sized right), total number of pages in buffer pool, number of pages free, pages allocated by database pages and dirty pages. From these values you can learn if your buffer pool is sized well – if you have constantly a lot of pages free, it probably means your active database size is smaller than allocated buffer pool size so you can tune it down. Even if free pages is zero as in this case database pages will not be equal to total size of buffer pool, because buffer pool also stores lock information, adaptive hash indexes and some other system structures.

Pending reads and writes are pending requests on buffer pool level. Innodb may merge multiple requests to one on file level so these are different. We can also see different types of IO submited by Innodb – pages to be flushed via LRU pages – dirty pages which were not accessed long time, flush list – old pages which need to be flushed by checkpointing process and single page – independent page writes.

We can also see number of pages being read and written. Created pages is empty pages created in buffer pool for new data – when previous page content was not read to the buffer pool.

Finally you can see buffer pool hit ratio which measures buffer pool efficiency. 1000/1000 corresponds to 100% hit rate. It is hard to tell what buffer pool hit rate is good enough – it is very workload dependent. Sometimes 950/1000 will be enough, sometimes you can see IO bound workload with hit rate of 995/1000.
————–
ROW OPERATIONS ————–
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 13095, id 538459056, state: sleeping
Number of rows inserted 748462338, updated 421356848, deleted 323128079, read 3220854064
5.13 inserts/s, 8.00 updates/s, 0.00 deletes/s, 43.71 reads/s
—————————-
END OF INNODB MONITOR OUTPUT
============================

Row operations show activity on the row basics and some system information.

It shows innodb thread queue status – how many threads are waiting and being active. How many read views are open inside Innodb – this is when transaction was started but no statement is currently active, state of Innodb main thread which controls scheduling of number of system operations – flushing dirty pages, checkpointing, purging, flusing logs, doing insert buffer merge. Values for “state” field are rather self explanatory.

You can also see number of rows operation since system startup as well as average values. Row operations is very good measure of Innodb load. Not all row operations are created equal of course and accessing of 10 byte rows is much cheaper than accessing 10MB blog, but it is still much more helpful than number of queries, which is even more different.