Forking Vs. Threading

What is Fork/Forking?

Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having it’s own memory. Parent process creates a separate address space for child. Both parent and child process possess the same code segment, but execute independently from each other.

The simplest example of forking is when you run a command on shell in unix/linux. Each time a user issues a command, the shell forks a child process and the task is done.

When a fork system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process, but in certain cases, this is not needed. Like in ‘exec’ system calls, there is not need to copy the parent process pages, as execv replaces the address space of the parent process itself.

Few things to note about forking are:

  • The child process will be having it’s own unique process ID.
  • The child process shall have it’s own copy of parent’s file descriptor.
  • File locks set by parent process shall not be inherited by child process.
  • Any semaphores that are open in the parent process shall also be open in the child process.
  • Child process shall have it’s own copy of message queue descriptors of the parents.
  • Child will have it’s own address space and memory.

Fork is universally accepted than thread because of the following reasons:

  • Development is much easier on fork based implementations.
  • Fork based code a more maintainable.
  • Forking is much safer and more secure because each forked process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all.
  • Threads code is much harder to debug than fork.
  • Fork are more portable than threads.
  • Forking is faster than threading on single cpu as there are no locking over-heads or context switching.

Some of the applications in which forking is used are: telnetd(freebsd), vsftpd, proftpd, Apache13, Apache2, thttpd, PostgreSQL.

Pitfalls in Fork:

  • In fork, every new process should have it’s own memory/address space, hence a longer startup and stopping time.
  • If you fork, you have two independent processes which need to talk to each other in some way. This inter-process communication is really costly.
  • When the parent exits before the forked child, you will get a ghost process. That is all much easier with a thread. You can end, suspend and resume threads from the parent easily. And if your parent exits suddenly the thread will be ended automatically.
  • In-sufficient storage space could lead the fork system to fail.

What are Threads/Threading?

Threads are Light Weight Processes (LWPs). Traditionally, a thread is just a CPU (and some other minimal state) state with the process containing the remains (data, stack, I/O, signals). Threads require less overhead than “forking” or spawning a new process because the system does not initialize a new system virtual memory space and environment for the process. While most effective on a multiprocessor system where the process flow can be scheduled to run on another processor thus gaining speed through parallel or distributed processing, gains are also found on uniprocessor systems which exploit latency in I/O and other system functions which may halt process execution.

Threads in the same process share:
== Process instructions
== Most data
== open files (descriptors)
== signals and signal handlers
== current working directory
== User and group id

Each thread has a unique:
== Thread ID
== set of registers, stack pointer
== stack for local variables, return addresses
== signal mask
== priority
== Return value: errno

Few things to note about threading are:

  • Thread are most effective on multi-processor or multi-core systems.
  • For thread – only one process/thread table and one scheduler is needed.
  • All threads within a process share the same address space.
  • A thread does not maintain a list of created threads, nor does it know the thread that created it.
  • Threads reduce overhead by sharing fundamental parts.
  • Threads are more effective in memory management because they uses the same memory block of the parent instead of creating new.

Pitfalls in threads:

  • Race conditions: The big loss with threads is that there is no natural protection from having multiple threads working on the same data at the same time without knowing that others are messing with it. This is called race condition. While the code may appear on the screen in the order you wish the code to execute, threads are scheduled by the operating system and are executed at random. It cannot be assumed that threads are executed in the order they are created. They may also execute at different speeds. When threads are executing (racing to complete) they may give unexpected results (race condition). Mutexes and joins must be utilized to achieve a predictable execution order and outcome.
  • Thread safe code: The threaded routines must call functions which are “thread safe”. This means that there are no static or global variables which other threads may clobber or read assuming single threaded operation. If static or global variables are used then mutexes must be applied or the functions must be re-written to avoid the use of these variables. In C, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is thread-safe. Thread-unsafe functions may be used by only one thread at a time in a program and the uniqueness of the thread must be ensured. Many non-reentrant functions return a pointer to static data. This can be avoided by returning dynamically allocated data or using caller-provided storage. An example of a non-thread safe function is strtok which is also not re-entrant. The “thread safe” version is the re-entrant version strtok_r.

Advantages in threads:

  • Threads share the same memory space hence sharing data between them is really faster means inter-process communication (IPC) is real fast.
  • If properly designed and implemented threads give you more speed because there aint any process level context switching in a multi threaded application.
  • .Threads are really fast to start and terminate

Some of the applications in which threading is used are: MySQL, Firebird, Apache2, MySQL 323

FAQ’s:

1. Which should i use in my application ?

Ans: That depends on a lot of factors. Forking is more heavy-weight than threading, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication. Actually threads really win the race when it comes to inter communication. Conversely, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.

which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.

2. Which one is better, threading or forking ?

Ans: That is something which totally depends on what you are looking for. Still to answer, In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process/forking compared to a thread (only the MMU stuff is additional for the thread). There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.

3. What kinds of things should be threaded or multitasked?

Ans: If you are a programmer and would like to take advantage of multithreading, the natural question is what parts of the program should/ should not be threaded. Here are a few rules of thumb (if you say “yes” to these, have fun!):

  • Are there groups of lengthy operations that don’t necessarily depend on other processing (like painting a window, printing a document, responding to a mouse-click, calculating a spreadsheet column, signal handling, etc.)?
  • Will there be few locks on data (the amount of shared data is identifiable and “small”)?
  • Are you prepared to worry about locking (mutually excluding data regions from other threads), deadlocks (a condition where two COEs have locked data that other is trying to get) and race conditions (a nasty, intractable problem where data is not locked properly and gets corrupted through threaded reads & writes)?
  • Could the task be broken into various “responsibilities”? E.g. Could one thread handle the signals, another handle GUI stuff, etc.?

Conclusions:

1. Whether you have to use threading or forking, totally depends on the requirement of your application.
2. Threads more powerful than events, but power is not something which is always needed.
3. Threads are much harder to program than forking, so only for experts.
4. Use threads mostly for performance-critical applications.

source : http://www.geekride.com/index.php/2010/01/fork-forking-vs-thread-threading-linux-kernel/

Thread Executors in java

With Java 1.4, when we used to write threaded applications, we did not have much options in limiting and reusing the same thread for the available tasks. I had written applications earlier where each process that needed to be executed freely (or a process which can be executed on a thread) used to create its own thread. And once a thread was created and used, it was simply discarded. New threads were created whenever needed.

With java 1.5, there is an executor class which allows you to create controlled thread pools and execute the list of tasks on the same number of threads. If the number of tasks exceeds the number of threads then these extra tasks just wait for a thread to become available and then start their execution.

A simple threaded server using java 1.4 :

1 import java.io.*;
2 import java.net.*;
3 import java.util.*;
4
5 public class testPool implements Runnable
6 {
7   private int port;
8   private int active, total;
9
10   public testPool(int port)
11   {
12     this.port = port;
13     System.out.println(“Thread pool constructor – creating thread”);
14     new Thread(this).start();
15   }
16
17   public void run()
18   {
19     try
20     {
21       System.out.println(“Thread pool – new thread created”);
22       ServerSocket ss = new ServerSocket(port);
23       while(true)
24       {
25         Socket s = ss.accept();
26         total++;
27         new Handler(s);
28       }
29     }catch(Exception ex)
30     {
31       ex.printStackTrace();
32     }
33   }
34
35   public class Handler implements Runnable
36   {
37     private Socket socket;
38     public Handler(Socket s)
39     {
40       this.socket = s;
41       System.out.println(“Handler constructor – creating thread”);
42       new Thread(this).start();
43     }
44     public void run()
45     {
46       System.out.println(Thread.currentThread().getName()+” – Handler – new thread created”);
47       active++;
48       boolean loop = true;
49       try
50       {
51         BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
52         DataOutputStream out = new DataOutputStream(socket.getOutputStream());
53         while(loop)
54         {
55           String cmd = in.readLine();
56           if(cmd.equals(“QUIT”))
57           {
58             loop = false;
59             socket.close();
60           }
61           else if(cmd.equals(“INFO”))
62           {
63             System.out.println(“Active Connections : “+active);
64             System.out.println(“Total Connections : “+total);
65           }
66           else
67           {
68             System.out.println(“Command = “+cmd);
69           }
70         }
71       }catch(Exception ex)
72       {
73         ex.printStackTrace();
74       }
75       active–;
76     }
77   }
78
79   public static void main(String[] args)
80   {
81     System.out.println(“Starting Daemon”);
82     new testPool(Integer.parseInt(args[0]));
83   }
84 }

Compile and run the program

jayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor – creating thread
Thread pool – new thread created

Now, as you keep on connecting to localhost 5000 port, you can see that new threads are created. Even when the old connections are closed, the same thread is not used again but new threads are created.

so, if i do

telnet localhost 5000

from 3 consoles and issue the “INFO” command from 4th console, the output is

jayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor – creating thread
Thread pool – new thread created
Handler constructor – creating thread
Thread-1 – Handler – new thread created
Handler constructor – creating thread
Thread-2 – Handler – new thread created
Handler constructor – creating thread
Thread-3 – Handler – new thread created
Handler constructor – creating thread
Thread-4 – Handler – new thread created
Active Connections : 1
Total Connections : 4

4 threads are created. In this case, everytime you connect to the server, a new thread would be created.

Now, lets modify the code to work on java 1.5 using thread Executors…

Add the following code snippets at or after the following line numbers

4 import java.util.concurrent.*;
10  private int pool;
12  this.pool=2;
22  ExecutorService threadExecutor = Executors.newFixedThreadPool(pool);
27  threadExecutor.execute(new Handler(s));

And comment the following 2 lines

27  new Handler(s);
42  new Thread(this).start();

Not lets run the program again on port 5000 and try connecting to it. We have created a pool of 2 threads here. Observer that:

  • pool-1-thread-1 & pool-1-thread-2 are the two threads created.
  • After two connections to the server, the connections are not rejected, but are queued.
  • If a thread becomes available, a connection from the queue is assigned to the thread for processing
  • The same two threads are used again and again. No new threads are created.
  • Connections which are in queue are not able to process any requests unless they are assigned to a worker thread

Process for testing

console 1: telnet localhost 5000
console 2: telnet localhost 5000
console 3: telnet localhost 5000
console 1: from 1
console 2: from 2
console 3: from 3
console 1: INFO
console 1: 1 quitting
console 1: QUIT

Output for the testing process

jjayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor – creating thread
Thread pool – new thread created
Handler constructor – creating thread
pool-1-thread-1 – Handler – new thread created
Handler constructor – creating thread
pool-1-thread-2 – Handler – new thread created
Handler constructor – creating thread
Command = from 1
Command = from 2
Active Connections : 2
Total Connections : 3
Command = 1 quitting
pool-1-thread-1 – Handler – new thread created
Command = from 3

As seen, processing of connection 3 starts only after connection 1 is closed by the client.

The benefits of using a thread executor instead of a normal thread are

a] Overhead of creating and destroying threads is avoided.
b] A queue of threads is created automatically when the no of tasks exceeds the no of threads in the pool. The tasks in queue are automatically executed when the threads become free.
c] A limit (maximum no of threads) could be imposed thus controlling the available resources for delivering better performance.

python programming – threading

There are a few options for threading in python. I wont be exploring all the options here. I would simply try to make this blog entry simple, sweet and short.

There are two modules in python which provide threading capabilities – “thread” module and “threading” module. The thread module is very basic. So lets focus on the threading module

To use the threading module, all you have to do is

import threading

class mythread(threading.Thread):
  def run(self):
    <your code>

mythread().start()
mythread().start()

This will create 2 threads running <your code>

That was quick right. Now lets see some basic stuff like Thread Naming, Thread isAlive and join which is used in most threading environments…

Naming a thread:

import threading

class mythread(threading.Thread):
  def run(self):
    print ‘my name is ‘, self.getName()

foo = mythread()
foo.setName(‘Foo’)
foo.start()

bar = mythread()
bar.setName(‘Bar’)
bar.start()

mythread().start()

Run the program:

$ python threadname.py

And see the output

my name is Foo
my name is Bar
my name is Thread-3

Checking if the thread is still alive:

import threading
import time

class mythread(threading.Thread):
  def run(self):
    print ‘my name is ‘, self.getName()

class aliveth(threading.Thread):
  def run(self):
    time.sleep(10)
    print ‘my name is ‘, self.getName()

myt = mythread()
myt.setName(‘mythread’)
myt.start()
if myt.isAlive():
  print ‘myt is alive’
else:
  print ‘myt is dead’

alt = aliveth()
alt.setName(‘aliveth’)
alt.start()
if alt.isAlive():
  print ‘alt is alive’
else:
  print ‘alt is dead’

And check the output

my name is mythread
myt is dead
alt is alive
my name is aliveth

Joining threads:

You can use the thread.join() method to make a thread wait for another thread

import threading
import time

class ThreadOne ( threading.Thread ):
  def run ( self ):
    print ‘Thread’, self.getName(), ‘started.’
    print self.getName(), ‘: sleeping ‘
    time.sleep ( 5 )
    print ‘Thread’, self.getName(), ‘ended.’

class ThreadTwo ( threading.Thread ):
  def run ( self ):
    print ‘Thread’, self.getName(), ‘started.’
    print self.getName(), ‘: waiting for ‘, thingOne.getName()
    thingOne.join()
    print ‘Thread’, self.getName(), ‘ended.’

class ThreadThree (threading.Thread):
  def run(self):
    print ‘Thread’, self.getName(), ‘started’
    print self.getName(),’: Not waiting for any other thread’
    print ‘Thread’, self.getName(), ‘ended.’

thingOne = ThreadOne()
thingOne.start()
thingTwo = ThreadTwo()
thingTwo.start()
thingThree = ThreadThree()
thingThree.start()

And check the output

Thread Thread-1 started.
Thread-1 : sleeping
Thread Thread-2 started.
Thread-2 : waiting for Thread-1
Thread Thread-3 started
Thread-3 : Not waiting for any other thread
Thread Thread-3 ended.
Thread Thread-1 ended.
Thread Thread-2 ended.

This covers most of the stuff for programming threads in python. We will look into thread synchronization issues some other time.