Tuesday, August 05, 2008

Don’t start a dedicated thread to perform a simple task

When you want to perform a piece of code asynchronously it is tempting to start a dedicated thread to execute this code. However, starting a new thread has a big performance overhead. For simple tasks this overhead cannot be justified (the CPU-time needed to start the thread might be much bigger than the actual processing that will occur in the thread). Microsoft recognizes this and provides the Threadpool which is much better for executing tasks asynchronously.

I decided to write a small benchmark to find out just how bad it is to start a dedicated thread for a simple task. The code of the task is very simple: increment a counter and check the value of this counter. This task has to be executed 1,000,000 times. Of course this is not actual code that you would write; it is just the simplest task that I could think of to execute asynchronously.
Because the counter will be accessed from multiple threads it needs to be protected by a lock. In this particular code it is possible to avoid the lock using the System.Threading.Interlocked class. I have decided not to do this because the use of a normal lock makes the benchmark much more relevant towards real-world scenarios.

Here is the version of the code that uses a dedicated thread for each task (error-handling code and the code that was used to measure the performance has been removed for clarity):

   1: class Program 
   2: { 
   3:         const int c_Loops = 1000000; 
   4:         static uint s_Counter;        // the counter that will be incremented by every task 
   5:         static object s_Lock = new object();        // needed because the counter is accessed from multiple threads 
   6:         static AutoResetEvent s_TestFinished = new AutoResetEvent(false);        // signals when the test is finished 
   7:  
   8:         static void Main(string[] args) 
   9:         { 
  10:                 for (int counter = 0; counter < c_Loops; counter++) 
  11:                 { 
  12:                         Thread t = new Thread(new ThreadStart(SmallTask)); 
  13:                         t.Start(); 
  14:                 } 
  15:                 s_TestFinished.WaitOne();        // we wait until the last task has finished 
  16:         } 
  17:  
  18:         static void SmallTask() 
  19:         { 
  20:                 // This method executes on a dedicated thread 
  21:                 lock (s_Lock) 
  22:                 { 
  23:                         s_Counter++; 
  24:                         if (s_Counter == c_Loops) s_TestFinished.Set();        // notify the main thread that the test is finished 
  25:                 } 
  26:         } 
  27: } 

These are the performance measurements (taken on a dual-core processor):

  • Total time to execute all the tasks: 18312 ms
  • % time in user-mode: 47%

As you can see it takes quite some time to execute all the tasks. Only about half of this time is spent in user mode, the rest is spent in kernel mode.

Now let's rewrite this code using the threadpool. Only minimal changes are needed (they are shown in red):

   1: class Program 
   2: { 
   3:         const int c_Loops = 100000; 
   4:         static uint s_Counter;        // the counter that will be incremented by every task 
   5:         static object s_Lock = new object();        // needed because the counter is accessed from multiple threads 
   6:         static AutoResetEvent s_TestFinished = new AutoResetEvent(false);        // signals when the test is finished 
   7:  
   8:         static void Main(string[] args) 
   9:         { 
  10:                 for (int counter = 0; counter < c_Loops; counter++) 
  11:                 { 
  12:                         ThreadPool.QueueUserWorkItem(new WaitCallback(SmallTask));
  13:                 } 
  14:                 s_TestFinished.WaitOne();        // we wait until the last task has finished 
  15:         } 
  16:  
  17:         static void SmallTask(object state) 
  18:         { 
  19:                 // This method executes on a threadpool thread 
  20:                 lock (s_Lock) 
  21:                 { 
  22:                         s_Counter++; 
  23:                         if (s_Counter == c_Loops) s_TestFinished.Set(); 
  24:                 } 
  25:         } 
  26: } 

These are the performance measurements:

  • Total time to execute all the tasks: 218 ms.
  • % time in user-mode: 98%

So we see that by using the Treadpool we have increased the performance 82 times! Also, almost all time is spent in user-mode.

Conclusion: when executing simple tasks asynchronously, starting a dedicated thread for each task has a very big performance impact.

3 comments:

Andy Patten said...

Good article, very interesting.

A question though: How do you measure the time spent in User v Kernel mode?

I assume from your article that more time spent in User mode is better, or in other words less CPU intensive.

Anonymous said...

Enjoying your posts. Could it be that your perf increase for the ThreadPool version was partly due to c_Loops being 100,000 instead of 1,000,000?

Anonymous said...

Could it be that your perf increase for the ThreadPool version was partly due to c_Loops being 100,000 instead of 1,000,000?

LOL good catch.