Wednesday, October 29, 2008

.NET and Parallelism

I attended a really interesting session today about the current work being done with parallelism for the .NET framework 4.0.

And the future is bright! :)

You should see the session yourself, but I'll try to summarize what I got out of it anyway.

Threading and parallelism are important now, but will become ever more important in the future. Our processors have stopped getting noticeably faster, but they are continuing to become smaller and more efficient, which enables to have more and more of them. Currently most of us are running with two cores, and quite a few with four. Hardware producers are saying that 80-100 cores are withing reach within a not too distant future for regular machines. We absolutely need to design for this.

So we need to start taking threads ever more into account. But it is such a complicated field, with locking, contention etc.

The current implemenation of threads aren't helping either. Create a thread and it takes 1 mb of committed memory. Try to manage them, follow them with current debugger support in VS, etc etc - it's not easy.

First of all, the annoying Thread will be replaced by much more enjoyable Task in 4.0(The thread will live on, but the Task should take over in terms of being used.).

The much improved (compared to threads) Task API, their internal handling, and the new tools to support them and parallelism (in VS2010) looks really cool.

I can't tell you how tasks differs from threads yet, except that they are similar, but that tasks has a more advanced API accompanying them, and a different internal handling.

Consider the scenario of running a program on a two or four (or something else) core computer. What's the best number of threads you could possibly have then? The answer is of course two on the two-core, four on the four-core, and so on. (Don't design your app to run with two parallel execution paths if you have a two-core computer, you want these things to scale when you them on much more powerful servers.)

When you create a myriad of threads, you get the problem with the constant fight for resources and time. With tasks this is handled in a different way. Consider this simple scheme:

  • One global queue where all your tasks are created

  • One local queue for each of your processors


Once a task is created it is put into the global queue, and then moved into one of the processors local queue.

Each individual processor then processes each of the queued tasks in a LIFO fashion, as the last one added is the one most likely to have information cached. Once it is done with its tasks, it steals (stealing is good here) in a FIFO fashion from the other lists, attaining a few things:

  • None of your processes are standing lazily around

  • You will get into much less contention issues, since you grab tasks from different sides

  • Everything is executed in an orderly, efficient manner
  • <


So how do we use these tasks?

You create tasks much in the same fashion as you created threads. The difference is you have quite a few more options in how they are handled: if you want them to work together with the parent thread, if you want to crash the main thread if there is an unhandled exception on a child-thread, in creating generic versions to return a particular type of data (+++++++). Check it out yourself.

You also get more structured assistance, which looks really nice. Instead of running for instance a standard ForEach which runs through each item sequentially, you can use a new Parallel.ForEach which runs each task in parallel. This will work great if each step can be run in isolation and don't depend on each other, but even if they do, you can use the API to say stop, and then for each task run it check whether stop has been set, in practice stopping much more prematurely. With lists you can just add ToParallel() to create a paralell version of the handling of for instance a LINQ query.

I really should go on and on.

And I have forgotten to mention the new Visual Studio tools, haven't I? There are a couple of new debugger-windows which seem extremely helpful in visualising which tassk are currently running, which tasks have been created, what values the tasks have, how the stack trace of each task is, and what methods any and every tasks have hit. Hard to explain, but you'll like it!

This is just emerging, but this will make parallel life much easier in the future. I'll surely be looking much into this!

No comments: