Wednesday, October 29, 2008

.NET and Parallelism

I attended a really interesting session today about the current work being done with parallelism for the .NET framework 4.0.

And the future is bright! :)

You should see the session yourself, but I'll try to summarize what I got out of it anyway.

Threading and parallelism are important now, but will become ever more important in the future. Our processors have stopped getting noticeably faster, but they are continuing to become smaller and more efficient, which enables to have more and more of them. Currently most of us are running with two cores, and quite a few with four. Hardware producers are saying that 80-100 cores are withing reach within a not too distant future for regular machines. We absolutely need to design for this.

So we need to start taking threads ever more into account. But it is such a complicated field, with locking, contention etc.

The current implemenation of threads aren't helping either. Create a thread and it takes 1 mb of committed memory. Try to manage them, follow them with current debugger support in VS, etc etc - it's not easy.

First of all, the annoying Thread will be replaced by much more enjoyable Task in 4.0(The thread will live on, but the Task should take over in terms of being used.).

The much improved (compared to threads) Task API, their internal handling, and the new tools to support them and parallelism (in VS2010) looks really cool.

I can't tell you how tasks differs from threads yet, except that they are similar, but that tasks has a more advanced API accompanying them, and a different internal handling.

Consider the scenario of running a program on a two or four (or something else) core computer. What's the best number of threads you could possibly have then? The answer is of course two on the two-core, four on the four-core, and so on. (Don't design your app to run with two parallel execution paths if you have a two-core computer, you want these things to scale when you them on much more powerful servers.)

When you create a myriad of threads, you get the problem with the constant fight for resources and time. With tasks this is handled in a different way. Consider this simple scheme:

  • One global queue where all your tasks are created

  • One local queue for each of your processors

Once a task is created it is put into the global queue, and then moved into one of the processors local queue.

Each individual processor then processes each of the queued tasks in a LIFO fashion, as the last one added is the one most likely to have information cached. Once it is done with its tasks, it steals (stealing is good here) in a FIFO fashion from the other lists, attaining a few things:

  • None of your processes are standing lazily around

  • You will get into much less contention issues, since you grab tasks from different sides

  • Everything is executed in an orderly, efficient manner
  • <

So how do we use these tasks?

You create tasks much in the same fashion as you created threads. The difference is you have quite a few more options in how they are handled: if you want them to work together with the parent thread, if you want to crash the main thread if there is an unhandled exception on a child-thread, in creating generic versions to return a particular type of data (+++++++). Check it out yourself.

You also get more structured assistance, which looks really nice. Instead of running for instance a standard ForEach which runs through each item sequentially, you can use a new Parallel.ForEach which runs each task in parallel. This will work great if each step can be run in isolation and don't depend on each other, but even if they do, you can use the API to say stop, and then for each task run it check whether stop has been set, in practice stopping much more prematurely. With lists you can just add ToParallel() to create a paralell version of the handling of for instance a LINQ query.

I really should go on and on.

And I have forgotten to mention the new Visual Studio tools, haven't I? There are a couple of new debugger-windows which seem extremely helpful in visualising which tassk are currently running, which tasks have been created, what values the tasks have, how the stack trace of each task is, and what methods any and every tasks have hit. Hard to explain, but you'll like it!

This is just emerging, but this will make parallel life much easier in the future. I'll surely be looking much into this!

Windows Azure

As you've bound to have heard by now, Microsoft recently announced Windows Azure at the PDC08, the new Cloud operating system.

I must say I'm really intreagued by this.

The possibilities are big. Think of the extreme costs you have for running and maintaining an application. For instance:

  • You need to be able to scale according to user interest

  • You need to be highly responsive - which means a server in Norway might work pretty bad when accessed from US or Asia

  • You need to have a server park in at the very least one place to just host the applications (if you can afford the downtime a electricity failure would cause, or possibly the loss of data and hardware a fire/earthquake could cause)

  • You need failover database solutions (If manually recreating a backup won't do)

  • Some data might not be allowed to be stored in some countries, you need to be sure that whatever solution you choose will handle this properly

Instead of these options you could get into some sort of external hosting solution. And this is in many regards what MS is offering with Azure, but I bet a fair bit of hosting-providers are literally shaking now after the new plans of MS have been introduced.

We are talking big, big scale investsments in this. The tight integration with the current MS software and process of doing things (Visual studio for instance) doesn't hurt in the process of making this popular. And it's not like they don't have quite a bit of internal expertise in this already; for instance through running the Live services.

One of the most important things are the continous expectations of ever more interactive and graphics-heavy software delivered at the blink of an eye over the internet. You can do this yourself, but I doubt the cost/value calculation will be on your size.

Another things is scaling. With Azure, or the cloud service offering, you can scale up or down according to user interest. An application with user peaks that vary over time can be tough financially, since you can end up with much more hardware set up than you usually need.

One big issue is privacy. If you host this in the cloud, you put you data in Microsofts hands. Are you cool with that?

As you have noticed, my detailed understand of Azure is lacking, to be honest I haven't bothered focusing too much on it, so much else which is intersting here at PDC! I do believe though that this can become such a big thing, and I'm expecting to have to consider it much more in the future - but for now I'll leave it to someone else.

Check out Azure here.

Wednesday, October 8, 2008

Microsoft completely miss the point of Agile Development?

Slightly annoyed about a post in CIO about Visual Studio 2010. The contents of the article is not so interesting, but a user comment annoyed me.

"Microsoft, unfortunately, continues to show that they in fact completely miss the point of agile software development. Agility is about simplicity of design, of process, of feedback mechanisms. It is also about open, community-based tools, frameworks, and standards. MS keeps offering hilariously bloated, complex, monolithic, closed, and expensive IDE "solutions" that worsen every problem they attempt to solve. Visual Studio is now, at more than 43 million lines of code (and counting), so counter to agile development practices that I must question its architects' sanity or motives. Is all of this bureaucratic bloat forced upon the VS team by clueless marketing drones? That might explain the continuing madness."

This guy has just completely missed the point. A few things:

Agile = open, community-based tools, frameworks, and standards.
Why?? Agile is (a lot more than this, but also is) about using agile methodologies and practices to drive a project to success. Any software that can help in this endavour is great, but all I care about is using the best software. Whether it is Microsoft or Thoughtworks that delivers my CI system is unimportant to me, as long as I get to make the choice. I can use an open source IDE, CI, source control, build tool, test tool, etc. if I want to (and I often do), and that's all that matters.

MS keeps offering hilariously bloated, complex, monolithic, closed, and expensive IDE "solutions" that worsen every problem they attempt to solve.
Dude, Visual Studio is a great tool! Together with Resharper it really is a tool that rocks :)

That might explain the continuing madness
Keep up the madness! I, for one, can't wait to see what comes next.

Admittedly Microsoft wants as big a part of the pie as possible, and that can certainly lead to situations that is less than optimal, but just because everything isn't good about Microsoft certainly doesn't make everything wrong.