Wednesday, December 31, 2008

Essential Software Concepts: Fail fast

Software development is hard. Not only is development hard, the environment and hardware is not exactly fault proof. Let's not forget the users either, you can be sure they will do something you didn't think about. Or even the human factor in terms of deployment, database-updates and the like. So one of the few truths about software is that things will fail, the question is what you do about the failures. 


You generally have two (three) options in dealing with errors: you can try to make the system recover, or you can fail at once (Or you can try to swallow the error, and cross your fingers that it will work out in the end).


Unless it is essential that the system shouldn't fail, I like to fail things fast. By that I mean, if something doesn't work properly then fail the operation as soon as possible. Failing doesn't mean crashing the application, you should provide an informative message to the user. Failing like this has, as usual, both advantages and disadvantages. The main advantage is related to tracking and fixing bugs. One of the major efforts in fixing bugs is understanding where and why it happened. If you fail early rather than late, you'll most likely have an easier task tracking the bug down. You don't have to worry about the system having worked in an inconsistent state and there's a larger chance that the stack trace actually shows where the problem is located.


The main disadvantage, or rather question to ask yourself is "is failing early fine in production?". If you can't say yes to this, then know that you'll have a much more complex bug-tracking time ahead of you. It's not an easy question though. Failing early can sometimes mean that users are unable to use important functionality because of a strict policy of failing early. Furthermore, a system that keeps failing is not good for building customer trust. Perhaps it is better to fix it in the background in some cases. Whatever you choose, it is certainly not a black or white question.

No comments: