Tuesday, August 19, 2008

Refactoring of the persistance-solution in agile projects

I invited my fellow colleagues to a discussion today about a topic I felt could make for some interesting learning: Refactoring of a persistance-solution in an agile project.

By that I mean, when/how/if do you refactor away the solution you have for communicating with the database. The general possibilities we reached was

  • You can drop the whole solution and create a new one after a few years. Evidently this happens a lot more often than you’d expect.

  • You hide the refactoring in a new big task, often accompanied by some cool buzzword

  • You try to isolate the old solution away, and start fresh with new functionality

  • A few more I’ll cover below


I was surprised that few had been in the situation where replacing the persistance-solution was deemed necessary. I don’t have the illusions that any non-developer (-strong) organization could see the real value in doing something like that, but I thought I’d get more examples of projects where that was the only solution. Say you run the msdn-friendly dataset-strong application which works great in the beginning, gets the GUI up quickly, and have a few initial iterations delivering great functionality and a GUI which rapidly fills with functions. Great stuff! Until you reach a level of complexity which just breaks the design completely down. What do you do then? A few possible solutions:

  1. Explain to the customer that you need an iteration or two to redesign the application to accommodate the newfound complexity.

  2. Quietly include the refactoring within the subsequent iterations tasks, far lowering the work delivered to the customer

  3. Postpone everything, plan instead to get around to it once everything quiets down

  4. Have separated things properly, making it easier to change from one solution to another

  5. From the initial phase of the project, settling where we are going and the time it will take, using your experience to choose a better solution, perhaps NHibernate or Castle Active Record, and hopefully your domain-model will never got into this extreme a situation (with contant refactoring)

  6. Hah, the object-relational impedance mismatch is obsolete; object databases will save the world!


First of all, there so no single answer to any of this, as with everything in our business – it depends. It depends very much on what kind of solution you are working on and what kind of environment you are working in.

Obviously number 1 isn’t going to go down well with the customer – “you’ve had such a good flow so far, just keep it going” is not a surprising answer. And that’s quite understandable, and really correct in many cases. You should have applied constant refactoring to your solution, enabling you to steer clear of that situation. Of course, if you make a msdn-friendly dataset-app in the first place, you probably know that it isn’t going to handle the most complex of tasks, so you’ll never get into this situation. I bet it does happen though, and has happened more than I can imagine (I hope not).

Number 2 is the one you should have been doing all along, except now it will far halt the current work. Perhaps that is okay. You could get to an 80% finished solution quickly, and ask if that is ok. If it is, then the customer will have gotten his money worth of application quickly. If not, you can be clear to the customer that to get this and this, it will cost exponentially more than it did before. Cause we all (should) know the Pareto principle, or 80-20 rule, 80 percent of the job takes 20 percent of the time (or cost), and vice versa. Again, the customer might not be too happy about that either.

Number 3: postpone doing anything until later. You’re already in a pretty dire situation, so postponing is not likely to do much to improve on that. If you don’t know what to do, then I guess just keep doing the same is probably the best – but if you get to that point, I don’t want to have anything to do with you anyway :)

Having separated things properly (4) will give you a good basis when you get into this situation. Hopefully it won’t be too costly to make the change to tougher concurrency-issues, more advanced business logic, etc.

As long as you didn’t go with a big design up front, but leveraged experience to choose a clever framework to start on, good for you. I currently fall very easily into the Domain Driven Design with NHibernate/Active record-camp (5), using a proper domain model and everything. Not nearly as fast as the dataset approach, but more applicable in the situations I’m usually in (The dataset-approach can be great though, don’t overdesign a simple application – there’s nothing ¬with more framework support from top to bottom in .NET than datasets!)

Slightly sarcastic with the object database-title here, I just have a feeling someone who falls for one without having much experience with the other side of the table (relational), could possibly proclaim it as the new(ish) silver bullet. Now I’d love to try using an object database, because it seems you do get around many of the issues you need to consider and constantly work on with the traditional relational database approach. However, being in the real world, you mostly have to live in an existing business environment, where relational databases are the big G. Converting the object database at the end of the development cycle is a possibility, but beware of pitfalls, some intelligent colleagues found quite a few (Perhaps they could write about it soon?).

You should always be aware of the total cost of ownership (TCO) of course, knowing that the initial development of an application is only a small part of the entire cost of it, maintenance being the biggest thief. But you know, in the real world, the one funding or driving the application development isn’t necessarily the one paying for the maintenance, so… Hopefully you’ll have someone who has a bit more professional integrity than that, but it’s no wonder shortcuts on that side are taken.

One of my colleagues, Sverre Hundeide, came up with a nice suggestion. The contents should be clear to all, but the conclusion has value. An OR-solution has a few possibilities in how it interferes with your system

  • Persistance ignorance - Nothing in your domain-model assembly references anything concerning the persistance solution used

  • POCO – For me, this is the same as above, but he means this has been defined as almost the same, except there can be some references in the assembly, extra metadata, etc.

  • IPOCO – The domain classes can inherit from a base class used for persistance

  • Code-generation – The domain and persistance code is generated from the database, often highly coupled.


Basically, the higher you are on the list, the easier it is to change to another solution. Simple, but an important thing to have in mind.

I guess it’s time for the conclusion now. Or it should have been anyway, except I’m not quite sure what I covered. Just Another blog post with a lack of well arranged contents. You’ll have to make do for now :)

No comments: