Sunday, August 31, 2008

Handling the object-database-problem (Presentation)

A few days ago we once again had a "fagdag" - a full day dedicated to learning (and a splendid evening building team spirit afterwards of course).

I held a presentation about the different ways to handle the object-database issue, focusing mainly on the positives and negatives of the various approaches, thus, I believe, providing more value than giving any sort of tutorial on tools or approaches which can easily be googled.

The goal of the talk was to make sure that all my colleagues were up to speed on what is available in this area today, and what should be used when.


Here's a brief overview of what I talked about.

The issue is the difference between objects and relational databases. Databases focuses mainly on storage and (more or less) normalized relationships through keys, while objects focus mostly on the interaction between objects in a business domain. They typically have the concepts of inheritance and polymorphism close at hand, while databases only handles these if they must, and definitely not naturally. the approaches from the object or database side certainly don't go hand in hand, neither technically nor conceptually.

So which options do we have for handling the object-relational impedance mismatch, and what's the pros and cons of each approach?

I covered the following approaches:

  • Manual SQL

  • Stored procedures/views

  • Dataset

  • Active Record

  • OR-mapper (self-made)

  • OR-mapper (open source/commercial)

  • Code generation


As well as a few concrete implementations of active record and OR-mapper:
  • NHibernate

  • Linq To SQL

  • Entity Framework

  • Castle Active Record


One concept important to have in mind when comparing many of these options is Persistence Ignorance. This means the degree of which your domain model is freed from having any reference or concerns about how the entity should be persisted. A large degree of Persistence Ignorance means you can model your domain model without any restrictions or interruptive information. Note that this does not correlate with how easy the solution is to use in general.

Manual SQL

Often used with datasets or "manual" OR-mapping.

Pros
- Easy to use

Cons
- Can be hard to handle / loose overview
- You need to write lot's of code
- No indication to the DBA what accesses the database
- SQL-injection
- Query plan caching - if you run a query against SQL Server without using parameters properly, the entire query will be cached, whereas if you use proper parameters the query excluding the parameters will be cached - meaning you can reuse the cached query plan with any number of queries with different parameters.

Stored procedures / Views

Often used with datasets or "manual" OR-mapping.

Pros
- Easier for the DBA to know what accesses the database
- Well known
- Great for batches and reports
- Could be used well as a layer of abstraction
- Security if you can't handle it on the application layer or when several applications must access the database and you need a central handling of it. You should have a good reason for doing this.
- Tuning of queries

Cons
- Can be time-consuming to specify everything as stored procedures
- If business logic starts leaking into your stored procedures you're in problems
- Switch database type. Not usually an issue, but if you are planning on changing databases anytime soon, don't model everything in stored procedures.
- I believe it's more hostile to change than many other options.
- Error messages and debugging is less clear
- (Testing. This is being used by many as a reason against stored procedures. I don't agree with this, becuase stored procedures are really quite easy to unit test, as long as you keep them clear and concise and don't go superdynamic on them)

Dataset

Pros
- Rapid Application Development(RAD) support
- Very quick to get up
- Support throughout the .NET framework
- Microsoft product. Easy to get a customer into
- Quick to get going, can be sure that everyone involved in a project will know of it.
Cons
- Scales very poorly with complexity
- Often need to live with utility-classes for functionality
- The database schema often leaks down into the code, frequently even into the GUI-binding. Try changing anything there..
- Not exactly an object oriented approach
- Everything must be casted (stored as object)
- String-based access to table and column-names (Unless you're using typed datasets)
- Since they're so easy to use and can be filled anywhere, some solutions even fills them in the codebehind. Oh beware

Active Record

Row object (entity) with domain logic and knowledge of how to persist itself. 1-1 mapping towards a database table.

Pros
- Easy to get started
- Not duplicated mapping of property names as in OR-mapping with mapping files
- Mapping against database in object. Nice to have everything in one spot
- Centralised connection to the database in one layer
- Always works with entities. Automagic queries to the database

Cons
- Works well with a uncomplicated schema (1:1). As soon as your domain model won't map well with the database schema, Active Record won't do for you.
- Data centric. Active Record says your objects need to be the same as your database tables. In simple approaches this works well. But as I mentioned briefly in the beginning, what you care about when you model the database is often very different from what you want and need when you model the domain model. Thus an active record approach can produce a domain model which is unclear and bothersome to work with.
- Hard for the DBA to tune queries
- Not big on Persistence Ignorance (PI)
- Testing can be an issue because of the point above

OR-mapper

Automatic mapping between database and objects through mapping files or attributes

General
-------
Pros
- Domain centric approach. Can model the domain separately from the database (to some extent)
- Dynamic SQL-generation
- Support multiple databases (At the same time or simplify switching at a point in time)
- Good solutions have support for stored procedures and/or manual SQL in the cases where you need to do things on your own
- Fast development-time once up and running

Cons
- Complexity (will get back to it)
- Performance (will get back to it)


With mapping-files
------------------
Pros
- Separated object and mapping

Cons
- Must duplicate class name, property names etc.
- XML and pure text (Help with GUI-support, advanced plugins)


OR-mapper developed in-house
----------------------------
Pros
- Own functionality
- Debugging can be easier / possible
- Know that everything that affects a project is internal
- Cool for the one doing it

Cons
- Time-consuming
- Need to redevelop something already developed a myriad of times
- Noone will continue development and testing of your solution outside of the project/organisation
- Anyone new to the project won't know the solution

OR-mapper - commercial/open source
----------------------------------
Pros
- Relatively short startup time (compared to creating it yourself, not compared with most other approaches)
- Advanced functionality available
- Well tested in projects and scenarios
- Free bugfixing and new releases
- New project members might know the solution
- User group with knowledge of its use

Cons
- Does take time
- Can't debug on errors (Might be possible)
- (Generally) limited by the products limits
- Learning curve
- If it's not Microsoft, it might be an issue in your organisation (hope not)
- What if the product go out of production/stops further development?

Code generation

Generate data access and domain objects from database

I don't know this area too well myself, so I'll be short here

Pros
- Advanced generation scrips
- With so many solutions available, there must be something good?
Cons
- Data-centric
- Many of the same cons as with OR-mapper, in terms of complexity etc.



And then it's the concrete implementations:

NHibernate

- Most known and mature
- Port from Hibernate, well known and have existed for a long time
- Large crowd of followers
- Domain centric
- Functionality
-- Mapping files
-- Inheritance (all three approaches)
-- Stored procedure support
-- All types of relationships (1-1, 1-*, *-*)
-- Caching
-- Support multiple databases
-- LINQ to NHibernate
-Persistence Ignorance (PI). Only need to set everything virtual and have a default (any visibility) constructor

I believe it's the de-facto OR-mapper in .NET at the moment.

Entity Framework

MSDN: "Designed to provide strongly typed LINQ access for applications requring a more flexible object-relational mapping, across Microsoft SQL Server and third-party databases"

- Enterprise solution
- Several layers of mapping (Conceptual, Storage, Mapping)
- Is supposed to be more than an OR-mapper (Don't deliver on this in V1)
- Poor in terms of Persistence Ignorance
- Not very mature

But I'm positive to Microsofts entry into this arena. The fact that they don't believe datasets can solve every problem in the world anymore is a big step forward. Also, they have taken all the criticism to heart, and have started fixing many of the shortcomings for v2. They also have a blog describing the work and asking for feedback, as well as an expert group following the progress. That is good stuff. Still hope I don't have to work with it yet becuase of some fancy marketing slide though. But hopefully it does become a good contender for the OR-mapping crown, what could be better than better tools?

LINQ to SQL

MSDN: "Designed to provide strongly typed LINQ access for raplidly developed applications across the Microsoft SQL Server family of databases."
- Good for RAD-development
- Support only direct 1:1 mapping towards the database
- Only table-per-hierarchy inheritance
- Only attribute-mapping when you use the 08 designer
- Only SQL Server (Though this apparently wasn't becuase of any technical challenge.)

Castle Active Record

Pros
- Simple to get started
- Created on top of NHibernate, taking advantage of all it has to offer, as well as making it much simpler to get started
- Don't need to specify column name in the attribute if it is the same as the property name

Cons
- Standard problems with PI for Active Record
- Common to inherit from base class (although you don't have to)
- Data-centric
- 1:1

I created a couple of diagrams to help compare the four approaches in terms of Peristence Ignorance and OR-mapping in general:



Friday, August 22, 2008

When is refactoring unappropriate?

In a recent meeting, our own self-proclaimed agile guru (not saying I disagree!), Odd Helge Gravalid, came up with a good suggestion on when refactoring can be really troublesome.

If you are working on a "legacy project", a system in production, lot's of ugly code, no structure - in other words refactoring heaven, refactoring can actually be dangerous. Say you do refactor lot's and lot's of the code, and at the same time, someone else need to do a fix and they bring in a bug in the code. If this is patched into the live system, and you try to find out what went wrong, you have a problem.

Why? Often you can simply start out by doing a code-comparison in these cases, to see what has been changed. Try doing that after a big refactoring (Refactoring here normally includes removing unused code, etc. Not really refactoring, but for lack of a more complete word....)

I'm not saying that you shouldn't refactor in these cases. Just be aware of the risks involved. And hopefully you do write a complete test suite for the code before you refactor it, making it a lot easier to track down any problems with both the refactoring and any new functionality.. (Remember: Typemock + legacy code testing = true)

Thursday, August 21, 2008

MSTest ExpectedException doesn't support System.Exception

This isn't an informative post, as I like to keep it, but MS just got me really annoyed.

The MSTests ExpectedException attribute, where you put in the exception you expect will be thrown, does not support System.Exception, only derived exceptions!

How crazy is that?

Obviously you should try to have as specific an exception as possible (And I follow that, must be why I haven't seen it before), but it's perfectly right to throw a System.Exception if a more detailed is not possible. Hey, it's even the one MS praises in every .NET 2.0-> certification... (ApplicationException earlier)

Tuesday, August 19, 2008

Refactoring of the persistance-solution in agile projects

I invited my fellow colleagues to a discussion today about a topic I felt could make for some interesting learning: Refactoring of a persistance-solution in an agile project.

By that I mean, when/how/if do you refactor away the solution you have for communicating with the database. The general possibilities we reached was

  • You can drop the whole solution and create a new one after a few years. Evidently this happens a lot more often than you’d expect.

  • You hide the refactoring in a new big task, often accompanied by some cool buzzword

  • You try to isolate the old solution away, and start fresh with new functionality

  • A few more I’ll cover below


I was surprised that few had been in the situation where replacing the persistance-solution was deemed necessary. I don’t have the illusions that any non-developer (-strong) organization could see the real value in doing something like that, but I thought I’d get more examples of projects where that was the only solution. Say you run the msdn-friendly dataset-strong application which works great in the beginning, gets the GUI up quickly, and have a few initial iterations delivering great functionality and a GUI which rapidly fills with functions. Great stuff! Until you reach a level of complexity which just breaks the design completely down. What do you do then? A few possible solutions:

  1. Explain to the customer that you need an iteration or two to redesign the application to accommodate the newfound complexity.

  2. Quietly include the refactoring within the subsequent iterations tasks, far lowering the work delivered to the customer

  3. Postpone everything, plan instead to get around to it once everything quiets down

  4. Have separated things properly, making it easier to change from one solution to another

  5. From the initial phase of the project, settling where we are going and the time it will take, using your experience to choose a better solution, perhaps NHibernate or Castle Active Record, and hopefully your domain-model will never got into this extreme a situation (with contant refactoring)

  6. Hah, the object-relational impedance mismatch is obsolete; object databases will save the world!


First of all, there so no single answer to any of this, as with everything in our business – it depends. It depends very much on what kind of solution you are working on and what kind of environment you are working in.

Obviously number 1 isn’t going to go down well with the customer – “you’ve had such a good flow so far, just keep it going” is not a surprising answer. And that’s quite understandable, and really correct in many cases. You should have applied constant refactoring to your solution, enabling you to steer clear of that situation. Of course, if you make a msdn-friendly dataset-app in the first place, you probably know that it isn’t going to handle the most complex of tasks, so you’ll never get into this situation. I bet it does happen though, and has happened more than I can imagine (I hope not).

Number 2 is the one you should have been doing all along, except now it will far halt the current work. Perhaps that is okay. You could get to an 80% finished solution quickly, and ask if that is ok. If it is, then the customer will have gotten his money worth of application quickly. If not, you can be clear to the customer that to get this and this, it will cost exponentially more than it did before. Cause we all (should) know the Pareto principle, or 80-20 rule, 80 percent of the job takes 20 percent of the time (or cost), and vice versa. Again, the customer might not be too happy about that either.

Number 3: postpone doing anything until later. You’re already in a pretty dire situation, so postponing is not likely to do much to improve on that. If you don’t know what to do, then I guess just keep doing the same is probably the best – but if you get to that point, I don’t want to have anything to do with you anyway :)

Having separated things properly (4) will give you a good basis when you get into this situation. Hopefully it won’t be too costly to make the change to tougher concurrency-issues, more advanced business logic, etc.

As long as you didn’t go with a big design up front, but leveraged experience to choose a clever framework to start on, good for you. I currently fall very easily into the Domain Driven Design with NHibernate/Active record-camp (5), using a proper domain model and everything. Not nearly as fast as the dataset approach, but more applicable in the situations I’m usually in (The dataset-approach can be great though, don’t overdesign a simple application – there’s nothing ¬with more framework support from top to bottom in .NET than datasets!)

Slightly sarcastic with the object database-title here, I just have a feeling someone who falls for one without having much experience with the other side of the table (relational), could possibly proclaim it as the new(ish) silver bullet. Now I’d love to try using an object database, because it seems you do get around many of the issues you need to consider and constantly work on with the traditional relational database approach. However, being in the real world, you mostly have to live in an existing business environment, where relational databases are the big G. Converting the object database at the end of the development cycle is a possibility, but beware of pitfalls, some intelligent colleagues found quite a few (Perhaps they could write about it soon?).

You should always be aware of the total cost of ownership (TCO) of course, knowing that the initial development of an application is only a small part of the entire cost of it, maintenance being the biggest thief. But you know, in the real world, the one funding or driving the application development isn’t necessarily the one paying for the maintenance, so… Hopefully you’ll have someone who has a bit more professional integrity than that, but it’s no wonder shortcuts on that side are taken.

One of my colleagues, Sverre Hundeide, came up with a nice suggestion. The contents should be clear to all, but the conclusion has value. An OR-solution has a few possibilities in how it interferes with your system

  • Persistance ignorance - Nothing in your domain-model assembly references anything concerning the persistance solution used

  • POCO – For me, this is the same as above, but he means this has been defined as almost the same, except there can be some references in the assembly, extra metadata, etc.

  • IPOCO – The domain classes can inherit from a base class used for persistance

  • Code-generation – The domain and persistance code is generated from the database, often highly coupled.


Basically, the higher you are on the list, the easier it is to change to another solution. Simple, but an important thing to have in mind.

I guess it’s time for the conclusion now. Or it should have been anyway, except I’m not quite sure what I covered. Just Another blog post with a lack of well arranged contents. You’ll have to make do for now :)

Friday, August 15, 2008

SQL: Convert list of values to comma-separated list

Ever had a list of values in SQL, but wanting them in a comma(or anything)-separated list?

Just do something like this:

SELECT c.co_companyName,
substring(List, 1, datalength(List)/2 - 1) AS 'Employees'
FROM co_Company c
CROSS APPLY (
SELECT e.em_employeeName + ',' AS [text()]
FROM em_Employee e
INNER JOIN co_Company c2
ON c2.co_Id = e.em_co_Id
ORDER BY 1
FOR XML PATH('')
) AS Dummy(List)


If you had a a company-table and an employee-table, you'd end up with a list of the companies in the table, with a column with a comma-separated list of all the employees.

By the way, apologies for how absurdly ugly this looks as plain text.