Information is king: January 2010

Saturday, January 23, 2010

Changes in software – Different developers over time for a long-running project

To do changes to software you need to understand it. You need to understand the exact piece of code you are changing, you need an overview of the system to be aware of any dependent parts, and you need to understand the technologies in use. In the real world you can also end up with a few more difficulties. You need to understand why things work the way they do. Will the changes you are imposing break existing logic? Does any clear documentation of how things work exist, or that you'll need to update? Can you trust the documentation that does exist?

Once you have a "long-running project" you get the added difficulty of handling information stored only in a developers or business specialists head. Your best option is to have the same developers working on the project for its lifetime. That can be challenging in terms of getting the right people in on it. When that's not possible you need to make sure you don't end up in a situation where the knowledge is lost.

The state of the actual code is perhaps the most important. Documentation will always be hard to keep up to date, and the best description of what you have is your code. It's essential that it is well structured and readable to make changes possible. Do refactorings when it helps clarify code, or when your understanding of the problem or domain changes. Write small and well named automatic tests that each verify a single piece of behavior and explains why and in what context. For this context, the tests will help clarify how a piece of code works and is intended to be used (Do you know what AAA is? If not, it's about time you do).

Consider increasing the truck factor of your code - meaning, how many people must be hit by a truck before a piece of information is lost. This has a lot to do with how you handle code ownership. Does individuals only make changes in their piece of the code, are there module leaders that supervise changes to parts of the system, or do you have a form of collective code ownership (XP's view on collective ownership: http://www.extremeprogramming.org/rules/collective.html)? Like every part of software development, there is no answer that is correct in every situation, but strong individual ownership is most likely to cause problems in my view. The truck factor is extremely low, a sort of "me versus they" mentality can lead to things falling between two chairs. Few developers knowing a piece of code means fewer can do changes to it if problems arise, and different parts of a system can end up being developed quite differently.

Documentation is another area. Do you document at all? Only high level requirements? Sequence or collaboration diagrams? Is the documentation kept up to date? No business decisions hidden in mail discussions or in developer or business users heads? In meeting minutes? This is such a tough area. I haven't quite made up my mind about this yet - but I do know one thing - keeping documentation truly up to date is extremely challenging. You can certainly try to document everything, but is it worth it in terms of ROI?

When a developer do choose to leave a project, make sure you work to capture as much knowledge as possible. Hopefully most of the knowledge of the developer is already shared with the team, but try to capture the rest as much as possible. It will cost a lot more to try and understand something later on.

Sunday, January 17, 2010

Changes in software - One of the biggest challenges of software development

Changes to software has been a problem since the dawn of computers, and it still is a challenge with every piece of software developed. Perhaps the biggest part of creating software is changing or expanding existing code. So handling changes is something we have to deal with all the time. It's the maintainability aspect of software development. The problem is that there are so many challenges involved, so many things that can go wrong. But do you know what the worst problem is? The fact that a lot of developers have very little focus on the issue in general, including the reasons behind it.

I tried to come up with a few of the top of my head, and listed almost thirty easily (Even without going into nitty gritty coding issues). No wonder it's hard to handle. There's no silver bullet to fix many of these issues, but as with everything else in life - you need to know about it to be able to do something with it.

Here's a number of things that affects and complicates changes in software:

Project

Many different developers over time
Legacy codebase
External system dependencies

Person

Developer with limited understanding of code in general
Developer with limited understanding of the projects code and architecture
Unmotivated developer
Changes done just before end of day/end of week/end of year
Different quality of developers
Misunderstanding the time it takes to write quality code
Not thoroughly understanding a problem

Code

The danger of quick fixes/hacks
Lack of understanding of how software rot
Unreadable code.
High coupling in the system. Making a change in code that is used many places has a higher risk.
Complex architecture (Hard to understand or wrong for the problem at hand)
Duplicate code

Business

Illogical business logic
Unavailable business people

Process

Time (Quality - if time permits)
Bug acceptance
Developer not feeling the pain of fixing production bugs
Task-switching developer
Picking up problems caused by changes
Lack of testing
Big bang releases / seldom releases
Individual code ownership
Refactoring / Not refactoring
Adding additional features at end of test/iteration/project

This post would get way too long if I were to cover all the reasons in detail, so I'm rather going to cover various individual ones in separate posts.

Did you miss something on the list?

Thursday, January 14, 2010

Real programmers don't comment their code. If it was hard to write, it should be hard to understand

Now that's just a plain lie. Real programmers do comment their code - though only certain parts of it.

First of all, an important rule: Your code should be so readable that you don't need comments to understand what it does.

Comments are a problem because they
- might not get get updated when the code changes.
- become an excuse to write unreadable code.

If you can't trust that the comment is correct, you'll have to go into a method to understand what it does. Once you do that you might have to step even further down into other methods, and soon you'll forget what you were doing initially.

Justifying unreadable code with comments is heresy. You should not need to write comments to explain what a piece of code does. What do you need to do if you have code that does need additional comments? Refactor! Often the text you would put in comments can be used as method names instead. Don't forget Joshua Kerievsky's tip in Refactoring to Patterns:

"What is the preferred size of small methods? I would say ten lines of code or fewer, with the majority of your methods using one to five lines of code. If you make the vast majority of a system's methods small, you can have a few methods that are larger, as long as they are simple to understand and don't contain duplication."

Don't be afraid of small methods with a readable signature. You have intellisense to help finish method names, and the performance hit of having many methods is negligible, so none of those arguments hold any value. Does it feel like your classes end up with too many methods? There's a good change they're doing too much. Remember to use the Single Responsibility Principle on both classes and methods - they should be responsible for doing only one thing, and should have one, and only one, reason to change.

When are comments appropriate?
- At major boundaries, for instance with libraries or at service contracts, giving context or similar.
- Clarification or expressing intent, if you fail to do it in code
- Warnings or particular notifications
- Regular expressions, if you have to use them
- TODO's, sparingly.
- When you have no other options.

There is always a need for comments in you code. The rule should be to not use them, but if you have a good reason - go right ahead.

I think Uncle Bob says it well in Clean Code:

"The proper use of comments is to compensate for our failure to express ourself in code. Note that I used the word failure. I meant it. Comments are always failures. We must have them because we cannot always figure out how to express ourselves without them, but their use is not a cause for celebration. [...] Every time you express yourself in code, you should pat yourself on the back. Every time you write a comment, you should grimace and feel the failure of your ability of expression."

Sunday, January 10, 2010

Object Orientation revisited – Business layer patterns

Object orientation is a fascinating topic. It is one of the core concepts for a major part of developers today, yet the understanding and the inclusion of it in our day to day work varies greatly. Even though many believe they do the same. 'Cause everyone is working object oriented with an object oriented language, right?

We had a day back in December at Objectware where we revisited object orientation, and its implications for our work. The day was structured with some presentations first, a good deal of group work, a reflection on the work, followed by a general discussion in the end. We had a continuous focus on discussions and experience sharing throughout. I started of with a session on general everyone-should-know information, more of a recap or introduction if you will, and then brought the focus to the use of object orientation in structuring our business logic, centralized or delegated design, and so on. A few colleagues of me followed on with topics concerning cohesion and coupling, information hiding and composition vs inheritance.

I thought I'd share some of the content of my presentation here. It's certainly not a new topic, it has been blogged and written about countless times, but still I meet a lot of variation on the knowledge in this area.

First of all - how is this valuable to you? Knowing how to structure your business logic both in various forms of applications, but also varying internally in an application, is paramount in ending up with an application that doesn't fight you continuously. In designing every application we want to:

Create an architecture that meets all technical and operational demands
AND that solves all quality attributes like performance, security and maintainability well

The architect should set standards concerning the business logic layer, but I've seen more than one architecture transform into something very different because of varying knowledge on this. It's always important to know the main differences so you can take informed decisions. This is one of those areas that most software developers should know - especially those that work on applications containing more than a trivial amount of business logic.

Introducing OO

Object orientation is about core principles like abstraction, composition, inheritance, encapsulation, polymorphism and decoupling. Several things can be said about this style of programming, like

”Division of responsibilities for an application or system into individual reusable and self-sufficient objects, each containing the data and the behavior relevant to the object.”

”An objectoriented design views a system as a series of cooperating objects, instead of a set of routines or procedural instructions.”

In general it can be said that Object Orientation should be considered when:

”You want to model your application based on real world objects and actions, or you already have suitable objects and classes that match the design and operational requirements. The object-oriented style is also suitable if you must encapsulate logic and data together in reusable components or you have complex business logic that requires abstraction and dynamic behavior.”

(All quotes from Microsoft Application Architecture Guide, 2nd Edition)

And you continuously work to

find the right abstraction of the real world for your problem
find fitting objects
find the right place to put code
rework the code at all times to make sure it is correct based on your current understanding
favor low coupling
limit duplication
make each object work on one thing only - ensuring high cohesion
work to have a test friendly design
and generally try to keep your code as to-the-point as possible, all the time ensuring that adding further behavior and extending the current functionality is as painless as possible (without over engineering of course :) )
and certainly much more than this..

How you structure your business layer certainly has a say in how well and how easily you can achieve these things.
You traditionally have four main ways of structuring this logic:

Procedurally oriented with Transaction script or Table module.
or
Object oriented with Active record or Domain model.

Let's take a closer look at each..

Transaction script

”Organizes business logic by procedures where each procedure handles a single request from the presentation.”

Martin Fowler, Patterns of Enterprise Application Architecture

Transaction script follows the current structure:

Note that this is an extremely simplified example. Using transaction script you certainly utilize all you know about class design and move logic into where it fits best, the main point I'm trying to make is that you have a central place to control the flow. The CreateOrder method controls what logic will happen from start to end.

If you use this all the way, the business layer will consist of a number of procedures that each implements one action from the user interface. Good design in this context is about minimizing duplication at the same time as meeting all demands. This has nothing to do with a database transaction, but refers to one monolithic logical operation. It is not uncommon to create one business component per database table.

The positive sides with it is that it's easy to understand and maps well to the flow of a user story/use case. But it breaks down on complicated logic, can be hard to test as it does many things, and can lead to duplicated code.
Being popular in general, it the .NET world it was quickly replaced in popularity by....

Table module

”A single instance that handles the business logic for all rows in a database table or view.”

Martin Fowler, Patterns of Enterprise Application Architecture

The Table module is built around a table of data. Its perspective is similar to that of Table script, except the focus is on a group of data. Operations are often designed as methods on an object that represents a table of rows. Because of that you always use a key or index to indicate the row in use. It is procedural, but has more of an object oriented focus than Transaction script.
Note that you don't need to encapsulate the DataSet in a custom class, it is common to just work directly with the DataSet as well.

Table Module quickly became a success in .NET because of the great tool support existing. Putting together a simple read-only application that shows data in grid form literally takes no time because of the focus Microsoft has had on the tooling aspect. Everything from GUI to database works smoothly for this. Generally you can say that RAD applications work great with this approach. And it's a fair compromise if you have little logic and don't need much abstraction over you data model - because there will be a very tight coupling between your data model and every part of the application.

A history lesson

The Table module pattern has many downsides though - even in .NET. If you have complex business logic, it won't cope with that very well. It's very data driven, and doesn't focus on the business side of things. You will end up with an application that is very tightly coupled to the database. It is poor at indicating relationships between objects, and polymorphism isn't exactly available. As long as you keep to the code generation and wizard support available, you'll be all right, but it can get complicated if you can't rely on that.

Microsoft had for many years an almost exclusive focus on this pattern - by their way of action basically saying that you could solve any problem with a dataset. Object bigots found various ways around it, often by creating mappers that converted datasets to objects, by cleverly encapsulating datasets or by creating custom data mappers using the data reader directly. The focus since Visual Basics heyday was on RAD applications, supporting novice developers, and not all that much more (Patterns & Practices shipped Enterprise Library 1.0 in 2005 though). The Microsoft community didn't seem to mature much more either, as there was an almost sole focus on software and tools that Microsoft shipped. From what I have come to understand this made the whole community much less used to object orientation than for instance the Java community became. And that is likely the main reason why the general Microsoft community is still so procedurally oriented still.

Luckily things have changed, both in terms of third party tools and to some extent Microsoft's focus. Tools like NHibernate and the Castle Project led the way with support for good object relational mapping and dependency injection tools, as well as MonoRail as an early MVC framework. Microsoft has supported enterprise applications through the work of Patterns and Practices on Enterprise Library. However, simplicity and enterprise library has never been used in the same sentence. In recent times though, Microsoft begun focusing on object relational mapping through LINQ To SQL and Entity Framework, an own MVC framework, ASP .NET MVC and their own IoC container, Unity, and the continuing support for composite UI applications with PRISM for WPF and Silverlight (Following the Composite UI Application Block for Winforms).

So the tide has to some extent turned in the Microsoft community, certainly the support for the two remaining patterns are continuously improving.

But less take a closer look at the object oriented patterns:

Active Record

”An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.”

Martin Fowler, Patterns of Enterprise Application Architecture

The example above uses Castle Active Record. As you can see, the object encapsulates one database row in principle, and the object contains both data and domain logic. It includes logic to directly manipulate the database, like Save or Delete, and also includes static methods that work on the entire collection, like Get, or GetOrdersCount.

The flow of logic is different in Active Record than in the two previous patterns in that the object contains the logic internally, making use of encapsulation and achieving higher cohesion. Instead of having the service dictate how the logic should flow, you ask the object to perform some domain logic, and although not shown here, it typically delegates responsibility to other objects.

Active Record is a useful framework for two reasons: simplicity and tooling support. It's quite easy to understand, and as long as you use frameworks like Castle Active Record or LINQ to SQL, it's also fairly easy to use. For simple object models it works great, and as long as it's OK to have the object model closely mimic the data model, Active Record is a good choice.

There's a few things you should consider before choosing Active Record though. Because of it's close connection to the data model, you have very limited support for designing your object model separate of the database. If you have a need for that, you should skip ahead to the next pattern. It's also a problem that it mixes responsibilities. The objects holds domain data and methods, but in addition you have attributes for mapping to the database, CRUD operations and static methods that work on the entire collection. And it sure isn't Persistence Ignorant (PI).

This is a popular pattern as well, and even though it seemed Microsoft tried to kill (http://bit.ly/8VkYnW) LINQ to SQL to avoid supporting it in addition to Entity Framework (more on this tool later), it's popularity seems to have saved it for now.
If Active Record doesn't quite do it for you, and your complexity is high enough, you should take a closer look at the next pattern:

Domain Model

”An object model of the domain that incorporates both behavior and data”

Martin Fowler, Patterns of Enterprise Application Architecture

The domain model pattern is the one that best supports object orientation. The point is to separate the domain logic into classes that are only concerned with modeling the domain and the corresponding rules well. The model classes should not know how to persist themselves (as in Active Record) and shouldn't be coupled to infrastructure logic. The classes should be POCO (Plan Old CLR Object), enabling a higher abstraction from the data model. This also means that the business logic of the domain model is simple to test, and you can easily get high coverage on this most important area without worrying about infrastructure and the like.

Domain model is the most complex to use, mostly because of the cost relating to mapping to the database (because of the impedance mismatch), and complexity regarding new ways of having to think about disconnected objects, lazy loading, less direct SQL to tweak and converning the next major point of this blogpost, not being familiar with delegated control. This complexity is biggest on first usage, and drops in subsequent projects.

Domain model shouldn't be used in all scenarios. Use it when you have a complex domain with complex business rules and when you want to be able to model the domain model free from database limitations. It is more complex to master than the other patterns, but in the right scenarios has great strengths.

To handle the persistence of your domain model you use an Object Relational Mapper (OR/M). This includes an API for CRUD operations, mapping between the database and the data and domain model and a query model and associated language. The modeling is usually done either via external XML mapping files or some kind of fluent interface, where you state how the model maps to the database. A lot can be said about Object Relational Mappers, but this is not the place. One thing though - various people have objections against using domain model and object relational mapping for the wrong reasons. Be skeptical about objections concerning security, performance and SQL injection. More on that another time.

Much has been said about domain models. In recent years, the most influential books on the topic have been Martin Fowler's Patterns of Enterprise Application Architecture and Eric Evans' Domain Driven Design.

Centralized or delegated control

If you take a look at sequence diagrams of logic designed as the two procedural patterns, transaction script and table module, compared to the object oriented ones, active record and domain model, you will see two quite different information flows. The procedural ones uses a centralized control style, whereas the object oriented ones use a delegated one.

The logic for transaction script or table module is controlled from one location. The script knows which steps the transaction needs to take to perform the task, and asks appropriate helper classes to solve each step. A sequence diagram will show you information going into helper classes, often with single parameters or a form of DTOs, and back again, then into new classes, and so on. You have a central point of control.

Active record and domain model works in a different way. Here the responsibility is typically delegated to one or more objects, which again delegates responsibility to other objects. A sequence diagram will show you a flow going into an object, then delegated into other objects, and so on, instead of going back and forth from the central location. In this way you have a delegated form of control.

I think the procedural, or centralized control style, is more common in .NET than in Java. The main reason is the support and focus that Microsoft has had.A research paper was published in IEEE Transactions on Software Engineering, called "Evaluating the Effect of a Delegated versus Centralized Control Style on the Maintainability of Object-Oriented Software", where about 150 senior, intermediate and junior developers, including a number of students, participated. The developers had to make various changes in both a delegated and a centralized design. The results:

"The results show that the most skilled developers, in particular, the senior consultants, require less time to maintain software with a delegated control style than with a centralized control style. However, more novice developers, in particular, the undergraduate students and junior consultants, have serious problems understanding a delegated control style, and perform far better with a centralized control style".

And then concluding:

"Thus, the maintainability of object-oriented software depends, to a large extent, on the skill of the developers who are going to maintain it. These results may have serious implications for object-oriented development in an industrial context: Having senior consultants design object-oriented systems may eventually pose difficulties unless they make an effort to keep the designs simple, as the cognitive complexity of 'expert' designs might be unmanageable for less skilled maintainers."

I think the conclusions with the delegated style of control also has a lot to do with familiarity. Since many .NET developers, including senior ones as well, have limited experience with this, imposing a delegated style can take some time getting used to.

Have I covered everything then?

Since I put the heading in here, I'm sure you already know the answer to the question. There is one pattern I haven't mentioned yet, or an anti-pattern anyway. And a common one, that is.

I've mentioned several times in this post how the .NET community often have had a procedural focus. I think that is the reason why this pattern seem to be so popular in .NET-land. Let's have a closer look at what I'm talking about.

Anemic Domain Model

The Anemic Domain Model looks like a domain model, has a rich structure of objects, but there's almost no behavior in the objects. The logic is typically controlled via a transaction script, with the model being simply data containers.

The examples are Hello World in complexity, but hopefully still captures the main difference between this and a regular domain model. As you can see, the logic has mostly been moved out of the domain model, and into a different class. It's quite common that these are referred to as xxHelper, xxManager, xxHandler, or other general terms.

This (anti-)pattern can be easier to understand for those struggling with delegated control, but this is a benefit with a cost. Other benefits is that you can brag about having a domain model to your friends (which you don't), and any other benefits you get with transaction script. The problem is that you get the same problems as with transaction script - a challenge with complex code, duplication of code and logic that is harder to unit test. In addition you get the complexity of using an OR/M on top of that.

Conclusion

None of these patterns is never right. That's part of the fun of software development. You need to take a close look at what you need to build and then take an informed guess. Take into account the positive and negative sides, and when you have made the choice - make real effort to diminish the effects of the negative sides.

Wednesday, January 6, 2010

The technical laws of December

Throughout December I had the pleasure of collecting more or less clever statements about computers and software development in general. My twitter account had a steady flow of these up until Christmas.

In today’s society, we have to use so much of our energy filtering out knowledge, and it is interesting how cleverly written sentences can capture so much information. I’m going to elaborate on several of them in the time to come, but for now, here they are again.

Disclaimer: These aren't my quotes - they've been collected from a variety of places and people, and thus credit is due in many places. I apologize for not quoting correctly.

Dec1 tech law: Walking on water and developing software to specification are easy as long as both are frozen

Dec2 tech law: In theory, there is no difference between theory and practice, but in practice there is

Dec3 tech law: Real programmers don't comment their code. If it was hard to write, it should be hard to understand

Dec4 tech law: Computers are unreliable, but humans are even more unreliable. Any system which depends on human reliability is unreliable.

Dec5 tech law: There is never time to do it right, but always time to do it over

Dec6 tech law: Adding manpower to a late software project makes it later

Dec7 tech law: The degree of technical competence is inversely proportional to the level of management.

Dec8 tech law: The probability of bugs appearing is directly proportional to the number and importance of people watching

Dec9 tech law: If a program is useful, it will have to be changed. If a program is useless, it will have to be documented.

Dec10 tech law: Good enough isn't good enough, unless there is a deadline

Dec11 tech law: An expert is someone brought in at the last minute to share the blame

Dec12 tech law: The chances of a program doing what it's supposed to do is inversely proportional to the num lines of code used to write it

Dec13 tech law: profanity is the one language all computer users know

Dec14 tech law: All's well that ends.

Dec15 tech law: No matter how hard you work, the boss will only appear when you access the Internet.

Dec16 tech law: A meeting is an event at which the minutes are kept and the hours are lost.

Dec17 tech law: An expert is one who knows more and more about less and less until he knows absolutely everything about nothing.

Dec 18 tech law: it's not a bug, it's an undocumented feature

Dec19 tech law: A complex system that works is invariably found to have evolved from a simple system that works

Dec20 tech law: The documented interfaces between standard software modules will have undocumented quirks

Dec21 tech law: Bugs will appear in one part of a working program when another 'unrelated' part is modified

Dec22 tech law: When designing a program to handle all possible dumb errors, nature creates a dumber user

Dec23 tech law: Build a system that even a fool can use and only a fool will want to use it

Dec24 tech law: The cleverness of technical laws is inversely proportional to the number of laws

Sunday, January 3, 2010

Walking on water and developing software to specification are easy as long as both are frozen

Developing software is pretty complex, but it’s not exactly rocket science, right? You find out what you want created, you get a number of software developers that has studied how the technicalities work, and after a time you end up with a good, working piece of software. Correct?

It's a shame how few reading this can actually agree with it. Why is that? And can anything be done about it? If you have a few decent developers (they don't have to be that good), a small set of requirements, a business expert they can discuss those requirements with... You know what - you just got yourself the perfect recipe for a good working system. Fingers crossed!

Unfortunately the previous description doesn't map to many real systems. One of the biggest problems is that requirements change. This affects all parts of software development, but no other part can influence the result of this like the process used. Back in the days, and still in practice quite a few places today, they tried to scale the previously described way of solving software development no matter the size of the project. You just collect and analyze all the requirements, find a number of developers to implement the system, get one or more architects based on some measure of experience to help design the structure, plan and discuss the requirements, implement it, test it and release it. That works to a certain extent, to a certain size, for certain organizations, but it has many limitations. I'm not going to try to cover those extensively now, but it includes

You've got one shot at coming up with the features for the system. Now guess what happens -> You will try to come up with as much as humanly possible, important or not
You have little idea how the system is actually going to look or feel or just how it can be used -> You will try to make the requirements cover everything. This isn't really a bad thing, but can be quite a complex process.
Price the project beforehand, to move the risk of development to a software company -> Do you think those requirements change sessions will be fun?
Your understanding of what the system can bring you, what you really need it to do, and just how it can do that the best -> Will change!

The whole point of this post has become very well understood in the industry - requirements change, and you'll have to cope with that one way or another. If you want a useful system that is. A number of smart people put their heads together back in 2001 and came up with the "Manifesto for Agile Software Development", and a number of principles behind it.

What was the cure they came up with? Besides a few other good points, it was the formalization of an old principle - tackle complexity by dividing a problem into smaller, manageable pieces. (Note: The original paper on the Waterfall model did cover the importance of a certain amount of iterative behavior, but I guess it was lost to quite a few people on the way.) If you’re trying to tackle all problems at once – or in other words not developing software in an iterative manner, make sure you have very good reasons for it. Unless the project is small, with clearly defined knowledge and boundaries, you’re going to have a hard time producing the system you really need. Do you do it for control? There is no real form of control in software besides working software.

Is process the only thing you have to think about when handling change in software development? Not by a long shot, but it is an important part of it. I will cover more areas soon.

Information is king