Don’t document your process!

Joe perfects his process for the perfect date

Yesterday, a Slashdot article asked an age-old question:

One of the worst problems [in my company] is a lack of process documentation. All knowledge is passed down via an oral tradition. Someone gets hit by a bus and that knowledge is lost forevermore. Now I know what I’ve seen in the past. There’s the big-binder-of-crap-no-one-reads method, usually used in conjunction with nobody-updates-this-crap-so-it’s-useless-anyway approach. I’ve been hearing good things about company wikis, and mixed reviews about Sharepoint and its intranet capabilities. And yes, I know that this is all a waste of time if there’s no follow-through from management. But assuming that the required support is there, how do you guys do process documentation?

This question seems to come up over and over again. The funny thing is that it almost always leads straight to a long discussion of the technique for gathering process documentation, and then a discussion of mechanism for storing it. That’s the question I think the reader thought he was asking: how to “copy down” the process by looking at how the people on the team build the software and putting it into a “complete” set of documents, and whether to use a wiki, Sharepoint, a version control system or some other repository to hold the docuemnts. And the discussion that followed from that Slashdot post should be pretty familiar to anyone who’s tried to solve this problem in real life. Some people talk about systems to store documents, others talk about the virtues of keeping them up to date, there’s a healthy dose of “write down what you’re currently doing” or suggestions for incident logging, an apparent epidemic of bus drivers who have it in for the one guy who knows how everything in the company works, and lots of talk of cataloging, updating and verifying.

I’ve been through that before, and I’ve bought into many of those ideas in the past. And you know what? It wasn’t particularly useful. I know that in many process circles, that’s a heretical idea. In fact, if you’d told me back in 1999 that it’s not useful, I would have laughed at you. Of course you’re supposed to start by documenting the complete process! How else do you know what you’re improving? There seems to be an unspoken rule that we’re supposed to be striving for a fully documented, constantly improving process. And in some shops, that does make sense. But it’s a very, very hard thing to do, and in practice it’s almost impossible to put in place from the ground up.

This is a really hard thing for a lot of software people to accept. Our nature, as programmer types, is to strive for complete systems. When we build software, we try to handle every possible special case. We’re overly pedantic and literal; it’s like exceptions and missing cases just get under our skin. So when we’re presented with the problem of how to improve the way a team or a company builds software, the first thing we want to do is come up with a system that describes the complete process for building software, mapping out every possible special case and exception that a project might come to. And it makes sense that we’d want to test that process to make sure it’s accurate, correct any changes, and put everything in a repository that gives us complete access to the one true way that we build software.

The problem is that people don’t really work that way. Process engineering suffers from a serious problem: it seems simple when you think about it in the abstract, but once you start trying to document precisely and completely how a team builds software, you run into an enormous number of special cases. Architecture is always finished and signed off before coding begins, right? Oh, wait, except for Joe’s project, where we did 30% of the architecture and started building the code for that while Louise and Bob worked on the next piece of architecture. Oh yeah, and then there’s that project that’s going to be broken into three phases, and we don’t really know how the third part is going to work.

I’ve seen that pattern many times, and it usually plays out in one of two ways. Either you end up with really general documentation that lays out a very general process that’s trivial to follow but doesn’t really provide any useful guidance (like a big chart that shows that testing comes after coding, which comes after design), or you end up with a tangled mess of special cases and alternative paths that seems to get updated every time there’s a new project. Both of those technically fulfill the goal of process documentation — which is great if your job was to document the process. But neither is particularly useful if your goal is to actually build better software.

There’s an easy solution to this problem: don’t document your software process. Or, at least, don’t start out by documenting the complete software process. Instead, take a step back and try to figure out what problems you’re facing. What about your process needs to be fixed? Do you have too many bugs? Do you deliver the software too late? Does your CFO complain that projects are too expensive? Do you deliver a build to your users, only to have them tell you that it looks fine and all, but wasn’t it supposed to do this other thing? Those are all different problems, and they have different solutions.

That’s what Jenny and I teach in our first book, Applied Software Project Management. We call it the “diagnose and fix” approach: first you figure out what problems are plaguing your projects, and then you put in very limited fixes that address the most painful problems that keep you from building better software. People don’t just wake up one day and say, “We’ve got to totally change the way we build software.” They don’t start documenting the software process because things are going just fine. People hate change, and they don’t start making changes to the way they build software unless they have a good reason. So look for that reason, find the pain that hurts the most, and make the smallest change that you can to fix that one problem without rocking the boat. Then find the next most painful thing, and put in the smallest change that you can to fix that. This is something you can keep doing indefinitely, in a way that doesn’t disrupt those parts of your projects that are working just fine. Because the odds are that there are plenty of things that the team is doing right! If it ain’t broke, don’t fix it.

So what about the question of how to actually document the process changes that you do want to make? That’s a very practical problem, and one that we had to handle in our book. After all, we do give you processes for planning, estimating, documenting, building and testing software. And we wanted to do it in a way that was programmer-friendly, with as little cognitive overhead as we possible.

We decided to use process scripts — that’s scripts like an actor reads, not scripts like a shell runs — to describe our processes. We developed these scripts based on use cases (which we talk about in detail in the book). If you take a look at the use case page from the book’s companion website, you can see an example of a use case, followed by a typical script that you’d follow to develop use cases for your project. That particular script is very iterative, because use case development (like many great software practices) should be a highly iterative process. We’ve got examples of many scripts for the various practices and processes: ones for planning projects, reviewing deliverables, and building and testing software.

As for storing these scripts, I’ve used all sorts of ways to do it in the past. I’ve used wikis, version control systems (both Subversion and VSS, depending on what’s in place at the company), even plain old folders full of MS Word documents. The actual mechanics of storing documents aren’t particularly interesting, and are pretty much interchangeable for process documentation. Processes shouldn’t change all that often, because change is very disruptive to a company. The changes should be small, incremental, and easily understood by the team… and the team should agree that they’re useful! Because the biggest problem with process changes — and several posts in the Slashdot thread bring this up — is that they don’t “stick”. But making those changes stick is easy. Just make small changes that the team buys into, and that actually get you to build better software.

That’s easier said than done, of course. Lucky for us, too! Otherwise there wouldn’t be a market for our books or training.

How spending a little extra time and money on design might have saved Microsoft over a billion bucks

I really wanted an Xbox 360.

My old PS2 is showing its age, and I wanted to upgrade to a new system as soon as I finished the last few missions of GTA: Vice City Stories — especially now that it looks like Manhunt 2 won’t be coming out for PS2 any time soon. I’m a huge fan of the GTA series, and I’m especially psyched about GTA4. I grew up in Brooklyn, on a block that looks a more than a little like a GTA4 screenshot.

But then something happened…

Viva Pinata.png

But a couple of weeks ago my plans changed. Jenny was happily thrashing away on Guitar Hero, when her TV screen just went blank. She looked down at her console, which had suddenly gone quiet. (That’s pretty noticeable, apparently, because the Xbox 360 is a really loud machine… which, as it turns out, is important to our story.) Much to her disappointment, she saw those three telltale LEDs that every Xbox owner dreads: the red ring of death.

Luckily, Jenny’s 360 lasted long enough so that she could take advantage of the Xbox 360 service site that Microsoft launched earlier this month. But her poor console was just the latest in a long line of casualties. Some retailers estimate that 30% of Xbox 360s need repair, and we’ve seen plenty of anecdotal evidence that gamers are unhappy. It’s costing Microsoft sales and spooking investors. Microsoft is doing everything they can to fix the problem — they’ve extended the warranty to three years, and it’s costing them over a billion dollars. But it’s a real mess.

As much as I want a new console, I’m not going to buy one until I know that it won’t break. I still plan on getting a 360, but not until I can be reasonably sure that I won’t have to return it. By the time I eventually get one, hopefully they’ll have figured out how to make it quieter. I’m certain that I’m not the only one who’s decided to put off buying an Xbox. And that’s bad news for Microsoft.

So what can we, as software developers, learn from the Xbox 360 fiasco?

Productive meeting

Software people like us have a nasty habit of dismissing hardware problems as if they have nothing to do with us. We tend to think that designing software is really different from building hardware. And sure, there are definitely differences. We don’t have to worry about assembly lines, product getting damaged in shipment, or those pesky laws of physics that can prove to be such an irritating limitation when you have to design physical objects.

And it’s easy to dismiss the Xbox 360 failure as one of those unfortunate things that falls into that last category of physical faults. There’s a great Tech-On! article that gives us the dirt on exactly what’s caused the problem. It’s an excellent post-mortem on what amounts to terrible thermal design.

For those of you who’ve never taken a computer apart, here’s a little background information. Dealing with heat is an important part of modern computer design. Computer processors generate a lot of heat — so much that if you don’t come up with a way to get rid of it, they’ll fry themselves. So computer manufacturers will typically attach a heat sink to a processor. A heat sink is basically just a big radiator with fins or poles that lets air circulate and draw away the heat. (I once roasted a Pentium 4 processor by popping its heat sink off while the computer was running, just to see what would happen. It went “poof”.) A lot of processors are too hot even for heat sinks; in that case, you’ll need to stick a fan on top of it to cool it off. That’s why some computers are so noisy: they need fans to keep them cool.

It turns out that the Xbox 360 generates far too much heat, and a lot of people speculate that when that heat builds up past a critical point it unseats the GPU (a separate processor that’s used for graphics). Microsoft has so far refused to comment on exactly what the problem is, but as time goes on there does seem to be some consensus forming about it. And that Tech-On! article seems to have found a smoking (heh) gun.

But that’s just the hardware stuff. What does that have to do with building better software?

The punchline for all of this came at the end of that Tech-On! article, and it’s why I think this whole incident is so interesting. Here’s what it said:

Finally, we opened the chassis of the Xbox 360 repaired in May 2007 and compared it with the other Xbox 360 we purchased in late 2005.

“Huh? The heat sinks and fans are completely identical, aren’t they?”

To our surprise, the composition of the repaired Xbox 360 looked completely the same as that of the Xbox 360 purchased in late 2005. It turned out that Microsoft provided repair without changing the Xbox 360’s thermo design at least until May 2007.

The repaired units weren’t replaced with ones that had a better design. They were the same — as far as they could tell, Microsoft just replaced a broken unit with one that hadn’t broken yet. That’s probably why we’re seeing various reports of repeated breakdowns.

What that tells me is that the design of the Xbox 360 is deeply flawed, and that design flaw has already cost Microsoft well over a billion dollars. And it’s that flawed design that can teach us a whole lot about our own software projects.

Shoddy workmanship

So what does this all mean for us developers? Well, for the more cynical among us, it could just mean a whole lot of job security. I’ve met COBOL programmers who charge ridiculous amounts of money to maintain aging systems. But while their jobs pay well, personally they sound tedious and awful to me. Does anyone really aspire to spend years patching an aging software system? Most programmers will tell you that maintaining old systems is the worst part of the job. If you love designing new and innovative software, then the last thing you want to do is get your career stuck in maintenance mode.

And that’s what Microsoft is learning with the Xbox 360. I’m not a thermal design expert, but I am absolutely positive that they could have come up with a different design that wouldn’t fail so often. And while it may have cost more money to design the system and build each unit, I sincerely doubt those extra costs would have added up to over a billion dollars. And maybe the extra design time might have cost them more time… but now there are plenty of us who aren’t buying the system because we don’t want to be stung by the rampant quality problems.

Had Microsoft designed the system properly in the first place, they wouldn’t be in this mess now. And that’s the big lesson for us to learn. Oddly enough, it’s not a new lesson… in fact, it’s a pretty old one. One way to look at the Xbox thermal problem is to see it as a design defect that wasn’t caught until after the product was shipped.

Look what I found in an old 1997 issue of Windows Tech Journal… it’s an article by one of our favorite authors, Steve McConnell, called “Upstream Decisions, Downstream Costs”. The article lays out a scenario that most of us will recognize immediately: a fictional software company runs into problems because they don’t do enough planning up front, and end up getting buried with bugs, which cause awful delays. It also has a chart that anyone who’s read a few software engineering textbooks will recognize, showing that the earlier a bug is introduced in the project and the later it’s caught, the more expensive it is to fix.

So now we’ve seen a good, real-world situation where better design practices would have saved a whole lot of money. But what can we do about it in our own projects?

First and foremost, this gives us more ammunition when arguing with our coworkers and our bosses for more time to design our software. It’s really easy to get frustrated during the design phase of a software project, when a few people are generating a lot of paper or diagrams but nobody’s working on the code yet. That’s one of the things that we pointed out in our first book, Applied Software Project Management — that finding problems too late can sink projects. Luckily, there’s a relatively painless fix: adopt good review practices.

This is something that our friends in the open source world are really good at. Jenny and I talked about this in an ONLamp.com article we wrote last year called “What Corporate Projects Should Learn from Open Source”. A lot of high-profile, successful open source projects have very careful reviews, where they scrutinize scope and design decisions before they start coding. (To be fair, a lot of high-profile, successful closed source projects do the same, but we can’t just go to their websites and see their review results.)

So the moral of the story is that it often costs less to spend more time and money on design up front. And I bet there are some Microsoft shareholders that will agree.

Why “gold plating” is a lousy name

A few days ago I posted an answer to a question about gold plating and scope creep to the Head First PMP forum. I’m not surprised the question came up — people really seem to have trouble with the concept of gold plating. And I don’t think it’s because it’s a tough concept to get. I think it’s because it’s got a lousy name.

Gold plated silverware

In the usual gold plating scenario, a programmer adds features that were never requested because they’re “cool” or fun or seem like they’d be really useful. And sometimes they are — but more often, they’re just wasted effort, at least from the perspective of the person paying the programmer’s salary. Like I pointed out in my last post that mentioned gold plating, I completely sympathize. I’m definitely guilty of gold plating. There was one project I led about ten years ago where I created an entire scripting language, complete with interpreter, that was totally unnecessary. As far as I know, that product is still being used today, and not a single person has ever written one script for it. But it was definitely cool (or, at least, I thought so). More importantly, I really did think it would be useful, and make the software better. Classic gold plating.

On the surface, the “gold plating” does seem intuitive, but the analogy starts to break down under closer scrutiny. Think about what gets gold plated: all sorts of stuff, from cheap jewelry to expensive pens. All sorts of things get encrusted with, for lack of a better word, bling. And that’s what I pointed out in that forum post:

Gold plating is what we call it when the project team does work on the product to add features that the requirements didn’t call for, and that the stakeholder and customer didn’t ask for and don’t need. It’s called “gold plating” because of the tendency a lot of companies have to make a product more expensive by covering it in gold, without actually making any functional changes. (For example, there are plenty of watches and fountain pens you can buy from luxury companies that are identical to their cheaper versions, except that they’re covered in gold.)

This got me thinking about gold plating, and why it gives people so much trouble. Is gold plating in a software project actually similar to gold plating in real life? Or is it an odd, somewhat mismatched analogy?

To answer that question, we’ll need to take a step back and look at gold plating in the real world. There are a few different strains of gold plating, and they serve different purposes. First there’s the traditional, purely decorative gold plating. That’s the one we know and love: take an ordinary object, slap some gold on it, and charge a whole lot more. That’s the sort of product that’s associated with decadence and conspicuous consumption. It’s where the Gilded Age got its name.

Certainly, making a product arbitrarily more expensive is certainly a way to sell more of it to a certain sort of consumer. But while there are certainly numerous modern examples of opulent gold plating, it’s fallen out of favor somewhat. More importantly, it’s not really a great analogy for gold plating in software.

What it’s been replaced with is a somewhat similar but definitely distinct way to enhance (read: sell “upscale” versions of) products. This “enhancement” is done by adding features that are actually useful, but go far beyond the needs of the typical consumers of the product.

Here’s an example. Hhow many suburban homes really need an industrial refrigerator or restaurant-quality range? Those appliances have been a selling point of “luxury” and “upscale” homes’ kitchens for years. But the difference between that and, say, gilded kitchen items is that the “professional” applicances almost certainly worth the price — if you actually need them. Which you don’t, if you only use your kitchen to cook dinner for four every couple of days and thaw the occasional frozen turkey. But that’s not the point. The kitchen itself has been “upgraded” with cool but unnecessary items. And, in this case, it sells.

Canyonero

That’s certainly not the only example of products packed with features that are potentially useful, but which are, for the average owner of those products, essentially unnecessary (and by and large unused). There’s the slowly fading American love affair with the SUV; the top-of-the-line faucets, fixtures, and general home accouterments that litter the country’s McMansions; and pretty much everything in the Brookstone, Hammacher Schlemmer and Sharper Image catalogs. We’ve got our fill of amateur hill climbers with professional mountaineering gear, weekend golfers with $1,500 titanium clubs, and basement workshops stocked with industrial hardware used to build the occasional birdhouse. Every single one of those things, in the right hands, is almost certainly worth it. I’ve been playing bass guitar for about 20 years, and I know that there’s a big difference between the average $300 instrument and one that cost ten times as much. (Which is not to say you can’t spend $3,000 on a crappy bass, but that’s a whole different issue entirely.) Certainly, a beginner will see some small benefit from using a better instrument. But it’s probably not worth the price for someone who will only pick it up once every few months.

Neither of these two ideas is a perfect analogy for software gold plating. In some ways gold plating in software is a lot like like gilded products. In other ways, it’s similar to the kind of overkill that people perform when they use “top-of-the-line” products unnecessarily. More accurately, it’s really a mixture of the two.

So what drives us to gold plating? We add unrequested (and eventually unused) features because we want to build stuff that’s cool, and it never occurs to us that we’re building something our users won’t need. I like to think of Google as the ultimate in gold plating. Their latest offering, Google Street View, is a great example of cool software that doesn’t seem to meet an obvious need. And I love it — I think they did a great job with it, and I’m honestly impressed with the way a whole lot of moving parts came together the way they did. Maybe someone will think of a really great use for it. But isn’t that a solution in search of a problem, by definition?

A good rule of thumb is that people will generally only pay for a product that’s useful. One of my favorite ways to describe quality is to consider two pieces of software. The first one is beautifully designed, very well built, very stable, never crashes, has a very intuitive user interface, is extremely secure, and is generally a pleasure to use — but it doesn’t do the job you need it to do. The other is terribly built, painful to use, crashes at least once a day, and does 50% of what you need. And that’s the one you’re going to use. It meets your needs. And any feature in that software that doesn’t meet your needs (or the needs of any other users) is pure gold plating.

Which brings me back to the name. I don’t think “gold plating” does justice to the real phenomenon it refers to. It’s more than just gilding software by making it pretty (and/or more expensive). It’s about the way we genuinely feel that we’re making the software better by adding features that may be really cool, but which we, as programmers, simply don’t recognize are useless.

And the sooner we can figure out how to avoid doing that, the better our software will be.

(Luckily, we’ve got some really good tools to help us avoid gold plating. I’ll talk about them soon in another post.)