Nonfunctional Requirements Q&A

Non-functional requirements Q&A: We answer questions from readers about using nonfunctional requirements on a real software project, and how to use them on a real software project.

Non-functional requirements: Planning out how well your software will work

A couple of months ago I wrote a post called Using nonfunctional requirements to build better software. It’s basically a step-by-step guide for creating an easy, practical technique to use nonfunctional requirements on a real software project, treating them in a way that’s similar to how a lot of Agile teams treat user stories, scenarios and other functional requirements: by sticking them on index cards and using them to do some lightweight planning.

Since then, I’ve gotten a lot of questions about nonfunctional requirements (or, as some people call them, non functional requirements, behavioral requirements, quality attributes, and probably half a dozen other names). Based on the questions I’ve been getting, a lot of people really seem to want a solid overview of exactly what they are:

  • What are nonfunctional requirements?
  • What goes into a good nonfunctional requirement?
  • Is there a nonfunctional requirements checklist that I can use?
  • How do I write down a nonfunctional requirement? Is there a nonfunctional requirements template I can use?

Luckily, Jenny and I addressed exactly those questions in our first book, Applied Software Project Management, and I’ve gotten feedback over the years from people who read it (and other writing I’ve done about requirements), and tell me it helped them get a handle on the concepts behind nonfunctional requirements. So I’ll do a little requirements Q&A and address those questions one by one, drawing from the book where possible. And I’ve posted an O’Reilly Community blog post called Understanding nonfunctional requirements with some additional information, which should also help get to the bottom of the issue.

Q: What are non-functional requirements?

Non-functional requirements — or behavioral requirements, or quality attributes — describe how well a system performs its function. This is fundamentally different than the typical functional requirements that most of us are used to, which describe what that system does.

Here’s a quick example. Whenever I’m talking about requirements, I like to use a “search and replace” feature in a word processor or text editor as an example, because we’re all familiar with how it works. So while a functional requirement for “search and replace” might describe how the case-sensitive matching works: “If the original text was all uppercase, then the replacement text must be inserted in all uppercase.” A nonfunctional requirement, on the other hand, might describe the performance (“it must be able to replace 1000 search terms in a 3MB document in under 250ms on one of our standard test VMs running at 50% load”).

Q: What goes into a good nonfunctional requirement?

A good nonfunctional requirement is one that makes it clear to everyone on the project — including the user (or someone who really understands what the user needs) — exactly how the software has to perform. Remember, a good requirement (functional or nonfunctional) is about understanding and addressing the needs of a user.

Here’s what Jenny and I wrote about nonfunctional requirements in our first book:

Users have implicit expectations about how well the software will work. These characteristics include how easy the software is to use, how quickly it executes, how reliable it is, and how well it behaves when unexpected conditions arise. The nonfunctional requirements define these aspects about the system.Applied Software Project Management, p113 (O’Reilly 2005)

It’s really easy for non functional requirements to be unclear or ambiguous. The best way to make sure a nonfunctional requirement is clear and easy to use is to quantify it. So instead of saying that a task must be done quickly, write down the maximum number of seconds it must take to perform a task. The maximum size of a database on disk, the number of hours per day a system must be available, and the number of concurrent users supported are examples of requirements that the software must implement but do not change its behavior.

Q: Is there a nonfunctional requirements checklist that I can use?

We put together a nonfunctional requirements checklist that I’ve used many times on real projects. It’s based on a list of nonfunctional requirements we included in Applied Software Project Management.

Here’s one thing to keep in mind about this (or any other) non functional requirements checklist: as you’re reading it, you’ll probably find yourself thinking, “Wait a minute, all my software needs to be flexible (or efficient, or robust, etc.).” Yes, that’s true, of course. But are there specific non-functional requirements that affect your project in particular, above and beyond what you do on every project? That’s what this checklist is for, and that’s what you should be thinking about when you write down nonfunctional requirements.

  • Availability: Are their constraints on the system’s availability or uptime?
  • Efficiency: Are there resources the system needs to be careful about monopolizing?
  • Flexibility: Will the system need to be altered after deployment?
  • Portability: How easy it is to move to another platform?
  • Integrity: How sensitive is the project to data security, access, and privacy?
  • Robustness: Do error conditions need to be handled gracefully?
  • Scalability: Will the system need to handle a wide range of configuration sizes?
  • Usability: Are there specific constraints on how the users will understand, learn and use the software?

If you’re interested in using this on a real project , you can read more about it in that O’Reilly Broadcast post about non-functional requirements: I post a relevant excerpt from Applied Software Project Management that goes into more detail about each of these.

Q: How do I write down a nonfunctional requirement? Is there a nonfunctional requirements template I can use?

Yes. We put together a software requirements specification template, and I’ve helped a lot of teams adopt it for their own projects over the years. When we put together our requirements templates, we put a lot of effort into streamlining them as much as possible. So this is a sort of “bare minimum” template for writing down use cases, functional requirements, and nonfunctional requirements.

Here’s an example of how we’d specify a functional requirement:

Name Nonfunctional requirement #7: Performance constraints for search-and-replace
Summary The search-and-replace feature must perform a search quickly.
Rationale If a search is not fast enough, users will avoid using the software.
Requirements A case-insensitive search-and-replace performed on a 3MB document with twenty 30-character search terms to be replaced with a different 30-character search term must take under 500ms on one of our standard testing VMs at 50% CPU load.
References See use case #8: Search

And that gets back to the blog post I mentioned earlier, Using nonfunctional requirements to build better software. If you’re working on a team that uses an agile process to build software, there’s a good chance that you already write down a lot of your requirements on index cards. In that post, I outline a method that I’ve had a lot of success with in my own projects.

I went into a lot more detail in that post, but here’s a quick recap. First, I write the requirement itself on the front of an index card:

Nonfunctional requirement index card (front)

and on the back I’ll write a specific test to make sure the requirement is implemented:

nf_index_card_back.jpg

Then I use it for planning the project to make sure it actually gets included — you can see more about it in the post. As far as documenting nonfunctional requirements goes, that’s actually a really efficient way of doing it, and I’ve seen it work well on agile projects.

I hope that answers some of your questions about using nonfunctional requirements in software projects. Obviously, I’d be thrilled if you took a look at the requirements chapter in Applied Software Project Management — Jenny and I gave a lot of practical advice about how to use requirements on a software project. And I’ve got other requirements posts on Building Better Software as well. But if they don’t answer your questions, feel free to ask (or send them to us).

Using nonfunctional requirements to build better software

Understanding nonfunctional requirements — which some people call software quality attributes or nonbehavioral requirements — can make a big difference when you’re building software. But a lot of people have trouble taking a somewhat theoretical idea and applying it to a real-life project. Luckily, we’ve got an easy, practical technique to use nonfunctional requirements on a real software project.

Jeez, lady

How well does your program do… well, whatever it does?

I’ve wanted to write a post about nonfunctional requirements for a while. But I’ve been trying to find a good angle for talking about them, because while they can be really practical and useful on a software project, it’s a little hard to get that practicality across in a useful way.

Luckily, I’ve been spending a lot of time lately talking about architecture, since Jenny and I are going to give our Beautiful Teams talk at the ITARC New York conference next week. And that’s got me thinking a lot about how architects work. I’ve been asked more than once recently about what, exactly, the term “architecture” refers to. The quick answer is the textbook definition — designing, documenting and verifying the structure, components and properties of a system. But I always want to go beyond that. Any time I come across an interesting idea (and software architecture is full of them!), I try to come up with a way that it can help a developer out today, on a project that developer is working on. In fact, I’ve got a quick technique to help you do exactly that — it’s at the end of this post. And like many great software practices, it involves index cards.

So I started thinking about some common problems that software architects run into, especially junior ones who are still building up their experience. And that leads me straight to a problem that I’ve seen over and over again. A lot of people jump into architecture and design by starting with the question, “What’s this system going to do?” We’ve got a lot of very useful tools for that (like user stories and use cases). Obviously, you can’t design a system well without understanding what it does.

But I’ve had the opportunity to work with some very talented, very experienced software architects lately, and I’ve noticed something critically different about how they approach designing a system, and it’s really pointed me to an important difference that separates a senior architect from a junior one. When one of these guys gets started on a system, they don’t just think about what it’s going to do. They also think about how it’s going to do whatever it does.

That’s a really subtle point, and it’s a very easy one to overlook. But it’s very important. Important enough, in fact, to have a name: nonfunctional requirements.

Most of us have read about nonfunctional requirements. In fact, it’s a pretty common interview question: “Name a few nonfunctional requirements.” Someone who took a class in software architecture recently might be able to rattle a few of them off (usability, reliability, performance, scalability, availability…). And lots of project managers and business analysts I talk seem to be on an eternal quest for the perfect nonfunctional requirements template.

If you’re not familiar with nonfunctional requirements, here’s how Jenny and I defined them in our first book:

Users have implicit expectations about how well the software will work. These characteristics include how easy the software is to use, how quickly it executes, how reliable it is, and how well it behaves when unexpected conditions arise. The nonfunctional requirements define these aspects about the system. (The nonfunctional requirements are sometimes referred to as “nonbehavioral requirements” or “software quality attributes.”)

– Andrew Stellman & Jennifer Greene, Applied Software Project Management, chapter 6 (O’Reilly 2005)

And that’s a good starting point. But there’s an art to actually using nonfunctional requirements to make your software better. So how do we do that?

Thinking “how well” from the start

One of those senior architects I mentioned gave me a really good tip recently, one that really rings true. He told me, “Always think about performance from day one of your project, and test for it until you deliver.” Now, this particular person works on software tools used to design high-availability, high-performance server systems, so he thinks about performance a lot. But his point was that to design systems well, you need to think about performance — and other nonfunctional requirements — from the start.

Take a minute and think about that, because I think it’s an important point. I like it a lot for two reasons.

I like the fact that he’s thinking about how well the software works from the beginning of the project. I’m a firm believer in the old QA saying that “you can’t test quality in.” Yes, I know that saying rubs some people the wrong way because they think it sometimes lets people off the hook for testing at the end of the project. But there’s definitely a lot of truth in the idea that developers who think about quality from the beginning of the project build better software. If you design for performance, and if you then code for performance, then it’s pretty likely that you’ll end up with a more performant design than if only start thinking about performance at the very end of the project.

The other thing I like is that he didn’t say, “Think about performance, scalability, usability, robustness, etc., from the beginning of the project.” He narrowed it down to the single quality attribute that was most important to his particular project. I’ve talked to a lot of developers, project managers, designers, testers and business analysts over the years about nunfunctional requirements, and what I often find is that people seem overwhelmed. There are so many facets to quality beyond what the software does, and if you’re just trying to get started thinking about these things, it’s hard to know where to start.

Which leads me to my advice for developers. If you’re a programmer working on a project, here’s something that you can do today to improve the final product. Start with just three areas: availability, performance and reliability. I like these three because they’re easy to understand, it’s not hard to brainstorm examples of how they can go wrong.

Start with some definitions. Here are ones that Jenny and I gave in Applied Software Project Management:

Performance: The performance constraints specify the timing characteristics of the software. Certain tasks or features are more time-sensitive than others; the nonfunctional requirements should identify those software functions that have constraints on their performance.

Flexibility: If the organization intends to increase or extend the functionality of the software after it is deployed, that should be planned from the beginning; it influences choices made during the design, development, testing, and deployment of the system.

Reliability: Reliability specifies the capability of the software to maintain its performance over time. Unreliable software fails frequently, and certain tasks are more sensitive to failure (for example, because they cannot be restarted, or because they must be run at a certain time).

Now, make them practical and useful to your project by doing thee simple steps. To do this, you’ll need three index cards. Here’s what to do:

  1. On the front of each of the three index cards, write one a type of nonfunctional requirement at the top. So on the first card, write “Performance”. On the second one, write “Flexibility”. And on the third one, write “Reliability”. Write these words on the front and the back of the card. If bright colors grab your attention, use a bright-colored highlighter to highlight them. (Personally, they don’t really do anything for me.)
  2. Take each of the cards. On each of them write down the name of one feature f your software and what this particular attribute means for that feature. I like to use Search and Replace as an example whenever I talk about this sort of thing, because we’ve all used it and understand it. So on the Performance card I might write, “Search and replace: searching through a large document needs to be fast.”
    • Performance index card
  3. Here’s the hard part. On the back of each card, write down a single test that you can do to figure out how well your software meets that requirement. So on the back of the performance card I might write, “Replacing 100 occurrences of a 4-character string in a 25MB document has to take under 750 milliseconds.” (The more concrete you can make this test, the better this works.)
    • Performance index card (back)

Now that you’ve got those three cards, tack them up on your cubicle wall (or, even better, on your task board). Make sure the feature you wrote down in step #2 is facing forward. Make sure they’re someplace you’ll see them. Take just a minute or two each day to look at them and figure out if you’re headed in the right direction. What you’ll find more often than not is that as you’re designing your system, you won’t forget about those three things. Just spending a small amount of time writing down and thinking about these things can color your whole project.

Once you’ve moved from the design phase into the programming phase, flip the cards around so the test side is showing forward. (If you’re on an agile project with a three-week iteration, this might happen during the first week, but this works equally well for projects with a longer design phase.) As you’re writing the code for each of the features you wrote down, run that test by hand. It should only take a few minutes to do, and it will give you an idea of how well you’re doing. If you do this enough, you might end up figuring out a way to automate that test. If you do, and if you have a build server, then you can add it to the build. That way you’ll know any time you check in code that causes your project to see its performance (or reliability, or flexibility) degrade.

Try doing that on your next project, and what you should find is that you spend more time thinking about these things. When opportunities to improve those three specific things come up, you’ll recognize them and take them. And that’s a great way to build better software.

How to hold a more effective code review

A lot of programmers feel like being asked to go to a code review is like being told by mom to eat our veggies. We’ll complain about it, and even if we do eventually swallow them we’re determined not to enjoy them.

It’s something I’ve seen over and over again: programmers groaning about having to go to a code review, usually because someone gets some big idea about making things better, and decides this is how you do it. There’s sometimes a little nervous joking at the beginning of the meeting about how nobody really wants to be there. And after it’s done, a lot of us get the distinct feeling that it was a waste of time.

The thing is, code reviews can be a really good thing. And not only that, they don’t have to be a chore. If you do them right, people on the team can start to appreciate them and even – heaven forbid – enjoy them.

So how do we make code reviews more palatable? We need to think about what motivates us as programmers. Programmers love to code. We love building things, and we love solving problems. But we hate anything that seems bureaucratic or tedious, and we definitely hate meetings. But most of all, we hate being in uncomfortable social situations that require us to confront people face-to-face. We’re not alone in that; most people hate situations like that.

I think that’s a pretty big part of why programmers intuitively dislike code reviews. It’s not that we’re afraid of putting our work out there for our peers to see. That’s actually something we look forward to: we love to show off code that we’ve worked so hard on, and we definitely appreciate the input from the people around us. But it’s almost never the person whose code is being reviewed who groans about it. No, it’s usually the people who are asked to attend the review. And I think I know why: it’s because we don’t like being asked to criticize the work of others, openly and without hesitation, for the good of the team. I think we naturally feel uncomfortable putting someone else in the position of having to openly confront their errors, because it’s so easy to imagine ourselves in that same position.

And that may be the secret to holding code reviews that people actually look forward to attending: make it about helping make the code better, not criticizing and finding its flaws. If you can come up with a way to avoid bitter arguments about tiny details and stubbornly-held opinions, and instead concentrate on helping someone improve his or her code, then people will stop thinking about code reviews as an uncomfortable meeting and start thing about them as a way to build better software.

That sounds like a tall order. How do you do it?

Well, you start out by thinking about one of the best things that can happen in a code review, something that I’ve seen many times. It usually happens about two thirds of the way through the review, about the time when the first signs of meeting fatigue are starting to set in. Someone points out another problem with the code, and a conversation starts up about an aspect of it that nobody really thought of. Uh-oh — it’s a bug, a particularly nasty one that. Then everyone kind of looks at each other with a weird mixture of relief and disgust, because we found a bug that a) definitely would have made it into production, and b) would have taken hours or days to track down and fix.

I’ve said this many times before: programmers are very practical people, especially when it comes to our own time. If something will save us time down the road, we’ll definitely do it. If you can show a programmer that a tool or technique (like a code review) saves more time than it costs, that’s a great way to get him or her to start thinking positively about it.

That doesn’t change the fact that a lot of people get nervous about code reviews, even people who have done them a bunch of times. So I spent a little time thinking of advice I’ve given about code reviews over the years. Some of this is pretty standard code review stuff, but I think it’s worth repeating because people have so many different views on how to do code reviews effectively. And I think that if you think about them, and get other people on your team thinking the same way, it could definitely help you hold effective code reviews.

So here’s some advice about holding better code reviews — you can think of them as code review tools (or even code review best practices, although I’m not a huge fan of that term) that can make your software better:

  • First things first: there are a lot of different ways to do a code review. Some people like to follow a very strict process. Personally, I like to keep them informal; the more it seems like an everyday conversation, the more work we actually get done.
  • It’s important to keep the meeting to under two hours — any more than that and meeting fatigue sets in. Most code review teams can handle between 200 and 400 lines of code in a two-hour review meeting. (Your mileage may vary.)
  • Don’t forget to distribute the code before the review, and make sure you give everyone enough time to actually look through it. Send around a PDF of the code (a2ps is a good way to make it readable, and it’s got stylesheets for almost every language). Also make sure that everyone also has access to the source, and that they know how to build and run it. Sometimes it’s a lot easier to prepare for a code review if you can actually debug your way through the code.
  • Make sure that everyone knows you appreciate their time. It’s easy to forget that, but it helps the team see the review as a useful tool and not just another boring meeting. Remember, you’re pulling half a dozen or more people into a room for two hours, plus preparation time — that’s the equivalent of taking a top developer off of all projects for two days. That’s also why it’s very important to choose a good block of code to review. Choose one that’s inherently risky: a difficult algorithm, code from a library that many other people depend on, an interface a lot of people will use, a particularly nasty bit of spaghetti code.
  • Code reviews and refactoring work really well together. Look for opportunities to extract methods, rename variables, replace literals with constants, etc.
  • Pay attention to OOP principles, especially encapsulation. Improving encapsulation is an easy and effective way to prevent bugs.
  • The code review isn’t just about bringing up issues — it should also be about giving some indication of how to resolve those issues. Ideally, the programmer whose code is being reviewed should be able to read through the results of the review and very quickly implement the fixes, because in the meeting we wrote down exactly what needs to be fixed and how to fix it. (We don’t actually have to write down lines of code, of course — just enough information so it’s clear what to do.) A good way to do this is to make the goal of the meeting to be to produce a log, or a list of fixes that need to be made to the code.
  • Instead of storing the log in a spreadsheet, Word document, or wiki page (I’ve done all three), try having the moderator put the results of the review directly into the code as comments (which includes an easily searchable unique string like ‘// %%TODO: CODE REVIEW 8/28/08%%’, so the programmer can find them all). The results of the review meeting can be e-mailed out a link to a diff report in the source repository. When the programmer goes back to update the code, he or she can alter the comments to make sure they make sense in context — but they can stay in the code because they’ll make more sense, and it’ll be clear why the code is the way it is if someone is maintaining it in the future.
  • A good way to focus the discussion is to guide the conversation away from arguments about coding in general, and towards what to write down in the log to resolve the current issue. Make a good effort to figure out how to resolve the issue: most can be resolved in the meeting. Any issue that can’t be resolved in a reasonable amount of time gets added to the log as an open issue.
  • I’ve always gotten a lot of mileage out of using a moderator. The moderator’s job is to keep track of the discussion, and keep the discussion on track. If people are getting off onto a tangent that won’t benefit the code, or if they’ve gotten into a disagreement where there are merits on both sides of the issue and it clearly won’t be resolved, the moderator should suggest that we log it as an open issue and move on. You can always follow up later and resolve the issue.
  • Some people get very strict about making sure that the moderator stays at arm’s length, and doesn’t get involved with the review. Personally, I think code reviews are hard enough without imposing arbitrary rules like that (because we’re laying someone’s code bare and dissecting it, which can be difficult for anyone who’s not used to it). We’re all adults here, and we can trust any of us not to “abuse” a moderator role. If the moderator has something to say, he or she should say it. If it’s easier, replace “moderator” with “note-taker” or something like that.
  • Don’t be pedantic, and try to avoid theoretical discussions. It’s really easy to get bogged down with a discussion that doesn’t go anywhere about whether this variable declaration should be here or there, whether this type of structure or that is slightly more efficient, if we could do something better in theory if we scrapped a large amount of code and rewrote it. If a discussion isn’t going to directly lead to a change, even if it’s an interesting topic, note it in the log as an open issue and move on. And definitely don’t point out spelling errors. A lot of grate programmers are lousy spelers.
  • Make sure variable names make sense, but don’t worry about naming conventions. Some people love underscores in front of parameters, some people hate them. I’m sure you can come up with at least three different “official” conventions for any programming language. There are few things less useful during a code review than an argument over whether to use PascalCase or camelCase.
  • One way people like to do reviews is to have people “read” the code – interpret it out loud. I’ve had some success with going around the table and having people take turns “reading” each chunk of code. If there’s a chunk that is difficult to “read”, it’s not clear enough and is a good candidate for refactoring.
  • Before you distribute the code to the team, run it through a static code analyzer (like FindBugs or FxCop) and fix issues that are found. There’s no need to waste discussion time on problems that a tool can catch and log for us.
  • I’ve had a lot of success with a kind of review called a “high impact inspection” (that’s a term that was developed at HP about fifteen years ago). Basically, it boils down to having everyone do the code review independently and e-mailing their issues to a moderator. The moderator puts the issues into one big list, sends them back out to everyone, and then the review meeting itself is focused on just going through those issues. Jenny and I developed a code review process similar to high impact inspections to let us hold inspections in teams outsourced to India (where time zone differences make it very difficult to regularly schedule two-hour meetings). We ran it a bunch of times, and it worked really well.

When Jenny and I were writing the section on code review techniques in our first book, Applied Software Project Management, we came up with a checklist of things that you should look for during a code review. That should give you a good starting point for coming up with a good review.

Good luck with your code reviews! If you end up with a good code review success (or failure!) story, I’d love to hear about it.

Unit testing and the narrowly averted Citicorp Center disaster

It was almost a disaster...

I was working on a project earlier today. Now, typically I always do test-driven development, where I’ll build unit tests that verify each class first and then build the code for the class after the tests are done. But once in a while, I’ll do a small, quick and dirty project, and I’ll think to myself, “Do I really need to write unit tests?” And then, as I start building it, it’s obvious: yes, I do. It always comes at a point where I’ve added one or two classes, and I realize that I have no idea if those classes actually work. I’ll realize that I’ve written a whole bunch of code, and I haven’t tested any of it. And that starts making me nervous. So I turn around and start writing unit tests for the classes I’ve written so far… and I always find bugs. This time was no exception.

This time, for some reason, that Lose Weight Exercise reminded me of the story of the nearly disastrous Citicorp Center building.

Citicorp Center was one of the last skyscrapers built in the New York City skyscraper and housing boom in the 1960s and 1970s. A lot of New Yorkers today probably don’t realize that it was actually one of the more interesting feats of structural engineering at the time. The building was built on a site occupied by St. Peter’s, a turn-of-the-century Lutheran church that would have to be demolished to make way for the skyscraper. The church agreed to let Citigroup demolish it, on one condition: that it be rebuilt on the same site.

The engineer, Bill LeMessurier, came up with an ingenious plan: put the base of the building up on columns, and cantilever the edge of the building over the church. Take a look at it on Google Maps’ Street View — you can pan up, navigate around, and see just how much of a structural challenge this was.

The building was completed in 1977. A year later, LeMessurier got a call from an engineering student studying the Citicorp building. Joe Morgenstern’s excellent 1995 New Yorker article about the building describes it like this:

The student wondered about the columns–there are four–that held the building up. According to his professor, LeMessurier had put them in the wrong place.

“I was very nice to this young man,” LeMessurier recalls. “But I said, ‘Listen, I want you to tell your teacher that he doesn’t know what the hell he’s talking about, because he doesn’t know the problem that had to be solved.’ I promised to call back after my meeting and explain the whole thing.”

Unfortunately, LeMessurier was mistaken, and in the article he describes the problem in all its gory detail. It’s a fascinating story, and I definitely recommend reading it — it’s a great example of how engineering projects can go wrong. It’ll probably seem eerily familiar to most experienced developers: after a project is done, someone uncovers something that seems to be a tiny snag, which turns out to be disastrous and requires a huge amount of rework.

Rework in a building isn’t pretty. In this case, it required a team to go through and weld steel plates over hundreds of bolted joints throughout the building, all over the weekends so nobody would find out and panic.

But what I found especially interesting about the story had to do with testing the building:

On Tuesday morning, August 8th, the public-affairs department of Citibank, Citicorp’s chief subsidiary, put out the long delayed press release. In language as bland as a loan officer’s wardrobe, the three-paragraph document said unnamed “engineers who designed the building” had recommended that “certain of the connections in Citicorp Center’s wind bracing system be strengthened through additional welding.” The engineers, the press release added, “have assured us that there is no danger.” When DeFord expanded on the handout in interviews, he portrayed the bank as a corporate citizen of exemplary caution–“We wear both belts and suspenders here,” he told a reporter for the News–that had decided on the welds as soon as it learned of new data based on dynamic-wind tests conducted at the University of Western Ontario.

There was some truth in all this. During LeMessurier’s recent trip to Canada, one of Alan Davenport’s assistants had mentioned to him that probable wind velocities might be slightly higher, on a statistical basis, than predicted in 1973, during the original tests for Citicorp Center. At the time, LeMessurier viewed this piece of information as one more nail in the coffin of his career, but later, recognizing it as a blessing in disguise, he passed it on to Citicorp as the possible basis of a cover story for the press and for tenants in the building.

Tests were at the center of this whole situation. It turned out that insufficient testing was done at the beginning of the project. Now, more tests were used to figure out how to handle the situation. Tests got them into the situation, and tests got them out.

So what does this have to do with software?

I have a hunch that anyone who’s done a lot of test-driven development will see the relevance pretty quickly. The quality of your software — whether it does its job or fails dramatically — depends on the quality of your tests. It’s easy to think that you’ve done enough testing, but once in a while your tests uncover a serious problem that would be painful — even disastrous — to repair. And as LeMessurier found, it’s easy to run tests that give a false sense of security because they’re based on faulty assumptions.

I’ve had arguments many times over my career with various people about how much testing to do. I can’t say that I’ve always handled them perfectly, but I have found a tactic that works. I point to the software and ask which of the features doesn’t have to work properly. But it’s good to remind myself how easy it is to question the importance of tests. It’s so easy, in fact, that I did it myself earlier today. And that’s why it’s important to have examples like Citicorp Center to remind us of how important testing can be.

How spending a little extra time and money on design might have saved Microsoft over a billion bucks

I really wanted an Xbox 360.

My old PS2 is showing its age, and I wanted to upgrade to a new system as soon as I finished the last few missions of GTA: Vice City Stories — especially now that it looks like Manhunt 2 won’t be coming out for PS2 any time soon. I’m a huge fan of the GTA series, and I’m especially psyched about GTA4. I grew up in Brooklyn, on a block that looks a more than a little like a GTA4 screenshot.

But then something happened…

Viva Pinata.png

But a couple of weeks ago my plans changed. Jenny was happily thrashing away on Guitar Hero, when her TV screen just went blank. She looked down at her console, which had suddenly gone quiet. (That’s pretty noticeable, apparently, because the Xbox 360 is a really loud machine… which, as it turns out, is important to our story.) Much to her disappointment, she saw those three telltale LEDs that every Xbox owner dreads: the red ring of death.

Luckily, Jenny’s 360 lasted long enough so that she could take advantage of the Xbox 360 service site that Microsoft launched earlier this month. But her poor console was just the latest in a long line of casualties. Some retailers estimate that 30% of Xbox 360s need repair, and we’ve seen plenty of anecdotal evidence that gamers are unhappy. It’s costing Microsoft sales and spooking investors. Microsoft is doing everything they can to fix the problem — they’ve extended the warranty to three years, and it’s costing them over a billion dollars. But it’s a real mess.

As much as I want a new console, I’m not going to buy one until I know that it won’t break. I still plan on getting a 360, but not until I can be reasonably sure that I won’t have to return it. By the time I eventually get one, hopefully they’ll have figured out how to make it quieter. I’m certain that I’m not the only one who’s decided to put off buying an Xbox. And that’s bad news for Microsoft.

So what can we, as software developers, learn from the Xbox 360 fiasco?

Productive meeting

Software people like us have a nasty habit of dismissing hardware problems as if they have nothing to do with us. We tend to think that designing software is really different from building hardware. And sure, there are definitely differences. We don’t have to worry about assembly lines, product getting damaged in shipment, or those pesky laws of physics that can prove to be such an irritating limitation when you have to design physical objects.

And it’s easy to dismiss the Xbox 360 failure as one of those unfortunate things that falls into that last category of physical faults. There’s a great Tech-On! article that gives us the dirt on exactly what’s caused the problem. It’s an excellent post-mortem on what amounts to terrible thermal design.

For those of you who’ve never taken a computer apart, here’s a little background information. Dealing with heat is an important part of modern computer design. Computer processors generate a lot of heat — so much that if you don’t come up with a way to get rid of it, they’ll fry themselves. So computer manufacturers will typically attach a heat sink to a processor. A heat sink is basically just a big radiator with fins or poles that lets air circulate and draw away the heat. (I once roasted a Pentium 4 processor by popping its heat sink off while the computer was running, just to see what would happen. It went “poof”.) A lot of processors are too hot even for heat sinks; in that case, you’ll need to stick a fan on top of it to cool it off. That’s why some computers are so noisy: they need fans to keep them cool.

It turns out that the Xbox 360 generates far too much heat, and a lot of people speculate that when that heat builds up past a critical point it unseats the GPU (a separate processor that’s used for graphics). Microsoft has so far refused to comment on exactly what the problem is, but as time goes on there does seem to be some consensus forming about it. And that Tech-On! article seems to have found a smoking (heh) gun.

But that’s just the hardware stuff. What does that have to do with building better software?

The punchline for all of this came at the end of that Tech-On! article, and it’s why I think this whole incident is so interesting. Here’s what it said:

Finally, we opened the chassis of the Xbox 360 repaired in May 2007 and compared it with the other Xbox 360 we purchased in late 2005.

“Huh? The heat sinks and fans are completely identical, aren’t they?”

To our surprise, the composition of the repaired Xbox 360 looked completely the same as that of the Xbox 360 purchased in late 2005. It turned out that Microsoft provided repair without changing the Xbox 360’s thermo design at least until May 2007.

The repaired units weren’t replaced with ones that had a better design. They were the same — as far as they could tell, Microsoft just replaced a broken unit with one that hadn’t broken yet. That’s probably why we’re seeing various reports of repeated breakdowns.

What that tells me is that the design of the Xbox 360 is deeply flawed, and that design flaw has already cost Microsoft well over a billion dollars. And it’s that flawed design that can teach us a whole lot about our own software projects.

Shoddy workmanship

So what does this all mean for us developers? Well, for the more cynical among us, it could just mean a whole lot of job security. I’ve met COBOL programmers who charge ridiculous amounts of money to maintain aging systems. But while their jobs pay well, personally they sound tedious and awful to me. Does anyone really aspire to spend years patching an aging software system? Most programmers will tell you that maintaining old systems is the worst part of the job. If you love designing new and innovative software, then the last thing you want to do is get your career stuck in maintenance mode.

And that’s what Microsoft is learning with the Xbox 360. I’m not a thermal design expert, but I am absolutely positive that they could have come up with a different design that wouldn’t fail so often. And while it may have cost more money to design the system and build each unit, I sincerely doubt those extra costs would have added up to over a billion dollars. And maybe the extra design time might have cost them more time… but now there are plenty of us who aren’t buying the system because we don’t want to be stung by the rampant quality problems.

Had Microsoft designed the system properly in the first place, they wouldn’t be in this mess now. And that’s the big lesson for us to learn. Oddly enough, it’s not a new lesson… in fact, it’s a pretty old one. One way to look at the Xbox thermal problem is to see it as a design defect that wasn’t caught until after the product was shipped.

Look what I found in an old 1997 issue of Windows Tech Journal… it’s an article by one of our favorite authors, Steve McConnell, called “Upstream Decisions, Downstream Costs”. The article lays out a scenario that most of us will recognize immediately: a fictional software company runs into problems because they don’t do enough planning up front, and end up getting buried with bugs, which cause awful delays. It also has a chart that anyone who’s read a few software engineering textbooks will recognize, showing that the earlier a bug is introduced in the project and the later it’s caught, the more expensive it is to fix.

So now we’ve seen a good, real-world situation where better design practices would have saved a whole lot of money. But what can we do about it in our own projects?

First and foremost, this gives us more ammunition when arguing with our coworkers and our bosses for more time to design our software. It’s really easy to get frustrated during the design phase of a software project, when a few people are generating a lot of paper or diagrams but nobody’s working on the code yet. That’s one of the things that we pointed out in our first book, Applied Software Project Management — that finding problems too late can sink projects. Luckily, there’s a relatively painless fix: adopt good review practices.

This is something that our friends in the open source world are really good at. Jenny and I talked about this in an ONLamp.com article we wrote last year called “What Corporate Projects Should Learn from Open Source”. A lot of high-profile, successful open source projects have very careful reviews, where they scrutinize scope and design decisions before they start coding. (To be fair, a lot of high-profile, successful closed source projects do the same, but we can’t just go to their websites and see their review results.)

So the moral of the story is that it often costs less to spend more time and money on design up front. And I bet there are some Microsoft shareholders that will agree.