I was recently asked on my project to help predict, based on project history, how many defects the team would find in the remainder of the release. Almost every project plan I’ve been on includes a task to address outstanding defects found during testing–and I’ve even done the estimating myself in some cases, but I had always wondered how this time was or should be estimated. It turns out that I’m not the only one. I asked others on my project and other projects how they would estimate time to fix defects that would be detected. Most of the answers could be distilled to a ‘fudge factor’ that development provided and was a percentage of the total development time. The variables on IT projects which affect defect counts are so great both in number and variety that it seems impossible to isolate and quantify all the relevant factors with respect to the number of defects that a project would produce, so the fudge factor is an understandable compromise.
The following IEEE paper by Norman Fenton offers some perspectives on the various factors which feed into defect prediction:
His conclusion is that many of the models in existence which look at attributes such as size of modules, relative complexity, existing testing, etc. only provide part of the answer to how defects can be predicted. His proposal, which relies on developing a more holistic approach and using Bayesian Belief Networks to model defect prediction is interesting, and might be promising. But, practically speaking, I was asked on a Wednesday to provide the number by Friday. So, I left my Bayesian Belief Network building device in the closet. I couldn’t even use models that relied on lines of code to do my prediction–I doubt that even the development manager even knows how many lines of code have been written on the project, and the product would probably launch before I was able to find the answer. Using lines of code to predict things like defects that will be found seems silly to me anyway (even if you isolate for things like the language the code was written in, the experience of the programmer and test teams, etc). One is just as likely to find defects that are due to poorly written requirements as they are to find a defect that is related to a badly written line of code.
The approach that I ultimately ended up using was to look at recent history. We looked at test cases executed to date, how many defects were detected per test case to get them to pass, then looked at the remaining test cases. Multiply the number of defects by remaining test cases and you’ve got a rough and ready prediction. Some modifications you may or may not find helpful: If the period for which you’re predicting covers multiple cycles, take a look at the velocity of defect detection. You can do this by populating a table with number of defects detected per test case per day. The trend will spike periodically, but should show an overall trend that goes to zero.
An approach that I think ought to be used, is to identify requirements coverage, defects per requirement detected so far, and remaining requirements to be tested. This requires good discipline around your traceability processes. It’s not perfect, but relying on test cases in your prediction model assumes that all your test cases have about the same level of complexity. Requirements, for the most part, are more atomic than test cases and typically do not vary in complexity. One flaw with using requirements in your prediction model, is that even though one requirement may not vary in complexity from any other requirement by a large degree, the solution required to implement one requirement versus another may in fact vary quite widely in complexity.
Another approach that I have heard about, but do not recommend might be dubbed the “if you don’t find it, it doesn’t exist” approach. In order to reduce the number of defects found, you simply reduce the number of tests executed or the test duration. This definitely guarantees that you will find fewer defects, but it almost also certainly ensures that your project will never get out of User Acceptance Testing and deploy.
What do our readers think? How do you know how much is left to break and fix?