Updating some core concepts in software testing

Nov 21st, 2006 at 3:58 am

Most software testing techniques were first developed in the 1970Ã¢â‚¬â„¢s, when Ã¢â‚¬Å“largeÃ¢â‚¬? programs were tiny compared to today.

Programmer productivity has grown dramatically over the years, a result of paradigmatic shifts in software development practice. Testing practice has evolved less dramatically and our productivity has grown less spectacularly. This divergence in productivity has profound implicationsÃ¢â‚¬â€every year, testers impact less of the product. If we continue on this trajectory, our work will become irrelevant because its impact will be insignificant.

Over the past few years, several training organizations have created tester certifications. I donÃ¢â‚¬â„¢t object in principle to certification but the Body of Knowledge (BoK) underlying a certificate has broader implications. People look to BoKs as descriptions of good (or at least current) attitudes and practice.

IÃ¢â‚¬â„¢ve been dismayed by the extent to which several BoKs reiterate the 1980Ã¢â‚¬â„¢s. Have we really made so little progress?

When we teach the same basics that we learned, we provide little foundation for improvement. Rather than setting up the next generation to rebel against the same dumb ideas we worked around, we should teach our best alternatives, so that new testersÃ¢â‚¬â„¢ rebellions will take them beyond what we have achieved.

One popular source of orthodoxy is my own book, Testing Computer Software (TCS). In this article, I would like to highlight a few of the still-influential assumptions or assertions in TCS, in order to reject them. They are out of date. We should stop relying on them.

Where TCS was different

I wrote TCS to highlight what I saw as best practices (of the 1980Ã¢â‚¬â„¢s) in Silicon Valley, which were at odds with much of the received wisdom of the time:

Testers must be able to test well without authoritative (complete, trustworthy) specifications. I coined the phrase, exploratory testing, to describe a survival skill.
Testing should address all areas of potential customer dissatisfaction, not just functional bugs. Because matters of usability, performance, localizability, supportability, (these days, security) are critical factors in the acceptability of the product, test groups should become skilled at dealing with them. Just because something is beyond your current skill set doesnÃ¢â‚¬â„¢t mean itÃ¢â‚¬â„¢s beyond your current scope of responsibility.
It is neither uncommon nor unethical to defer (choose not to fix) known bugs. However, testers should research a bug or design weakness thoroughly enough and present it carefully enough to help the project team clearly understand the potential consequences of shipping with this bug.
Testers are not the primary advocates of quality. We provide a quality assistance service to a broader group of stakeholders who take as much pride in their work as we do.
The decision to automate a test is a matter of economics, not principle. It is profitable to automate a test (including paying the maintenance costs as the program evolves) if you would run the manual test so many times that the net cost of automation is less than manual execution. Many manual tests are not worth automating because they provide information that we donÃ¢â‚¬â„¢t need to collect repeatedly.
Testers must be able to operate effectively within any software development lifecycleÃ¢â‚¬â€the choice of lifecycle belongs to the project manager, not the test manager. In addition, the waterfall model so often advocated by testing consultants might be a poor choice for testers because the waterfall pushes everyone to lock down decisions long before vital information is in, creating both bad decisions and resistance to later improvement.
Testers should design new tests throughout the project, even after feature freeze. As long as we keep learning about the product and its risks, we should be creating new tests. The issue is not whether it is fair to the project team to add new tests late in the project. The issue is whether the bugs those tests could find will impact the customer.
We cannot measure the thoroughness of testing by computing simple coverage metrics or by creating at least one test per requirement or specification assertion. Thoroughness of testing means thoroughness of mitigation of risk. Every different way that the program could fail creates a role for another test.

The popularity of these positions (and the ones they challenged) waxes and wanes, but at least they are seen as mainstream points of view.

Where TCS 3 would (will) be different

TCS Editions 1 and 2 were written in a moderate tone. In retrospect, my wording was sometimes so gentle that readers missed key points. In addition, some of TCS 2Ã¢â‚¬â„¢s firm positions were simply mistaken:

It is not the primary purpose of testing to find bugs. (Nor is it the primary purpose of testing to help the project manager make decisions.) Testing is an empirical investigation conducted to provide stakeholders with information about the quality of the software under test. Stakeholders have different informational needs at different times, in different situations. The primary purpose of testing is to help those stakeholders gain the information they need.
Testers should not attempt to specify the expected result of every test. The orthodox view is that test cases must include expected results. There are many stories of bugs missed because the tester simply didnÃ¢â‚¬â„¢t recognize the failure. IÃ¢â‚¬â„¢ve seen this too. However, I have also seen cases in which testers missed bugs because they were too focused on verifying Ã¢â‚¬Å“expectedÃ¢â‚¬? results to notice a failure the test had not been designed to address. You cannot specify all the resultsÃ¢â‚¬â€all the behaviors and system/software/data changesÃ¢â‚¬â€that can arise from a test. There is value in documenting the intent of a test, including results or behaviors to look for, but it is important to do so in a way that keeps the tester thinking and scanning for other results of the test instead of viewing the testing goal as verification against what is written.
Procedural documentation probably offers little training value. I used to believe testers would learn the product by following test scripts or by testing user documentation keystroke by keystroke. Some people do learn this way, but others (maybe most) learn more from designing / running their own experiments than from following instructions. In science education, we talk about this in terms of the value of constructivist and inquiry-based learning. ThereÃ¢â‚¬â„¢s an important corollary to this that IÃ¢â‚¬â„¢ve learned the hard wayÃ¢â‚¬â€when you create a test script and pass it to an inexperienced tester, she might be able to follow the steps you intended, but she wonÃ¢â‚¬â„¢t have the observational skills or insights that you would have if you were following the script instead. Scripts might create a sequence of actions but they donÃ¢â‚¬â„¢t create cognition.
Software testing is more like design evaluation than manufacturing quality control. A manufacturing defect appears in an individual instance of a product (like badly wired brakes in a car). It makes sense to look at every instance in the same ways (regression tests) because any one might fail in a given way, even if the one before and the one after did not. In contrast, a design defect appears in every instance of the product. The challenge of design QC is to understand the full range of implications of the design, not to look for the same problem over and over.
Testers should not try to design all tests for reuse as regression tests. After theyÃ¢â‚¬â„¢ve been run a few times, a regression suiteÃ¢â‚¬â„¢s tests have one thing in common: the program has passed them all. In terms of information value, they might have offered new data and insights long ago, but now theyÃ¢â‚¬â„¢re just a bunch of tired old tests in a convenient-to-reuse heap. Sometimes (think of build verification testing), itÃ¢â‚¬â„¢s useful to have a cheap heap of reusable tests. But we need other tests that help us understand the design, assess the implications of a weakness, or explore an issue by machine that would be much harder to explore by hand. These often provide their value the first time they are runÃ¢â‚¬â€reusability is irrelevant and should not influence the design or decision to develop these tests.
Exploratory testing is an approach to testing, not a test technique. In scripted testing, a probably-senior tester designs tests early in the testing process and delegates them to programmers to automate or junior testers to run by hand. In contrast, the exploratory tester continually optimizes the value of her work by treating test-related learning, test design, test execution and test result interpretation as mutually supportive activities that run in parallel throughout the project. Exploration can be manual or automated. Explorers might or might not keep detailed records of their work or create extensive artifacts (e.g. databases of sample data or failure mode lists) to improve their efficiency. The key difference between scripting and exploration is cognitiveÃ¢â‚¬â€the scripted tester follows instructions; the explorer reinvents instructions as she stretches her knowledge base and imagination.
The focus of system testing should shift to reflect the strengths of programmersÃ¢â‚¬â„¢ tests. Many testing books (including TCS 2) treat domain testing (boundary / equivalence analysis) as the primary system testing technique. To the extent that it teaches us to do risk-optimized stratified sampling whenever we deal with a large space of tests, domain testing offers powerful guidance. But the specific techniqueÃ¢â‚¬â€checking single variables and combinations at their edge valuesÃ¢â‚¬â€is often handled well in unit and low-level integration tests. These are much more efficient than system tests. If the programmers are actually testing this way, then system testers should focus on other risks and other techniques. When other people are doing an honest and serious job of testing in their way, a system test group so jealous of its independence that it refuses to consider what has been done by others is bound to waste time repeating simple tests and thereby miss opportunities to try more complex tests focused on harder-to-assess risks.
Test groups should offer diverse, collaborating specialists. Test groups need people who understand the application under test, the technical environment in which it will run (and the associated risks), the market (and their expectations, demands, and support needs), the architecture and mechanics of tools to support the testing effort, and the underlying implementation of the code. You cannot find all this in any one person. You can build a group of strikingly different people, encourage them to collaborate and cross-train, and assign them to project areas that need what they know.
Testers may or may not work best in test groups. If you work in a test group, you probably get more testing training, more skilled criticism of your tests and reports, more attention to your test-related career path, and stronger moral support if you speak unwelcome truths to power. If you work in an integrated development group, you probably get more insight into the development of the product, more skilled criticism of the impact of your work, more attention to your broad technical career path, more cross-training with programmers, and less respect if you know lots about the application or its risks but little about how to write code. If you work in a marketing (customer-focused) group, you probably get more training in the application domain and in the evaluation of product acceptability and customer-oriented quality costs (such as support costs and lost sales), more attention to a management-directed career path, and more sympathy if programmers belittle you for thinking more like a customer than a programmer. Similarly, even if there is a cohesive test group, its character may depend on whether it reports to an executive focused on testing, support, marketing, programming, or something else. There is no steady-state best place for a test group. Each choice has costs and benefits. The best choice might be a fundamental reorganization every two years to diversify the perspectives of the long-term staff and the people who work with them.
We should abandon the idea, and the hype, of best practices. Every assertion that IÃ¢â‚¬â„¢ve made here has been a reaction to another that is incompatible but has been popularly accepted. Testers provide investigative services to people who need information. Depending on the state of their project, the ways in which the product is being developed, and the types of information the people need, different practices will be more appropriate, more efficient, more conducive to good relations with others, more likely to yield the information soughtÃ¢â‚¬â€or less.

This paper has been a summary of a talk I gave at KWSQA last month and was written for publication in their newsletter. For additional details, see my paper, The Ongoing Revolution in Software Testing, available at www.kaner.com/pdfs/TheOngoingRevolution.pdf.

In education, testing. Both comments and pings are currently closed.

One Response to “Updating some core concepts in software testing”

Debasi says:

November 1, 2007 at 11:26 pm

Dear Dr. kaner,

I am terribly upset with myself for NOT noticing this article until almost 1 year of it’s publication! I visit your blog quite regularly to improve the little knowledge that I have in testing. And I am still wondering how I could miss this article for so long! Anyway, I have read the book “TCS, 2nd edition” by you. And now when I read this article it was like brushing up the whole book once again in flat 15 minutes time, with a still better perspective! Thanks for writing this brilliant piece of article. I am looking forward to more such articles from you in future. Thanks for helping me in getting better as a software tester!

Happy Testing…
Regards,
-Debasis

Cem Kaner, J.D., Ph.D.

Retired Professor of Software Engineering