Becky and I are working on a new version of the BBST courses (BBST-2014). In the interim, to support the universities teaching the course, the AST, and the many people who are studying the materials on their own, we’re publishing some of the ideas we have for clarifying the course. The first two in this series focused on oracles in the Foundations course and on interactive grading as a way to give students more detailed feedback. This post is about follow-up testing, which we cover in the current BBST-BA’s slides 44-68.
Some failures are significant and they occur under circumstances that seem fairly commonplace. These will either be fixed, or there will be a compelling reason to not fix them.
Other failures need work. When you see them, there’s ambiguity about the significance of the underlying bug:
- Some failures look minor. Underlying the failure is a bug that might (or might not) be more serious than you’d imagine from the first failure.
- Some failures look rare or isolated. Underlying that failure is a bug that might (or might not) cause failures more often, or under a wider range of conditions, than you’d imagine from the first failure.
Investigating whether a minor-looking failure reflects a more serious bug
To find out whether the underlying bugs are more serious than the first failures make them look, we can do follow-up testing. But what tests?
That’s a question that Jack Falk, Hung Quoc Nguyen and I wrestled with many times. You can see a list of suggestions in Testing Computer Software. Slides 44-56 sort those ideas (and add a few) into four categories:
- Vary your behavior
- Vary the options and settings of the program
- Vary data that you load into the program
- Vary the software and hardware environment
Some students get confused by the categories (which is not shocking, because the categories are a somewhat idiosyncratic attempt to organize a broad collection of test ideas), confused enough that they don’t do a good job on the exam of generating test ideas from the categories.
For example, they have trouble with this question:
Suppose that you find a reproducible failure that doesn’t look very serious.
- Describe the four tactics presented in the lecture for testing whether the defect is more serious than it first appeared.
- As a particular example, suppose that the display got a little corrupted (stray dots on the screen, an unexpected font change, that kind of stuff) in Impress when you drag the scroll bar up and down. Describe four follow-up tests that you would run, one for each of the tactics that you listed above.
I’m not going to solve this puzzle for you, but the solution should be straightforward if you understand the categories.
The slides work through a slightly different example:
A program unexpectedly but slightly scrolls the display when you add two numbers:
- The task is entering numbers and adding
- The failure is the scrolling.
Let’s consider the categories in terms of this example
1. Vary your behavior
When you run a test, you intentionally do some things as part of the test. For example, you might:
- enter some data into input variables
- write some data into data files tha tthe program will read
- give the program commands
You might change any of these as part of your follow-up testing. (What happens if I do this instead of that?)
These follow-up tests might include changing the data or adding steps, substituting steps, or taking steps away.
For example, if adding one pair of numbers causes unexpected scrolling, suppose you try adding two numbers many times. Will the program scroll more, or scroll differently, as you repeat the test?
Suppose we modified the example so the program reads (and then adds) two numbers from a data file. Changing that data file would be another example of varying your behavior.
The slides give several additional examples.
2. Vary the options and settings of the program
Think about Microsoft Word. Here are some examples of its options:
- Show (or don’t show) formatting marks on the screen
- Check spelling (or don’t check it) as you type
In addition, you might change a template that controls the formatting of new documents. Some examples of variables you might change in the template are:
- the default typeface
- the spacing between tab stops
- the color of the text
Which of these are “options” and which of these are “settings”? It doesn’t matter. The terminology will change from program to program. What doesn’t change is that these are persistent variables. Their value stays with the program from one document to another.
3. Vary data that you load into the program
This isn’t well worded. Students confuse what I’m talking about with the basic test data.
Imagine again testing Microsoft Word. Suppose that you are focused on the format of the document, so your test documents have lots of headings and tables and columns and headers and footers (etc.). If you change the document, save it, and load the revised one, that is data that you load into the program, but I think of that type of change as part of 1. Vary your behavior.
When Word starts up, it also loads some files that might not be intentional parts of your test. For example, it loads a dictionary, a help file, a template, and probably other stuff. Often, we don’t even think of these files (and how what they hold might affect memory or performance) when we design tests. Sometimes, changing one of these files can reveal interesting information.
4. Vary the software and hardware environment
For example,
- Change the operating system’s settings, such as the language settings
- Change the hardware (a different video card) or how the operating system works with the hardware (a different driver)
- Change hardware settings (same video card, different display resolution)
This is a type of configuration testing, but the goal here is to try to demonstrate a more impressive failure, not to assess the range of circumstances under which the bug will appear.
Investigating whether an obscure-looking failure will arise under more circumstances
In this case, the failure shows up under what appear to be special circumstances. Does it only show up under special circumstances?
Slides 58-68 discuss this, but some additional points are made on other slides or in the taped lecture. Bringing them together…
- Uncorner your corner cases
- Look for configuration dependence
- Check whether the bug is new to this version
- Check whether failures like this one already appear in the bug database
- Check whether bugs of this kind appear in other programs
Here are a few notes:
1. Uncorner your corner cases
Risk-based tests often use extreme values (boundary conditions). These are good for exposing a failure, but once you find the failure, try less extreme values. Demonstration of failure with a less extreme test will yield a more credible report.
2. Look for configuration dependence
In this case, the question is whether the failure will appear on many configurations or just this one. Try it with more memory, with another version of operating system (or on another OS altogether), etc.
3. Check whether the bug is new to this version
Does this bug appear in earlier versions of the program? If so, did users (or others) complain about it?
If the bug is new, especially if it was a side-effect of a fix to another bug, some people will take it much more seriously than a bug that has been around a long time but rarely complained about.
4. Check whether bugs like this one already appear in the bug database
One of the obvious ways to investigate whether a failure appears under more general circumstances than the ones in a specific test is to check the bug tracking database to see whether the failure has in fact occurred under other circumstances. It often takes some creativity and patience to search out reports of related failures (because they probably aren’t reported in exactly the same way as you’re thinking of the current failure), but if your database has lots of not-yet-fixed bugs, such a search is often worthwhile.
5. Check whether bugs of this kind appear in other programs
The failure you’re seeing might be caused by the code specifically written by this programmer, or the bug might be in a library used by the programmer. If it’s in the library, the same bug will be in other programs. Similarly, some programmers model (or copy) their code from textbook descriptions or from code they find on the web. If that code is incorrect, the error will probably show up in other programs.
If the error does appear in other programs, you might find discussions of the error (how to find it, when it appears, how serious it is) in discussions of those programs.
This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.