Archive for June, 2011

Testing tours: Research for Best Practices?

Friday, June 24th, 2011

A few years ago, Michael Bolton wrote a blog post on “Testing Tours and Dashboards.” Back then, it had recently become fashionable to talk about “tours” as a fundamental class of tool in exploratory testing. Michael reminded readers of the unacknowledged history of this work.

Michael’s post also mentioned that some people were talking about running experiments on testing tours. Around that time, I heard a bunch about experiments that purported to compare testing tours. This didn’t sound like very good work to me, so I ignored it. Bad ideas often enjoy a burst of publicity, but if you leave them alone, they often fade away over time.

This topic has come up again recently, repeatedly. A discussion with someone today motivated me to finally publish a comment.

The comments have come up mainly in discussions or reviews of a test design course that I’m creating. The course (video) starts with a demonstration of a feature tour, followed by a inventory of many of the tours I learned from Mike Kelly, Elisabeth Hendrickson, James Bach and Mike Bolton.

Why, the commentator asks, do I not explain which tours are better and which are worse? After all, hasn’t there been some research that shows that Tour X is better than Tour Y? This continues down one of two tracks:

  • Isn’t it irresponsible to ignore scientific research that demonstrates that some tours are much better than others or that some tours are ineffective?
  • Shouldn’t I be recommending that people do more of this kind of research? Wouldn’t this be a promising line of research? Couldn’t someone propose it to a corporate research department or a government agency that funds research? After all, this could be a scientific way to establish some Best Practices for exploratory testing (use the best tours, skip the worst ones).

This idea, using experiments to rank tours from best to worst, can certainly be made to sound impressive.

I don’t think this is a good idea. I’ll say that more strongly: even though this idea might be seductive to people who have training in empirical methods (or who are easily impressed by descriptions of empirical work), I think it reflects a fundamental lack of understanding of exploratory testing and of touring as a class of exploratory tools.

A tour is a directed search through the program. Find all the capabilities. Find all the claims about the product. Find all the variables. Find all the intended benefits. Find all the ways to get from A to B. Find all the X. Or maybe not ALL, but find a bunch.

This helps the tester achieve a few things:

  1. It creates an inventory of a class of attributes of the product under test. Later, the tester can work through the inventory, testing each one to some intended level of depth. This is what “coverage”-oriented testing is about. You can test N% of the program’s statements, or N% of the program’s features, or N% of the claims made for the product in its documentation–if you can list it, you can test it and check it off the list.
  2. It familiarizes the tester with this aspect of the product. Testing is about discovering quality-related information about the product. An important part of the process of discovery is learning what is in the product, how people can / will use it, and how it works. Tours give us different angles on that multidimensional learning problem.
  3. It provides a way for the tester to explain to someone else what she has studied in the product so far, what types of things she has learned and what basics haven’t yet been explored. In a field with few trustworthy metrics, this gives us a useful basis for reporting progress, especially progress early in the testing effort.

So which tour is better?

From a test-coverage perspective, I think that depends a lot on contract, regulation, and risk.

  1. To the extent that you have to know (and have to be able to tell people in writing) that all the X’s have been tested and all the X’s work, you need to know what all the X’s are and how to find them in the program. That calls for an X-tour. Which tour is better? The one you need for this product. That probably varies from product to product, no?
  2. Some programmers are more likely to make X-bugs than Y-bugs. Some programmers are sloppy about initializing variables. Some are sloppy about boundary conditions. Some are sloppy about thread interactions. Some are good coders but they design user interactions that are too confusing. If I’m testing Joe X’s code, I want to look for X-bugs. If Joe blows boundaries, I want to do a variable tour, to find all the variables so I can test all the boundaries. But if Joe’s problem is incomprehensibility, I want to do a benefit tour, to see what benefits people _should_ get from the program and how hard/confusing it is for users to actually get them. Which tour is better? That depends on which risks we are trying to mitigate, which bugs we are trying to find. And that varies from program to program, programmer to programmer, and on the same project, from time to time.

From a tester-learning perspective, people learn differently from each other.

  1. If I set 10 people with the task of learning what’s in a program, how it can be used, and how it works, those people would look for different types of information. They would be confused by different things. They would each find some things more interesting than others. They would already know some things that their colleagues didn’t.
  2. Which tour is better? The tour that helps you learn something you’re trying to learn today. Tomorrow, the best tour will be something different.

Testing is an infinite task. Define “distinct tests” this way: two tests are distinct if each can expose at least one bug that the other would miss. For a non-trivial program, there is an infinite number of distinct potential tests. The central problem of test design is boiling down this infinite set to a (relative to infinity) tiny collection of tests that we will actually use. Each test technique highlights a different subset of this infinity. In effect, each test technique represents a different sampling strategy from the space of possible tests.

Tours help the tester gain insight into the multidimensional nature of this complex, infinite space. They help us imagine, envision, and as we gain experience on the tour, prioritize the different sampling strategies we could use when we do more thorough, more intense testing after finishing the tours. So which tour is best? The ones that give the testers more insight and that achieve a greater stretch of the testers’ imagination. For this, some tours will work better for me, others for you.

The best tour isn’t necessarily the one that finds the most bugs. Or covers the most statements (or take your pick of what coverage attribute) of the product. The best tour is the one that helps the individual human tester learn something new and useful. (That’s why we call it exploration. New and useful knowledge.) And that depends on what you already know, on what risks and complexities characterize this program, and what the priorities are in this project.

A tour that is useful to you might be worthless to me. But that doesn’t stop it from being useful for you.

  • Rather than looking for a few “best” tours, I think it would be more interesting to develop guidance on how to do tour X given that you want to. (Do you know how to do a tour of uses of the program that trigger multithreaded operations in the system? Would it be interesting to know how?)
  • Rather than looking for a few “best” tours, I think it would be more interesting to develop a more diverse collection of tours that we can do, with more insight into what each can teach us.
  • Rather than seeking to objectify and quantify tours, I think we should embrace their subjectivity and the qualitative nature of the benefits they provide.

Pseudo-academics and best-practicers trivialize enough aspects of testing. They should leave this one alone.