Archive for May, 2010

A new brand of snake oil for software testing

Wednesday, May 19th, 2010

I taught a course last term on Quantitative Investment Modeling in Software Engineering to a mix of undergrad and grad students of computer science, operations research and business. We had a great time, we learned a lot about the market, about modeling, and about automated exploratory testing (more on this type of exploratory testing at this year’s Conference of the Association for Software Testing…)

In the typical undergraduate science curriculum, most of the experimental design we teach to undergraduates is statistical. Given a clearly formulated hypothesis and a reasonably clearly understood oracle, we learn how to design experiments that control for confounding variables, so that we can decide whether our experimental effect was statistically significant. We also teach some instrumentation, but in most cases, the students learn how to use well-understood instruments as opposed to how to appraise, design, develop, calibrate and then apply them.

Our course was not so traditionally structured. In our course, each student had to propose and evaluate an investment strategy. We started with a lot of bad ideas. (Most small investors lose money. One of the attributes of our oracle is, “If it causes you to lose money, it’s probably a bad idea.”) We wanted to develop and demonstrate good ideas instead. We played with tools (some worked better than others) and wrote code to evolve our analytical capabilities, studied some qualitative research methods (hypothesis-formation is a highly qualitative task), ran pilot studies, and then eventually got to the formal-research stages that the typical lab courses start at.

Not surprisingly, the basics of designing a research program took about 1/3 of the course. With another course, I probably could have trained these students to be moderately-skilled EVALUATORS of research articles. (It is common in several fields to see this as a full-semester course in a doctoral program.)

Sadly, few CS doctoral programs (and even fewer undergrad programs) offer courses in the development or evaluation of research, or if they offer them, they don’t require them.

The widespread gap between having a little experience replicating other people’s experiments and seeing some work on a lab, on the one hand, and learning to do and evaluate research on the other hand — this gap is the home court for truthiness. In the world of truthiness, it doesn’t matter whether the evidence in support of an absurd assertion is any good, as long as we can make it look to enough people as though good enough evidence exists. Respectable-looking research from apparently-well-credentialed people is hard for someone to dispute if, as most people in our field, one lacks training in critical evaluation of research.

The new brand of snake oil is “evidence-based” X, such as “evidence-based” methods of instruction or in a recent proposal, evidence-based software testing. Maybe I’m mistaken in my hunch about what this is about, but the tone of the abstract (and what I’ve perceived in my past personal interactions with the speaker) raise some concerns.

Jon Bach addresses the tone directly. You’ll have to form your own personal assessments of the speaker. But I agree with Jon that this does not sound merely like advocacy of applying empirical research methods to help us improve the practice of testing, an idea that I rather like. Instead, the wording  suggests a power play that seems to me to have less to do with research and more to do with the next generation of ISTQB marketing.

So let me talk here about this new brand of snake oil (“Evidence-Based!”), whether it is meant this way by this speaker or not.

The “evidence-based” game is an interesting one to play when most of the people in a community have limited training in research methods or research evaluation. This game has been recently fashionable in American education. In that context, I think it has been of greatest benefit to people who make money selling mediocritization. It’s not clear to me that this movement has added one iota of value to the quality of education in the United States.

In principle, I see 5 problems (or benefits, depending on your point of view). I say, “in principle” because of course, I have no insight into the personal motives and private ideas of Dr. Reid or his colleagues. I am raising a theoretical objection. Whether it is directly applicable to Dr. Reid and ISTQB is something you will have to decide yourself, and these comments are not sufficient to lead you to a conclusion.

  1. It is easy to promote forced results from worthless research when your audience has limited (or no) training in research methods, instrumentation, or evaluation of published research. And if someone criticizes the details of your methods, you can dismiss their criticisms as quibbling or theoretical. Too many people in the audience will be stuck making their decision about the merits of the objection on the personal persuasiveness of the speakers (which snake oil salesmen excel at) rather than on the underlying merits of the research.
  2. When one side has a lot of money (such as, perhaps, proceeds from a certification business), and a plan to use “research” results as a sales tool to make a lot more money, they can invest in “research” that yields promotable results. The work doesn’t have to be competent (see #1). It just has to support a conclusion that fits with the sales pitch.
  3. When the other side doesn’t have a lot of money, when the other side are mainly practitioners (not much time or training to do the research), and when competent research costs a great deal more than trash (see #2 and #5), the debates are likely to be one-sided. One side has “evidence” and if the other side objects, well, if they think the “evidence” is so bad,  they should raise a bunch of money and donate a bunch of time to prove it. It’s an opportunity for well-funded con artists to take control of the (apparent) high road. They can spew impressive-looking trash at a rate that cannot possibly be countered by their critics.
  4. It is easy for someone to do “research” as a basis for rebranding and reselling someone else’s ideas. Thus, someone who has never had an original thought in his life can be promoted as the “leading expert” on X by publishing a few superficial studies of it.  A certain amount of this goes on already in our field, but largely as idiosyncratic misbehavior by individuals. There is a larger threat. If a training organization will make more money (influence more standards, get its products mandated by more suckers) if its products and services have the support of “the experts”, but many of “the experts” are inconveniently critical, there is great marketing value in a vehicle for papering over the old experts with new-improved experts who have done impressive-looking research that gives “evidence-based” backing to whatever the training organization is selling. Over time, of course, this kind of plagiarism kills innovation by bankrupting the innovators. For companies that see innovation as a threat, however, that’s a benefit, not a problem. (For readers who are wondering whether I am making a specific allegation about any person or organization, I am not. This is merely a hypothetical risk in an academic’s long list of hypothetical risks, for you to think about  in your spare time.)
  5. In education, we face a classic qualitative-versus-quantitative tradeoff. We can easily measure how many questions someone gets right or wrong on simplistic tests. We can’t so easily measure how deep an understanding someone has of a set of related concepts or how well they can apply them. The deeper knowledge is usually what we want to achieve, but it takes much more time and much more money and much more research planning to measure it. So instead, we often substitute the simplistic metrics for the qualitative studies. Sadly, when we drive our programs by those simplistic metrics, we optimize to them and we gradually teach to the superficial and abandon the depth. Many of us in the teaching community in the United States believe that over the past few years, this has had a serious negative impact on the quality of the public educational system and that this poses a grave threat to our long-term national competitiveness.

Most computer science programs treat system-level software testing as unfit for the classroom.

I think that software testing can have great value, that it can be very important, and that a good curriculum should have an emphasis on skilled software testing. But the popular mix of ritual, tedium, and moralizing that has been passed off by some people as testing for decades has little to offer our field, and even less for university instruction. I think ISTQB has been masterful at selling that mix. It is easy to learn and easy to certify. I’m sure that a new emphasis, “New! Improved! Now with Evidence!” could market the mix even better. Just as worthless, but with even better packaging.