July « 2016 « Cem Kaner, J.D., Ph.D.

Archive for July, 2016

Updating BBST to Version 4.0

Friday, July 15th, 2016

Becky Fiedler and I are designing the next generation of BBST. We’ll soon start the implementation of BBST-Foundations 4.0. This post is the first of a series soliciting design advice for BBST.

We’re looking for comments, including criticism and alternative suggestions: We are revising the core topics of BBST-Foundations. This post lays out our current thinking. Some parts are undoubtedly wrong.
We’re looking for recommendation-worthy readings. We point students to reading material. Some is required reading (we test the students on it, and/or make them apply it). Other readings are listed as recommended reading or pointed to in individual slides as references. We’re happy to point to good blog posts, white papers, magazine articles and conference talks, along with the usual books and formal papers. Our criteria are simple:
1. Is this relevant?
2. Is it written at a level that our students can understand?
3. Will they learn useful things from it?
We probably can’t include all good suggestions, but we do want to point students to a broader set of references, from a broader group of people, than Foundations 3.0.

This is a long article. I split it into 5 sections (separate posts) because you can’t read it all in one sitting. The other parts are at:

Background: What is BBST

Some readers of this post won’t be familiar with BBST. Here’s a bit of background. If you are already familiar with BBST, there’s nothing new here. Skip to the next section.

BBST is short for Black Box Software Testing. “Black box” is an old engineering term. When you analyze something as a “black box”, you look at its interaction with the world. You understand it in terms of the actions it initiates and the responses it makes to inputs. The value of studying something as a black box is the focus. You don’t get distracted by the implementation. You focus on the appropriateness and value of the behavior.

Hung Quoc Nguyen and I developed the first version of BBST right after we published Testing Computer Software 2.0 in 1993. (The 1993 and 1999 versions are the same book, different publishers.) The course became commercially successful. I cotaught the course with several people. Each of them helped me expand my coverage, deepen my treatment, improve my instructional style, and update my thinking. So did the Los Altos Workshops on Software Testing, which Brian Lawrence, Drew Pritsker and I cofounded, and the spinoffs of that, especially AWTA (Pettichord’s Austin Workshops on Software Testing), STMR (my workshops on the Software Test Managers Roundtable), WTST (Fiedler’s and my Workshops on Teaching Software Testing), and the much smaller set of WHETs (Bach’s and my Workshops on Heuristic & Exploratory Techniques).

Fifteen years ago, Bach and I were actively promoting the context-driven approach to software testing together. As part of our co-branding, we started signing BBST as Kaner-and-Bach and he let me incorporate some materials from RST into BBST. Our actual course development paths remained largely independent. Bach worked primarily on RST, while I worked primarily on BBST.

Florida Institute of Technology recruited me to teach software engineering, in 2000. In that setting, Fiedler and I wrote grant proposals and got funding from NSF to expand BBST and turn it into an online course. The “flipped classroom” (lectures on video, activities in the classroom) was invented independently and simultaneously by several instructional researchers/developers. Fiedler and I (with BBST 1.0) were among this earliest publishers within this group (see here and here and here).

We evolved BBST to 3.0 through a collaboration with the Association for Software Testing, and with several individual colleagues (especially, I think, Scott Barber, Doug Hoffman, and John McConda, but with significant guidance from many others including James Bach, Ajay Bhagwat, Bernie Berger, Rex Black, Michael Bolton, Tim Coulter, Jack Falk, Sabrina Fay, Elisabeth Hendrickson, Alan Jorgenson, Pat McGee, Hung Nguyen, Andy Tinkham — please see the opening slides in the Foundations course. It’s a long list.) For more on the history, including parts of the key NSF grant proposal and many other links, please see Evolution of the BBST Courses.

There are 4 BBST courses:

BBST Foundations, signed by Kaner & Bach. Course description here. Courseware available here, under a Creative Commons (free use/reuse) license. Course Workbook by Kaner & Fiedler (not free) here.
BBST Bug Advocacy, signed by Kaner & Bach. Course description here. Courseware available here, under a Creative Commons (free use/reuse) license. Course Workbook by Kaner & Fiedler (not free) here.
BBST Test Design, signed by Kaner & Fiedler. Course description here. Courseware available here, under a Creative Commons (free use/reuse) license. Course Workbook by Kaner & Fiedler coming very soon.
BBST Domain Testing, signed by Kaner & Fiedler. Course description here. Courseware not publicly available. Course Textbook by Kaner, Hoffman & Padmanabhan (not free) here.

We teach skills along with content. Along with testing skills and knowledge, we teach learning skills, communication skills and computing fundamentals.

To see how we analyze the depth of instruction for a part of the course, see BBST Courses: BBST Learning Objectives. To see how we balance the coverage of learning objectives across the courses, see Learning Objectives of the BBST Courses and then see Emphasis & Objectives of the Test Design Course.

Posted in BBST, education | Comments Off on Updating BBST to Version 4.0

Updating to BBST 4.0: Differences Between the Core BBST Courses and Domain Testing

Friday, July 15th, 2016

This is the second section of my post on BBST 4.0. The other parts are at:

Differences Between the Core BBST Courses and Domain Testing

Domain Testing focuses on one testing technique (domain testing), which is believed to be the most popular test technique in the known universe. (We hope to develop some additional one-technique-focused courses, but this one had to come first.) In this course, we help students go beyond awareness and basic understanding of the technique, to develop skill with it. Because it is more in-depth, and more focused on doing the technique well than on talking about it, the course structure had to change a bit.

Our most important change was the introduction of application videos, in which we demonstrate parts of the technique on real programs. More details on this below… I mention it here because this is the primary idea from Domain Testing that will come into BBST 4.0.

Structure of the Core BBST Courses

The original instructor-led BBST courses are organized around six lectures (about six hours of talk, divided into one-hour parts), with a collection of activities. To successfully complete a typical instructor-led course, a student spends about 12-15 hours per week for 4 weeks (48-60 hours total).

Most of the course time is spent on the activities:

Orientation activities introduce students to a key challenge considered in a lecture. The student puzzles through the activity for 30 to 90 minutes, typically before watching the lecture, then sees how the lecture approaches this type of problem. The typical Foundations course has two to four of these.
Application activities call for two to six hours of work. It applies ideas or techniques presented in a lecture or developed over several lectures. The typical Foundations course has one to two of these.
Multiple-choice quizzes help students identify gaps in their knowledge or understanding of key concepts in the course. These questions are tough because they are designed to be instructional or diagnostic (to teach you something, to deepen your knowledge of something, or to help you recognize that you don’t understand something) rather than to fairly grade you.
Various other discussions that help the students get to know each other better, chew on the course’s multiple-choice quiz questions, or consider other topics of current interest.
An essay-style final exam.
Interactive Grading. An interactive grading session is a one-on-one meeting, student with instructor, that lasts 1-to-2 hours (occasionally, 3 hours, ouch) and is focused on a specific exam or major assignment. In Foundations, we work through the exam. In Bug Advocacy, we focus on a student’s analysis of a bug report. In Test Design, we focus on the second course assignment, which applies a combination of risk-based testing and domain testing to OpenOffice.

Structure of Domain Testing

The Domain Testing course breaks from the BBST model in four ways:

Focus On One Technique: The Domain Testing course teaches one (1) technique. Our goal is to help students develop practical skill with this technique.
The workbook is a textbook on domain testing, not a course support book.
Application Videos: Along with the core lecture videos, we provide supplementary videos that show how we apply the technique to real software products. This is the primary instructional innovation in this course, and students have praised it highly. Each lesson comes with a set of application videos. Each video demonstrates the technique on one product. We currently work with 4 very different products: a financial application, a complex embedded software system (driving a high-end sewing machine), a graphic design program and a music composition program. As we teach a new aspect of domain testing, students can choose to watch how we apply it to any one (or more) of these programs. Then they apply it to the course’s software under test (currently, GnuCash, which is like an open source QuickBooks.)
Capstone Project Instead of an Exam: Because our focus is on skill development, we skip the multiple-choice questions and the exam. Students practiced the pieces of the task in the first three weeks. In the Project, they put it all together. We interactively grade the final project.

Posted in BBST, education | Comments Off on Updating to BBST 4.0: Differences Between the Core BBST Courses and Domain Testing

Updating to BBST 4.0: Learning Objectives and Structure of Foundations 3.0 (2010)

Friday, July 15th, 2016

This is the third section of my post on BBST 4.0. The other parts are at:

Learning Objectives and Structure of Foundations 3.0 (2010)

Most courses present content, help students develop skills, and foster student attitudes. The typical course identifies its central topic area (such as software testing or Java programming) and focuses on that, but also presents other content, skills and attitudes that are not directly on-topic but support the student who studies this area. For example, we see communication skills as important for software testers.

In Foundations 2.0 and 3.0, we decided that the central testing content should be the five most important issues in software testing. In our view, these were:

Information objectives drive the testing mission and strategy (Why do we test? What are we trying to learn and how are we trying to learn it?)
Oracles are heuristic (How can we know whether a program passed or failed a test?)
Coverage is a multidimensional concept (How can we determine how much testing has been done?)
Complete testing is impossible (Are we done yet?)
Measurement is important, but hard (How much testing have we completed and how well have we done it?)

Along the way, to teach these, we have to present definitions and foundational information.

Along with the central topic, we worked on a key set of supporting objectives:

Help students develop learning skills that are effective in the online-course context. Most professional-development courses are easy. Even if they present difficult material, they give the students feel-good activities that help the student feel as though they understand the material or can do the basic task, whether they actually learned it or not. This is good marketing (the students walk away feeling successful and happy) but we don’t think it’s good education. We prefer to give students tasks that are more challenging, that helps them understand how well they have learned the content and how well they can apply the skills. Most of our professional students finished their formal schooling years ago and so they are out of practice with academic skills. We have to help them rebuild their skills with effective reading, taking exams, writing assignments, providing peer reviews and coping with critical feedback from peers and instructors.
Foster the attitude that our assessments are helpfully hard without being risky
To be instructionally effective, assessments (exams, quizzes, labs, projects, writing assignments, etc.) should be hard enough that students have to stretch a bit in order to do them well, but not too hard or students will give up.Students come into Foundations with vastly different backgrounds. This is not like a third-year math course, in which the instructor knows that every student has two successful years of math behind them. Our hardest assessment-design goal is to create activities that are challenging for experienced professionals while still being motivating and instructionally useful for newcomers to the field.One of the course-design decisions that comes out of an intention to serve a very diverse group of students is that the assessments should be challenging, feedback should be honest but pass-fail decisions should be generous. Honest feedback sometimes requires direct and detailed criticism. Our goal is to deliver the criticism in a kind way. We are trying to teach these students, not demoralize them. While this is not possible with every student, the goal is to write in a way that helps them understand what is meant, rather than making them so defensive or embarrassed that they reject it.
- Our intention is that most students who take Foundations and try reasonably hard should “pass” the course. Most of the people who don’t pass the course should be people who abandon it, often because they had to switch their focus to some crisis at work. Someone who has to shift focus (and stop working on BBST) won’t pass the course, but we don’t want to say they “failed” it. That’s not what happened. They didn’t fail. They just didn’t finish. So rather than saying that someone “passed” a BBST course, we say they completed it. Rather than say they “failed” our course, we say that they didn’t complete it.
- Some students stay in the course to the end but don’t do the work, or do work that is really bad, or submit work that isn’t theirs (otherwise known as “cheating.”) These students don’t complete the course.
- Our standard, for students who make an honest effort, is this: Bug Advocacy and Test Design are a little harder than Foundations. If, at the end of the course, the instructor believes that a student has no hope of succeeding in the next class, we won’t tell them they “completed” Foundations.
- Most students who stay in the course do get credit for completing the course even if their work is not up to the average level of the class. They get honest feedback on each piece of work, but if they make an honest effort, we want to give them a “pass.”

Bug Advocacy and Test Design emphasize other supporting objectives, such as persuasive technical communication, time management and prioritization, and organization of complex, multidimensional datasets with simplifying themes. However, for students to have any hope with these, they need the basic learning skills that we try to help them with in Foundations.

Posted in BBST, education | Comments Off on Updating to BBST 4.0: Learning Objectives and Structure of Foundations 3.0 (2010)

Updating to BBST 4.0: What Should Change

Friday, July 15th, 2016

This is the 4th section of my post on BBST 4.0. The other parts are at:

What We Think Should Change

We are satisfied with the course standards and the supporting objectives (teaching students better learning skills). We want to make it easier for students to succeed by improving how we teach the course, but our underlying criteria for success aren’t going to change.

In contrast, we are not satisfied with the central content. Here are my notes:

1. Information objectives drive the testing mission and strategy

When you test a product, you have a defining objective. What are you trying to learn about the product? We’ll call this your information objective. Your mission is to achieve your information objective. So the information objective is more about what you want to know and the mission is more about what you will do. Your strategy describes your overall plan for actually doing/achieving it.

This is where we introduce context-driven testing. For example, sometimes testers will organize their work around bug-hunting but other times they will organize around getting an exciting product into the market as quickly as possible and making it improvable as efficiently as possible. The missions differ here, and so do the strategies for achieving them.

Critique:

The distinction between information objective and mission is too fine-grained. It led to unhelpful discussions and questions. We’re going to merge the two concepts into “mission.”
The presentation of context got buried in a mass of other details. We gave students an assignment, in which we described several contexts and they had to discuss how testing would differ in them. This was too hard for too many students. We have to provide a better foundation in the lecture and reading.
We must present more contexts and/or characterize them more explicitly. Back when we created BBST 3.0, we treated some development approaches as ideas rather than as contexts. The world has moved on, and what was, to some degree, hypothetical or experimental, has become established. For example:
- In BBST 3.0, when we described a testing role that would fit in an agile organization, we avoided agile-development terminology (for reasons that no longer matter). In retrospect, that decision was badly outdated when we made it. Several different contexts grew out of the Agile Manifesto. Books like Crispin & Gregory illustrate the culture of testing in those situations. In BBST 4.0, we will treat this as a normal context
- Similarly, a very rapid development process is common in which you ship an early version, polish aggressively based on customer feedback, and make ongoing incremental improvements rapidly. Whether we think this approach is wonderful or not, we should recognize it as a common context. A context-respecting tester who works for, or consults to, a company that develops software this way is going to have to figure out how to provide valuable testing services that fit within this model.
Context-driven testing isn’t for everyone. Context-driven testing requires the tester to adapt to the context—or leave. There are career implications here, but we didn’t talk much about them in BBST 3.0. The career issues were often visible in the classes, but there was no place in the course to work on them. This section (mission drives strategy) isn’t the place for them, but the course needs a place for career discussion.

Tentative decisions:

Continue with this as the course opener.
Merge information objectives and mission into one concept (mission) which drives strategy.
Tighten up on the set of definitions. Definitions / distinctions that aren’t directly applicable to the course’s main concepts (such as the distinction between black box testing and behavioral testing) must go away.
Create a separate, well-focused treatment of career paths in software testing. (This will become a supplementary video, probably presented in the lesson that addresses test automation.)

2. Oracles are heuristic

An oracle is a mechanism for determining whether a program passed or failed a test. In 1980, Elaine Weyuker pointed out that perfect oracles rarely exist and that in practice, we rely on partial oracles. Weyuker’s work wasn’t noticed by most of us in the testing community. The insight didn’t take hold until Doug Hoffman rediscovered it, presenting his Taxonomy of Test Oracles at Quality Week and then at an all-day session at the Los Altos Workshop on Software Testing (LAWST 5).

Foundations presented oracles from two conflicting perspectives, (a) Hoffman’s (and mine) and (b) James Bach & Michael Bolton’s.

We started with an example from Bach’s Rapid Software Testing course, then presented the Bach and Bolton’s “heuristic oracles” (or “oracle heuristics”). A heuristic is a fallible but reasonable and useful decision rule. The heuristic aspect of a heuristic oracle is the idea that the behavior of software should usually (but not always) be consistent with a reasonable expectation. Bach developed a list of reasonable expectations, such as the expectation that the current version of the software will behave similarly to a previous version. This is usually correct, but sometimes it is wrong because of design improvement. Thus it is a heuristic. After the explanation, students worked through a challenging group assignment.
Next, we presented Hoffman’s list of partial oracles and mentioned that these are useful supports for automated testing. We gave students some required readings, plus quiz questions and exam study guide questions but no assignment.

Critique:

This material worked well for RST. It did not work well in BBST. I published a critique: The Oracle Problem and the Teaching of Software Testing. and a more detailed analysis in the Foundations of Software Testing workbook. Here are some of my concerns:

The terminology created at least as much confusion as insight. When we tested them, students repeatedly confused the meaning of the word “oracles” and the meaning of the word “heuristics.” Many students treated the words as equivalent and made statements like “all heuristics are oracles” (which is, of course, not true).
The terminology is redundant and uninformative. Saying “heuristic oracle” is like saying “testly test.” The descriptor (heuristic, or testly) adds no new information. The word heuristic has been severely overused in software testing. A fallible decision rule is a heuristic. A cognitive tool that won’t always achieve the desired result is a heuristic. A choice to use a technological tool that won’t always achieve the desired result is a heuristic. Every decision testers make is rooted in heuristics because all of our decisions are made under uncertainty. Every tool we use can be called a heuristic because they are all imperfect and, given the impossibility of complete testing and the impossibility of perfect oracles, every tool we will ever use will be imperfect. This isn’t unique to software testing. Long before we started talking about this in software testing, Billy V. Koen’s award-winning introductions to engineering (often taught in general engineering courses) pointed this out that engineering reasoning and methods are rooted in heuristics. See Koen’s work here (my favorite presentation) and here and here. There is nothing more heuristic-like about oracles than there is about any other aspect of test design, or engineering in general.
Heuristic is a magic word. Magic words provide a name for something while relieving you of the need to think further about it. The core issue with oracles is not that they are fallible. The core issue is that they are incomplete.
- The nature of incompleteness: An oracle will focus a test on one aspect of the software under test (or a few aspects, but just a few). Testers (and automated tests) won’t notice that the program behaves well relative to those aspects but misbehaves in other ways. Therefore, you can notice a failure if it is one that you’re looking for, but you never know whether the program actually passed a test. You merely learn that it didn’t fail the test.
- Human observers don’t eliminate this incompleteness. If you give a test to a human observer, with a well-defined oracle, they are likely to behave like a machine. They pay attention to what the oracle steers them to and don’t notice anything else. The phenomenon of not noticing anything else has been studied formally (inattentional blindness). If you don’t steer the observer with an oracle, you can get more diversity. If multiple people do the testing, different people will probably pay attention to different things, so across a group of people you will probably see greater coverage of the variety of risks. However, each individual is a finite-capacity processor. They have limited attention spans. They can only pay attention to a few things. When people split their attention, they become less effective in each task. An individual observer can introduce variation by paying attention to different potential problems in different runs of the same test.
- I don’t think you achieve effective oracle diversity by choosing to “explore” (another magic word, though one that I am more fond of). I think you achieve it by deliberately focusing on some things today that you know you did not focus on yesterday. We can do that systematically by intentionally adopting different oracles at different times. Thinking of it this way, oracle diversity is at least as much a matter of disciplined test design as it is a basis for exploration. (I learned this, and its importance for the design of suites of automated tests, from Doug Hoffman.)
- No aspect of the word “heuristic” takes us into these details. The word is a distraction from the question: How can we compensate for inevitable incompleteness, (a) when we work with automated test execution and evaluation and (b) when we work with human observers?
There are two uses of oracles. We emphasized the one that is wrong for Foundations:
1. The test-design oracle: An oracle is an essential part of the design of a test, or of a suite of tests. This is especially important in the design of automated tests. The oracle defines what problems the tests can see (and what problems they are blind to). Anyone who is learning how tests are designed should be learning about building suites of imperfect tests using partial oracles.
2. The tester-expectations oracle: An oracle can be a useful component of a bug report. When you find a bug, you usually have to tell someone about it, and the telling often includes an explanation of why you think this behavior is wrong, and how important the wrongness is. For example, suppose you say, “this is wrong because it’s different from how it used to be.” You are relying on an expectation here and treating it as an oracle. The expectation is that the program’s behavior should stay the same unless the program is intentionally redesigned.
James Bach originally developed his list of tester expectations as a catalog of patterns of explanations that testers gave when they were asked to explain their bug reports. He recharacterized them as oracle heuristics (or heuristic oracles) years later. We think these are useful patterns, and when we create Bug Advocacy 4.0, we will probably include the ideas there (but maybe without the fancy oracular heuristical vocabulary).
In Foundations, Bach and Bolton’s excellent presentation of tester-expectation oracles drowned out most students’ awareness of test-design oracles. That is, what we see on Foundations 3.0 exams is that many students forget, or completely ignore, the course’s treatment of partial oracles, remembering only the tester-expectation oracles.

Tentative decisions:

All of the “heuristic” discussion of oracles is going away from Foundations.
We will probably introduce the notion of heuristics somewhere in the BBST series. There is value in teaching people that testers’ reasoning and decisions are rooted in heuristics, and that this is normal in all applied sciences. However, we will teach it as its own topic. It creates a distraction when pasted onto oracles.
We will tie the idea of partial oracles much more tightly to the design of suites of automated tests. We will focus on the problem that all oracles are incomplete, then transition into the questions:
- What are you looking for with this (test or) series of tests?
- What are these tests blind to?
- How will you look for those other potential problems?
- Can you detect these potential problems with code?
- If not, can you train people to look for them?
- Which problems are low-priority enough, or low-likelihood enough, that you should choose to run general exploratory tests and rely on luck and hope to help you stumble across them if they are there?
Years ago, many test tool vendors (and many consultants) promised complete test automation from tools that were laughably inadequate for that purpose. These people promised “complete testing” that would replace all (or almost all) of your expensive manual testers. Several of us (including me) argued that these tools were weaker than promised. We pointed out that the tools provided a very incomplete look at the behavior of the software and even that often came with an excessively-high maintenance cost. These discussions were often adversarial and unpleasant. Sometimes very unpleasant. Arguments like these leave mental scars. I think these old scars are at the root of some of the test-automation skepticism that we hear from some of our leading old-timers (and their students). I think BBST can present a much more constructive view of current automation-support technology by starting from the perspective of partial oracles.
Knott’s book on Mobile Application Testing provides an excellent overview of the variety of test-automation technologies, contrasting them in terms of the basis they use for evaluating the software (e.g. comparison to a captured screen) and the costs and benefits of that basis. These are all partial oracles. They come with their own costs, including tool cost, implementation difficulty, and maintenance cost and they come with their own limitations (what you can notice with this type of oracle and what you are blind to). I think this presentation of Knott’s, combined with updated material from Hoffman will probably become the core of the new oracle discussion.
I think this is where we’ll treat automation issues more generally, discussing Cohn’s test automation pyramid (see Fowler too) (the value of extensive unit testing and below-the-UI integration testing) and Knott’s inverted pyramid. If I spend the time on this that I expect to spend, then along with describing the pyramid and the inverted pyramid and summarizing the rationales, I expect to raise the following issues:
- For traditional applications (situations in which we expect the pyramid to apply), I think the pyramid model substantially underestimates the need for end-to-end testing. For example:
  - Performance testing is critical for many types of applications. To do this well, you need a lot of automated, end-to-end tests that model the behavior patterns of different categories of users.
  - Some bugs are almost impossible to discover with unit-level or service-level automated tests or with manual end-to-end tests or with many types of automated end-to-end tests. In my experience, wild pointers, stack overflows, race conditions and other timing problems seem to be most effectively found by high-volume automated system-level tests.
  - Security testing seems to require a combination of skilled manual attacks and routinized tests for standard vulnerabilities and high-volume automated probes.
  - As we get better at these three types of automated system-level tests, I suspect that we will develop better technology for automated functional tests.
- For mobile apps, Knott makes a persuasive case that the pyramid has to be inverted (more system-level testing). This is because of an explosion of configuration diversity (significant variations of hardware and system software across phones) and because some tasks are location-dependent, time-dependent, or connection-dependent.
  - I think there is some value in remembering that problems like this happened in the Windows and Unix worlds too. In DOS/Windows/Apple, we had to test application compatibility with printers printer-by-printer in each application. Same for compatibility with video cards, keyboards, rodents, network interfaces, sound cards, fonts, disk formats, disk drive interfaces, etc. I’ve heard of similar problems in Unix, driven more by variation in system hardware and architecture than by peripherals. Gradually, in the Windows/Apple worlds, the operating systems expanded their reach and presented standardized interfaces to the applications. Claims of successful standardization were often highly exaggerated at first, but gradually, variances between printer models (etc.) have become much simpler, smaller testing issues. We should look at the weakly-managed device variation in mobile phones as a big testing problem today, that we will have to develop testing strategies to deal with, while recognizing that those strategies will become obsolete as the operating systems mature. I think the historical perspective is important because of the risk that the mobile testers of today will become the test managers of tomorrow. There is a risk that they will manage for the problems they matured with, even though those problems have largely been solved by better technology and better O/S design. I think I see that today, with managers of my generation, some of whom seem still worried about the problems of the 1980’s/1990’s. Today’s students need historical perspective so that they can anticipate the evolution of system risks and plan to evolve with them.
  - How does this apply to the inverted pyramid? I think that it is inverted today as a matter of necessity, but that 20 years from now, the same automation pyramid that applies to other applications will apply equally well to mobile. As the systems mature, the distortion of testing processes that we need to cope with immaturity will gradually fall away.
Rather than arguing about whether automation is desirable (it is) and whether it is often extremely useful (it is) and whether it is overhyped (it is) and whether simplistic strategies that won’t work are still being peddled to people not sophisticated enough to realize what trouble they’re getting into (they are), I want people to come out of Foundations with a few questions. Here’s my current list. I’m sure it will change.:
- What types of information will we learn when we use this tool in this way?
- What types of information are we missing and how much do we care about which types?
- What other tools (or manual methods) can we use to fill in the knowledge gaps that we’ve identified?
- What are the costs of using this technology?
- Every added type of information comes at a cost. Are we willing to invest in a systematic approach to discovering a type of information about the software and if so, what will the cost be of that systematic approach? If the systematic approach is not feasible (too hard or too expensive), what manual methods for discovering the information are available, how effective are they, how expensive are they and how thorough a look are we willing to pay for?

Please send us good pointers to articles / blog posts that Foundations students can read to learn more about these questions (and ways to answer them). The course itself will go over this lightly. To address this in more depth would require a whole course, rather than a one-hour lecture. Students who are interested in this material (as every wise student will be) will learn most of what they learn on the topic from the readings.

A supplementary video:

The emphasis on automation raises a different cluster of issues having to do with the student’s career path. There used to be a solid career for a person who knew basic black box testing techniques and had general business knowledge. This person tested at the black box level, doing and designing exploratory and/or scripted tests. I think this will gradually die as a career. We have to note the resurgence of pieceworking, a type of employment where the worker is paid by the completed task (the “piece”) rather than by the time spent doing the task. If you can divide a task into many small pieces of comparable difficulty, you can spread them out across many workers. In general, I think pieceworking provides low incomes for people who are doing tasks that don’t require rare skills. Take a look at Soylent, “an add-in to Microsoft Word that uses crowd contributions to perform interactive document shortening, proofreading, and human-language macros.” Look over their site and read their paper. Think forward ten years. How much basic, system-level testing will be split out as piecework rather than done in-house or outsourced?

I’m not advocating for this change. I’m looking at a social context (that’s what context-driven people do) and saying that it looks like this is coming down the road, whether we like it or not.

There is already a big pay differential between traditional black box testers and testers who write code as part of their job. At Florida Tech, I see a big gap in starting salaries of our students. Students who go into traditional black box roles get much lower offers. As pieceworking gets more popular, I think the gap will grow wider. I think that students who are training today for a job as a traditional black-box tester are training for a type of job that will vanish or will stop paying a middle-class wage for North American workers. I think that introductory courses in software testing have a responsibility to caution students that they need to expand their skills if they want a satisfactory career in testing.

Some people will challenge my analysis. Certainly, I could be wrong. As Rob Lambert pointed out recently, people have been talking about the demise of generalist, black-box testers for years and years and they haven’t gone away yet. And certainly, for years to come, I believe there will be room for a few experts who thrive as consultants and a few senior testers who focus strictly on testing but are high-skill designers/planners. Those roles will exist, but how many?

Given my analysis (which could be wrong), I think it is the responsibility of teachers of introductory testing courses to caution students about the risk of working in traditional testing and the need to develop additional skills that can market well with an interest in testing.

Combining testing and programming skill is the obvious path and the one that probably opens the broadest set of doors.
Another path combines testing with deep business knowledge. If you know a lot about actuarial math and about the culture of actuaries, you might be able to provide very valuable services to companies that develop software for actuaries. However, your value for non-actuarial software might be limited.
For some types of products or services, you might add unusual value if you have expertise in human factors, or in accessibility, or in statistical process control, or in physics. Again, you offer unusual value for companies that need your combination of skills but perhaps only generic value for companies that don’t need your specific skills.
I don’t think that a quick, informed-amateur-level survey of popular topics in cognitive psychology will provide the kind of skills or the depth of skills that I have in mind.

It’s up to the student to figure out their career path, but it’s up to us to tell them that a career that has been solid for 50 years is changing character in ways they have to prepare for.

I’ve been trying to figure out how to focus a Lesson on this, or how to insert this into one of the other Lessons. I don’t think it will fit. There aren’t enough available hours in a 4-week online course. However, just as we gave application videos to students in the Domain Testing course (and some students watched only one video per Lesson while others watched several), I think I can create a watch-this-if-you-want-to video on career path that accompanies the lecture that addresses test automation.

3. Coverage is a multidimensional concept

My primary goal in teaching “coverage” was to open students’ eyes to the many different ways that they can measure coverage. Yes, there are structural coverage measures (statement coverage, branch coverage, subpath coverage, multicondition coverage, etc.) but there are also many black-box measures. For example, if you have a requirements specification, what percentage of the requirements items have you tested the program against?

Coverage is a measure, normally expressed as a percentage. There is a set of things that you could test: all the lines of code, or all the relevant flavors of smartphones’ operating systems, or all the visible features, or all the published claims about the program, etc. Pick your set. This is your population of possible test targets of a certain kind (target: thing you can test). The size of your set (the number of possible test targets) is your denominator. Count how many you’ve actually tested: that’s your numerator. Do the division and you get the proportion of possible tests of this kind that you have actually run. That’s your coverage measure. Statement coverage is the percentage of statements you’ve tested. Branch coverage is the percentage of branches you’ve tested. Visible-feature coverage is the percentage of visible features that you’ve tested thoroughly enough to say, “I’ve tested that one.”

Complete coverage is 100% coverage of all of the possible tests. The program is completely tested if there cannot be any undiscovered bugs (because you’ve run all the possible tests). For nontrivial programs, complete testing is impossible because the population of possible tests is infinite. So you can’t have complete coverage. You can only have partial coverage. And at that point, we teach you to subdivide the world into types of test targets and to think in terms of many test coverages. My main paper on testing coverage lists 101 different types of coverage. There are many more. Maybe you want to achieve 95% statement coverage and 2% mouse coverage (how many different rodents do you need to test?) and 100% visible feature coverage and so on. Evaluating the relative significance of the different types of coverage gives you a way to organize and prioritize your testing.

I had generally positive results teaching this in Foundations 1.0 and 2.0, but it was clear that some students didn’t understand programming concepts well enough to understand what the structural coverage measures actually were or how they might go about collecting the data. Around 2008, I realized that many of my university students (computer science majors) had this problem too. They had studied control structures, of course, but not in a way that they could easily apply to coverage measurement. In addition, they (and my non-programmer practitioner students) had repeatedly shown confusion about how computers represent data, how rounding error comes about in floating point calculations, why rounding error is inherent in floating point calculations, etc. These confusions had several impacts. For example some students couldn’t fathom overflows (including buffer overflows). Many students insisted that normal floating-point rounding errors were bugs. I had seen mistakes like these in real-life bug reports that damaged the credibility of the bug reporter. So, we decided that for Foundations 3.0, we would add basic information about programming and data representation to the course.

Critique:

The programming material was very helpful for some students. For university students, it built a bridge between their programming knowledge and their testing lessons. Confusions that I had seen in Foundations 2.0 simply went away. For non-programmer professional students, the results were more mixed. Some students gained a new dimension of knowledge. Other students learned the material well enough to be tested on it but considered it irrelevant and probably never used it. And some students learned very little from this material, whined relentlessly and were irate that they had to learn about code in a course on black box testing.
The programming material overshadowed other material in Lessons 4 and 5. Some students paid so much attention to this material that they learned it adequately, but missed just about everything else. Some students dropped the course out of fear of the material or because they couldn’t find enough time to work through this material at their pace.
Many students never understood that coverage is a measurement. Instead, they would insist that “statement coverage” means all statements have been tested, and generally, that X coverage means that all X’s have been tested. I think we could have done a much better job with this if we had more time.
The lecture also addressed the risk of driving testing to achieve high coverage of a specific kind. I taught Marick’s How to Misuse Code Coverage, which describes the measurement dysfunction that he saw in organizations that pushed their testing to achieve very high statement/branch coverage. The students handled this material well on exams. The programming instruction probably helped with this.

Tentative decisions:

Return to a lecture that emphasizes the measurement aspects of coverage, the multidimensional nature of coverage, and the risks of driving testing to achieve high coverage (rather than high information).
Reduce the programming instruction to an absolute minimum. We need to teach a little bit about control structures in order to explain structural coverage, but we’ll do it briefly. This will create the same problems for students as we had in Foundations 1.0 and 2.0.
Create a series of supplementary videos that describe data representation and control structures. I’ll probably use the same lecture material in my Introduction to Programming in Java (Computer Science 1001). We’ll encourage students to watch the videos, probably offer a quiz on the material to help them learn it, maybe offer a bonus question on the exam, but won’t require the students to look at it.

4. Complete testing is impossible

This lecture teaches students that it is impossible to completely test a program. In the course of this instruction, we teach some basic combinatorics (showing students how to calculation the number of possible tests of certain kinds) and expose them again, in a new way, to the wide variety of types of test targets (things you can test) and the difficulty of testing all of them. We teach the basics of data flows, the basics of path testing, and provide a real-life example of the kinds of bugs you can miss if you don’t do extensive path testing.

Critique:

Overall, I think this material was OK in Foundations 3.0.
The lecture introduces a strong example of a real-life application of high-volume automated testing but provides only the briefest introduction to the concept of high-volume automated testing. The study guide (list of potential exam questions) included questions on high-volume automated testing but we were reluctant to ask any of them because we treated the material too lightly.

Tentative decisions:

Cover essentially the same material.
Integrate more tightly with the preceding material on the many types of coverage.
Add a little more on high-volume automated testing.

5. Measurement is important, but difficult

The lecture introduces students to the basics of measurement theory, to the critical importance of construct validity, to the prevalent use of surrogate measures and the difficulties associated with this, and to measurement dysfunction and distortion. We illustrate measurement dysfunction with bug-count metrics.

Critique:

Overall, the section was reasonably effective in conveying what it was designed to convey, but that design was flawed.
The lecture exuded mistrust of bad metrics. It provided no positive inspiration for doing good measurement and very little constructive guidance. Back in 2001, Pettichord and Bach and I wrote, “Metrics that are not valid are dangerous.” We inscribed the words in Lessons Learned in Software Testing and in the statement of the Context-Driven Testing Principles.
- Back in the 1990’s (and still in 2001), we didn’t have much respect for the metrics work that had been done and we thought that bright people could easily make a lot of progress toward developing valid, useful metrics. We were wrong. The next dozen years taught me some humility. The context-driven testing community (and the broader software engineering community) made remarkably little progress on improving software-related measurement. Some of us (including me) have made some unsuccessful efforts. I don’t see any hint of a breakthrough on the horizon. Rather, I see more consultants who simply discourage clients and students from taking measurements or who present oversimplified descriptions of qualitative measurement that don’t convey the difficulties and the cost of doing qualitative measurement well. This stuff plays well at conferences but it’s not for me.
  - I am against bad measurement (and a lot of terrible measurement is being advocated in our field, and students need some help recognizing it). But I am not against measurement. Back in the 1990s and early 2000s, I discouraged people from taking measurement, because I thought a better set of metrics was around the corner. It’s not. Eventually, my very wise friend, Hung Quoc Nguyen counseled me that it was time for me to rethink my position. He argued that the need for data to guide management is still with us — that if we don’t have better measures available, we need to find better ways to work with what we’ve got. I have come to agree with his analysis.
  - Many people in our field are required to provide metrics to their management. We can help them improve what they provide. We can help them recognize problems and suggest better alternatives. But we aren’t helping them if we say that providing metrics is unethical and they need to say refuse to do it, even if that costs them their jobs. I wrote about this in Contexts differ: Recognizing the difference between wrong and Wrong and then in Metrics, Ethics, & Context-Driven Testing (Part 2).
  - Becky Fiedler and I have used qualitative approaches several times. We’re enthusiastic about them. See our paper on Putting the Context in Context-Driven Testing, for example. But we have no interest in misrepresenting the time and skill required for this kind of work or the limitations of it. This is a useful set of tools, but it is not a collection of silver bullets.

Tentative decisions:

We have to fundamentally rework this material. I will probably start from two places:
- In 2012, Nawwar Kabbani and I presented a long talk at CAST 2012 on Software Metrics: Threats to Validity.
- Becky and I also developed some nice tutorial material for my software metrics class at Florida Tech: On the Quality of Qualitative Measures.
I think these provide a better starting point for a presentation on metrics that:
- Introduces basic measurement theory
- Alerts people to the risks associated with surrogate measures, weak validity, and abusive use of metrics
- Explains measurement distortion and dysfunction
- Introduces basic qualitative measurement concepts
- Lays out the difficulties of doing good measurement, peeks at the state of metrics in some other fields (not better than ours), and presents a more positive view on ways to gain useful information from imperfect statistics.
The big challenge here, the enormous challenge, will be fitting this much content into a reasonably short lecture.

Posted in BBST, education, software metrics, testing | 1 Comment »

Updating to BBST 4.0: Financial Model & Concluding Thoughts

Friday, July 15th, 2016

This is the 5th section of my post on BBST 4.0. The other parts are at:

Financial Model

BBST 3.0 was developed with support from the National Science Foundation. That funding supported many student assistants and the logistics of the Workshops on Teaching Software Testing. It also paid for a portion (substantial, but well under half) of the time Becky and I spent on BBST development and the equipment and software we used to create it. We also received financial support from the Association for Software Testing and from Florida Institute of Technology. We all understood and agreed that with this level of public support, BBST should be open source.

We also had a vision of BBST becoming self-supporting, like Linux. Some of our thoughts are published here and here and here. That was an ambitious idea. I pursue many ambitious ideas. Some become reality, others don’t. This one didn’t.

BBST 3.0 will continue to be available under a Creative Commons License. We expect some organizations will continue teaching it. We think it’s still a good course, and we’re proud of it. To increase the lifespan for BBST 3.0, we spent an enormous amount of time over the past three years creating the BBST Workbooks. These update the assessment materials and some of the content significantly. The time we spent on BBST 3.0 Workbooks was opportunity cost for BBST 4.0. With limited time, we couldn’t start BBST 4.0 until now (now that the Test Design Workbook has gone to press).

For the future, BBST has to become a traditional commercial course. We just don’t see a way to create it or teach it without charging for it. Becky and I created Kaner, Fiedler & Associates, LLC to take over the maintenance and evolution of BBST. This is a business. It needs income, which we can then spend on instructors and on BBST.

We would very much appreciate your guidance for BBST 4.0, but please understand that the help you give us is going to a commercial project, not to an open source one.

Concluding Thoughts

The most difficult two challenges in instructional design are:

Assessment: Figuring out what the students understand, using activities (including exams) that help students learn more, and learn more deeply, during the assessment process
Focus: Limiting the scope of the course so that you can (a) fit the material in without cramming it in and (b) teach the material deeply enough that the students can get practical value from it.

Moving the programming content into supplementary instruction makes room for more content. The new content that we think is most important is a sympathetic survey of test automation, but with a frank presentation of the limitations of the common oracles. (We’re going to rely heavily on a great survey by Knott.) We can’t fit everything in that we want to include. Intimately connected to automation, in our view, is discussion of career path. We can’t fit both into the course, there’s just not enough time. Our tentative decision is to make the career path segment supplementary — we’re pretty confident that people who want this will get to it and will get what value they can from it. The automation material will be harder, which means we should support it with assessment.

We’d love to hear your thoughts on this. BBST Foundations / Bug Advocacy / Test Design took almost 6000 hours of our development time (not counting the time spent on the workbooks). Foundations 4.0 will take a lot of effort. We’d like to produce something that creates value for the profession. Your guidance will make that more likely.

Posted in BBST, education, testing | 2 Comments »

Cem Kaner, J.D., Ph.D.

Retired Professor of Software Engineering

Archive for July, 2016

Updating BBST to Version 4.0

Background: What is BBST

Updating to BBST 4.0: Differences Between the Core BBST Courses and Domain Testing

Differences Between the Core BBST Courses and Domain Testing

Structure of the Core BBST Courses

Structure of Domain Testing

Updating to BBST 4.0: Learning Objectives and Structure of Foundations 3.0 (2010)

Learning Objectives and Structure of Foundations 3.0 (2010)

Updating to BBST 4.0: What Should Change

What We Think Should Change

1. Information objectives drive the testing mission and strategy

Critique:

Tentative decisions:

2. Oracles are heuristic

Critique:

Tentative decisions:

A supplementary video:

3. Coverage is a multidimensional concept

Critique:

Tentative decisions:

4. Complete testing is impossible

Critique:

Tentative decisions:

5. Measurement is important, but difficult

Critique:

Tentative decisions:

Updating to BBST 4.0: Financial Model & Concluding Thoughts

Financial Model

Concluding Thoughts

Archives

Categories

Meta