Instructor’s Manual for the BBST Courses

September 13th, 2012

 

The BBST course series is open source — anyone can download the materials. Anyone can teach the courses.

Dr. Rebecca L. Fiedler designed the BBST Instructors Course to teach people who had taken BBST how to teach it. Over several years, we’ve been working on an Instructors’ Manual to support the course, publishing several drafts for review.

Dr. Fiedler, Doug Hoffman and I have finally finished the manual, a 357-page book. You can find it here, at the web page for the BBST Instructors Course. (Like BBST, the BBST instructors course materials are available to the world for free. Enjoy!)

As with all of the BBST work, we thank the National Science Foundation, for its support: grants EIA-0113539 ITR/SY+PE Improving the Education of Software Testers and CCLI-0717613 Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. (The views expressed in BBST and this blog reflect the opinions of the authors and not NSF.)

The Oracle Problem and the Teaching of Software Testing

September 11th, 2012

When I first studied testing, I learned that a test involved comparison of the test result to an expected result. The expected result was the oracle: the thing that would tell you whether the program passed or failed the test. We generalized this a bit, especially for automated testing–we would look to a reference program as our oracle. The reference program is a program that generates the expected results, rather than the results themselves. But the idea was the same: testing involved comparison with a known result.

The Oracle Problem

Oracle is an interesting choice of terminology, because the oracles of Greece (the original “oracles”) were mythological. And Greek tragedies are full of stories of people who misinterpreted what an oracle told them, and behaved (on the basis of their understanding) in ways that brought disaster on them.

If we define a software testing oracle as a tool that tells you whether the program passed your test, we are describing a myth–something that doesn’t exist. Relying on the oracle, you might make either of the classic mistakes of decision theory:

  • The miss: you believe the program has passed even though it did something wrong.
  • The false alarm: you believe the program has failed even though it has behaved appropriately.

So we soften the definition: a software testing oracle is a tool that helps you decide whether the program passed your test.

Seen this way, oracles are heuristic devices: they are useful tools that help us make decisions, but sometimes they point us to the wrong decision.

If you don’t have authoritative oracles (“authoritative” = an oracle that is always correct), then how can you test? How can you specify a test in a way that a junior tester or a computer can run the test and correctly tell you whether the program passed it?

The Instructional Problem

I’ve been emphasizing the oracle problem in my testing courses for about a dozen years. I see this as one of the defining problems of software testing: one of the reasons that skilled testing is a complex cognitive activity rather than a routine activity. Most of the time, I start my courses with a survey of the fundamental challenges of software testing, including an extended discussion of the oracle problem.

If you’ve seen the BBST-Foundations courses (for example, the Association for Software Testing teaches a version of this course), you’ve seen my introduction to the oracle problem and to heuristic oracles. Students typically work through one or more theoretical and/or practical labs in BBST. Once they understand the oracle problem, the course presents two approaches for using oracles:

  • One approach, that I associate with James Bach and Michael Bolton, lists 8 general types of expectations. For example, we expect a product to operate consistently across all of its features. We expect it to operate consistently from version to version, etc. (I’ll list the set of consistencies later.)
  • A different approach, that I associate with Doug Hoffman (but I think it’s been independently developed and followed by lots of people), lists specific heuristics. None of them is complete. Each focuses on a specific prediction about the results of a test and ignores other aspects of the test. For example, if we are testing a program that does calculations, checking whether it says 2+3=5 (5 is the expectation) is not complete, but it is useful. Testing whether we can invert an operation (take the square root of a square of a number) isn’t a complete test of a square-a-number function, but it is useful. (More examples below…)

Bach and Bolton have done a good job of explaining their approach. Mike Kelly provides a great summary of it. I follow an explanatory approach that I learned from an early version of Bach’s RST course in my lectures and it works well. Students understand the consistencies and (generally) find them compelling. The list feels complete–any specific oracle you can think of can be classified as an example of one of their consistencies.

I think there are three problems with Bach and Bolton’s consistencies:

  1. These provide a useful way to think about a bug after you find it and try to report it. The list of consistencies can structure your thinking as you try to figure out how to explain to someone else why a particular program behavior feels wrong (what feels wrong about it?). However, even though I do find them helpful for evaluating test results, I don’t find them helpful for designing tests.
  2. I think they are particularly worthless for designing automated tests. Automated testing depends on oracles–the automated-testing-program that runs a zillion tests has to decide whether the software under test passed each test or not. The consistencies don’t guide testers (not me, not the testers I know, not the students I teach) toward oracle ideas that are specific enough to be programmed and used by an automaton.
  3. In my courses, the consistencies capture my students’ imagination and interfere with their thinking about oracles that would be useful for designing tests, especially automated tests.

It is the third problem that I have been wrestling with for several years and that will cause me to rewrite the BBST-Foundations course.

In exam after exam, when I give students a specific scenario that clearly involves automated testing and ask them to suggest oracles they would use to support their automated testing and how they would use them, they ramble through a memorized list of consistency heuristics and don’t come up with ideas–some students don’t come up with any ideas–for oracles that support the automation.

  • I tried to correct this problem by raising the problem in supplementary lectures. It didn’t work.
  • I went even further, telling students that this was a classic problem in this course and they needed to answer questions about oracles in test design with specific oracles. It didn’t work.
  • I went even further, telling students as part of the exam question itself that this question called for specific ideas about the oracles they would design into specific tests and they shouldn’t rely on general descriptions of consistencies. It didn’t work.
  • Even when I give a set of exam questions to students in advance, and they draft an answer in advance with full benefit of time, course notes, lecture notes and videos, discussions with each other, and anything they can find on the web–even when these questions have cautionary notes about this being a question about test automation and they shouldn’t just present a consistency oracle–it still didn’t work.

They just keep giving back worthless ideas about test design and flunk that part of the exam.

Feh!

An Important Instructional Heuristic

When a few students give bad answers on an exam, the problem is in the students. They don’t understand the material well enough.

When a lot of students give bad answers on the exam, the problem is in the instruction. It’s the responsibility of the teacher to troubleshoot and fix this.

When a lot of students give bad answers that are weak in a consistent way, something specific in the instruction leads them down that path. In my experience, that something is often something the instructor is particularly attached to.

I like the consistencies a lot. But I think that in the Foundations course, they are an attractive nuisance (an almost-irresistable invitation to take a hazardous or counterproductive path).

Therefore

  • In the next generation of the Foundations course, I will probably drop the oracle consistencies approach altogether.
  • In the next generation of the Bug Advocacy course, I will probably add the oracle consistencies as a useful tool for persuasive bug report writing.

Appendix: More Details on Oracles

  1. The underlying problem: Oracles are necessarily incomplete
  2. Oracles are heuristics
  3. Bach and Bolton’s consistencies
  4. Hoffman’s approach
  5. Applying this to test automation

Oracles are Necessarily Incomplete

Back when dinosaurs roamed the earth, some testasauruses theorized that a properly designed software test involves:

  • a set of preconditions that specify the state of the software and system when you start the test
  • a set of procedures that specify what you do when you do the test
  • comparison of what the software under test does with a set of postconditions: the predicted state of the system under test after you run the test. This set of postconditions make up the expected results of the test.

We can call the postconditions the oracle or we can say that a program that generates the expected results is the oracle, but in either case, the testasauruses said, good testing involves comparing the program’s test behavior to expected results, and to do good testing, you need an oracle. (Fossils from this era have been preserved in IEEE Standard 829 on software test documentation.)

Elaine Weyuker’s (1980) On Testing Nontestable Programs shattered that view. Weyuker argued that “it is unusual for … an oracle to be pragmatically attainable or even to exist” (p. 3). Instead, she said, testers rely on partial oracles. For example:

  • A tester might recognize a result of a calculation as impossibly large even though she doesn’t know what the exact result should be. (You might not know offhand what 1.465732 x 2.74312 is, but if a program said 7,000,000 you could reject that as obviously wrong without doing any calculations.)
  • A tester might recognize behavior as inappropriate, even if she doesn’t know exactly how the program should behave.

Weyuker’s paper wasn’t widely noticed in the practitioner community. I don’t think we appreciated the extent of this problem until the Quality Week conference in 1998, when Doug Hoffman (A Taxonomy for Test Oracles) explained this problem and its implications this way:

Suppose that we specify a test by describing

    • the starting state of the system under test
    • the test inputs (the data and operations you use to carry out the test)
    • the expected test outputs

We can still make mistakes in interpreting the test results.

    • We might incorrectly decide that the program passed the test because its outputs matched the expected outputs but it misbehaved in some other way. For example, a program that adds 2+2 might get 4, but it is clearly broken in some way if it takes 10 hours to get that result of 4.
    • We might incorrectly decide that the program failed the test because its outputs did not match the expected results, but on more careful examination, we might realize that it did the right thing. For example, imagine testing to a network printer with the expectation that the printer will page a specific page within 1 minute–but during the test, another computer sent a long document to the printer and so it didn’t actually get to the test document for a long time. This might be the exactly correct behavior under the circumstances, but it doesn’t match the expectation.

Most testers, doing manual testing, would probably not make either mistake. But an automated test would make both mistakes. So would a manual tester who was trying to exactly follow a fully-detailed script.

Doug argued that both types of mistakes were inevitable in testing because no one could fully specify the starting state of the system and no one could fully specify the ending state of the system. There are too many potentially-relevant variables. For example, suppose in your 2+2 test, you do specify the expected time for the test to complete:

  • Did you specify the contents of the stack? What if the program adds stuff to the stack but doesn’t remove it, or corrupts the stack in some other way?
  • Did you specify the contents of memory? Memory leaks are common bugs. And buffer overflows are a common example of a class of bug that corrupts memory.
  • Did you specify the contents of the hard disk? What if the program saves something or deletes something?
  • Did you specify what the printer would do during the test? What if the program sends something to the printer, even though it is not supposed to, or sends unauthorized email, etc.?

If you don’t have experience thinking about the diversity of ways that something can go wrong, but you have a bit of technical savvy, the Hewlett-Packard printer diagnostics can be eye-opening. You can find documentation of these in Management Information Bases (MIB’s) published by HP. I find these at https://spp.austin.hp.com/SPP/Public/Sdk/SdkPublicDownload.aspx but if this source goes away, you can find third party sites like OiDView. For example, the MIB file for the LaserJet 9250c runs 8506 lines, documenting 176 commands, many of them with many possible parameters. A program can go wrong in hundreds (or thousands) of different ways.

From a diagnostic point of view, imagine running a test and checking the state of the printer. For example, you might check how much free memory there is, or how long the last command took to execute, or the most recent internal error code. Each diagnostic command that you run changes the state of the machine, and so the results of the next diagnostic are no longer looking at the system as it was right after the test completed.

So in practical terms, even if you could fully specify the state of the system after a test (you can’t, but pretend that you could), you still couldn’t check whether the system actually reached that state after the test because each of the diagnostics that you would run to check the state of the system would change the state. The next diagnostic tests the machine that is now in a different state. In practical terms, you can only run a few diagnostics as part of a test (maybe just one) before the diagnostics stop being informative. If  these diagnostics don’t look for a problem in the right places, you won’t see it. This is sometimes called the Heisenbug problem, in honor of the Heisenberg Uncertainty Principle.

Oracles are Heuristics

Hoffman argued that no oracle can fully specify the postcondition state of the system under test and therefore no oracle is complete. Given that an oracle is incomplete, you might use the oracle and incorrectly conclude that the program failed the test when it didn’t or passed the test when it didn’t. Either way, reliance on an oracle can lead you to the wrong conclusion.

A decision rule that is useful but not always correct is called a heuristic.

My favorite presentations of the ideas underlying Heuristics were written by Billy V. Koen. See his book (I prefer the shorter and simpler ASEE early edition used in introductory engineering courses, but the current version is good too) and a wonderful historical article that he wrote for BBST.

The Bach / Bolton Consistency Heuristics

Imagine running a test. The program misbehaves. The tester notices the behavior and recognizes that something is wrong. What is it that makes the tester decide this is wrong behavior?

In Bach’s view (as I understand it from talking with him and teaching about this with him), what happens is that the tester makes a comparison between the behavior and some expectations about the ways the program should (or should not) behave. These comparisons might be conscious or unconscious, but Bach posits that they must happen because every explanation of why a program’s behavior has been evaluated as a misbehavior can be mapped to one of these types of consistency.

Here’s the list:

  • Consistent within product: Function behavior consistent with behavior of comparable functions or functional patterns within the product.
  • Consistent with comparable products: Function behavior consistent with that of similar functions in comparable products.
  • Consistent with history: Present behavior consistent with past behavior.
  • Consistent with our image: Behavior consistent with an image the organization wants to project.
  • Consistent with claims: Behavior consistent with documentation, specifications, or ads.
  • Consistent with standards or regulations: Behavior consistent with externally-imposed requirements.
  • Consistent with user’s expectations: Behavior consistent with what we think users want.
  • Consistent with purpose: Behavior consistent with product or function’s apparent purpose.

(If you reorder the list, you can use a mnemonic abbreviation to memorize it: HICCUPPS.)

For example, imagine that there is a program specification and that the program behaves differently from what you would predict from the specification. The behavior might be reasonable, but if it contradicts the specification, you should probably write a bug report. Your explanation of the problem in the report wouldn’t be “this is bad”. It would be “this is bad because it is inconsistent with the specification.”

The list is designed to cover every type of consistency-expectation that testers rely on. If they realize the list is incomplete, they add a new type.

For the sake of argument, I will assume that this list is complete, i.e. that every rationale that a tester provides for why a program is misbehaving can be mapped to one of these 8 types of consistency.

I have seen it argued (mainly on Twitter) that this is the “right” list. That every other oracle can be mapped to this list (this oracle tests for this type of inconsistency) and therefore they are all special cases. If you know this list, the argument goes, you can derive (or imagine) (or something) all the oracles from it.

As far as I know, there is no empirical research to support the claim that testers in fact always rely on comparisons to expectations or that these particular categories of expectations map to the comparisons that go on in testers’ heads.

  • That assertion does not match my subjective impression of what happens in my head when I test. It seems to me that misbehaviors often strike me as obvious without any reference to an alternative expectation. One could counter this by saying that the comparison is implicit (unconscious) and maybe it is. But there is no empirical evidence of this, and until there is, I get to group the assertion with Santa Claus and the Tooth Fairy. Interesting, useful, but not necessarily true.
  • The assertion also does not match my biases about the nature of concept formation and categorical reasoning. As a graduate student, I studied cognition with Professor Lee R. Brooks. Some of his most famous work was on nonanalytic concept formation (see his 1978 chapter In Rosch & Lloyd’s classic Cognition & Categorization or his paper with Larry Jacoby (1984) on Nonanalytic cognition. A traditional view of cognition holds that we make many types of judgment on the basis of rules that put things into categories–something is this or that because of a set of rules that we consult either consciously or unconsciously. Bach and Bolton’s consistencies are examples of the kinds of categories that I think of when I think of this tradition. A very different view holds that we make judgments on the basis of similarity to exemplars. (An exemplar is a memorable example.) A person can learn arbitrarily many exemplars. Experts have probably learned many more than nonexperts and so they make better evaluations. One of the most interesting experiments in Lee’s lab required the experimental subject to make a judgment (saying which category something belonged to) and explain the judgment. The subjects in the experiments described what they said were their decision rules to explain each choice. But over a long series of decisions, you can ask whether these rules actually describe the judgments being made. The answer was negative. The subject would describe a rule that recently he hadn’t followed and that he would again not follow later. Instead, the more accurate predictor of his decisions was the similarity of the thing he was categorizing to other things he had previously categorized. It appeared that unconscious processing was going on, but it was  nonanalytic (similarity-based), not analytic (rule-based). I found, and still find, this line of results persuasive.

A list can be useful as a heuristic device, as a tool that helps you consciously think about a problem, whether the list describes the actual underlying psychology of testing or not.

But if it is to be a good heuristic device, it has to be more useful than not. As a tool for teaching oracles as part of test design, my experience is that the consistency list fails the utility criterion.

I don’t have any scientific research to back up my conclusion, just a lot of personal experience. But when dealing with a heuristic device that is not backed up by any scientific research (just a lot of personal experience), I get to rely on what I’ve got.

Doug Hoffman’s Approach

I first saw Doug talk about oracles in 1998, at Quality Week. That was the start of a long series of publications on oracles and the use of oracles in test automation. Along with the papers, I have the benefit of having taught courses on test automation with Doug and having talked at length with him while he struggled to get his ideas on paper.

Doug made two key points in 1998:

  • All oracles are heuristic (we’ve already covered that ground)
  • There are a lot of incomplete oracles available. Given that we have to rely on incomplete oracles (because no oracles are complete), we should think about what combinations of oracles we can use to learn interesting things about the software.

Doug’s work was so striking that we opened the Fifth Los Altos Workshop on Software Testing with it. That meeting became an intense, 2-day long, moderated debate between Doug and James Bach. We learned so much about managing difficult debates in that meeting that we were able to create what I think of as the current structure of LAWST, adopted in LAWST 6.

Doug has published several lists of specific types of oracles. Unfortunately, each of the ones I’ve read has its own idiosyncrasies that can be confusing, so I won’t try to restate them. Instead, I’ll work from BBST-Foundations-2013 (in preparation), which refines a list that I prepared for the current BBST Foundations with Doug’s coaching.

  • We use the constraint oracle to check for impossible values or impossible relationships. For example an American ZIP code must be 5 or 9 digits. If you see something that is non-numeric or some other number of digits, it cannot be a ZIP code. A program that produces such a thing as a ZIP code has a bug.
  • We use the regression oracle to check results of the current test against results of execution of the same test on a previous version of the product.
  • We use self-verifying data as an oracle. In this case, we embed the correct answer in the test data. For example, if a protocol specifies that when a program sends a message to another program, the other one will return a specific response (or one of a few possible responses), the test could include the acceptable responses. An automated test would generate the message, then check whether the response was in the list or was the specific one in the list that is expected for this message under this circumstance.
  • We use a physical model as an oracle when we test a software simulation of a physical process. For example, does the movement of a character or object in a game violate the laws of gravity?
  • We use a business model the same way we use a physical model. If we have a model of a system, we can make predictions about what will happen when events X take place. The model makes predictions. If the software emulates the business process as we intend, it should give us behavior that is consistent with those predictions. Of course, as with all heuristics, if the program “fails” the test, it might be the model that is wrong.
  • We use a statistical model to tell us that a certain behavior or sequence of behaviors is very unlikely, or very unlikely in response to a specific action. The behavior is not impossible, but it is suspicious. We can test whether the actual behavior in the test is within the tolerance limits predicted by the model. This is often useful for looking for patterns in larger sets of data (longer sequences of tests). For example, suppose we expect an eCommerce website to get 80% of its customers from the local area, but in beta trials of its customer-analysis software, the software reports that 70% of the transactions that day were from far away. Maybe this was a special day, but probably this software has a bug. If we can predict a statistical pattern (correlations among variables, for example), we can check for it.
  • Another type of statistical oracle starts with an input stream that has known statistical characteristics and then check the output stream to see if it has the same characteristics. For example, send a stream of random packets, compute statistics of the set, and then have the target system send back the statistics of the data it received. If this is a large data set, this can save a lot of transmission time. Testing transmission using checksums is an example of this approach. (Of course, if a message has a checksum built into the message, that is self-verifying data.)
  • We use a state model to specify what the program does in response to an input that happens when it is in a known state. A full state model specifies, for every state the program can be in, how the program will respond (what state it will transition to) for every input.
  • We can build an interaction model to help us test the interaction between this program and another one. The model specifies how that program will behave in response to events in (actions of) this program and how this program will behave in response to actions of the other program. The automaton triggers the action, then checks the expected behavior.
  • We use calculation oracles to check the calculations of a program. For example, if the program adds 5 numbers, we can use some other program to add the 5 numbers and see what we get. Or we can add the numbers and then successively subtract one at a time to see if we get a zero.
  • The inverse oracle is often a special case of a calculation oracle (the square of the square root of 2 should be 2) but not always. For example, imagine taking a list that is sorted low to high, sorting it high to low and then sorting it low to high. Do we get back the same list?
  • The reference program generates the same responses to a set of inputs as the software under test. Of course, the behavior of the reference program will differ from the software under test in some ways (they would be identical in all ways only if they were the same program). For example, the time it takes to add 1000 numbers might be different in the reference program versus the software under test, but if they ultimately yield the same sum, we can say that the software under test passed the test.

You can probably imagine lots of other possibilities for this list.

Applying This to Automation

What is special about these oracles is that they are programmable. You can create automated tests that will check the behavior of the program against the result predicted by (or predicted against by) any of these oracles or by any (well, probably, almost any) combination of these oracles.

Given a programmable oracle you can do high volume automated testing. Have the test-design-and-execution program randomly generate inputs to the software under test and check whether the software responds the way the oracle predicts. You might use some type of model to drive the random number generator (making some events more likely than others). You might randomly create a long random sequence of tests (e.g. regression tests) by randomly selecting which test to run next from a pool of already-built tests (each of which has an expected result that you can check against). Given an oracle, you can detect whatever failures that oracle can expose. For example, you might test with several oracles:

  • one oracle predicts how long an operation should take (or a range of possibility). If the program takes substantially more or less time, that’s a problem.
  • another oracle can predict the calculation result of the operation (or the functional result if you’re doing something else, like sorting, that isn’t exactly a calculation)
  • another oracle might predict the amount of free memory, or at least might tell you whether a large data set (or memory-intensive calculation) should fit in memory. If so, you can detect memory leaks this way.

No matter what combination you choose, you will miss some types of errors. You cannot test all the dimensions of the result of a test with any oracle or any combination of oracles. But if you test a feature by machine using some oracles, then when you test that feature with a human painstakingly designing and running each test individually, that person will know that she doesn’t have to waste time checking whether certain types of bugs are there or not, because if they were there, the automaton would already have exposed them.

An Exam Question

Here’s an exam question (that students in the current version of BBST Foundations have often handled poorly):

Suppose you have written a test tool that allows you to feed commands and data to Microsoft Excel and to Open Office Calc and to see the results. The test tool is complete, and it works correctly. You have been asked to test a new version of Calc and told to automate all of your testing. What oracles would you use to help you find bugs? What types of information would you expect to get with each oracle?

Note: Don’t just echo back a consistency heuristic. Be specific in your description of a relevant oracle and of the types of information or bugs that you expect.

Look back at the Hoffman list and think of what oracles you could use for this test, to facilitate extensive automated testing.

Now look further back to the Bach / Bolton list and think of what oracles their lists suggests, that would work well for designing extensive automated testing.

For me, the Hoffman list works better. (And if I thought about additional oracles that were specific enough to support automation, I would add them to the Hoffman list, growing it into something that gets longer and longer the more I use it.) What about for you?

This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

 

CAST 2012 Metrics Talk Posted

August 7th, 2012

Title slide

I’ve posted the video and slide deck for the metrics talk Nawwar and I did at CAST. I hope you enjoy them.

 

Theses and Dissertations About Software Testing

April 17th, 2012

This week, Florida Tech’s Center for Software Testing Education & Research (my lab) published a bibliography of dissertations and theses focused on software testing (http://www.zotero.org/groups/cster_dissertations). To use the bibliography, you need an open source (free) tool called Zotero (http://www.zotero.org/)

What are these documents?

Theses are research reports written by graduate students as a final requirement for graduation. Doctoral theses are also called dissertations.

The typical thesis includes:

  • a literature review that describes a significant set of related research published by others
  • an idea (for example, an idea about how to improve testing, or how to create and assess a testing tool, or how to study how testing is really done, or how to teach it more effectively, or how to prove that some other idea about testing is wrong)
  • a description of the thesis methodology and technology (for example, how a test tool was designed, implemented, and studied)
  • a description of the results of the study.

Theses are evaluated by professors who are experts in the discipline and at least one who is not. For example, the supervisory committee for a doctoral student in Computer Science might include three professors of Computer Science, one professor of Biology and one of Business Administration. Of the three computer scientists, one (or two) would typically be expert in the subject matter of the thesis (e.g. software testing) and the others would probably be experts in other areas of computing that the work depends on. for example, a dissertation focused on testing databases might be supervised by an expert in testing, an expert in databases and an expert in research design.

Why I recommend them

Theses are designed to be read by someone who is not an expert in the field. Therefore, a thesis will typically organize a testing problem–including the most relevant research papers–in a way that a student or a mid-level testing practitioner can understand.

Of course, theses vary in quality. Some are written poorly. Some are researched poorly. Many present half-baked ideas (this is student work, not the work of an experienced practitioner or a professional researcher). But overall, I have found them good starting points when I start working in a new area or when I assign a student to an area that is new to her.

What’s in the bibliography

We have over 700 references, most before 2007.

Each reference includes the basic bibliographic information (author, title, etc.). It also includes:

  • a URL.
    • If we found a free copy of the thesis online, we point to that.
    • If not, then if the thesis is listed in WorldCat, we point to that. WorldCat indexes many of the world’s public libraries. If your public or university library is on the Interlibrary Loan system, WorldCat will tell your reference librarian what library has a copy of the thesis, so you can borrow it. Interlibrary Loans are often free to the borrower. It’s not as convenient as free-on-the-web, but it’s still free.
    • If it’s not listed on WorldCat, we point to ProQuest (we often point to ProQuest in the notes as well). You might know this branch of ProQuest as University Microfilms. You can order dissertations from ProQuest but this is not always free (PQDT Open and ProQuest with Google Scholar publish some dissertations for free, possibly including some that we thought were available only commercially). Prices vary. I think $39 is a typical number. Because theses are of variable quality, I strongly suggest that you preview as much as you can (you can often download a chapter for free from ProQuest) or read an article that summarizes the thesis (see next section) before paying $39.
  • a related reference
  • an abstract (a short summary of the thesis)
    • We chose not to copy abstracts from Dissertations Abstracts (ProQuest) because we don’t want to risk a copyright fight. If we found a copy of a thesis online or if an author posted a copy of their thesis abstract online, we copied that abstract into the bibliographic record for the thesis.

Digging up this extra information takes a lot of time and painstaking work. We’re continuing to add more recent work, and expect to grow the collection significantly over the summer.

Acknowledgments

This bibliography was created primarily by Karishma Bhatia, Casey Doran, Pat McGee, Kasey Powers, Andy Tinkham, and Patricia Terol Tolsa.

This bibliography is a product of research that was supported by NSF Grants EIA-0113539 ITR/SY+PE: “Improving the Education of Software Testers” and CCLI-0717613 “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing.” Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

WTST 2012: Workshop on Teaching Software Testing

November 27th, 2011

TEACHING SECURITY-RELATED SOFTWARE TESTING

WTST 2012: The 11th Annual Workshop on Teaching Software Testing

January 27-29, 2012

at the Harris Institute for Assured Information

Florida Institute of Technology, Melbourne, Florida

http://www.wtst.org.

Software testing is often described as a central part of software security, but it has a surprisingly small role in security-related curricula. Over the next 5 years, we hope to change this. If there is sufficient interest, we hope to focus WTSTs 2012-2016 on instructional support for teaching security-related testing.

OUR GOALS FOR WTST 2012

  • Survey the domain: What should we consider as part of “security-related software testing”?
  • Cluster the domain: What areas of security-related testing would fit well together in the same course?
  • Characterize some of the key tasks:
    • Some types of work are (or should be) routine. To do them well, an organization needs a clearly defined, repeatable process that is easy to delegate.
    • Other types are cognitively complex. Their broader goals might stay stable, but the details constant change as circumstances and threats evolve.
    • And other types are centered on creating, maintaining and extending technology, such as tools to support testing.
  • Publish this overview (survey / clustering / characterization)
  • Apply for instructional development grants. We (CSTER) intend to apply for funding. We hope to collaborate with other institutions and practitioners and we hope to foster other collaborations that lead to proposals that are independent of CSTER.

UNDERLYING VIEWPOINT

The Workshop on Teaching Software Testing is concerned with the practical aspects of teaching university-caliber software testing courses to academic or commercial students.

We see software testing as a cognitively complex activity, an active search for quality-related information rather than a tedious collection of routines. We see it as more practical than theoretical, more creative than prescriptive, more focused on investigation than assurance (you can’t prove a system is secure by testing it), more technical than managerial, and more interested in exploring risks than defining processes.

We think testing is too broad an area to cover fully in a single course. A course that tries to teach too much will be too superficial to have any real value. Rather than designing a single course to serve as a comprehensive model, we think the field is better served with several designs for several courses.

We are particularly interested in online courses that promote deeper knowledge and skill. You can see our work on software testing at http://www.testingeducation.org/BBST. Online courses and courseware, especially Creative Commons courseware, make it possible for students to learn multiple perspectives and to study new topics and learn new skills on a schedule that works for them.

WHO SHOULD ATTEND

We invite participation by:

  • academics who have experience teaching courses on testing or security
  • practitioners who teach professional seminars on software testing or security
  • one or two graduate students
  • a few seasoned teachers or testers who are beginning to build their strengths in teaching software testing or security.

There is no fee to attend this meeting. You pay for your seat through the value of your participation. Participation in the workshop is by invitation based on a proposal. We expect to accept 15 participants with an absolute upper bound of 22.

TO APPLY TO ATTEND

Send an email to Cem Kaner (kaner@cs.fit.edu) by December 20, 2011.

Your email should describe your background and interest in teaching software testing or security. What skills or knowledge do you bring to the meeting that would be of interest to the other participants?

If you are willing to make a presentation, send an abstract. Along with describing the proposed concepts and/or activities, tell us how long the presentation will take, any special equipment needs, and what written materials you will provide. Along with traditional presentations, we will gladly consider proposed activities and interactive demonstrations.

We will begin reviewing proposals on December 1. We encourage early submissions. It is unlikely but possible that we will have accepted a full set of presentation proposals by December 20

Proposals should be between two and four pages long, in PDF format. We will post accepted proposals to http://www.wtst.org.

We review proposals in terms of their contribution to knowledge of HOW TO TEACH software testing and security. We will not accept proposals that present a theoretical advance with weak ties to teaching and application. Presentations that reiterate materials you have presented elsewhere might be welcome, but it is imperative that you identify the publication history of such work.

By submitting your proposal, you agree that, if we accept your proposal, you will submit a scholarly paper for discussion at the workshop by January 15, 2010. Workshop papers may be of any length and follow any standard scholarly style. We will post these at http://www.wtst.org as they are received, for workshop participants to review before the workshop.

HOW THE MEETING WILL WORK

WTST is a workshop, not a typical conference.

  • We will have a few presentations, but the intent of these is to drive discussion rather than to create an archivable publication.
    • We are glad to start from already-published papers, if they are presented by the author and they would serve as a strong focus for valuable discussion.
    • We are glad to work from slides, mindmaps, or diagrams.
  • Some of our sessions will be activities, such as brainstorming sessions, collaborative searching for information, creating examples, evaluating ideas or workproducts and lightning presentations (presentations limited to 5-minutes, plus discussion).
  • In a typical presentation, the presenter speaks 10 to 90 minutes, followed by discussion. There is no fixed time for discussion. Past sessions’ discussions have run from 1 minute to 4 hours. During the discussion, a participant might ask the presenter simple or detailed questions, describe consistent or contrary experiences or data, present a different approach to the same problem, or (respectfully and collegially) argue with the presenter.

Our agenda will evolve during the workshop. If we start making significant progress on something, we are likely to stick with it even if that means cutting or timeboxing some other activities or presentations.

Presenters must provide materials that they share with the workshop under a Creative Commons license, allowing reuse by other teachers. Such materials will be posted at http://www.wtst.org.

HOSTS

The hosts of the meeting are:

LOCATION AND TRAVEL INFORMATION

We will hold the meetings at

Harris Center for Assured Information, Room 327

Florida Tech, 150 W University Blvd,

Melbourne, FL

Airport

Melbourne International Airport is 3 miles from the hotel and the meeting site. It is served by Delta Airlines and US Airways. Alternatively, the Orlando International Airport offers more flights and more non-stops but is 65 miles from the meeting location.

Hotel

We recommend the Courtyard by Marriott – West Melbourne located at 2101 W. New Haven Avenue in Melbourne, FL.

Please call 1-800-321-2211 or 321-724-6400 to book your room by January 2. Be sure to ask for the special WTST rates of $89 per night. Tax is an additional 11%.

All reservations must be guaranteed with a credit card by January 2, 2010 at 6:00 pm. If rooms are not reserved, they will be released for general sale. Following that date reservations can only be made based upon availability.

For additional hotel information, please visit the http://www.wtst.org or the hotel website at http://www.marriott.com/hotels/travel/mlbch-courtyard-melbourne-west/

OUR INTELLECTUAL PROPERTY AGREEMENT

We expect to publish some outcomes of this meeting. Each of us will probably have our own take on what was learned. Participants (all people in the room) agree to the following:

  • Any of us can publish the results as we see them. None of us is the official reporter of the meeting unless we decide at the meeting that we want a reporter.
  • Any materials initially presented at the meeting or developed at the meeting may be posted to any of our web sites or quoted in any other of our publications, without further permission. That is, if I write a paper, you can put it on your web site. If you write a problem, I can put it on my web site. If we make flipchart notes, those can go up on the web sites too. None of us has exclusive control over this material. Restrictions of rights must be identified on the paper itself.
    • NOTE: Some papers are circulated that are already published or are headed to another publisher. If you want to limit republication of a paper or slide set, please note the rights you are reserving on your document. The shared license to republish is our default rule, which applies in the absence of an asserted restriction.
  • The usual rules of attribution apply. If you write a paper or develop an idea or method, anyone who quotes or summarizes you work should attribute it to you. However, many ideas will develop in discussion and will be hard (and not necessary) to attribute to one person.
  • Any publication of the material from this meeting will list all attendees as contributors to the ideas published as well as the hosting organization.
  • Articles should be circulated to WTST-2012 attendees before being published when possible. At a minimum, notification of publication will be circulated.
  • Any attendee may request that his or her name be removed from the list of attendees identified on a specific paper.
  • If you have information which you consider proprietary or otherwise shouldn’t be disclosed in light of these publication rules, please do not reveal that information to the group.

ACKNOWLEDGEMENTS

Support for this meeting comes from the Harris Institute for Assured Information at the Florida Institute of Technology, and Kaner, Fiedler & Associates, LLC.

Funding for WTST 1-5 came primarily from the National Science Foundation , under grant EIA-0113539 ITR/SY+PE “Improving the Education of Software Testers.” Partical funding for the Advisory Board meetings in WTST 6-10 came from the the National Science Foundation, under grant CCLI-0717613 “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing”.

Opinions expressed at WTST or published in connection with WTST do not recessarily reflect the views of NSF.

WTST is a peer conference in the tradition of the Los Altos Workshops of Software Testing.

 

Please update your links to this blog

November 20th, 2011

A few new posts will be coming soon. I’m hoping they won’t be missed.

I moved my blog to http://kaner.com over a year ago — If you’re still linking to the old site, please update it.

Thanks!

A welcome addition to the scholarship of exploratory software testing

November 16th, 2011

Juha Itkonen will be defending his dissertation on “Empirical Studies on Exploratory Software Testing” this Friday. I haven’t read the entire document, but what I have read looks very interesting.

Juha has been studying real-world approaches to software testing for about a decade (maybe longer–when I met him almost a decade ago, his knowledge of the field was quite sophisticated). I’m delighted to see this quality of academic work and wish him well in his final oral exam.

For a list of soon-to-be-Dr. Itkonen’s publications, see https://wiki.aalto.fi/display/~jitkonen@aalto.fi/Full+list+of+publications.

Emphasis & Objectives of the Test Design Course

October 8th, 2011

Becky and I are getting closer to rolling out Test Design. Here’s our current summary of the course:

Learning Objectives for Test Design

This is an introductory survey of test design. The course introduces students to:

  • Many (over 100) test techniques at a superficial level (what the technique is).
  • A detailed-level of familiarity with a few techniques:
    • function testing
    • testing tours
    • risk-based testing
    • specification-based testing
    • scenario testing
    • domain testing
    • combination testing.
  • Ways to compare strengths of different techniques and select complementary techniques to form an effective testing strategy
  • Using the Heuristic Test Strategy Model for specification analysis and risk analysis
  • Using concept mapping tools for test planning.

I’m still spending about 6 hours per night on video edits, but our most important work is on the assessments. To a very large degree, my course designs are driven by my assessments. That’s because there’s such a strong conviction in the education community–which I share–that students learn much more from the assessments (from all the activities that demand that they generate stuff and get feedback on it) than from lectures or informal discussions. The lectures and slides are an enabling backdrop for the students’ activities, rather than the core of the course.

In terms of design decisions, deciding what I will hold my students accountable for knowing requires me to decide what I will hold myself accountable for teaching well.

If you’re intrigued by that way of thinking about course design, check out:

I tested the course’s two main assignments in university classrooms several times before starting on the course slides (and wrote 104 first-draft multiple-quess questions and maybe 200 essay questions). But now that the course content is almost complete, we’re revisiting (and of course rewriting) these materials. In the process, we’ve been gaining perspective.

I think the most striking feature of the new course is its emphasis on content.

Let me draw the contrast with a chart that compares the BBST courses (Foundations, Bug Advocacy, and Test Design) and some other courses still on the drawing boards:

A few definitions:

  • Course Skills: How to be an effective student. Working effectively in online courses. Taking tests. Managing your time.
  • Social Skills: Working together in groups. Peer reviews. Using collaboration tools (e.g. wikis).
  • Learning Skills: How to gather, understand, organize and be able to apply new information. Using lectures, slides, and readings effectively. Searching for supplementary information. Using these materials to form and defend your own opinion.
  • Testing Knowledge: Definitions. Facts and concepts of testing. Structures for organizing testing knowledge.
  • Testing Skills: How to actually do things. Getting better (through practice and feedback) at actually doing them.
  • Computing Fundamentals: Facts and concepts of computer science and computing-relevant discrete mathematics.

As we designed the early courses, Becky Fiedler and I placed a high emphasis on course skills and learning skills. Students needed to (re)learn how to get value from online video instruction, how to take tests, how to give peer-to-peer feedback, etc.

The second course, Bug Advocacy, emphasizes specific testing skills–but the specific skills are the ones involved in bug investigation and reporting. Even though these critical thinking, research, and communication skill have strong application to testing, they are foundational for knowledge-related work.

Test Design is much more about the content (testing knowledge). We survey (depends on how you count) 70 to 150 test techniques. We look for ways to compare and contrast them. We consider how to organize projects around combinations of a few techniques that complement each other (make up for each other’s weaknesses and blindnesses). The learning skills component is active reading–This is certainly generally useful, but its context and application is specification analysis.

Test Design is more like the traditional Software Testing Course firehose. Way too much material in way too little time, with lots of reference material to help students explore the underemphasized parts of the course when they need it on the job.

The difference is that we are relying on the students’ improved learning skills. The assignments are challenging. The labs are still works-in-progress and might not be polished until the third iteration of the course, but labs-plus-assignments being home a bunch of lessons.

Whether the students’ skills are advanced enough to process over 500 slides efficiently, integrate material across sections, integrate required readings, and apply them to software — all within the course’s 4-week timeframe — remains to be seen.

This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

Learning Objectives of the BBST Courses

September 19th, 2011

As I finish up the post-production and assessment-design for the Test Design course, I’m writing these articles as a set of retrospectives on the instructional design of the series.

For me, this is a transition point. The planned BBST series is complete with lessons to harvest as we create courseware on software metrics, development of skills with specific test techniques, computer-related law/ethics (in progress), cybersecurity, research methods and instrumentation applied to quantitative finance, qualitative research methods, and analysis of requirements.

Instructional Design

I think course design is a multidimensional challenge, focused around seven core questions:

    1. Content: What information, or types of information do I want the students to learn?
    2. Skills: What skills do I want the students to develop?
    3. Level of Learning: What level of depth do I want to students to learn this material at?
    4. Learning activities: How will the course’s activities (including the assessments, such as assignments and exams) that I use support the students’ learning?
    5. Instructional Technologies: What technologies will I use to support the course?
    6. Assessment: How will I assess the course: How will I find out what the students have learned, and at what level?
    7. Improvement: How will I use the assessment results to guide my improvement of the course?

This collection of articles will probably skip around in these questions as I take inventory of my last 7 years of notes on online and hybrid course development.

Objectives of the BBST Courses

The changing nature of the objectives of the BBST courses.

Courses differ in more than content. They differ in the other things you hope students learn along with the content.

The BBST courses include a variety of types of assessment: quizzes, labs, assignments and exams. For instructional designers, the advantage of assessments is that we can find out what the students know (and don’t know) and what they can apply.

The BBST courses have gone through several iterations. Becky Fiedler and I used the performance data to guide the evolutions.

Based on what we learned, we place a higher emphasis in the early courses on course skills and learning skills and a greater emphasis in the later courses on testing skills.

A few definitions:

  • Course Skills: How to be an effective student. Working effectively in online courses. Taking tests. Managing your time.
  • Social Skills: Working together in groups. Peer reviews. Using collaboration tools (e.g. wikis).
  • Learning Skills: How to gather, understand, organize and be able to apply new information. Using lectures, slides, and readings effectively. Searching for supplementary information. Using these materials to form and defend your own opinion.
  • Testing Knowledge: Definitions. Facts and concepts of testing. Structures for organizing testing knowledge.
  • Testing Skills: How to actually do things. Getting better (through practice and feedback) at actually doing them.
  • Computing Fundamentals: Facts and concepts of computer science and computing-relevant discrete mathematics.

You can see the evolution of emphasis in the course’s specific learning objectives.

Learning Objectives of the 3-Course BBST Set

  • Understand key testing challenges that demand thoughtful tradeoffs by test designers and managers.
  • Develop skills with several test techniques.
  • Choose effective techniques for a given objective under your constraints.
  • Improve the critical thinking and rapid learning skills that underlie good testing.
  • Communicate your findings effectively.
  • Work effectively online with remote collaborators.
  • Plan investments (in documentation, tools, and process improvement) to meet your actual needs.
  • Create work products that you can use in job interviews to demonstrate testing skill.

Learning Objectives for the First Course (Foundations)

This is the first of the BBST series. We address:

  • How to succeed in online classes
  • Fundamental concepts and definitions
  • Fundamental challenges in software testing

Improve academic skills

  • Work with online collaboration tools
    • Forums
    • Wikis
  • Improve students’ precision in reading
  • Create clear, well-structured communication
  • Provide (and accept) effective peer reviews
  • Cope calmly and effectively with formative assessments (such as tests designed to help students learn).

Learn about testing

  • Key challenges of testing
    • Information objectives drive the testing mission and strategy
    • Oracles are heuristic
    • Coverage is multidimensional
    • Complete testing is impossible
    • Measurement is important, but hard
  • Introduce you to:Basic vocabulary of the field
    • Basic facts of data storage and manipulation in computing
    • Diversity of viewpoints
    • Viewpoints drive vocabulary

Learning Objectives for the Second Course (Bug Advocacy)

Bug reports are not just neutral technical reports. They are persuasive documents. The key goal of the bug report author is to provide high-quality information, well written, to help stakeholders make wise decisions about which bugs to fix.

Key aspects of the content of this course include:

  • Defining key concepts (such as software error, quality, and the bug processing workflow)
  • The scope of bug reporting (what to report as bugs, and what information to include)
  • Bug reporting as persuasive writing
  • Bug investigation to discover harsher failures and simpler replication conditions
  • Excuses and reasons for not fixing bugs
  • Making bugs reproducible
  • Lessons from the psychology of decision-making: bug-handling as a multiple-decision process dominated by heuristics and biases.
  • Style and structure of well-written reports

Our learning objectives include this content, plus improving your abilities / skills to:

  • Evaluate bug reports written by others
  • Revise / strengthen reports written by others
  • Write more persuasively (considering the interests and concerns of your audience)
  • Participate effectively in distributed, multinational workgroup projects that are slightly more complex than the one in Foundations

Learning Objectives for the Third Course (Test Design)

This is an introductory survey of test design. The course introduces students to:

  • Many (nearly 200) test techniques at a superficial level (what the technique is).
  • A detailed-level of familiarity with a few techniques:
    • function testing
    • testing tours
    • risk-based testing
    • specification-based testing
    • scenario testing
    • domain testing
    • combination testing.
  • Ways to compare strengths of different techniques and select complementary techniques to form an effective testing strategy
  • Using the Heuristic Test Strategy Model for specification analysis and risk analysis
  • Using concept mapping tools for test planning.

 
This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

A New Course on Test Design: The Bibliography

September 13th, 2011

Back in 2004, I started developing course videos on software testing and computer-related law/ethics. Originally, these were for my courses at Florida Tech, but I published them under a Creative Commons license so that people could incorporate the materials in their own courses.

Soon after that, a group of us (mainly, I think, Scott Barber, Doug Hoffman, Mike Kelly, Pat McGee, Hung Nguyen, Andy Tinkham, and Ben Simo) started planning the repurposing of the academic course videos for professional development. I put together some (failing) prototypes and Becky Fiedler took over the instructional design.

  • We published the first version of BBST-Foundations and taught the courses through AST (Association for Software Testing). It had a lot of rough edges, but people liked it at lot.
  • So Becky and I created course #2, Bug Advocacy, with a lot of help from Scott Barber (and many other colleagues). This was new material, a more professional effort than Foundations, but it took a lot of time.

That took us to a fork in the road.

  • I was working with students on developing skills with specific techniques (I worked with Giri Vijayaraghavan and Ajay Jha on risk-based testing; Sowmya Padmanabhan on domain testing; and several students with not-quite-successful efforts on scenario testing). Sowmya and I proved (not that we were trying to) that developing students’ testing skills was more complex than I’d been thinking. So, Becky and I were rethinking our skills-development-course designs.
  • On the other hand, AST’s Board wanted to pull together a “complete” introductory series in black box testing. Ultimately, we went that way.

The goal was a three-part series:

  1. A reworked Foundations that fixed many of the weaknesses of Version 1. We completed that one about a year ago (Becky and Doug Hoffman were my main co-creators, with a lot of design guidance from Scott Barber).
  2. Bug Advocacy, and
  3. a new course in Test Design.

Test Design is (finally) almost done (many thanks to Michael Bolton and Doug Hoffman). I’ll publish the lectures as we finish post-production on the videos. Lecture 1 should be up around Saturday.

Test Design is a survey course. We cover a lot of ground. And we rely heavily on references, because we sure don’t know everything there is to know about all these techniques.

To support the release of the videos, I’m publishing our references now. (The final course slides will have the references too, but those won’t be done until we complete editing the last video.):

  • As always, it has been tremendously valuable reading books and papers suggested by colleagues and rereading stuff I’ve read before. A lot of careful thinking has gone into the development and analysis of these techniques.
  • As always, I’ve learned a lot from people whose views differ strongly from my own. Looking for the correctness in their views–what makes them right, within their perspective and analysis, even if I disagree with that perspective–is something I see as a basic professional skill.
  • And as always, I’ve not only learned new things: I’ve discovered that several things I thought I knew were outdated or wrong. I can be confident that the video is packed with errors–but plenty fewer than there would have been a year ago and none that I know about now.

So… here’s the reference list. Video editing will take a few weeks to complete–if you think we should include some other sources, please let me know. I’ll read them and, if appropriate, I’ll gladly include them in the list.

Active reading (see also Specification-based testing and Concept mapping)

All-pairs testing

See http://www.pairwise.org/ for more references generally and http://www.pairwise.org/tools.asp for a list of tools.

Alpha testing

See references on tests by programmers of their own code, or on relatively early testing by development groups. For a good overview from the viewpoint of the test group, see Schultz, C.P., Bryant, R., & Langdell, T. (2005). Game Testing All in One. Thomson Press

Ambiguity analysis (See also specification-based testing)

Best representative testing (See domain testing)

Beta testing

Boundary testing (See domain testing)

Bug bashes

Build verification

  • Guckenheimer, S. & Perez, J. (2006). Software Engineering with Microsoft Visual Studio Team System. Addison Wesley.
  • Page, A., Johnston, K., & Rollison, B.J. (2009). How We Test Software at Microsoft. Microsoft Press.
  • Raj, S. (2009). Maximize your investment in automation tools. Software Testing Analysis & Review. http://www.stickyminds.com

Calculations

Note: There is a significant, relevant field: Numerical Analysis. The list here merely points you to a few sources I have personally found helpful, not necessarily to the top references in the field.

Combinatorial testing. See All-Pairs Testing

Concept mapping

  • Hyerle, D.N. (2008, 2nd Ed.). Visual Tools for Transforming Information into Knowledge, Corwin.
  • Margulies, N., & Maal, N. (2001, 2nd Ed.) Mapping Inner Space: Learning and Teaching Visual Mapping. Corwin.
  • McMillan, D. (2010). Tales from the trenches: Lean test case design. http://www.bettertesting.co.uk/content/?p=253
  • McMillan, D. (2011). Mind Mapping 101. http://www.bettertesting.co.uk/content/?p=956
  • Moon, B.M., Hoffman, R.R., Novak, J.D., & Canas, A.J. (Eds., 2011). Applied Concept Mapping: Capturing, Analyzing, and Organizing Knowledge. CRC Press.
  • Nast, J. (2006). Idea Mapping: How to Access Your Hidden Brain Power, Learn Faster, Remember More, and Achieve Success in Business. Wiley.
  • Sabourin, R. (2006). X marks the test case: Using mind maps for software design. Better Software. http://www.stickyminds.com/BetterSoftware/magazine.asp?fn=cifea&id=90

Concept mapping tools:

Configuration coverage

Configuration / compatibility testing

Constraint checks

See our notes in BBST Foundation’s presentation of Hoffman’s collection of oracles.

Constraints

Diagnostics-based testing

  • Al-Yami, A.M. (1996). Fault-Oriented Automated Test Data Generation. Ph.D. Dissertation, Illinois Institute of Technology.
  • Kaner, C., Bond, W.P., & McGee, P.(2004). High volume test automation. Keynote address: International Conference on Software Testing Analysis & Review (STAR East 2004). Orlando. http://www.kaner.com/pdfs/HVAT_STAR.pdf (The Telenova and Mentsville cases are both examples of diagnostics-based testing.)

Domain testing

  • Abramowitz & Stegun (1964), Handbook of Mathematical Functions. http://people.math.sfu.ca/~cbm/aands/frameindex.htm
  • Beizer, B. (1990). Software Testing Techniques (2nd Ed.). Van Nostrand Reinhold.
  • Beizer, B. (1995). Black-Box Testing. Wiley.
  • Binder, R. (2000). Testing Object-Oriented Systems: Addison-Wesley.
  • Black, R. (2009). Using domain analysis for testing. Quality Matters, Q3, 16-20. http://www.rbcs-us.com/images/documents/quality-matters-q3-2009-rb-article.pdf
  • Copeland, L. (2004). A Practitioner’s Guide to Software Test Design. Artech House.
  • Clarke, L.A. (1976). A system to generate test data and symbolically execute programs. IEEE Transactions on Software Engineering, 2, 208-215.
  • Clarke, L. A. Hassel, J., & Richardson, D. J. (1982). A close look at domain testing. IEEE Transactions on Software Engineering, 2, 380-390.
  • Craig, R. D., & Jaskiel, S. P. (2002). Systematic Software Testing. Artech House.
  • Hamlet, D. & Taylor, R. (1990). Partition testing does not inspire confidence. IEEE Transactions on Software Engineering, 16(12), 1402-1411.
  • Hayes, J.H. (1999). Input Validation Testing: A System-Level, Early Lifecycle Technique. Ph.D. Dissertation (Computer Science), George Mason University.
  • Howden, W. E. (1980). Functional testing and design abstractions. Journal of Systems & Software, 1, 307-313.
  • Jeng, B. & Weyuker, E.J. (1994). A simplified domain-testing strategy. ACM Transactions on Software Engineering, 3(3), 254-270.
  • Jorgensen, P. C. (2008). Software Testing: A Craftsman’s Approach (3rd ed.). Taylor & Francis.
  • Kaner, C. (2004a). Teaching domain testing: A status report. Paper presented at the Conference on Software Engineering Education & Training. http://www.kaner.com/pdfs/teaching_sw_testing.pdf
  • Kaner, C., Padmanabhan, S., & Hoffman, D. (2012) Domain Testing: A Workbook, in preparation.
  • Myers, G. J. (1979). The Art of Software Testing. Wiley.
  • Ostrand, T. J., & Balcer, M. J. (1988). The category-partition method for specifying and generating functional tests. Communications of the ACM, 31(6), 676-686.
  • Padmanabhan, S. (2004). Domain Testing: Divide and Conquer. M.Sc. Thesis, Florida Institute of Technology. http://www.testingeducation.org/a/DTD&C.pdf
  • Schroeder, P.J. (2001). Black-box test reduction using input-output analysis. Ph.D. Dissertation (Computer Science). Illinois Institute of Technology.
  • Weyuker, E. J., & Jeng, B. (1991). Analyzing partition testing strategies. IEEE Transactions on Software Engineering, 17(7), 703-711.
  • Weyuker, E.J., & Ostrand, T.J. (1980). Theories of program testing and the application of revealing subdomains. IEEE Transactions on Software Engineering, 6(3), 236-245.
  • White, L. J., Cohen, E.I., & Zeil, S.J. (1981). A domain strategy for computer program testing. In Chandrasekaran, B., & Radicchi, S. (Ed.), Computer Program Testing (pp. 103-112). North Holland Publishing.
  • http://www.wikipedia.org/wiki/Stratified_sampling

Dumb monkey testing

  • Arnold, T. (1998), Visual Test 6. Wiley.
  • Nyman, N. (1998). Application testing with dumb monkeys. International Conference on Software Testing Analysis & Review (STAR West).
  • Nyman, N. (2000), Using monkey test tools. Software Testing & Quality Engineering, 2(1), 18-20
  • Nyman, N. (2004). In defense of monkey testing. http://www.softtest.org/sigs/material/nnyman2.htm

Eating your own dogfood

  • Page, A., Johnston, K., & Rollison, B.J. (2009). How We Test Software at Microsoft. Microsoft Press.

Equivalence class analysis (see Domain testing)

Experimental design

  • Popper, K.R. (2002, 2nd Ed.). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge.
  • Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd Ed. Wadsworth.

Exploratory testing

Failure mode analysis: see also Guidewords and Risk-Based Testing.

Feature integration testing

Function testing

Function equivalence testing

  • Hoffman, D. (2003). Exhausting your test options. Software Testing & Quality Engineering, 5(4), 10-11
  • Kaner, C., Falk, J., & Nguyen, H.Q. (2nd Edition, 2000). Testing Computer Software. Wiley.

Functional testing below the GUI

Guerilla testing

  • Kaner, C., Falk, J., & Nguyen, H.Q. (2nd Edition, 2000). Testing Computer Software. Wiley.

Guidewords

Installation testing

Interoperability testing

Load testing

Localization testing

  • Bolton, M. (2006, April). Where in the world? Better Software. http://www.developsense.com/articles/2006-04-WhereInTheWorld.pdf
  • Chandler, H.M. & Deming, S.O (2nd Ed. in press). The Game Localization Handbook. Jones & Bartlett Learning.
  • Ratzmann, M., & De Young, C. (2003). Galileo Computing: Software Testing and Internationalization. Lemoine International and the Localization Industry Standards Association. http://www.automation.org.uk/downloads/documentation/galileo_computing-software_testing.pdf
  • Savourel, Y. (2001). XML Internationalization and Localization. Sams Press.
  • Singh, N. & Pereira, A. (2005). The Culturally Customized Web Site: Customizing Web Sites for the Global Marketplace. Butterworth-Heinemann.
  • Smith-Ferrier, G. (2006). .NET Internationalization: The Developer’s Guide to Building Global Windows and Web Applications. Addison-Wesley Professional.
  • Uren, E., Howard, R. & Perinotti, T. (1993). Software Internationalization and Localization. Van Nostrand Reinhold.

Logical expression testing

  • Amman, P., & Offutt, J. (2008). Introduction to Software Testing. Cambridge University Press.
  • Beizer, B. (1990). Software Testing Techniques (2nd Ed.). Van Nostrand Reinhold.
  • Copeland, L. (2004). A Practitioner’s Guide to Software Test Design. Artech House (see Chapter 5 on decision tables).
  • Jorgensen, P. (2008, 3rd Ed.). Software Testing: A Craftsman’s Approach. Auerbach Publications (see Chapter 7 on decision tables).
  • Brian Marick (2000) modeled testing of logical expressions by considering common mistakes in designing/coding a series of related decisions. Testing for Programmers. http://www.exampler.com/testing-com/writings/half-day-programmer.pdf.
  • MULTI. Marick implemented his approach to testing logical expressions in a program, MULTI. Tim Coulter and his colleagues extended MULTI and published it (with Marick’s permission) at http://sourceforge.net/projects/multi/

Long-sequence testing

Mathematical oracle

See our notes in BBST Foundation’s presentation of Hoffman’s collection of oracles.

Numerical analysis (see Calculations)

Paired testing

Pairwise testing (see All-Pairs testing)

Performance testing

Programming or software design

  • Roberts, E. (2005, 20th Ed.). Thinking Recursively with Java. Wiley.

Psychological considerations

  • Bendor, J. (2005). The perfect is the enemy of the best: Adaptive versus optimal organizational reliability. Journal of Theoretical Politics. 17(1), 5-39.
  • Rohlman, D.S. (1992). The Role of Problem Representation and Expertise in Hypothesis Testing: A Software Testing Analogue. Ph.D. Dissertation, Bowling Green State University.
  • Teasley, B.E., Leventhal, L.M., Mynatt, C.R., & Rohlman, D.S. (1994). Why software testing is sometimes ineffective: Two applied studies of positive test strategy. Journal of Applied Psychology, 79(1), 142-155.
  • Whittaker, J.A. (2000). What is software testing? And why is it so hard? IEEE Software, Jan-Feb. 70-79.

Quicktests

Random testing

Regression testing

Requirements-based testing

Requirements-based testing (continued)

  • Whalen, M.W., Rajan, A., Heimdahl, M.P.E., & Miller, S.P. )2006). Coverage metrics for requirements-based testing. Proceedings of the 2006 International Symposium on Software Testing and Analysis. http://portal.acm.org/citation.cfm?id=1146242
  • Wiegers, K.E. (1999). Software Requirements. Microsoft Press.

Risk-based testing

  • Bach, J. (1999). Heuristic risk-based testing. Software Testing & Quality Engineering. http://www.satisfice.com/articles/hrbt.pdf
  • Bach, J. (2000a). Heuristic test planning: Context model. http://www.satisfice.com/tools/satisfice-cm.pdf
  • Bach, J. (2000b). SQA for new technology projects. http://www.satisfice.com/articles/sqafnt.pdf
  • Bach, J. (2003). Troubleshooting risk-based testing. Software Testing & Quality Engineering, May/June, 28-32. http://www.satisfice.com/articles/rbt-trouble.pdf
  • Becker, S.A. & Berkemeyer, A. (1999). The application of a software testing technique to uncover data errors in a database system. Proceedings of the 20th Annual Pacific Northwest Software Quality Conference, 173-183.
  • Berkovich, Y. (2000). Software quality prediction using case-based reasoning. M.Sc. Thesis (Computer Science). Florida Atlantic University.
  • Bernstein, P.L. (1998). Against the Gods: The Remarkable Story of Risk. Wiley.
  • Black, R. (2007). Pragmatic Software Testing: Becoming an Effective and Efficient Test Professional. Wiley.
  • Clemen, R.T. (1996, 2nd ed.) Making Hard Decisions: An Introduction to Decision Analysis. Cengage Learning.
  • Copeland, L. (2004). A Practitioner’s Guide to Software Test Design. Artech House.
  • DeMarco, T. & Lister, T. (2003). Waltzing with Bears. Managing Risk on Software Projects. Dorset House.
  • Dorner, D. (1997). The Logic of Failure. Basic Books.
  • Gerrard, P. & Thompson, N. (2002). Risk-Based E-Business Testing. Artech House.
  • HAZOP Guidelines (2008). Hazardous Industry Planning Advisory Paper No. 8, NSW Government Department of Planning. http://www.planning.nsw.gov.au/plansforaction/pdf/hazards/haz_hipap8_rev2008.pdf
  • Hillson, D. & Murray-Webster, R. (2007). Understanding and Managing Risk Attitude. (2nd Ed.). Gower. http://www.risk-attitude.com/
  • Hubbard, D.W. (2009). The Failure of Risk Management: Why It’s Broken and How to Fix It. Wiley.
  • Jorgensen, A.A. (2003). Testing with hostile data streams. ACM SIGSOFT Software Engineering Notes, 28(2). http://cs.fit.edu/media/TechnicalReports/cs-2003-03.pdf
  • Jorgensen, A.A. & Tilley, S.R. (2003). On the security risks of not adopting hostile data stream testing techniques. 3rd International Workshop on Adoption-Centric Software Engineering (ACSE 2003), p. 99-103. http://www.sei.cmu.edu/reports/03sr004.pdf
  • Kaner, C. (2008). Improve the power of your tests with risk-based test design. Quality Assurance Institute QUEST conference. http://www.kaner.com/pdfs/QAIriskKeynote2008.pdf
  • Kaner, C., Falk, J., & Nguyen, H.Q. (2nd Edition, 2000a). Testing Computer Software. Wiley.
  • Neumann, P.G. (undated). The Risks Digest: Forum on Risks to the Public in Computers and Related Systems. http://catless.ncl.ac.uk/risks
  • Perrow, C. (1999). Normal Accidents: Living with High-Risk Technologies. Princeton University Press (but read this in conjunction with Robert Hedges’ review of the book on Amazon.com).
  • Petroski, H. (1992). To Engineer is Human: The Role of Failure in Successful Design. Vintage.
  • Petroski, H. (2004). Small Things Considered: Why There is No Perfect Design. Vintage.
  • Petroski, H. (2008). Success Through Failure: The Paradox of Design. Princeton University Press.
  • Pettichord, B. (2001). The role of information in risk-based testing. International Conference on Software Testing Analysis & Review (STAR East). http://www.stickyminds.com
  • Reason, J. T.  (1997). Managing the Risks of Organizational Accident. Ashgate Publishing.
  • Schultz, C.P., Bryant, R., & Langdell, T. (2005). Game Testing All in One. Thomson Press (discussion of defect triggers).
  • Software Engineering Institute’s collection of papers on project management, with extensive discussion of project risks. https://seir.sei.cmu.edu/seir/
  • Weinberg, G. (1993). Quality Software Management. Volume 2: First Order Measurement. Dorset House.

Rounding errors (see Calculations)

Scenario testing (See also Use-case-based testing)

Self-verifying data

Specification-based testing (See also active reading; See also ambiguity analysis)

State-model-based testing

  • Auer, A.J. (1997). State Testing of Embedded Software. Ph.D. Dissertation (Computer Science). Oulun Yliopisto (Finland).
  • Becker, S.A. & Whittaker, J.A. (1997). Cleanroom Software Engineering Practices. IDEA Group Publishing.
  • Buwalda, H. (2003). Action figures. Software Testing & Quality Engineering. March/April 42-27. http://www.logigear.com/articles-by-logigear-staff/245-action-figures.html
  • El-Far, I. K. (1999), Automated Construction of Software Behavior Models, Masters Thesis, Florida Institute of Technology, 1999.
  • El-Far, I. K. & Whittaker, J.A. (2001), Model-based software testing, in Marciniak, J.J. (2001). Encyclopedia of Software Engineering, Wiley. http://testoptimal.com/ref/Model-based Software Testing.pdf
  • Jorgensen, A.A. (1999). Software Design Based on Operational Modes. Doctoral Dissertation, Florida Institute of Technology. https://cs.fit.edu/Projects/tech_reports/cs-2002-10.pdf
  • Katara, M., Kervinen, A., Maunumaa, M., Paakkonen, T., & Jaaskelainen, A. (2007). Can I have some model-based GUI tests please? Providing a model-based testing service through a web interface. Conference of the Association for Software Testing. http://practise.cs.tut.fi/files/publications/TEMA/cast07-final.pdf
  • Mallery, C.J. (2005). On the Feasibility of Using FSM Approaches to Test Large Web Applications. M.Sc. Thesis (EECS). Washington State University.
  • Page, A., Johnston, K., & Rollison, B.J. (2009). How We Test Software at Microsoft. Microsoft Press.
  • Robinson, H. (1999a). Finite state model-based testing on a shoestring. http://www.stickyminds.com/getfile.asp?ot=XML&id=2156&fn=XDD2156filelistfilename1%2Epdf
  • Robinson, H. (1999b). Graph theory techniques in model-based testing. International Conference on Testing Computer Software. http://sqa.fyicenter.com/art/Graph-Theory-Techniques-in-Model-Based-Testing.html
  • Robinson, H. Model-Based Testing Home Page. http://www.geocities.com/model_based_testing/
  • Rosaria, S., & Robinson, H. (2000). Applying models in your testing process. Information & Software Technology, 42(12), 815-24. http://www.harryrobinson.net/ApplyingModels.pdf
  • Schultz, C.P., Bryant, R., & Langdell, T. (2005). Game Testing All in One. Thomson Press.
  • Utting, M., & Legeard, B. (2007). Practical Model-Based Testing: A Tools Approach. Morgan Kaufmann.
  • Vagoun, T. (1994). State-Based Software Testing. Ph.D. Dissertation (Computer Science). University of Maryland College Park.
  • Whittaker, J.A. (1992). Markov Chain Techniques for Software Testing and Reliability Analysis. Ph.D. Dissertation (Computer Science). University of Tennessee.
  • Whittaker, J.A. (1997). Stochastic software testing. Annals of Software Engineering, 4, 115-131.

Stress testing

Task analysis (see also Scenario testing and Use-case-based testing)

  • Crandall, B., Klein, G., & Hoffman, R.B. (2006). Working Minds: A Practitioner’s Guide to Cognitive Task Analysis. MIT Press.
  • Draper, D. & Stanton, N. (2004). The Handbook of Task Analysis for Human-Computer Interaction. Lawrence Erlbaum.
  • Ericsson, K.A. & Simon, H.A. (1993). Protocol Analysis: Verbal Reports as Data (Revised Edition). MIT Press.
  • Gause, D.C., & Weinberg, G.M. (1989). Exploring Requirements: Quality Before Design. Dorset House.
  • Hackos, J.T. & Redish, J.C. (1998). User and Task Analysis for Interface Design. Wiley.
  • Jonassen, D.H., Tessmer, M., & Hannum, W.H. (1999). Task Analysis Methods for Instructional Design.
  • Robertson, S. & Robertson, J. C. (2006, 2nd Ed.). Mastering the Requirements Process. Addison-Wesley Professional.
  • Schraagen, J.M., Chipman, S.F., & Shalin, V.I. (2000). Cognitive Task Analysis. Lawrence Erlbaum.
  • Shepard, A. (2001). Hierarchical Task Analysis. Taylor & Francis.

Test design / test techniques (in general)

Test idea catalogs

Testing skill

Many of the references in this collection are about the development of testing skill. However, a few papers stand out, to me, as exemplars of papers that focus on activities or structures designed to help testers improve their day-to-day testing skills. We need more of these.

Tours

Usability testing

  • Cooper, A. (2004). The Inmates are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity. Pearson Education.
  • Cooper, A., Reimann, R. & Cronin, D. (2007). About Face 3: The Essentials of Interaction Design. Wiley.
  • Dumas, J.S. & Loring, B.A. (2008). Moderating Usability Tests: Principles and Practices for Interacting. Morgan Kaufmann.
  • Fiedler, R.L., & Kaner, C. (2009). “Putting the context in context-driven testing (an application of Cultural Historical Activity Theory).” Conference of the Association for Software Testing. http://www.kaner.com/pdfs/FiedlerKanerCast2009.pdf
  • Ives, B., Olson, M.H., & Baroudi, J.J. (1983). The measurement of user information systems. Communications of the ACM, 26(10), 785-793. http://portal.acm.org/citation.cfm?id=358430
  • Krug, S. (2005, 2nd Ed.). Don’t Make Me Think: A Common Sense Approach to Web Usability. New Riders Press.
  • Kuniavsky, M. (2003). Observing the User Experience: A Practitioner’s Guide to User Research. Morgan Kaufmann.
  • Lazar, J., Fend, J.H., & Hochheiser, H. (2010). Research Methods in Human-Computer Interaction. Wiley.
  • Nielsen, J. (1994). Guerrilla HCI: Using discount usability engineering to penetrate the intimidation barrier. http://www.useit.com/papers/guerrilla_hci.html
  • Nielsen, J. (1999). Designing Web Usability. Peachpit Press.
  • Nielson, J. & Loranger, H. (2006). Prioritize Web Usability. MIT Press.
  • Norman, D.A. (2010). Living with Complexity. MIT Press.
  • Norman, D.A. (1994). Things that Make Us Smart: Defending Human Attributes in the Age of the Machine. Basic Books.
  • Norman, D.A. & Draper, S.W. (1986). User Centered System Design: New Perspectives on Human-Computer Interaction. CRC Press.
  • Patel, M. & Loring, B. (2001). Handling awkward usability testing situations. Proceedings of the Human Factors and Ergonomics Society 45th Annual Meeting. 1772-1776.
  • Platt, D.S. (2006). Why Software Sucks. Addison-Wesley.
  • Rubin, J., Chisnell, D. & Spool, J. (2008). Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. Wiley.
  • Smilowitz, E.D., Darnell, M.J., & Benson, A.E. (1993). Are we overlooking some usability testing methods? A comparison of lab, beta, and forum tests. Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, 300-303.
  • Stone, D., Jarrett, C., Woodroffe, M. & Minocha, S. (2005). User Interface Design and Evaluation. Morgan Kaufmann.
  • Tullis, T. & Albert, W. (2008). Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics (Interactive Technologies). Morgan Kaufmann.

Use-case based testing (see also Scenario testing and Task analysis)

  • Adolph, S. & Bramble, P. (2003). Patterns for Effective Use Cases. Addison-Wesley.
  • Alexander, Ian & Maiden, Neil. Scenarios, Stories, Use Cases: Through the Systems Development Life-Cycle.
  • Alsumait, A. (2004). User Interface Requirements Engineering: A scenario-based framework. Ph.D. dissertation (Computer Science), Concordia University.
  • Berger, Bernie (2001) “The dangers of use cases employed as test cases,” STAR West conference, San Jose, CA. http://www.testassured.com/docs/Dangers.htm
  • Charles, F.A. (2009). Modeling scenarios using data. STP Magazine. http://www.quality-intelligence.com/articles/Modelling%20Scenarios%20Using%20Data_Paper_Fiona%20Charles_CAST%202009_Final.pdf
  • Cockburn, A.(2001). Writing Effective Use Cases. Addison-Wesley.
  • Cohn, M. (2004). User Stories Applied: For Agile Software Development. Pearson Education.
  • Collard, R. (July/August 1999). Test design: Developing test cases from use cases. Software Testing & Quality Engineering, 31-36.
  • Hsia, P., Samuel, J. Gao, J. Kung, D., Toyoshima, Y. & Chen, C. (1994). Formal approach to scenario analysis. IEEE Software, 11(2), 33-41.
  • Jacobson, I. (1995). The use-case construct in object-oriented software engineering. In John Carroll (ed.) (1995). Scenario-Based Design. Wiley.
  • Jacobson, I., Booch, G. & Rumbaugh, J. (1999). The Unified Software Development Process. Addison-Wesley.
  • Jacobson, I. & Bylund, S. (2000) The Road to the Unified Software Development Process. Cambridge University Press.
  • Kim, Y. C. (2000). A Use Case Approach to Test Plan Generation During Design. Ph.D. Dissertation (Computer Science). Illinois Institute of Technology.
  • Kruchten, P. (2003, 3rd Ed.). The Rational Unified Process: An Introduction. Addison-Wesley.
  • Samuel, J. (1994). Scenario analysis in requirements elicitation and software testing. M.Sc. Thesis (Computer Science), University of Texas at Arlington.
  • Utting, M., & Legeard, B. (2007). Practical Model-Based Testing: A Tools Approach. Morgan Kaufmann.
  • Van der Poll, J.A., Kotze, P., Seffah, A., Radhakrishnan, T., & Alsumait, A. (2003). Combining UCMs and formal methods for representing and checking the validity of scenarios as user requirements. Proceedings of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology. http://dl.acm.org/citation.cfm?id=954014.954021
  • Zielczynski, P. (2006). Traceability from use cases to test cases. http://www.ibm.com/developerworks/rational/library/04/r-3217/

User interface testing

User testing (see beta testing)

  • Albert, W., Tullis, T. & Tedesco, D. (2010). Beyond the Usability Lab: Conducting Large-Scale Online User Experience Studies. Morgan Kaufmann.
  • Wang, E., & Caldwell, B. (2002). An empirical study of usability testing: Heuristic evaluation vs. user testing. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting. 774-778.

This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.