WTST 2012: Workshop on Teaching Software Testing

November 27th, 2011

TEACHING SECURITY-RELATED SOFTWARE TESTING

WTST 2012: The 11th Annual Workshop on Teaching Software Testing

January 27-29, 2012

at the Harris Institute for Assured Information

Florida Institute of Technology, Melbourne, Florida

http://www.wtst.org.

Software testing is often described as a central part of software security, but it has a surprisingly small role in security-related curricula. Over the next 5 years, we hope to change this. If there is sufficient interest, we hope to focus WTSTs 2012-2016 on instructional support for teaching security-related testing.

OUR GOALS FOR WTST 2012

  • Survey the domain: What should we consider as part of “security-related software testing”?
  • Cluster the domain: What areas of security-related testing would fit well together in the same course?
  • Characterize some of the key tasks:
    • Some types of work are (or should be) routine. To do them well, an organization needs a clearly defined, repeatable process that is easy to delegate.
    • Other types are cognitively complex. Their broader goals might stay stable, but the details constant change as circumstances and threats evolve.
    • And other types are centered on creating, maintaining and extending technology, such as tools to support testing.
  • Publish this overview (survey / clustering / characterization)
  • Apply for instructional development grants. We (CSTER) intend to apply for funding. We hope to collaborate with other institutions and practitioners and we hope to foster other collaborations that lead to proposals that are independent of CSTER.

UNDERLYING VIEWPOINT

The Workshop on Teaching Software Testing is concerned with the practical aspects of teaching university-caliber software testing courses to academic or commercial students.

We see software testing as a cognitively complex activity, an active search for quality-related information rather than a tedious collection of routines. We see it as more practical than theoretical, more creative than prescriptive, more focused on investigation than assurance (you can’t prove a system is secure by testing it), more technical than managerial, and more interested in exploring risks than defining processes.

We think testing is too broad an area to cover fully in a single course. A course that tries to teach too much will be too superficial to have any real value. Rather than designing a single course to serve as a comprehensive model, we think the field is better served with several designs for several courses.

We are particularly interested in online courses that promote deeper knowledge and skill. You can see our work on software testing at http://www.testingeducation.org/BBST. Online courses and courseware, especially Creative Commons courseware, make it possible for students to learn multiple perspectives and to study new topics and learn new skills on a schedule that works for them.

WHO SHOULD ATTEND

We invite participation by:

  • academics who have experience teaching courses on testing or security
  • practitioners who teach professional seminars on software testing or security
  • one or two graduate students
  • a few seasoned teachers or testers who are beginning to build their strengths in teaching software testing or security.

There is no fee to attend this meeting. You pay for your seat through the value of your participation. Participation in the workshop is by invitation based on a proposal. We expect to accept 15 participants with an absolute upper bound of 22.

TO APPLY TO ATTEND

Send an email to Cem Kaner (kaner@cs.fit.edu) by December 20, 2011.

Your email should describe your background and interest in teaching software testing or security. What skills or knowledge do you bring to the meeting that would be of interest to the other participants?

If you are willing to make a presentation, send an abstract. Along with describing the proposed concepts and/or activities, tell us how long the presentation will take, any special equipment needs, and what written materials you will provide. Along with traditional presentations, we will gladly consider proposed activities and interactive demonstrations.

We will begin reviewing proposals on December 1. We encourage early submissions. It is unlikely but possible that we will have accepted a full set of presentation proposals by December 20

Proposals should be between two and four pages long, in PDF format. We will post accepted proposals to http://www.wtst.org.

We review proposals in terms of their contribution to knowledge of HOW TO TEACH software testing and security. We will not accept proposals that present a theoretical advance with weak ties to teaching and application. Presentations that reiterate materials you have presented elsewhere might be welcome, but it is imperative that you identify the publication history of such work.

By submitting your proposal, you agree that, if we accept your proposal, you will submit a scholarly paper for discussion at the workshop by January 15, 2010. Workshop papers may be of any length and follow any standard scholarly style. We will post these at http://www.wtst.org as they are received, for workshop participants to review before the workshop.

HOW THE MEETING WILL WORK

WTST is a workshop, not a typical conference.

  • We will have a few presentations, but the intent of these is to drive discussion rather than to create an archivable publication.
    • We are glad to start from already-published papers, if they are presented by the author and they would serve as a strong focus for valuable discussion.
    • We are glad to work from slides, mindmaps, or diagrams.
  • Some of our sessions will be activities, such as brainstorming sessions, collaborative searching for information, creating examples, evaluating ideas or workproducts and lightning presentations (presentations limited to 5-minutes, plus discussion).
  • In a typical presentation, the presenter speaks 10 to 90 minutes, followed by discussion. There is no fixed time for discussion. Past sessions’ discussions have run from 1 minute to 4 hours. During the discussion, a participant might ask the presenter simple or detailed questions, describe consistent or contrary experiences or data, present a different approach to the same problem, or (respectfully and collegially) argue with the presenter.

Our agenda will evolve during the workshop. If we start making significant progress on something, we are likely to stick with it even if that means cutting or timeboxing some other activities or presentations.

Presenters must provide materials that they share with the workshop under a Creative Commons license, allowing reuse by other teachers. Such materials will be posted at http://www.wtst.org.

HOSTS

The hosts of the meeting are:

LOCATION AND TRAVEL INFORMATION

We will hold the meetings at

Harris Center for Assured Information, Room 327

Florida Tech, 150 W University Blvd,

Melbourne, FL

Airport

Melbourne International Airport is 3 miles from the hotel and the meeting site. It is served by Delta Airlines and US Airways. Alternatively, the Orlando International Airport offers more flights and more non-stops but is 65 miles from the meeting location.

Hotel

We recommend the Courtyard by Marriott – West Melbourne located at 2101 W. New Haven Avenue in Melbourne, FL.

Please call 1-800-321-2211 or 321-724-6400 to book your room by January 2. Be sure to ask for the special WTST rates of $89 per night. Tax is an additional 11%.

All reservations must be guaranteed with a credit card by January 2, 2010 at 6:00 pm. If rooms are not reserved, they will be released for general sale. Following that date reservations can only be made based upon availability.

For additional hotel information, please visit the http://www.wtst.org or the hotel website at http://www.marriott.com/hotels/travel/mlbch-courtyard-melbourne-west/

OUR INTELLECTUAL PROPERTY AGREEMENT

We expect to publish some outcomes of this meeting. Each of us will probably have our own take on what was learned. Participants (all people in the room) agree to the following:

  • Any of us can publish the results as we see them. None of us is the official reporter of the meeting unless we decide at the meeting that we want a reporter.
  • Any materials initially presented at the meeting or developed at the meeting may be posted to any of our web sites or quoted in any other of our publications, without further permission. That is, if I write a paper, you can put it on your web site. If you write a problem, I can put it on my web site. If we make flipchart notes, those can go up on the web sites too. None of us has exclusive control over this material. Restrictions of rights must be identified on the paper itself.
    • NOTE: Some papers are circulated that are already published or are headed to another publisher. If you want to limit republication of a paper or slide set, please note the rights you are reserving on your document. The shared license to republish is our default rule, which applies in the absence of an asserted restriction.
  • The usual rules of attribution apply. If you write a paper or develop an idea or method, anyone who quotes or summarizes you work should attribute it to you. However, many ideas will develop in discussion and will be hard (and not necessary) to attribute to one person.
  • Any publication of the material from this meeting will list all attendees as contributors to the ideas published as well as the hosting organization.
  • Articles should be circulated to WTST-2012 attendees before being published when possible. At a minimum, notification of publication will be circulated.
  • Any attendee may request that his or her name be removed from the list of attendees identified on a specific paper.
  • If you have information which you consider proprietary or otherwise shouldn’t be disclosed in light of these publication rules, please do not reveal that information to the group.

ACKNOWLEDGEMENTS

Support for this meeting comes from the Harris Institute for Assured Information at the Florida Institute of Technology, and Kaner, Fiedler & Associates, LLC.

Funding for WTST 1-5 came primarily from the National Science Foundation , under grant EIA-0113539 ITR/SY+PE “Improving the Education of Software Testers.” Partical funding for the Advisory Board meetings in WTST 6-10 came from the the National Science Foundation, under grant CCLI-0717613 “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing”.

Opinions expressed at WTST or published in connection with WTST do not recessarily reflect the views of NSF.

WTST is a peer conference in the tradition of the Los Altos Workshops of Software Testing.

 

Please update your links to this blog

November 20th, 2011

A few new posts will be coming soon. I’m hoping they won’t be missed.

I moved my blog to https://kaner.com over a year ago — If you’re still linking to the old site, please update it.

Thanks!

A welcome addition to the scholarship of exploratory software testing

November 16th, 2011

Juha Itkonen will be defending his dissertation on “Empirical Studies on Exploratory Software Testing” this Friday. I haven’t read the entire document, but what I have read looks very interesting.

Juha has been studying real-world approaches to software testing for about a decade (maybe longer–when I met him almost a decade ago, his knowledge of the field was quite sophisticated). I’m delighted to see this quality of academic work and wish him well in his final oral exam.

For a list of soon-to-be-Dr. Itkonen’s publications, see https://wiki.aalto.fi/display/~jitkonen@aalto.fi/Full+list+of+publications.

Emphasis & Objectives of the Test Design Course

October 8th, 2011

Becky and I are getting closer to rolling out Test Design. Here’s our current summary of the course:

Learning Objectives for Test Design

This is an introductory survey of test design. The course introduces students to:

  • Many (over 100) test techniques at a superficial level (what the technique is).
  • A detailed-level of familiarity with a few techniques:
    • function testing
    • testing tours
    • risk-based testing
    • specification-based testing
    • scenario testing
    • domain testing
    • combination testing.
  • Ways to compare strengths of different techniques and select complementary techniques to form an effective testing strategy
  • Using the Heuristic Test Strategy Model for specification analysis and risk analysis
  • Using concept mapping tools for test planning.

I’m still spending about 6 hours per night on video edits, but our most important work is on the assessments. To a very large degree, my course designs are driven by my assessments. That’s because there’s such a strong conviction in the education community–which I share–that students learn much more from the assessments (from all the activities that demand that they generate stuff and get feedback on it) than from lectures or informal discussions. The lectures and slides are an enabling backdrop for the students’ activities, rather than the core of the course.

In terms of design decisions, deciding what I will hold my students accountable for knowing requires me to decide what I will hold myself accountable for teaching well.

If you’re intrigued by that way of thinking about course design, check out:

I tested the course’s two main assignments in university classrooms several times before starting on the course slides (and wrote 104 first-draft multiple-quess questions and maybe 200 essay questions). But now that the course content is almost complete, we’re revisiting (and of course rewriting) these materials. In the process, we’ve been gaining perspective.

I think the most striking feature of the new course is its emphasis on content.

Let me draw the contrast with a chart that compares the BBST courses (Foundations, Bug Advocacy, and Test Design) and some other courses still on the drawing boards:

A few definitions:

  • Course Skills: How to be an effective student. Working effectively in online courses. Taking tests. Managing your time.
  • Social Skills: Working together in groups. Peer reviews. Using collaboration tools (e.g. wikis).
  • Learning Skills: How to gather, understand, organize and be able to apply new information. Using lectures, slides, and readings effectively. Searching for supplementary information. Using these materials to form and defend your own opinion.
  • Testing Knowledge: Definitions. Facts and concepts of testing. Structures for organizing testing knowledge.
  • Testing Skills: How to actually do things. Getting better (through practice and feedback) at actually doing them.
  • Computing Fundamentals: Facts and concepts of computer science and computing-relevant discrete mathematics.

As we designed the early courses, Becky Fiedler and I placed a high emphasis on course skills and learning skills. Students needed to (re)learn how to get value from online video instruction, how to take tests, how to give peer-to-peer feedback, etc.

The second course, Bug Advocacy, emphasizes specific testing skills–but the specific skills are the ones involved in bug investigation and reporting. Even though these critical thinking, research, and communication skill have strong application to testing, they are foundational for knowledge-related work.

Test Design is much more about the content (testing knowledge). We survey (depends on how you count) 70 to 150 test techniques. We look for ways to compare and contrast them. We consider how to organize projects around combinations of a few techniques that complement each other (make up for each other’s weaknesses and blindnesses). The learning skills component is active reading–This is certainly generally useful, but its context and application is specification analysis.

Test Design is more like the traditional Software Testing Course firehose. Way too much material in way too little time, with lots of reference material to help students explore the underemphasized parts of the course when they need it on the job.

The difference is that we are relying on the students’ improved learning skills. The assignments are challenging. The labs are still works-in-progress and might not be polished until the third iteration of the course, but labs-plus-assignments being home a bunch of lessons.

Whether the students’ skills are advanced enough to process over 500 slides efficiently, integrate material across sections, integrate required readings, and apply them to software — all within the course’s 4-week timeframe — remains to be seen.

This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

Learning Objectives of the BBST Courses

September 19th, 2011

As I finish up the post-production and assessment-design for the Test Design course, I’m writing these articles as a set of retrospectives on the instructional design of the series.

For me, this is a transition point. The planned BBST series is complete with lessons to harvest as we create courseware on software metrics, development of skills with specific test techniques, computer-related law/ethics (in progress), cybersecurity, research methods and instrumentation applied to quantitative finance, qualitative research methods, and analysis of requirements.

Instructional Design

I think course design is a multidimensional challenge, focused around seven core questions:

    1. Content: What information, or types of information do I want the students to learn?
    2. Skills: What skills do I want the students to develop?
    3. Level of Learning: What level of depth do I want to students to learn this material at?
    4. Learning activities: How will the course’s activities (including the assessments, such as assignments and exams) that I use support the students’ learning?
    5. Instructional Technologies: What technologies will I use to support the course?
    6. Assessment: How will I assess the course: How will I find out what the students have learned, and at what level?
    7. Improvement: How will I use the assessment results to guide my improvement of the course?

This collection of articles will probably skip around in these questions as I take inventory of my last 7 years of notes on online and hybrid course development.

Objectives of the BBST Courses

The changing nature of the objectives of the BBST courses.

Courses differ in more than content. They differ in the other things you hope students learn along with the content.

The BBST courses include a variety of types of assessment: quizzes, labs, assignments and exams. For instructional designers, the advantage of assessments is that we can find out what the students know (and don’t know) and what they can apply.

The BBST courses have gone through several iterations. Becky Fiedler and I used the performance data to guide the evolutions.

Based on what we learned, we place a higher emphasis in the early courses on course skills and learning skills and a greater emphasis in the later courses on testing skills.

A few definitions:

  • Course Skills: How to be an effective student. Working effectively in online courses. Taking tests. Managing your time.
  • Social Skills: Working together in groups. Peer reviews. Using collaboration tools (e.g. wikis).
  • Learning Skills: How to gather, understand, organize and be able to apply new information. Using lectures, slides, and readings effectively. Searching for supplementary information. Using these materials to form and defend your own opinion.
  • Testing Knowledge: Definitions. Facts and concepts of testing. Structures for organizing testing knowledge.
  • Testing Skills: How to actually do things. Getting better (through practice and feedback) at actually doing them.
  • Computing Fundamentals: Facts and concepts of computer science and computing-relevant discrete mathematics.

You can see the evolution of emphasis in the course’s specific learning objectives.

Learning Objectives of the 3-Course BBST Set

  • Understand key testing challenges that demand thoughtful tradeoffs by test designers and managers.
  • Develop skills with several test techniques.
  • Choose effective techniques for a given objective under your constraints.
  • Improve the critical thinking and rapid learning skills that underlie good testing.
  • Communicate your findings effectively.
  • Work effectively online with remote collaborators.
  • Plan investments (in documentation, tools, and process improvement) to meet your actual needs.
  • Create work products that you can use in job interviews to demonstrate testing skill.

Learning Objectives for the First Course (Foundations)

This is the first of the BBST series. We address:

  • How to succeed in online classes
  • Fundamental concepts and definitions
  • Fundamental challenges in software testing

Improve academic skills

  • Work with online collaboration tools
    • Forums
    • Wikis
  • Improve students’ precision in reading
  • Create clear, well-structured communication
  • Provide (and accept) effective peer reviews
  • Cope calmly and effectively with formative assessments (such as tests designed to help students learn).

Learn about testing

  • Key challenges of testing
    • Information objectives drive the testing mission and strategy
    • Oracles are heuristic
    • Coverage is multidimensional
    • Complete testing is impossible
    • Measurement is important, but hard
  • Introduce you to:Basic vocabulary of the field
    • Basic facts of data storage and manipulation in computing
    • Diversity of viewpoints
    • Viewpoints drive vocabulary

Learning Objectives for the Second Course (Bug Advocacy)

Bug reports are not just neutral technical reports. They are persuasive documents. The key goal of the bug report author is to provide high-quality information, well written, to help stakeholders make wise decisions about which bugs to fix.

Key aspects of the content of this course include:

  • Defining key concepts (such as software error, quality, and the bug processing workflow)
  • The scope of bug reporting (what to report as bugs, and what information to include)
  • Bug reporting as persuasive writing
  • Bug investigation to discover harsher failures and simpler replication conditions
  • Excuses and reasons for not fixing bugs
  • Making bugs reproducible
  • Lessons from the psychology of decision-making: bug-handling as a multiple-decision process dominated by heuristics and biases.
  • Style and structure of well-written reports

Our learning objectives include this content, plus improving your abilities / skills to:

  • Evaluate bug reports written by others
  • Revise / strengthen reports written by others
  • Write more persuasively (considering the interests and concerns of your audience)
  • Participate effectively in distributed, multinational workgroup projects that are slightly more complex than the one in Foundations

Learning Objectives for the Third Course (Test Design)

This is an introductory survey of test design. The course introduces students to:

  • Many (nearly 200) test techniques at a superficial level (what the technique is).
  • A detailed-level of familiarity with a few techniques:
    • function testing
    • testing tours
    • risk-based testing
    • specification-based testing
    • scenario testing
    • domain testing
    • combination testing.
  • Ways to compare strengths of different techniques and select complementary techniques to form an effective testing strategy
  • Using the Heuristic Test Strategy Model for specification analysis and risk analysis
  • Using concept mapping tools for test planning.

 
This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

A New Course on Test Design: The Bibliography

September 13th, 2011

Back in 2004, I started developing course videos on software testing and computer-related law/ethics. Originally, these were for my courses at Florida Tech, but I published them under a Creative Commons license so that people could incorporate the materials in their own courses.

Soon after that, a group of us (mainly, I think, Scott Barber, Doug Hoffman, Mike Kelly, Pat McGee, Hung Nguyen, Andy Tinkham, and Ben Simo) started planning the repurposing of the academic course videos for professional development. I put together some (failing) prototypes and Becky Fiedler took over the instructional design.

  • We published the first version of BBST-Foundations and taught the courses through AST (Association for Software Testing). It had a lot of rough edges, but people liked it at lot.
  • So Becky and I created course #2, Bug Advocacy, with a lot of help from Scott Barber (and many other colleagues). This was new material, a more professional effort than Foundations, but it took a lot of time.

That took us to a fork in the road.

  • I was working with students on developing skills with specific techniques (I worked with Giri Vijayaraghavan and Ajay Jha on risk-based testing; Sowmya Padmanabhan on domain testing; and several students with not-quite-successful efforts on scenario testing). Sowmya and I proved (not that we were trying to) that developing students’ testing skills was more complex than I’d been thinking. So, Becky and I were rethinking our skills-development-course designs.
  • On the other hand, AST’s Board wanted to pull together a “complete” introductory series in black box testing. Ultimately, we went that way.

The goal was a three-part series:

  1. A reworked Foundations that fixed many of the weaknesses of Version 1. We completed that one about a year ago (Becky and Doug Hoffman were my main co-creators, with a lot of design guidance from Scott Barber).
  2. Bug Advocacy, and
  3. a new course in Test Design.

Test Design is (finally) almost done (many thanks to Michael Bolton and Doug Hoffman). I’ll publish the lectures as we finish post-production on the videos. Lecture 1 should be up around Saturday.

Test Design is a survey course. We cover a lot of ground. And we rely heavily on references, because we sure don’t know everything there is to know about all these techniques.

To support the release of the videos, I’m publishing our references now. (The final course slides will have the references too, but those won’t be done until we complete editing the last video.):

  • As always, it has been tremendously valuable reading books and papers suggested by colleagues and rereading stuff I’ve read before. A lot of careful thinking has gone into the development and analysis of these techniques.
  • As always, I’ve learned a lot from people whose views differ strongly from my own. Looking for the correctness in their views–what makes them right, within their perspective and analysis, even if I disagree with that perspective–is something I see as a basic professional skill.
  • And as always, I’ve not only learned new things: I’ve discovered that several things I thought I knew were outdated or wrong. I can be confident that the video is packed with errors–but plenty fewer than there would have been a year ago and none that I know about now.

So… here’s the reference list. Video editing will take a few weeks to complete–if you think we should include some other sources, please let me know. I’ll read them and, if appropriate, I’ll gladly include them in the list.

Active reading (see also Specification-based testing and Concept mapping)

All-pairs testing

See http://www.pairwise.org/ for more references generally and http://www.pairwise.org/tools.asp for a list of tools.

Alpha testing

See references on tests by programmers of their own code, or on relatively early testing by development groups. For a good overview from the viewpoint of the test group, see Schultz, C.P., Bryant, R., & Langdell, T. (2005). Game Testing All in One. Thomson Press

Ambiguity analysis (See also specification-based testing)

Best representative testing (See domain testing)

Beta testing

Boundary testing (See domain testing)

Bug bashes

Build verification

  • Guckenheimer, S. & Perez, J. (2006). Software Engineering with Microsoft Visual Studio Team System. Addison Wesley.
  • Page, A., Johnston, K., & Rollison, B.J. (2009). How We Test Software at Microsoft. Microsoft Press.
  • Raj, S. (2009). Maximize your investment in automation tools. Software Testing Analysis & Review. http://www.stickyminds.com

Calculations

Note: There is a significant, relevant field: Numerical Analysis. The list here merely points you to a few sources I have personally found helpful, not necessarily to the top references in the field.

Combinatorial testing. See All-Pairs Testing

Concept mapping

  • Hyerle, D.N. (2008, 2nd Ed.). Visual Tools for Transforming Information into Knowledge, Corwin.
  • Margulies, N., & Maal, N. (2001, 2nd Ed.) Mapping Inner Space: Learning and Teaching Visual Mapping. Corwin.
  • McMillan, D. (2010). Tales from the trenches: Lean test case design. http://www.bettertesting.co.uk/content/?p=253
  • McMillan, D. (2011). Mind Mapping 101. http://www.bettertesting.co.uk/content/?p=956
  • Moon, B.M., Hoffman, R.R., Novak, J.D., & Canas, A.J. (Eds., 2011). Applied Concept Mapping: Capturing, Analyzing, and Organizing Knowledge. CRC Press.
  • Nast, J. (2006). Idea Mapping: How to Access Your Hidden Brain Power, Learn Faster, Remember More, and Achieve Success in Business. Wiley.
  • Sabourin, R. (2006). X marks the test case: Using mind maps for software design. Better Software. http://www.stickyminds.com/BetterSoftware/magazine.asp?fn=cifea&id=90

Concept mapping tools:

Configuration coverage

Configuration / compatibility testing

Constraint checks

See our notes in BBST Foundation’s presentation of Hoffman’s collection of oracles.

Constraints

Diagnostics-based testing

  • Al-Yami, A.M. (1996). Fault-Oriented Automated Test Data Generation. Ph.D. Dissertation, Illinois Institute of Technology.
  • Kaner, C., Bond, W.P., & McGee, P.(2004). High volume test automation. Keynote address: International Conference on Software Testing Analysis & Review (STAR East 2004). Orlando. https://kaner.com/pdfs/HVAT_STAR.pdf (The Telenova and Mentsville cases are both examples of diagnostics-based testing.)

Domain testing

  • Abramowitz & Stegun (1964), Handbook of Mathematical Functions. http://people.math.sfu.ca/~cbm/aands/frameindex.htm
  • Beizer, B. (1990). Software Testing Techniques (2nd Ed.). Van Nostrand Reinhold.
  • Beizer, B. (1995). Black-Box Testing. Wiley.
  • Binder, R. (2000). Testing Object-Oriented Systems: Addison-Wesley.
  • Black, R. (2009). Using domain analysis for testing. Quality Matters, Q3, 16-20. http://www.rbcs-us.com/images/documents/quality-matters-q3-2009-rb-article.pdf
  • Copeland, L. (2004). A Practitioner’s Guide to Software Test Design. Artech House.
  • Clarke, L.A. (1976). A system to generate test data and symbolically execute programs. IEEE Transactions on Software Engineering, 2, 208-215.
  • Clarke, L. A. Hassel, J., & Richardson, D. J. (1982). A close look at domain testing. IEEE Transactions on Software Engineering, 2, 380-390.
  • Craig, R. D., & Jaskiel, S. P. (2002). Systematic Software Testing. Artech House.
  • Hamlet, D. & Taylor, R. (1990). Partition testing does not inspire confidence. IEEE Transactions on Software Engineering, 16(12), 1402-1411.
  • Hayes, J.H. (1999). Input Validation Testing: A System-Level, Early Lifecycle Technique. Ph.D. Dissertation (Computer Science), George Mason University.
  • Howden, W. E. (1980). Functional testing and design abstractions. Journal of Systems & Software, 1, 307-313.
  • Jeng, B. & Weyuker, E.J. (1994). A simplified domain-testing strategy. ACM Transactions on Software Engineering, 3(3), 254-270.
  • Jorgensen, P. C. (2008). Software Testing: A Craftsman’s Approach (3rd ed.). Taylor & Francis.
  • Kaner, C. (2004a). Teaching domain testing: A status report. Paper presented at the Conference on Software Engineering Education & Training. https://kaner.com/pdfs/teaching_sw_testing.pdf
  • Kaner, C., Padmanabhan, S., & Hoffman, D. (2012) Domain Testing: A Workbook, in preparation.
  • Myers, G. J. (1979). The Art of Software Testing. Wiley.
  • Ostrand, T. J., & Balcer, M. J. (1988). The category-partition method for specifying and generating functional tests. Communications of the ACM, 31(6), 676-686.
  • Padmanabhan, S. (2004). Domain Testing: Divide and Conquer. M.Sc. Thesis, Florida Institute of Technology. http://www.testingeducation.org/a/DTD&C.pdf
  • Schroeder, P.J. (2001). Black-box test reduction using input-output analysis. Ph.D. Dissertation (Computer Science). Illinois Institute of Technology.
  • Weyuker, E. J., & Jeng, B. (1991). Analyzing partition testing strategies. IEEE Transactions on Software Engineering, 17(7), 703-711.
  • Weyuker, E.J., & Ostrand, T.J. (1980). Theories of program testing and the application of revealing subdomains. IEEE Transactions on Software Engineering, 6(3), 236-245.
  • White, L. J., Cohen, E.I., & Zeil, S.J. (1981). A domain strategy for computer program testing. In Chandrasekaran, B., & Radicchi, S. (Ed.), Computer Program Testing (pp. 103-112). North Holland Publishing.
  • http://www.wikipedia.org/wiki/Stratified_sampling

Dumb monkey testing

  • Arnold, T. (1998), Visual Test 6. Wiley.
  • Nyman, N. (1998). Application testing with dumb monkeys. International Conference on Software Testing Analysis & Review (STAR West).
  • Nyman, N. (2000), Using monkey test tools. Software Testing & Quality Engineering, 2(1), 18-20
  • Nyman, N. (2004). In defense of monkey testing. http://www.softtest.org/sigs/material/nnyman2.htm

Eating your own dogfood

  • Page, A., Johnston, K., & Rollison, B.J. (2009). How We Test Software at Microsoft. Microsoft Press.

Equivalence class analysis (see Domain testing)

Experimental design

  • Popper, K.R. (2002, 2nd Ed.). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge.
  • Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd Ed. Wadsworth.

Exploratory testing

Failure mode analysis: see also Guidewords and Risk-Based Testing.

Feature integration testing

Function testing

Function equivalence testing

  • Hoffman, D. (2003). Exhausting your test options. Software Testing & Quality Engineering, 5(4), 10-11
  • Kaner, C., Falk, J., & Nguyen, H.Q. (2nd Edition, 2000). Testing Computer Software. Wiley.

Functional testing below the GUI

Guerilla testing

  • Kaner, C., Falk, J., & Nguyen, H.Q. (2nd Edition, 2000). Testing Computer Software. Wiley.

Guidewords

Installation testing

Interoperability testing

Load testing

Localization testing

  • Bolton, M. (2006, April). Where in the world? Better Software. http://www.developsense.com/articles/2006-04-WhereInTheWorld.pdf
  • Chandler, H.M. & Deming, S.O (2nd Ed. in press). The Game Localization Handbook. Jones & Bartlett Learning.
  • Ratzmann, M., & De Young, C. (2003). Galileo Computing: Software Testing and Internationalization. Lemoine International and the Localization Industry Standards Association. http://www.automation.org.uk/downloads/documentation/galileo_computing-software_testing.pdf
  • Savourel, Y. (2001). XML Internationalization and Localization. Sams Press.
  • Singh, N. & Pereira, A. (2005). The Culturally Customized Web Site: Customizing Web Sites for the Global Marketplace. Butterworth-Heinemann.
  • Smith-Ferrier, G. (2006). .NET Internationalization: The Developer’s Guide to Building Global Windows and Web Applications. Addison-Wesley Professional.
  • Uren, E., Howard, R. & Perinotti, T. (1993). Software Internationalization and Localization. Van Nostrand Reinhold.

Logical expression testing

  • Amman, P., & Offutt, J. (2008). Introduction to Software Testing. Cambridge University Press.
  • Beizer, B. (1990). Software Testing Techniques (2nd Ed.). Van Nostrand Reinhold.
  • Copeland, L. (2004). A Practitioner’s Guide to Software Test Design. Artech House (see Chapter 5 on decision tables).
  • Jorgensen, P. (2008, 3rd Ed.). Software Testing: A Craftsman’s Approach. Auerbach Publications (see Chapter 7 on decision tables).
  • Brian Marick (2000) modeled testing of logical expressions by considering common mistakes in designing/coding a series of related decisions. Testing for Programmers. http://www.exampler.com/testing-com/writings/half-day-programmer.pdf.
  • MULTI. Marick implemented his approach to testing logical expressions in a program, MULTI. Tim Coulter and his colleagues extended MULTI and published it (with Marick’s permission) at http://sourceforge.net/projects/multi/

Long-sequence testing

Mathematical oracle

See our notes in BBST Foundation’s presentation of Hoffman’s collection of oracles.

Numerical analysis (see Calculations)

Paired testing

Pairwise testing (see All-Pairs testing)

Performance testing

Programming or software design

  • Roberts, E. (2005, 20th Ed.). Thinking Recursively with Java. Wiley.

Psychological considerations

  • Bendor, J. (2005). The perfect is the enemy of the best: Adaptive versus optimal organizational reliability. Journal of Theoretical Politics. 17(1), 5-39.
  • Rohlman, D.S. (1992). The Role of Problem Representation and Expertise in Hypothesis Testing: A Software Testing Analogue. Ph.D. Dissertation, Bowling Green State University.
  • Teasley, B.E., Leventhal, L.M., Mynatt, C.R., & Rohlman, D.S. (1994). Why software testing is sometimes ineffective: Two applied studies of positive test strategy. Journal of Applied Psychology, 79(1), 142-155.
  • Whittaker, J.A. (2000). What is software testing? And why is it so hard? IEEE Software, Jan-Feb. 70-79.

Quicktests

Random testing

Regression testing

Requirements-based testing

Requirements-based testing (continued)

  • Whalen, M.W., Rajan, A., Heimdahl, M.P.E., & Miller, S.P. )2006). Coverage metrics for requirements-based testing. Proceedings of the 2006 International Symposium on Software Testing and Analysis. http://portal.acm.org/citation.cfm?id=1146242
  • Wiegers, K.E. (1999). Software Requirements. Microsoft Press.

Risk-based testing

  • Bach, J. (1999). Heuristic risk-based testing. Software Testing & Quality Engineering. http://www.satisfice.com/articles/hrbt.pdf
  • Bach, J. (2000a). Heuristic test planning: Context model. http://www.satisfice.com/tools/satisfice-cm.pdf
  • Bach, J. (2000b). SQA for new technology projects. http://www.satisfice.com/articles/sqafnt.pdf
  • Bach, J. (2003). Troubleshooting risk-based testing. Software Testing & Quality Engineering, May/June, 28-32. http://www.satisfice.com/articles/rbt-trouble.pdf
  • Becker, S.A. & Berkemeyer, A. (1999). The application of a software testing technique to uncover data errors in a database system. Proceedings of the 20th Annual Pacific Northwest Software Quality Conference, 173-183.
  • Berkovich, Y. (2000). Software quality prediction using case-based reasoning. M.Sc. Thesis (Computer Science). Florida Atlantic University.
  • Bernstein, P.L. (1998). Against the Gods: The Remarkable Story of Risk. Wiley.
  • Black, R. (2007). Pragmatic Software Testing: Becoming an Effective and Efficient Test Professional. Wiley.
  • Clemen, R.T. (1996, 2nd ed.) Making Hard Decisions: An Introduction to Decision Analysis. Cengage Learning.
  • Copeland, L. (2004). A Practitioner’s Guide to Software Test Design. Artech House.
  • DeMarco, T. & Lister, T. (2003). Waltzing with Bears. Managing Risk on Software Projects. Dorset House.
  • Dorner, D. (1997). The Logic of Failure. Basic Books.
  • Gerrard, P. & Thompson, N. (2002). Risk-Based E-Business Testing. Artech House.
  • HAZOP Guidelines (2008). Hazardous Industry Planning Advisory Paper No. 8, NSW Government Department of Planning. http://www.planning.nsw.gov.au/plansforaction/pdf/hazards/haz_hipap8_rev2008.pdf
  • Hillson, D. & Murray-Webster, R. (2007). Understanding and Managing Risk Attitude. (2nd Ed.). Gower. http://www.risk-attitude.com/
  • Hubbard, D.W. (2009). The Failure of Risk Management: Why It’s Broken and How to Fix It. Wiley.
  • Jorgensen, A.A. (2003). Testing with hostile data streams. ACM SIGSOFT Software Engineering Notes, 28(2). http://cs.fit.edu/media/TechnicalReports/cs-2003-03.pdf
  • Jorgensen, A.A. & Tilley, S.R. (2003). On the security risks of not adopting hostile data stream testing techniques. 3rd International Workshop on Adoption-Centric Software Engineering (ACSE 2003), p. 99-103. http://www.sei.cmu.edu/reports/03sr004.pdf
  • Kaner, C. (2008). Improve the power of your tests with risk-based test design. Quality Assurance Institute QUEST conference. https://kaner.com/pdfs/QAIriskKeynote2008.pdf
  • Kaner, C., Falk, J., & Nguyen, H.Q. (2nd Edition, 2000a). Testing Computer Software. Wiley.
  • Neumann, P.G. (undated). The Risks Digest: Forum on Risks to the Public in Computers and Related Systems. http://catless.ncl.ac.uk/risks
  • Perrow, C. (1999). Normal Accidents: Living with High-Risk Technologies. Princeton University Press (but read this in conjunction with Robert Hedges’ review of the book on Amazon.com).
  • Petroski, H. (1992). To Engineer is Human: The Role of Failure in Successful Design. Vintage.
  • Petroski, H. (2004). Small Things Considered: Why There is No Perfect Design. Vintage.
  • Petroski, H. (2008). Success Through Failure: The Paradox of Design. Princeton University Press.
  • Pettichord, B. (2001). The role of information in risk-based testing. International Conference on Software Testing Analysis & Review (STAR East). http://www.stickyminds.com
  • Reason, J. T.  (1997). Managing the Risks of Organizational Accident. Ashgate Publishing.
  • Schultz, C.P., Bryant, R., & Langdell, T. (2005). Game Testing All in One. Thomson Press (discussion of defect triggers).
  • Software Engineering Institute’s collection of papers on project management, with extensive discussion of project risks. https://seir.sei.cmu.edu/seir/
  • Weinberg, G. (1993). Quality Software Management. Volume 2: First Order Measurement. Dorset House.

Rounding errors (see Calculations)

Scenario testing (See also Use-case-based testing)

Self-verifying data

Specification-based testing (See also active reading; See also ambiguity analysis)

State-model-based testing

  • Auer, A.J. (1997). State Testing of Embedded Software. Ph.D. Dissertation (Computer Science). Oulun Yliopisto (Finland).
  • Becker, S.A. & Whittaker, J.A. (1997). Cleanroom Software Engineering Practices. IDEA Group Publishing.
  • Buwalda, H. (2003). Action figures. Software Testing & Quality Engineering. March/April 42-27. http://www.logigear.com/articles-by-logigear-staff/245-action-figures.html
  • El-Far, I. K. (1999), Automated Construction of Software Behavior Models, Masters Thesis, Florida Institute of Technology, 1999.
  • El-Far, I. K. & Whittaker, J.A. (2001), Model-based software testing, in Marciniak, J.J. (2001). Encyclopedia of Software Engineering, Wiley. http://testoptimal.com/ref/Model-based Software Testing.pdf
  • Jorgensen, A.A. (1999). Software Design Based on Operational Modes. Doctoral Dissertation, Florida Institute of Technology. https://cs.fit.edu/Projects/tech_reports/cs-2002-10.pdf
  • Katara, M., Kervinen, A., Maunumaa, M., Paakkonen, T., & Jaaskelainen, A. (2007). Can I have some model-based GUI tests please? Providing a model-based testing service through a web interface. Conference of the Association for Software Testing. http://practise.cs.tut.fi/files/publications/TEMA/cast07-final.pdf
  • Mallery, C.J. (2005). On the Feasibility of Using FSM Approaches to Test Large Web Applications. M.Sc. Thesis (EECS). Washington State University.
  • Page, A., Johnston, K., & Rollison, B.J. (2009). How We Test Software at Microsoft. Microsoft Press.
  • Robinson, H. (1999a). Finite state model-based testing on a shoestring. http://www.stickyminds.com/getfile.asp?ot=XML&id=2156&fn=XDD2156filelistfilename1%2Epdf
  • Robinson, H. (1999b). Graph theory techniques in model-based testing. International Conference on Testing Computer Software. http://sqa.fyicenter.com/art/Graph-Theory-Techniques-in-Model-Based-Testing.html
  • Robinson, H. Model-Based Testing Home Page. http://www.geocities.com/model_based_testing/
  • Rosaria, S., & Robinson, H. (2000). Applying models in your testing process. Information & Software Technology, 42(12), 815-24. http://www.harryrobinson.net/ApplyingModels.pdf
  • Schultz, C.P., Bryant, R., & Langdell, T. (2005). Game Testing All in One. Thomson Press.
  • Utting, M., & Legeard, B. (2007). Practical Model-Based Testing: A Tools Approach. Morgan Kaufmann.
  • Vagoun, T. (1994). State-Based Software Testing. Ph.D. Dissertation (Computer Science). University of Maryland College Park.
  • Whittaker, J.A. (1992). Markov Chain Techniques for Software Testing and Reliability Analysis. Ph.D. Dissertation (Computer Science). University of Tennessee.
  • Whittaker, J.A. (1997). Stochastic software testing. Annals of Software Engineering, 4, 115-131.

Stress testing

Task analysis (see also Scenario testing and Use-case-based testing)

  • Crandall, B., Klein, G., & Hoffman, R.B. (2006). Working Minds: A Practitioner’s Guide to Cognitive Task Analysis. MIT Press.
  • Draper, D. & Stanton, N. (2004). The Handbook of Task Analysis for Human-Computer Interaction. Lawrence Erlbaum.
  • Ericsson, K.A. & Simon, H.A. (1993). Protocol Analysis: Verbal Reports as Data (Revised Edition). MIT Press.
  • Gause, D.C., & Weinberg, G.M. (1989). Exploring Requirements: Quality Before Design. Dorset House.
  • Hackos, J.T. & Redish, J.C. (1998). User and Task Analysis for Interface Design. Wiley.
  • Jonassen, D.H., Tessmer, M., & Hannum, W.H. (1999). Task Analysis Methods for Instructional Design.
  • Robertson, S. & Robertson, J. C. (2006, 2nd Ed.). Mastering the Requirements Process. Addison-Wesley Professional.
  • Schraagen, J.M., Chipman, S.F., & Shalin, V.I. (2000). Cognitive Task Analysis. Lawrence Erlbaum.
  • Shepard, A. (2001). Hierarchical Task Analysis. Taylor & Francis.

Test design / test techniques (in general)

Test idea catalogs

Testing skill

Many of the references in this collection are about the development of testing skill. However, a few papers stand out, to me, as exemplars of papers that focus on activities or structures designed to help testers improve their day-to-day testing skills. We need more of these.

Tours

Usability testing

  • Cooper, A. (2004). The Inmates are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity. Pearson Education.
  • Cooper, A., Reimann, R. & Cronin, D. (2007). About Face 3: The Essentials of Interaction Design. Wiley.
  • Dumas, J.S. & Loring, B.A. (2008). Moderating Usability Tests: Principles and Practices for Interacting. Morgan Kaufmann.
  • Fiedler, R.L., & Kaner, C. (2009). “Putting the context in context-driven testing (an application of Cultural Historical Activity Theory).” Conference of the Association for Software Testing. https://kaner.com/pdfs/FiedlerKanerCast2009.pdf
  • Ives, B., Olson, M.H., & Baroudi, J.J. (1983). The measurement of user information systems. Communications of the ACM, 26(10), 785-793. http://portal.acm.org/citation.cfm?id=358430
  • Krug, S. (2005, 2nd Ed.). Don’t Make Me Think: A Common Sense Approach to Web Usability. New Riders Press.
  • Kuniavsky, M. (2003). Observing the User Experience: A Practitioner’s Guide to User Research. Morgan Kaufmann.
  • Lazar, J., Fend, J.H., & Hochheiser, H. (2010). Research Methods in Human-Computer Interaction. Wiley.
  • Nielsen, J. (1994). Guerrilla HCI: Using discount usability engineering to penetrate the intimidation barrier. http://www.useit.com/papers/guerrilla_hci.html
  • Nielsen, J. (1999). Designing Web Usability. Peachpit Press.
  • Nielson, J. & Loranger, H. (2006). Prioritize Web Usability. MIT Press.
  • Norman, D.A. (2010). Living with Complexity. MIT Press.
  • Norman, D.A. (1994). Things that Make Us Smart: Defending Human Attributes in the Age of the Machine. Basic Books.
  • Norman, D.A. & Draper, S.W. (1986). User Centered System Design: New Perspectives on Human-Computer Interaction. CRC Press.
  • Patel, M. & Loring, B. (2001). Handling awkward usability testing situations. Proceedings of the Human Factors and Ergonomics Society 45th Annual Meeting. 1772-1776.
  • Platt, D.S. (2006). Why Software Sucks. Addison-Wesley.
  • Rubin, J., Chisnell, D. & Spool, J. (2008). Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. Wiley.
  • Smilowitz, E.D., Darnell, M.J., & Benson, A.E. (1993). Are we overlooking some usability testing methods? A comparison of lab, beta, and forum tests. Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, 300-303.
  • Stone, D., Jarrett, C., Woodroffe, M. & Minocha, S. (2005). User Interface Design and Evaluation. Morgan Kaufmann.
  • Tullis, T. & Albert, W. (2008). Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics (Interactive Technologies). Morgan Kaufmann.

Use-case based testing (see also Scenario testing and Task analysis)

  • Adolph, S. & Bramble, P. (2003). Patterns for Effective Use Cases. Addison-Wesley.
  • Alexander, Ian & Maiden, Neil. Scenarios, Stories, Use Cases: Through the Systems Development Life-Cycle.
  • Alsumait, A. (2004). User Interface Requirements Engineering: A scenario-based framework. Ph.D. dissertation (Computer Science), Concordia University.
  • Berger, Bernie (2001) “The dangers of use cases employed as test cases,” STAR West conference, San Jose, CA. http://www.testassured.com/docs/Dangers.htm
  • Charles, F.A. (2009). Modeling scenarios using data. STP Magazine. http://www.quality-intelligence.com/articles/Modelling%20Scenarios%20Using%20Data_Paper_Fiona%20Charles_CAST%202009_Final.pdf
  • Cockburn, A.(2001). Writing Effective Use Cases. Addison-Wesley.
  • Cohn, M. (2004). User Stories Applied: For Agile Software Development. Pearson Education.
  • Collard, R. (July/August 1999). Test design: Developing test cases from use cases. Software Testing & Quality Engineering, 31-36.
  • Hsia, P., Samuel, J. Gao, J. Kung, D., Toyoshima, Y. & Chen, C. (1994). Formal approach to scenario analysis. IEEE Software, 11(2), 33-41.
  • Jacobson, I. (1995). The use-case construct in object-oriented software engineering. In John Carroll (ed.) (1995). Scenario-Based Design. Wiley.
  • Jacobson, I., Booch, G. & Rumbaugh, J. (1999). The Unified Software Development Process. Addison-Wesley.
  • Jacobson, I. & Bylund, S. (2000) The Road to the Unified Software Development Process. Cambridge University Press.
  • Kim, Y. C. (2000). A Use Case Approach to Test Plan Generation During Design. Ph.D. Dissertation (Computer Science). Illinois Institute of Technology.
  • Kruchten, P. (2003, 3rd Ed.). The Rational Unified Process: An Introduction. Addison-Wesley.
  • Samuel, J. (1994). Scenario analysis in requirements elicitation and software testing. M.Sc. Thesis (Computer Science), University of Texas at Arlington.
  • Utting, M., & Legeard, B. (2007). Practical Model-Based Testing: A Tools Approach. Morgan Kaufmann.
  • Van der Poll, J.A., Kotze, P., Seffah, A., Radhakrishnan, T., & Alsumait, A. (2003). Combining UCMs and formal methods for representing and checking the validity of scenarios as user requirements. Proceedings of the South African Institute of Computer Scientists and Information Technologists on Enablement Through Technology. http://dl.acm.org/citation.cfm?id=954014.954021
  • Zielczynski, P. (2006). Traceability from use cases to test cases. http://www.ibm.com/developerworks/rational/library/04/r-3217/

User interface testing

User testing (see beta testing)

  • Albert, W., Tullis, T. & Tedesco, D. (2010). Beyond the Usability Lab: Conducting Large-Scale Online User Experience Studies. Morgan Kaufmann.
  • Wang, E., & Caldwell, B. (2002). An empirical study of usability testing: Heuristic evaluation vs. user testing. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting. 774-778.

This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

Testing tours: Research for Best Practices?

June 24th, 2011

A few years ago, Michael Bolton wrote a blog post on “Testing Tours and Dashboards.” Back then, it had recently become fashionable to talk about “tours” as a fundamental class of tool in exploratory testing. Michael reminded readers of the unacknowledged history of this work.

Michael’s post also mentioned that some people were talking about running experiments on testing tours. Around that time, I heard a bunch about experiments that purported to compare testing tours. This didn’t sound like very good work to me, so I ignored it. Bad ideas often enjoy a burst of publicity, but if you leave them alone, they often fade away over time.

This topic has come up again recently, repeatedly. A discussion with someone today motivated me to finally publish a comment.

The comments have come up mainly in discussions or reviews of a test design course that I’m creating. The course (video) starts with a demonstration of a feature tour, followed by a inventory of many of the tours I learned from Mike Kelly, Elisabeth Hendrickson, James Bach and Mike Bolton.

Why, the commentator asks, do I not explain which tours are better and which are worse? After all, hasn’t there been some research that shows that Tour X is better than Tour Y? This continues down one of two tracks:

  • Isn’t it irresponsible to ignore scientific research that demonstrates that some tours are much better than others or that some tours are ineffective?
  • Shouldn’t I be recommending that people do more of this kind of research? Wouldn’t this be a promising line of research? Couldn’t someone propose it to a corporate research department or a government agency that funds research? After all, this could be a scientific way to establish some Best Practices for exploratory testing (use the best tours, skip the worst ones).

This idea, using experiments to rank tours from best to worst, can certainly be made to sound impressive.

I don’t think this is a good idea. I’ll say that more strongly: even though this idea might be seductive to people who have training in empirical methods (or who are easily impressed by descriptions of empirical work), I think it reflects a fundamental lack of understanding of exploratory testing and of touring as a class of exploratory tools.

A tour is a directed search through the program. Find all the capabilities. Find all the claims about the product. Find all the variables. Find all the intended benefits. Find all the ways to get from A to B. Find all the X. Or maybe not ALL, but find a bunch.

This helps the tester achieve a few things:

  1. It creates an inventory of a class of attributes of the product under test. Later, the tester can work through the inventory, testing each one to some intended level of depth. This is what “coverage”-oriented testing is about. You can test N% of the program’s statements, or N% of the program’s features, or N% of the claims made for the product in its documentation–if you can list it, you can test it and check it off the list.
  2. It familiarizes the tester with this aspect of the product. Testing is about discovering quality-related information about the product. An important part of the process of discovery is learning what is in the product, how people can / will use it, and how it works. Tours give us different angles on that multidimensional learning problem.
  3. It provides a way for the tester to explain to someone else what she has studied in the product so far, what types of things she has learned and what basics haven’t yet been explored. In a field with few trustworthy metrics, this gives us a useful basis for reporting progress, especially progress early in the testing effort.

So which tour is better?

From a test-coverage perspective, I think that depends a lot on contract, regulation, and risk.

  1. To the extent that you have to know (and have to be able to tell people in writing) that all the X’s have been tested and all the X’s work, you need to know what all the X’s are and how to find them in the program. That calls for an X-tour. Which tour is better? The one you need for this product. That probably varies from product to product, no?
  2. Some programmers are more likely to make X-bugs than Y-bugs. Some programmers are sloppy about initializing variables. Some are sloppy about boundary conditions. Some are sloppy about thread interactions. Some are good coders but they design user interactions that are too confusing. If I’m testing Joe X’s code, I want to look for X-bugs. If Joe blows boundaries, I want to do a variable tour, to find all the variables so I can test all the boundaries. But if Joe’s problem is incomprehensibility, I want to do a benefit tour, to see what benefits people _should_ get from the program and how hard/confusing it is for users to actually get them. Which tour is better? That depends on which risks we are trying to mitigate, which bugs we are trying to find. And that varies from program to program, programmer to programmer, and on the same project, from time to time.

From a tester-learning perspective, people learn differently from each other.

  1. If I set 10 people with the task of learning what’s in a program, how it can be used, and how it works, those people would look for different types of information. They would be confused by different things. They would each find some things more interesting than others. They would already know some things that their colleagues didn’t.
  2. Which tour is better? The tour that helps you learn something you’re trying to learn today. Tomorrow, the best tour will be something different.

Testing is an infinite task. Define “distinct tests” this way: two tests are distinct if each can expose at least one bug that the other would miss. For a non-trivial program, there is an infinite number of distinct potential tests. The central problem of test design is boiling down this infinite set to a (relative to infinity) tiny collection of tests that we will actually use. Each test technique highlights a different subset of this infinity. In effect, each test technique represents a different sampling strategy from the space of possible tests.

Tours help the tester gain insight into the multidimensional nature of this complex, infinite space. They help us imagine, envision, and as we gain experience on the tour, prioritize the different sampling strategies we could use when we do more thorough, more intense testing after finishing the tours. So which tour is best? The ones that give the testers more insight and that achieve a greater stretch of the testers’ imagination. For this, some tours will work better for me, others for you.

The best tour isn’t necessarily the one that finds the most bugs. Or covers the most statements (or take your pick of what coverage attribute) of the product. The best tour is the one that helps the individual human tester learn something new and useful. (That’s why we call it exploration. New and useful knowledge.) And that depends on what you already know, on what risks and complexities characterize this program, and what the priorities are in this project.

A tour that is useful to you might be worthless to me. But that doesn’t stop it from being useful for you.

  • Rather than looking for a few “best” tours, I think it would be more interesting to develop guidance on how to do tour X given that you want to. (Do you know how to do a tour of uses of the program that trigger multithreaded operations in the system? Would it be interesting to know how?)
  • Rather than looking for a few “best” tours, I think it would be more interesting to develop a more diverse collection of tours that we can do, with more insight into what each can teach us.
  • Rather than seeking to objectify and quantify tours, I think we should embrace their subjectivity and the qualitative nature of the benefits they provide.

Pseudo-academics and best-practicers trivialize enough aspects of testing. They should leave this one alone.

A new brand of snake oil for software testing

May 19th, 2010

I taught a course last term on Quantitative Investment Modeling in Software Engineering to a mix of undergrad and grad students of computer science, operations research and business. We had a great time, we learned a lot about the market, about modeling, and about automated exploratory testing (more on this type of exploratory testing at this year’s Conference of the Association for Software Testing…)

In the typical undergraduate science curriculum, most of the experimental design we teach to undergraduates is statistical. Given a clearly formulated hypothesis and a reasonably clearly understood oracle, we learn how to design experiments that control for confounding variables, so that we can decide whether our experimental effect was statistically significant. We also teach some instrumentation, but in most cases, the students learn how to use well-understood instruments as opposed to how to appraise, design, develop, calibrate and then apply them.

Our course was not so traditionally structured. In our course, each student had to propose and evaluate an investment strategy. We started with a lot of bad ideas. (Most small investors lose money. One of the attributes of our oracle is, “If it causes you to lose money, it’s probably a bad idea.”) We wanted to develop and demonstrate good ideas instead. We played with tools (some worked better than others) and wrote code to evolve our analytical capabilities, studied some qualitative research methods (hypothesis-formation is a highly qualitative task), ran pilot studies, and then eventually got to the formal-research stages that the typical lab courses start at.

Not surprisingly, the basics of designing a research program took about 1/3 of the course. With another course, I probably could have trained these students to be moderately-skilled EVALUATORS of research articles. (It is common in several fields to see this as a full-semester course in a doctoral program.)

Sadly, few CS doctoral programs (and even fewer undergrad programs) offer courses in the development or evaluation of research, or if they offer them, they don’t require them.

The widespread gap between having a little experience replicating other people’s experiments and seeing some work on a lab, on the one hand, and learning to do and evaluate research on the other hand — this gap is the home court for truthiness. In the world of truthiness, it doesn’t matter whether the evidence in support of an absurd assertion is any good, as long as we can make it look to enough people as though good enough evidence exists. Respectable-looking research from apparently-well-credentialed people is hard for someone to dispute if, as most people in our field, one lacks training in critical evaluation of research.

The new brand of snake oil is “evidence-based” X, such as “evidence-based” methods of instruction or in a recent proposal, evidence-based software testing. Maybe I’m mistaken in my hunch about what this is about, but the tone of the abstract (and what I’ve perceived in my past personal interactions with the speaker) raise some concerns.

Jon Bach addresses the tone directly. You’ll have to form your own personal assessments of the speaker. But I agree with Jon that this does not sound merely like advocacy of applying empirical research methods to help us improve the practice of testing, an idea that I rather like. Instead, the wording  suggests a power play that seems to me to have less to do with research and more to do with the next generation of ISTQB marketing.

So let me talk here about this new brand of snake oil (“Evidence-Based!”), whether it is meant this way by this speaker or not.

The “evidence-based” game is an interesting one to play when most of the people in a community have limited training in research methods or research evaluation. This game has been recently fashionable in American education. In that context, I think it has been of greatest benefit to people who make money selling mediocritization. It’s not clear to me that this movement has added one iota of value to the quality of education in the United States.

In principle, I see 5 problems (or benefits, depending on your point of view). I say, “in principle” because of course, I have no insight into the personal motives and private ideas of Dr. Reid or his colleagues. I am raising a theoretical objection. Whether it is directly applicable to Dr. Reid and ISTQB is something you will have to decide yourself, and these comments are not sufficient to lead you to a conclusion.

  1. It is easy to promote forced results from worthless research when your audience has limited (or no) training in research methods, instrumentation, or evaluation of published research. And if someone criticizes the details of your methods, you can dismiss their criticisms as quibbling or theoretical. Too many people in the audience will be stuck making their decision about the merits of the objection on the personal persuasiveness of the speakers (which snake oil salesmen excel at) rather than on the underlying merits of the research.
  2. When one side has a lot of money (such as, perhaps, proceeds from a certification business), and a plan to use “research” results as a sales tool to make a lot more money, they can invest in “research” that yields promotable results. The work doesn’t have to be competent (see #1). It just has to support a conclusion that fits with the sales pitch.
  3. When the other side doesn’t have a lot of money, when the other side are mainly practitioners (not much time or training to do the research), and when competent research costs a great deal more than trash (see #2 and #5), the debates are likely to be one-sided. One side has “evidence” and if the other side objects, well, if they think the “evidence” is so bad,  they should raise a bunch of money and donate a bunch of time to prove it. It’s an opportunity for well-funded con artists to take control of the (apparent) high road. They can spew impressive-looking trash at a rate that cannot possibly be countered by their critics.
  4. It is easy for someone to do “research” as a basis for rebranding and reselling someone else’s ideas. Thus, someone who has never had an original thought in his life can be promoted as the “leading expert” on X by publishing a few superficial studies of it.  A certain amount of this goes on already in our field, but largely as idiosyncratic misbehavior by individuals. There is a larger threat. If a training organization will make more money (influence more standards, get its products mandated by more suckers) if its products and services have the support of “the experts”, but many of “the experts” are inconveniently critical, there is great marketing value in a vehicle for papering over the old experts with new-improved experts who have done impressive-looking research that gives “evidence-based” backing to whatever the training organization is selling. Over time, of course, this kind of plagiarism kills innovation by bankrupting the innovators. For companies that see innovation as a threat, however, that’s a benefit, not a problem. (For readers who are wondering whether I am making a specific allegation about any person or organization, I am not. This is merely a hypothetical risk in an academic’s long list of hypothetical risks, for you to think about  in your spare time.)
  5. In education, we face a classic qualitative-versus-quantitative tradeoff. We can easily measure how many questions someone gets right or wrong on simplistic tests. We can’t so easily measure how deep an understanding someone has of a set of related concepts or how well they can apply them. The deeper knowledge is usually what we want to achieve, but it takes much more time and much more money and much more research planning to measure it. So instead, we often substitute the simplistic metrics for the qualitative studies. Sadly, when we drive our programs by those simplistic metrics, we optimize to them and we gradually teach to the superficial and abandon the depth. Many of us in the teaching community in the United States believe that over the past few years, this has had a serious negative impact on the quality of the public educational system and that this poses a grave threat to our long-term national competitiveness.

Most computer science programs treat system-level software testing as unfit for the classroom.

I think that software testing can have great value, that it can be very important, and that a good curriculum should have an emphasis on skilled software testing. But the popular mix of ritual, tedium, and moralizing that has been passed off by some people as testing for decades has little to offer our field, and even less for university instruction. I think ISTQB has been masterful at selling that mix. It is easy to learn and easy to certify. I’m sure that a new emphasis, “New! Improved! Now with Evidence!” could market the mix even better. Just as worthless, but with even better packaging.

Conference of the Association for Software Testing

March 18th, 2010

I’ll be keynoting at CAST on investment modeling and exploratory test automation.

In its essence, exploratory testing is about learning new things about the quality of the software under test. Exploratory test automation is about using software to help with that learning.

Testing the value of investment models is an interesting illustration because it is all about the quality of the software, it is intensely automated, and none of the tests are regression tests. This is one illustration of automated exploration. I’ll point to others in my talk and paper.

THERE’S STILL TIME TO SUBMIT YOUR OWN PROPOSAL FOR A PRESENTATION. (The official deadline is March 20, but if you’re a few days late, I bet the conference committee will still read your proposal.)

Here’s the link: http://conferences.associationforsoftwaretesting.org/

What’s different about CAST is that we genuinely welcome discussion and debate. Any participant can ask questions of any speaker. Any participant can state her or his counter-argument to a speaker. We’ll keep taking questions all day. Artificial time limits are not used to cut off discussion.  After all, the point of a conference is “conferring.”

See you there…

An award from ACM

March 2nd, 2010

The Association for Computing Machinery’s Computers & Society Special Interest Group just honored me with their “Person Who Made a Difference” award. Here’s what it’s about:

Making a Difference Award

This award is presented to an individual who is widely recognized for work related to the interaction of computers and society. The recipient is a leader in promoting awareness of ethical and social issues in computing. The recipients of this award and the award itself encourage responsible action by computer professionals.

This is a once-in-a-lifetime award, which goes to (at most) one person per year.

ACM has printed a biographic summary that highlights the work that led to this award. We’ll probably add an interview soon.

The award presentation and talk will probably be at the Computers, Freedom & Privacy conference in San Jose in June.

I’ll probably also be talking about this work at a QAI meeting in Chicago, this May.

A few new articles

December 26th, 2009

I finally found some time to update my website, posting links to some more of my papers and presentations.

There are a few themes:

  • Investment modeling as a new exemplar. Software testing helps us understand the quality of the product or service under test. There are generically useful approaches to test design, like quicktests and tours and other basic techniques, but I think we add our greatest value when we apply a deeper understanding of the application. Testing instructors don’t teach this well because it takes so long to build a classroom-wide understanding of an application that is complex enough to be interesting. For the last 14 months, I’ve been exploring investment modeling as a potential exemplar of deep and interesting testing within a field that many people can grasp quickly.
  • Exploratory test automation. I don’t understand why people say that exploratory testing is always manual testing. Doug Hoffman and I have been teaching “high-volume” test techniques for twelve years (I wrote about some of these back in Testing Computer Software) that don’t involve regression testing or test-by-test scripting. We run these to explore new risks; we change our parameters to shift the focus of our search (to try something new or to go further in depth if we’re onto something interesting). This is clearly exploratory, but it is intensely automated. I’m now using investment modeling to illustrate this, and starting to work with Scott Barber to use performance modeling to illustrate it as well. Doug is working through a lot of historical approaches; perhaps the three of us can integrate our work, a lot of interesting work published by other folks, into something that more clearly conveys the general idea.
  • Instructional design: Teaching software testing. Rebecca Fiedler, Scott Barber and I have worked through a model for online education in software testing that fosters deeper learning than many other approaches. The Association for Software Testing has been a major testbed for this approach. We’ve also been doing a lot in academic institutions, comparing notes in detail with faculty at other schools.
  • The evolving law of software quality. Federal and state legislatures have failed to adopt laws governing software contracting and software quality. Because of this, American judges have had to figure out for themselves what legal rules should be applied–until Congress or the state legislatures finally get around to giving clear and constitutional guidance to the courts. This spring, the American Law Institute unanimously adopted the Principles of the Law of Software Contracts, which includes some positions that I’ve been advocating for 15 years. The set of papers below includes some discussion of the Principles. In addition, I’m kicking off a wiki-based project to update my book, Bad Software, to give customers good advice about their rights and their best negotiating tactics under the current legal regime. I’ll blog more about this later, looking for volunteers to help update the book.

Here’s the list of new stuff:

  1. Cem Kaner, “Exploratory test automation: Investment modeling as an example.” [SLIDES]. ImmuneIT, Amsterdam, October 2009.
  2. Cem Kaner, “Investment modeling: A software engineer’s approach.” [SLIDES]. Colloquium, Florida Institute of Technology, October 2009.
  3. Cem Kaner, “Challenges in the Evolution of Software Testing Practices in Mission-Critical Environments.” [SLIDES]. Software Test & Evaluation Summit/Workshop (National Defense Industrial Association), Reston VA, September 2009.
  4. Cem Kaner, “Approaches to test automation.” [SLIDES]. Research in Motion, Kitchener/Waterloo, September 2009.
  5. Cem Kaner, “Software Testing as a Quality-Improvement Activity” [SLIDES]. Lockheed Martin / IEEE Computer Society Webinar Series, September 2009.
  6. Rebecca L. Fiedler & Cem Kaner, “Putting the context in context-driven testing (an application of Cultural Historical Activity Theory)” [SLIDES]. Conference of the Association for Software Testing. Colorado Springs, CO., July 2009.
  7. Cem Kaner, “Metrics, qualitative measurement, and stakeholder value” [SLIDES]. Tutorial, Conference of the Association for Software Testing. Colorado Springs, CO., July 2009.
  8. Cem Kaner, “The value of checklists and the danger of scripts: What legal training suggests for testers.” [SLIDES]. Conference of the Association for Software Testing. Colorado Springs, CO., July 2009.
  9. Cem Kaner, “New rules adopted for software contracts.” [SLIDES]. Conference of the Association for Software Testing. Colorado Springs, CO., July 2009.
  10. Cem Kaner, “Activities in software testing education: a structure for mapping learning objectives to activity designs“. Software Testing Education Workshop (International Conference on Software Testing), Denver, CO, April 2009.
  11. Cem Kaner, “Plagiarism-detection software Clashing intellectual property rights and aggressive vendors yield dismaying results.” [SLIDES] [VIDEO]. Colloquium, Florida Institute of Technology, October 2009.
  12. Cem Kaner, “Thinking about the Software Testing Curriculum.” [SLIDES]. Workshop on Integrating Software Testing into Programming Courses, Florida International University, March 2009.
  13. Cem Kaner (initial draft), “Dimensions of Excellence in Research“. Department of Computer Sciences, Florida Institute of Technology, Spring 2009.
  14. Cem Kaner, “Patterns of activities, exercises and assignments.” [SLIDES]. Workshop on Teaching Software Testing, Melbourne FL, January 2009.
  15. Cem Kaner & Rebecca L. Fiedler, “Developing instructor-coached activities for hybrid and online courses.” [SLIDES]. Workshop at Inventions & Impact 2: Building Excellence in Undergraduate Science, Technology, Engineering & Mathematics (STEM) Education, National Science Foundation / American Association for the Advancement of Science, Washington DC, August 2008.
  16. Cem Kaner, Rebecca L. Fiedler, & Scott Barber, “Building a free courseware community around an online software testing curriculum.” [SLIDES]. Poster Session at Inventions & Impact 2: Building Excellence in Undergraduate Science, Technology, Engineering & Mathematics (STEM) Education, National Science Foundation / American Association for the Advancement of Science, Washington DC, August 2008.
  17. Cem Kaner, Rebecca L. Fiedler, & Scott Barber, “Building a free courseware community around an online software testing curriculum.” [SLIDES]. MERLOT conference, Minneapolis, August 2008.
  18. Cem Kaner, “Authentic assignments that foster student communication skills” [SLIDES], Teaching Communication Skills in the Software Engineering Curriculum: A Forum for Professionals and Educators (NSF Award #0722231), Miami University, Ohio, June 2008.
  19. Cem Kaner, “Comments on the August 31, 2007 Draft of the Voluntary Voting System Guidelines.” Submitted to the United States Election Assistance Commission, May 2008.
  20. Cem Kaner and Rebecca L. Fiedler, “A cautionary note on checking software engineering papers for plagiarism.”IEEE Transactions on Education, vol. 51, issue 2, 2008, pp. 184-188.
  21. Cem Kaner, “Software testing as a social science,” [SLIDES] STEP 2000 Workshop on Software Testing, Memphis, May 2008.
  22. Cem Kaner & Stephen J. Swenson, “Good enough V&V for simulations: Some possibly helpful thoughts from the law & ethics of commercial software.” [SLIDES] Simulation Interoperability Workshop, Providence, RI, April 2008.
  23. Cem Kaner, “Improve the power of your tests with risk-based test design.” [SLIDES] QAI QUEST Conference, Chicago, April 2008
  24. Cem Kaner, “Risk-based testing: Some basic concepts.” [SLIDES] QAI Managers Workshop, QUEST Conference, Chicago, April 2008
  25. Cem Kaner, “A tutorial in exploratory testing.” [SLIDES] QAI QUEST Conference, Chicago, April 2008
  26. Cem Kaner, “Adapting Academic Course Materials in Software Testing for Industrial Professional Development.” [SLIDES] Colloquium, Florida Institute of Technology, March 2008
  27. Cem Kaner, “BBST at AST: Adaptation of a course in black box software testing.” [SLIDES]. Workshop on Teaching Software Testing, Melbourne FL, January 2008.
  28. Cem Kaner, “BBST: Evolving a course in black box software testing.” [SLIDES] BBST Project Advisory Board Meeting, January 2008

Principles of the Law of Software Contracts approved

May 21st, 2009

“The Proposed Final Draft of the Principles of the Law of Software Contracts was approved, subject to the discussion at the meeting and to editorial prerogative. Approval of the draft clears the way for publication of the official text of this project.” (from a report to members on the  actions taken this week at the American Law Institute’s annual meeting.)

I’ll talk more about these at CAST, this summer.

This document will influence court decisions across the United States. It is the counterbalance to the Uniform Computer Information Transactions Act, which vastly increased software seller’s powers, virtually wiped out customers’ abilities to hold companies accountable for bad software—UCITA passed in 2 states, then died because it was so widely seen as so unbalanced.

The main provisions of the Principles that affect us:

(1) Companies will be required to reveal known defects at time of sale

(2) Reverse engineering will be more legally defensible. People will now have ALMOST as much right to reverse engineer software, in the United States, as they have for every other kind of product in the US. This brings us closer to international standards, making our development efforts less uncompetitive compared to most other high-tech countries.

I helped write the Principles. I wish I could give you more details about the discussion at the meeting (and will be able to by CAST). Unfortunately, I got a nasty virus last week and could not travel.

One comment. Earlier in the week, there was a lot of baloney on the web about a carefully timed letter from Microsoft and the Linux Foundation that (a) pleaded for delay because they said they needed more time to review the draft and (b) said that the disclosure requirements were very new and onerous.

Actually, the community has been aware of proposals for disclosure since 1995, when Ed Foster published widely-read articles on UCITA (then called Article 2B) in Infoworld, which were followed up by a lot of mass-media attention. There have been several follow-up reports to our community (software development, software testing) since then, including talks that I’ve given at previous CAST meetings.

In terms of awareness by LAWYERS, Microsoft has been involved in the drafting process for UCITA and the ALI Principles for longer than I have (I stated working on this in 1995; I think they started in 1989). The Open Source communities have been more variable in their activism on these laws, but several attorneys within that community have been active. More to the point, the Principles specifically exempts open source software from the disclosure rule because the distribution models (and availability of code) are so different from traditional proprietary software. The MS/Open Linux letter also complained that the ALI is treating all software transfer as if it were packaged software. This is a criticism that was applied to early drafts of UCITA (which Microsoft, Apple and IBM played heavy roles in writing) but that was pretty cleared up before UCITA was introduced to state legislatures in 2001. The ALI Principles were started after that, well after everyone in the process understood the variety of distribution models for software. Letters like this make good copy on slashdot and in blogs where authors don’t know much about the law or history of the work they’re blogging about, but as serious criticisms, they seem devoid of merit.

What is context-driven testing?

January 3rd, 2009

James, Bret and I published our definition of context-driven testing at http://www.context-driven-testing.com/.

Some people have found the definition too complex and have tried to simplify it, attempting to equate the approach with Agile development or Agile  testing, or with the exploratory style of software testing. Here’s another crack at a definition:

Context-driven testers choose their testing objectives, techniques, and deliverables (including test documentation) by looking first to the details of the specific situation, including the desires of the stakeholders who commissioned the testing. The essence of context-driven testing is project-appropriate application of skill and judgment. The Context-Driven School of testing places this approach to testing within a humanistic social and ethical framework.

Ultimately, context-driven testing is about doing the best we can with what we get. Rather than trying to apply “best practices,” we accept that very different practices (even different definitions of common testing terms) will work best under different circumstances.

Contrasting context-driven with context-aware testing.

Many testers think of their approach as context-driven because they take contextual factors into account as they do their work. Here are a few examples that might illustrate the differences between context-driven and context-aware:

  • Context-driven testers reject the notion of best practices, because they present certain practices as appropriate independent of context. Of course it is widely accepted that any “best practice” might be inapplicable under some circumstances. However, when someone looks to best practices first and to project-specific factors second, that may be context-aware, but not context-driven.
  • Similarly, some people create standards, like IEEE Standard 829 for test documentation, because they think that it is useful to have a standard to lay out what is generally the right thing to do. This is not unusual, nor disreputable, but it is not context-driven. Standard 829 starts with a vision of good documentation and encourages the tester to modify what is created based on the needs of the stakeholders. Context-driven testing starts with the requirements of the stakeholders and the practical constraints and opportunities of the project. To the context-driven tester, the standard provides implementation-level suggestions rather than prescriptions.

Contrasting context-driven with context-oblivious, context-specific, and context-imperial testing.

To say “context-driven” is to distinguish our approach to testing from context-oblivious, context-specific, or context-imperial approaches:

  • Context-oblivious testing is done without a thought for the match between testing practices and testing problems. This is common among testers who are just learning the craft, or are merely copying what they’ve seen other testers do.
  • Context-specific testing applies an approach that is optimized for a specific setting or problem, without room for adjustment in the event that the context changes. This is common in organizations with longstanding projects and teams, wherein the testers may not have worked in more than one organization. For example, one test group might develop expertise with military software, another group with games. In the specific situation, a context-specific tester and a context-driven tester might test their software in exactly the same way. However, the context-specific tester knows only how to work within her or his one development context (MilSpec) (or games), and s/he is not aware of the degree to which skilled testing will be different across contexts.
  • Context-imperial testing insists on changing the project or the business in order to fit the testers’ own standardized concept of “best” or “professional” practice, instead of designing or adapting practices to fit the project. The context-imperial approach is common among consultants who know testing primarily from reading books, or whose practical experience was context-specific, or who are trying to appeal to a market that believes its approach to development is the one true way.

Contrasting context-driven with agile testing.

Agile development models advocate for a customer-responsive, waste-minimizing, humanistic approach to software development and so does context-driven testing. However, context-driven testing is not inherently part of the Agile development movement.

  • For example, Agile development generally advocates for extensive use of unit tests. Context-driven testers will modify how they test if they know that unit testing was done well. Many (probably most) context-driven testers will recommend unit testing as a way to make later system testing much more efficient. However, if the development team doesn’t create reusable test suites, the context-driven tester will suggest testing approaches that don’t expect or rely on successful unit tests.
  • Similarly, Agile developers often recommend an evolutionary or spiral life cycle model with minimal documentation that is developed as needed. Many (perhaps most) context-driven testers would be particularly comfortable working within this life cycle, but it is no less context-driven to create extensively-documented tests within a waterfall project that creates big documentation up front.

Ultimately, context-driven testing is about doing the best we can with what we get. There might not be such a thing as Agile Testing (in the sense used by the agile development community) in the absence of effective unit testing, but there can certainly be context-driven testing.

Contrasting context-driven with standards-driven testing.

Some testers advocate favored life-cycle models, favored organizational models, or favored artifacts. Consider for example, the V-model, the mutually suspicious separation between programming and testing groups, and the demand that all code delivered to testers come with detailed specifications.

Context-driven testing has no room for this advocacy. Testers get what they get, and skilled context-driven testers must know how to cope with what comes their way. Of course, we can and should explain tradeoffs to people, make it clear what makes us more efficient and more effective, but ultimately, we see testing as a service to stakeholders who make the broader project management decisions.

  • Yes, of course, some demands are unreasonable and we should refuse them, such as demands that the tester falsify records, make false claims about the product or the testing, or work unreasonable hours. But this doesn’t mean that every stakeholder request is unreasonable, even some that we don’t like.
  • And yes, of course, some demands are absurd because they call for the impossible, such as assessing conformance of a product with contractually-specified characteristics without access to the contract or its specifications. But this doesn’t mean that every stakeholder request that we don’t like is absurd, or impossible.
  • And yes, of course, if our task is to assess conformance of the product with its specification, we need a specification. But that doesn’t mean we always need specifications or that it is always appropriate (or even usually appropriate) for us to insist on receiving them.

There are always constraints. Some of them are practical, others ethical. But within those constraints, we start from the project’s needs, not from our process preferences.

Context-driven techniques?

Context-driven testing is an approach, not a technique. Our task is to do the best testing we can under the circumstances–the more techniques we know, the more options we have available when considering how to cope with a new situation.

The set of techniques–or better put, the body of knowledge–that we need is not just a testing set. In this, we follow in Gerry Weinberg’s footsteps:  Start to finish, we see a software development project as a creative, complex human activity. To know how to serve the project well, we have to understand the project, its stakeholders, and their interests. Many of our core skills come from psychology, economics, ethnography, and the other socials sciences.

Closing notes

Reasonable people can advocate for standards-driven testing. Or for the idea that testing activities should be routinized to the extent that they can be delegated to less expensive and less skilled people who apply the routine directions. Or for the idea that the biggest return on investment today lies in improving those testing practices intimately tied to writing the code. These are all widely espoused views. However, even if their proponents emphasize the need to tailor these views to the specific situation, these views reflect fundamentally different starting points from context-driven testing.

Cem Kaner, J.D., Ph.D.
James Bach

Ed Foster is dead–A great loss for mass-market computing

July 29th, 2008

Ed Foster just died.

Ed was one of the great journalists of Silicon Valley. He listened. He read. He asked probing questions. He changed his mind when the evidence proved him wrong.  He understood the computer and information industries from (at least) a dozen perspectives. And he could explain their perspectives to each other.

Ed was part of the heart of Silicon Valley in the early years of the small computer revolution. He was a whirlwind of well-informed enthusiasm. He taught us about the culture and the values of the Valley.  Consumers, hackers, publishers, marketers were all entertained and informed by him. Directly and indirectly, he shaped our thinking about the potential of this new technology and the responsibilities of these new technologists within the technology-enthusiast society and the broader American society.

I think I first met Ed in the 1980’s at one of the trade shows, but I didn’t have the privilege of talking with him in depth, then of collaborating with him, until the mid-1990’s.

Ed was the first journalist to publicize a series of projects to rewrite the commercial laws governing computers and software.

These new laws were being presented to people, mainly to the broad legal community (no one else was listening in those early days)(well, almost no one–Ed was listening) as if they were a careful balance of the rights of consumers, small business customers, small software developers, the open source community, and bigger software publishers and hardware makers. On the surface, they looked that way. Beneath that surface was a new legal regime designed to give software publishers, database publishers and large software consulting firms a panoply of new rights and defenses.

I was a newly-graduated lawyer when Ed’s comments alerted me to the new stuff on the horizon. I went to school intending to work on the law of software quality. Following up Ed’s leads shaped my career.

As Ed (with some help from me) caught a glimmer of the scope and significance of these proposals, Ed got to work. He read voraciously. He came to meetings. He interviewed and interviewed and interviewed people. He checked facts. He checked his assumptions and conclusions. He took advice from people who disagreed with him as well as from those who agreed. He wrote with care and credibility. His leadership brought dozens of other journalists into coverage of the nuts and bolts of development of highly technical commercial law–something almost never covered by the press. He helped them understand what they were seeing. He was A Force To Be Reckoned With, not because he represented people with power or money but because he did his homework and knew how to explain what he knew to ordinary readers. The most visible bills of this group were the Uniform Computer Information Transactions Act (which was ultimately rejected by 48 of 50 States) and the Uniform Electronic Transactions Act (which improved tremendously under the bright light of public scrutiny. You might know it better under its federal name, ESIGN. It governs electronic commerce in every State). Without Ed, neither result would have happened.

One of the proposals that Ed embraced with passion was the idea that software companies should be required to disclose their known defects.  There’s a natural justice in the idea that a company who knows about a bug but won’t tell its customers about it should be responsible to those customers for any losses caused by the known bug.  You can’t find every bug. But if you honestly and effectively disclose, your customers can at least work around the bugs you know about (or buy a product whose bugs are less serious). Ed could make this natural justice clear and obvious. The implementation of the idea (writing it into a set of laws) is complex–you can easily get lost in the difficult details–but Ed could stand above that and remind people why the implementation was worth the effort. I was initially a skeptic–I favored the idea but saw other matters as more critical. Ed turned my priorities around, leading me gradually to understand that there is no real competition and no hope for justice in a marketplace that allows vendors to hide fundamental information from the people who most need it.

The UCITA drafters dismissed this as naive, unreasonable, excessively burdensome, or impossible to do. But now that UCITA has failed, a major new drafting project (the American Law Institute’s Principles of the Law of Software Contracting) has picked up the idea. It will probably appear in serious legislation in 2012 or so, almost twenty years  after Ed started explaining it to people. I am sad that Ed will miss seeing this come to fruition. Among his many gifts to American society, this is an important one.

In this decade, Ed has been one of this industry’s extremely few consumer advocates. Since 2000, when consumer protection at the Federal level went dark and State-level protection continued to vanish in the never ending waves of tax cuts and litigation “reforms,” I have learned more about the pulse of consumer problems from Ed’s alerts than from any other source.

I teach courses on computer law and ethics these days, to budding software engineers. Ed’s work provided perfect starting points for many students. Some probably learned more about professionalism and ethics from Ed’s writing than from anything in their textbooks or my lectures.

Ed wrote as a voice of the conscience of an industry that needs to find its conscience.

I am writing through tears as I say that he will be missed

— Cem Kaner

Keynote address at CAST

July 19th, 2008

This year’s theme at CAST was multidisciplinary approaches to testing. Tying my experience in psychological research, legal practice, and testing, I gave the keynote address:  The Value of Checklists and the Danger of Scripts: What Legal Training Suggests for Testers. At its core, the talk says this:

In the hands of professionals, checklists facilitate the exercise of judgment by the human professional. In contrast, scripts attempt to mechanize the task (whether a machine runs the test or a human being treated as a machine). Professionals often use checklists and, in my experience, rarely use scripts. When I went to law school, professors spoke often about reliance on script-like tools–in essence, they described them as paving stones on a fast highway to malpractice. My training and practice in psychology and law laid a foundation for my skepticism about linking scripting to “professionalism” in software testing.

Checklists (of many kinds) are fine tools to help testers prepare for exploratory testing or to document their work. Scripts are fine tools for bug reporting (“do these things to get the failure”) and necessary tools for dealing with regulators who demand them, but as test planning tools, they are an anti-professional practice.

Along with the overview, the talk provided several examples of different types of checklists, with thoughts on how to apply them to testing.

https://kaner.com/pdfs/ValueOfChecklists.pdf

Defining Exploratory Testing

July 14th, 2008

In January, 2006, several of us worked on definitions of exploratory testing at the Exploratory Testing Research Summit (another link), with follow-up work at the Workshop on Heuristic and Exploratory Techniques in May.  We didn’t reach unanimous agreement at that meeting. However, later discussions based on the meeting’s notes yielded this definition:

“Exploratory software testing is a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.”

I cannot present this as a consensus definition from WHET, both because (a) not everyone who came to WHET would agree and (b) several other people helped polish it. But I am uncomfortable presenting it as if it is primarily from James Bach, Jon Bach, Mike Bolton, and/or me. The authorship / endorsement list should be much broader.

If you would like to be identified either as one of the authors of this definition or as an endorser of it, can you please post a comment to this blog or send me an email (kaner@kaner.com)

AST Instructors’ Tutorial at CAST in Toronto

May 28th, 2008

You’ve read about the Association for Software Testing’s free software testing courses. Now find out how you can get involved in teaching these for AST, for your company, or independently. This workshop will use presentations, lectures, and hands-on exercises to address the challenges of teaching online: Becky Fiedler, Scott Barber and I will host the Live! AST Instructors Orientation Course Jumpstart Tutorial On July 17, 2008, in conjunction with this year’s Conference of the AST (CAST).

To register for this Tutorial, go to https://www.123signup.com/register?id=tbzrz

To register for CAST (early registration ends June 1)  go to http://www.cast2008.org/Registration

More Details

AST is developing a series of online courses, free to our members, each taught by a team of volunteer instructors. AST will grant a certificate in software testing to members who have completed 10 courses. (We hope to develop our 10th course by July 2009).

AST trains and certifies instructors: The main requirements for an AST member to be certified to teach an AST course are (a) completing the AST course on teaching online, (b) teaching the same course three times under supervision, (c) approval by the currently-certified instructors for that course, and agreement to teach the course a few times for AST (for free).

As a certified instructor, you can offer the course for AST credit: for AST (for free), for your company, or on your own. You or your company can charge fees without paying AST any royalties or other fees. (AST can only offer each free course a few times per year–if demand outstrips that supply, instructors will have a business opportunity to fill that gap.)

This Tutorial, the day after CAST, satisfies the Instructors Orientation Course requirement for prospective AST-certified instructors.

This workshop will use presentations, lectures, and hands-on exercises to address the challenges of teaching online. (Bring your laptop and wireless card if you can.) The presenters will merge instructional theory and assessment theory to show you how they developed the AST-BBST online instructional model. Over lunch, Scott Barber will lead a panel discussion of AST members who are working on AST Instructor Certification.

Your registration includes a boxed lunch and light snacks in the morning and afternoon.

This workshop is partially based on research that was supported by NSF Grants EIA-0113539 ITR/SY+PE:“Improving the Education of Software Testers and CCLI-0717613 “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this workshop are those of the presenter(s) and do not necessarily reflect the views of the National Science Foundation.

Sponsorship opportunities for CAST2008 still available

May 23rd, 2008

There are still sponsorship opportunities available for CAST08, the annual conference of the Association for Software Testing.The AST is a non-profit professional association with a mission of advancing the understanding and practice of software testing through conferences, LAWSTâ„¢ style peer workshops, publications, training courses, web sites, and other services.

This is an exciting year for CAST. We are planning for 350 participants to join us this year at 89 Chestnut in Toronto, Canada. We’ve secured Gerald M. Weinberg, Cem Kaner, Robert Sabourin, and Brian Fish as keynote speakers related to the conference theme of Beyond the Boundaries: Interdisciplinary Approaches to Software Testing. Pre-conference tutorials are being given by Jerry Weinberg, Scott Barber, Hung Nguyen, and Julian Harty. The rest of the program is equally impressive in terms of content and speaker quality. As an added bonus, Jerry has scheduled the official launch of his new book Perfect Software and Other Testing Myths to be at CAST. These are just some of the reasons that CAST attracts some of the most well known and influential testers, testing consultants, and trainers of testers from around the globe to attend as participants.

CAST’s Sponsorship and Exhibitor Programs include what you’d expect from a conference in our industry, but probably at a lower price than you might be used to.  There are packages where promotional material is delivered to participants on your behalf, meal and artifact (participant backpacks, proceedings CDs, etc.) sponsorship opportunities, as well as exhibitor booth options. Additionally, this year we have added a “Vendors & Service Providers Only” track at the conference.  This is where you get to come and discuss, demonstrate, answer questions about, and/or introduce your products and services to interested CAST participants. We do require vendors to follow the same 2/3, 1/3 rule that CAST mandates for all sessions, that is: You present for up to 2/3 of the total time slot, you leave at least 1/3 of the time for facilitated questions and answers. Last year we piloted this idea with 2 sessions. Participants *loved* that the vendors were presenting then sticking around to field hard questions and the vendors who presented reported that in addition to high quality leads and contacts, they gained significantly more valuable market information than they get as exhibitors or vendor-presenters at other conferences.

I hope you find all the information you need to decide what level of sponsorship is right for you in the link below, but if you do not find what you need, please do not hesitate to contact AST’s Executive Director and this year’s CAST Sponsorship Chairperson, Scott Barber at
executive.director@associationforsoftwaretesting.org

CAST 2008 Sponsorship and Exhibitor Program:
http://www.cast2008.org/sponsors

Software error and the meltdown of US finances

May 22nd, 2008

“LONDON, May 21 (Reuters) – A computer coding error led Moody’s Investors Service to assign incorrect triple-A ratings to a complex debt product that came to mark the peak of the credit boom, the Financial Times said on Wednesday. (see www.forbes.com/reuters/feeds/reuters/2008/05/21/2008-05-21T075644Z_01_L21551923_RTRIDST_0_MOODYS-CPDOS.html. For more, see blogs.spectrum.ieee.org/riskfactor/2008/05/moodys_rating_bug_gives_credit.html and www.ft.com/cms/s/0/0c82561a-2697-11dd-9c95-000077b07658.html?nclick_check=1 or just search for Moody’s software error.

This is the kind of stuff David Pels and I expected when we fought the Uniform Computer Information Transactions Act (UCITA) back in the 1990’s. UCITA was written as a shield for large software publishers, consulting firms, and other information publishers. It virtually wiped out liability for defects in information-industry products or services, expanded intellectual property rights well beyond what the Copyright Act and the patent laws provide, and helped companies find ways to expand their power to block reverse engineering of products to check for functional or security defects and to publicly report those defects.

UCITA was ultimately adopted only in Virginia and Maryland, rejected in all other American states, but largely imported into most states by judicial rulings (a fine example of “judicial activism”–imposing rules on a state even after its legislators rejected them. People who still whine about left-wing judicial activism are still stuck in the 1970’s).

David Pels and I wrote a book, “Bad Software” on the law of software quality circa 1998. It provides a striking contrast between software customers’ rights in the 1990s and the vastly-reduced rights we have come to expect today, along with some background on the UCITA legislation (UCITA was then called “Article 2B”–as part of a failed effort to add a new Article to the Uniform Commercial Code). John Wiley originally published Bad Software, but they have let me post a free copy at the web. You can find it at http://www.badsoftware.com/wiki/

Political activism

May 19th, 2008

As you’ve been able to see for a while, from his campaign picture on my blog, I support Barack Obama for U.S. president.

This blog isn’t the vehicle that I want to use for political discussion. Instructional theory, engineering practice, and the legal context (the evolution of the law of software quality) yes. Election-year politics, no.

If you’re interested in my more political views, see my new blog at http://my.barackobama.com/page/community/blog/cemkaner

Program published for CAST (Conference of the Association for Software Testing)

April 25th, 2008

The CAST program is at http://www.associationforsoftwaretesting.org/drupal/CAST2008/Program

CAST is a special conference because of its emphasis on open discussion and critical thinking.

For example, all speakers, including keynotes, are subject to unlimited questioning. If the discussion of a keynote goes beyond the end of the keynote time, we either move the next session to a new room, or move the keynote discussion to a new room. For example, the “questioning” for a keynote by James Bach included a full critical counterpresentation by one of the people in the audience, followed by a long debate. Several 1-hour track presentations have morphed into 2-hour (moderated) discussions.

Four more presentations

March 30th, 2008

“Adapting Academic Course Materials in Software Testing for Industrial Professional Development.” [SLIDES] Colloquium, Florida Institute of Technology, March 2008

The Association for Software Testing and I have been adapting the BBST course for online professional development. This presentation updates my students and colleagues at work on what we’re doing to transfer fairly rigorous academic course materials and teaching methods to a practitioner audience.

These next three are reworkings of presentations I’ve given a few times before:

“Software testing as a social science,” [SLIDES] STEP 2000 Workshop on Software Testing, Memphis, May 2008.

Social sciences study humans, especially humans in society. The social scientist’s core question, for any new product or technology is, “What will be the impact of X on people?” Social scientists normally deal with ambiguous issues, partial answers, situationally specific results, diverse interpretations and values– and they often use qualitative research methods. If we think about software testing in terms of the objectives (why we test) and the challenges (what makes testing difficult) rather than the methods and processes, then I think testing is more like a social science than like programming or manufacturing quality control. As with all social sciences, tools are important. But tools are what we use, not why we use them.

“The ongoing revolution in software testing,” [SLIDES] October 2007

My intent in this talk is to challenge an orthodoxy in testing, a set of ommonly accepted assumptions about our mission, skills, and onstraints, including plenty that seemed good to me when I published them in 1988, 1993 or 2001. Surprisingly, some of the old notions lost popularity in the 1990’s but came back under new marketing with the rise of eXtreme Programming.

I propose we embrace the idea that testing is an active, skilled technical investigation. Competent testers are investigators—clever, sometimes mischievous researchers—active learners who dig up information about
a product or process just as that information is needed.

I think that

  • views of testing that don’t portray testing this way are obsolete and counterproductive for most contexts and
  • educational resources for testing that don’t foster these skills and activities are misdirected and misleading.

“Software-related measurement: Risks and opportunties,” [SLIDES] October 2007

I’ve seen published claims that only 5% of software companies have metrics programs. Why so low? Are we just undisciplined and lazy? Most managers who I know have tried at least one measurement program–and abandoned them because so many programs do more harm than good, at a high cost. This session has
three parts:

  1. Measurement theory and how it applies to software development metrics (which, at their core, are typically human performance measures).
  2. A couple of examples of qualitative measurements that can drive useful behavior.
  3. (Consideration of client’s particular context–deleted.)

A few new papers and presentations

March 28th, 2008

I just posted a few papers to kaner.com. Here are the links and some notes:

Cem Kaner & Stephen J. Swenson, “Good enough V&V for simulations: Some possibly helpful thoughts from the law & ethics of commercial software.” Simulation Interoperability Workshop, Providence, RI, April 2008

What an interesting context for exploratory testers! Military application software that cannot be fully specified in advance, whose requirements and design will evolve continuously through the project. How should they test it? What is good enough? Stephen and I worked together to integrate ideas and references from both disciplines. There are a lot of very interesting papers on simulations, on the web, referenced in the bibliography.

Cem Kaner and Rebecca L. Fiedler, “A cautionary note on checking software engineering papers for plagiarism.” IEEE Transactions on Education, in press.

Journal of the Association for Software Testing hasn’t published its first issue yet, and we’re now rethinking our editorial objectives (more on that after the AST Board Meeting in April). One of the reasons is that over half of the papers submitted to the Journal were plagiarized. I’ve found significant degrees of plagiarism while reviewing submissions for other conferences and journals, and there’s too much in graduate theses/dissertations as well. One of the problems for scholarly publications and research supervisors is that current plagiarism-detection tools seem to promise more than they deliver. (Oh surprise, a software service that oversells itself!) This is the first of a series of papers on that problem.

Cem Kaner, “Improve the power of your tests with risk-based test design.” [SLIDES] QAI QUEST Conference, Chicago, April 2008

Conference keynote. My usual party line.

Cem Kaner, “A tutorial in exploratory testing.” [SLIDES] QAI QUEST Conference, Chicago, April 2008

Conference tutorial. This is a broader set of my slides on exploration than some people have seen before.

Cem Kaner, “BBST: Evolving a course in black box software testing.” [SLIDES] BBST Project Advisory Board Meeting, January 2008

Rebecca Fiedler and I lead a project to adapt a course on black box software testing from its traditional academic setting to commercial and academic-online settings. Much of the raw material (free testing videos) is at www.testingeducation.org/BBST. This is a status report to the project’s advisory board. If you’re interested in collaborating in this project, these slides will provide a lot more detail.

Cem Kaner, “I speak for the user: The problem of agency in software development,” Quality Software & Testing Magazine, April 2007

Short magazine article. Testers often appoint themselves as The User Representative. So do lots of other people. Who should take this role and why?

Cem Kaner & Sowmya Padmanabhan, “Practice and transfer of learning in the teaching of software testing,” [SLIDES] Conference on Software Engineering Education & Training, Dublin, July 2007.

This is a summary of Sowmya’s excellent M.Sc. thesis , “Domain Testing: Divide and Conquer

Writing Multiple Choice Test Questions

October 24th, 2007

SUMMARY

This is a tutorial on creating multiple choice questions, framed by Haladyna’s heuristics for test design and Anderson & Krathwohl’s update to Bloom’s taxonomy. My interest in computer-gradable test questions is to support teaching and learning rather than high-stakes examination. Some of the design heuristics are probably different for this case. For example, which is the more desirable attribute for a test question:

  1. defensibility (you can defend its fairness and appropriateness to a critic) or
  2. potential to help a student gain insight?

In high-stakes exams, (a) [defensibility] is clearly more important, but as a support for learning, I’d rather have (b) [support for insight].

This tutorial’s examples are from software engineering, but from my perspective as someone who has also taught psychology and law, I think the ideas are applicable across many disciplines.

The tutorial’s advice and examples specifically target three projects:

CONTENTS

STANDARDS SPECIFIC TO THE BBST AND OPEN CERTIFICATION QUESTIONS

1. Consider a question with the following structure:

Choose the answer:

  1. First option
  2. Second option

The typical way we will present this question is:

Choose the answer:

  1. First option
  2. Second option
  3. Both (a) and (b)
  4. Neither (a) nor (b)

  • If the correct answer is (c) then the examinee will receive 25% credit for selecting only (a) or only (b).

2. Consider an question with the following structure:

Choose the answer:

  1. First option
  2. Second option
  3. Third option

The typical way we will present this question is:

Choose the answer:

  1. First option
  2. Second option
  3. Third option
  4. (a) and (b)
  5. (a) and (c)
  6. (b) and (c)
  7. (a) and (b) and (c)

  • If the correct answer is (d), the examinee will receive 25% credit for selecting only (a) or only (b). Similarly for (e) and (f).
  • If the correct answer is (g) (all of the above), the examinee will receive 25% credit for selecting (d) or (e) or (f) but nothing for the other choices.

3. Consider an question with the following structure:

Choose the answer:

  1. First option
  2. Second option
  3. Third option
  4. Fourth option

The typical ways we might present this question are:

Choose the answer:

  1. First option
  2. Second option
  3. Third option
  4. Fourth option

OR

Choose the answer:

  1. First option
  2. Second option
  3. Third option
  4. Fourth option
  5. (a) and (c)
  6. (a) and (b) and (d)
  7. (a) and (b) and (c) and (d)

There will be a maximum of 7 choices.

The three combination choices can be any combination of two, three or four of the first four answers.

  • If the correct answer is like (e) (a pair), the examinee will receive 25% credit for selecting only (a) or only (b) and nothing for selecting a combination that includes (a) and (b) but also includes an incorrect choice.
  • If the correct answer is (f) (three of the four), the examinee will receive 25% credit for selecting a correct pair (if (a) and (b) and (d) are all correct, then any two of them get 25%) but nothing for selecting only one of the three or selecting a choice that includes two or three correct but also includes an incorrect choice.
  • If the correct answer is (g) (all correct), the examinee will receive a 25% credit for selecting a correct triple.

DEFINITIONS AND EXAMPLES

Definitions

Here are a few terms commonly used when discussing the design of multiple choice questions. See the Reference Examples, below.

  • Test: In this article, the word “test” is ambiguous. Sometimes we mean a software test (an experiment that can expose problems in a computer program) and sometimes an academic test (a question that can expose problems in someone’s knowledge). In these definitions, “test” means “academic test.”

  • Test item: a test item is a single test question. It might be a multiple choice test question or an essay test question (or whatever).
  • Content item: a content item is a single piece of content, such as a fact or a rule, something you can test on.
  • Stem: The opening part of the question is called the stem. For example, “Which is the best definition of the testing strategy in a testing project? ” is Reference Example B’s stem.
  • Distractor: An incorrect answer. In Reference Example B, (b) and (c) are distractors.
  • Correct choice: The correct answer for Reference Example B is (a) “The plan for applying resources and selecting techniques to achieve the testing mission.”
  • The Question format: The stem is a complete sentence and asks a question that is answered by the correct choice and the distractors. Reference Example A has this format.
  • The Best Answer format: The stem asks a complete question. Most or all of the distractors and the correct choice are correct to some degree, but one of them is stronger than the others. In Reference Example B, all three answers are plausible but in the BBST course, given the BBST lectures, (a) is the best.
  • The Incomplete Stem format: The stem is an incomplete sentence that the correct choice and distractors complete. Reference Example C has this format.
  • Complex formats: In a complex-format question, the alternatives include simple answers and combinations of these answers. In Reference Example A, the examinee can choose (a) “We can never be certain that the program is bug free” or (d) which says that both (a) and (b) are true or (f) which says that all of the simple answers (a, b and c) are true.
  • Learning unit: A learning unit typically includes a limited set of content that shares a common theme or purpose, plus learning support materials such as a study guide, test items, an explicit set of learning objectives, a lesson plan, readings, lecture notes or video, etc.
  • High-stakes test: A test is high-stakes if there are significant benefits for passing the test or significant costs of failing it.

The Reference Examples

For each of the following, choose one answer.

A. What are some important consequences of the impossibility of complete testing?

  1. We can never be certain that the program is bug free.
  2. We have no definite stopping point for testing, which makes it easier for some managers to argue for very little testing.
  3. We have no easy answer for what testing tasks should always be required, because every task takes time that could be spent on other high importance tasks.
  4. (a) and (b)
  5. (a) and (c)
  6. (b) and (c)
  7. All of the above

B. Which is the best definition of the testing strategy in a testing project?

  1. The plan for applying resources and selecting techniques to achieve the testing mission.
  2. The plan for applying resources and selecting techniques to assure quality.
  3. The guiding plan for finding bugs.

C. Complete statement coverage means …

  1. That you have tested every statement in the program.
  2. That you have tested every statement and every branch in the program.
  3. That you have tested every IF statement in the program.
  4. That you have tested every combination of values of IF statements in the program.

D. The key difference between black box testing and behavioral testing is that:

  1. The test designer can use knowledge of the program’s internals to develop a black box test, but cannot use that knowledge in the design of a behavioral test because the behavioral test is concerned with behavior, not internals.
  2. The test designer can use knowledge of the program’s internals to develop a behavioral test, but cannot use that knowledge in the design of a black box test because the designer cannot rely on knowledge of the internals of the black box (the program).
  3. The behavioral test is focused on program behavior whereas the black box test is concerned with system capability.
  4. (a) and (b)
  5. (a) and (c)
  6. (b) and (c)
  7. (a) and (b) and (c)

E. What is the significance of the difference between black box and glass box tests?

  1. Black box tests cannot be as powerful as glass box tests because the tester doesn’t know what issues in the code to look for.
  2. Black box tests are typically better suited to measure the software against the expectations of the user, whereas glass box tests measure the program against the expectations of the programmer who wrote it.
  3. Glass box tests focus on the internals of the program whereas black box tests focus on the externally visible behavior.

ITEM-WRITING HEURISTICS

Several papers on the web organize their discussion of multiple choice tests around a researched set of advice from Haladyna, Downing & Rodriguez or the updated list from Haladyna (2004). I’ll do that too, tying their advice to back to our needs for software testing.

Content Guidelines
1
Every item should reflect specific content and a single specific cognitive process, as called for in the test specifications (table of specifications, two-way grid, test blueprint).
2
Base each item on important content to learn; avoid trivial content.
3
Use novel material to meaure understanding and the application of knowledge and skills.
4
Keep the content of an item independent from content of other items on the test.
5
Avoid overspecific and overgeneral content.
6
Avoid opinion-based items.
7
Avoid trick items.
8
Format items vertically instead of horizontally.
Style and Format Concerns
9
Edit items for clarity.
10
Edit items for correct grammar, punctuation, capitalization and spelling.
11
Simplify vocabulary so that reading comprehension does not interfere with testing the content intended.
12
Minimize reading time. Avoid excessive verbiage.
13
Proofread each item.
Writing the Stem
14
Make the directions as clear as possible.
15
Make the stem as brief as possible.
16
Place the main idea of the item in the stem, not in the choices.
17
Avoid irrelevant information (window dressing).
18
Avoid negative words in the stem.
Writing Options
19
Develop as many effective options as you can, but two or three may be sufficient.
20
Vary the location of the right answer according to the number of options. Assign the position of the right answer randomly.
21
Place options in logical or numerical order.
22
Keep options independent; choices should not be overlapping.
23
Keep the options homogeneous in content and grammatical structure.
24
Keep the length of options about the same.
25
“None of the above” should be used sparingly.
26
Avoid using “all of the above.”
27
Avoid negative words such as not or except.
28
Avoid options that give clues to the right answer.
29
Make all distractors plausible.
30
Use typical errors of students when you write distractors.
31
Use humor if it is compatible with the teacher; avoid humor in a high-stakes test.

Now to apply those to our situation.

CONTENT GUIDELINES

1. Every item should reflect specific content and a single specific cognitive process, as called for in the test specifications (table of specifications, two-way grid, test blueprint).

Here are the learning objectives from the AST Foundations course. Note the grid (the table), which lists the level of knowledge and skills in the course content and defines the level of knowledge we hope the learner will achieve. For discussions of level of knowledge, see my blog entries on Bloom’s taxonomy [1] [2] [3]:

Learning Objectives of the AST Foundations Course Anderson / Krathwohl level
1 Familiar with basic terminology and how it will be used in the BBST courses Understand
2 Aware of honest and rational controversy over definitions of common concepts and terms in the field Understand
3 Understand there are legitimately different missions for a testing effort. Understand the argument that selection of mission depends on contextual factors . Able to evaluate relatively simple situations that exhibit strongly different contexts in terms of their implication for testing strategies. Understand, Simple evaluation
4 Understand the concept of oracles well enough to apply multiple oracle heuristics to their own work and explain what they are doing and why Understand and apply
5 Understand that complete testing is impossible. Improve ability to estimate and explain the size of a testing problem. Understand, rudimentary application
6 Familiarize students with the concept of measurement dysfunction Understand
7 Improve students’ ability to adjust their focus from narrow technical problems (such as analysis of a single function or parameter) through broader, context-rich problems Analyze
8 Improve online study skills, such as learning more from video lectures and associated readings Apply
9 Improve online course participation skills, including online discussion and working together online in groups Apply
10 Increase student comfort with formative assessment (assessment done to help students take their own inventory, think and learn rather than to pass or fail the students) Apply

For each of these objectives, we could list the items that we want students to learn. For example:

  • list the terms that students should be able to define
  • list the divergent definitions that students should be aware of
  • list the online course participation skills that students should develop or improve.

We could create multiple choice tests for some of these:

  • We could check whether students could recognize a term’s definition.
  • We could check whether students could recognize some aspect of an online study skill.

But there are elements in the list that aren’t easy to assess with a multiple choice test. For example, how can you tell whether someone works well with other students by asking them multiple choice questions? To assess that, you should watch how they work in groups, not read multiple-choice answers.

Now, back to Haladyna’s first guideline:

  • Use an appropriate type of test for each content item. Multiple choice is good for some, but not all.
  • If you use a multiple choice test, each test item (each question) should focus on a single content item. That might be a complex item, such as a rule or a relationship or a model, but it should be something that you and the student would consider to be one thing. A question spread across multiple issues is confusing in ways that have little to do with the content being tested.
  • Design the test item to assess the material at the right level (see the grid, above). For example, if you are trying to learn whether someone can use a model to evaluate a situation, you should ask a question that requires the examinee to apply the model, not one that just asks whether she can remember the model.

When we work with a self-contained learning unit, such as the individual AST BBST courses and the engineering ethics units, it should be possible to list most of the items that students should learn and the associated cognitive level.

However, for the Open Certification exam, the listing task is much more difficult because it is fair game to ask about any of the field’s definitions, facts, concepts, models, skills, etc. None of the “Body of Knowledge” lists are complete, but we might use them as a start for brainstorming about what would be useful questions for the exam.

The Open Certification (OC) exam is different from other high-stakes exams because the OC question database serves as a study guide. Questions that might be too hard in a surprise-test (a test with questions you’ve never seen before) might be instructive in a test database that prepares you for an exam derived from the database questions–especially when the test database includes discussion of the questions and answers, not just the barebones questions themselves.

2. Base each item on important content to learn; avoid trivial content.

The heuristic for Open Certification is: Don’t ask the question unless you think a hiring manager would actually care whether this person knew the answer to it.

3. Use novel material to meaure understanding and the application of knowledge and skills.

That is, reword the idea you are asking about rather than using the same words as the lecture or assigned readings. This is important advice for a traditional surprise test because people are good matchers:

  • If I show you exactly the same thing that you saw before, you might recognize it as familiar even if you don’t know what it means.
  • If I want to be a nasty trickster, I can put exact-match (but irrelevant) text in a distractor. You’ll be more likely to guess this answer (if you’re not sure of the correct answer) because this one is familiar.

This is important advice for BBST because the student can match the words to the readings (in this open book test) without understanding them. In the open book exam, this doesn’t even require recall.

On the other hand, especially in the open book exams, I like to put exact matches in the stem. The stem is asking a question like, What does this mean? or What can you do with this? If you use textbook phrases to identify the this, then you are helping the student figure out where to look for possible answers. In the open book exam, the multiple choice test is a study aid. It is helpful to orient the student to something you want him to think about and read further about.

4. Keep the content of an item independent from content of other items on the test.

Suppose that you define a term in one question and then ask how to apply the concept in the next. The student who doesn’t remember the definition will probably be able to figure it out after reading the next question (the application).

It’s a common mistake to write an exam that builds forward without realizing that the student can read the questions and answer them in any order.

5. Avoid overspecific and overgeneral content.

The concern with questions that are overly specific is that they are usually trivial. Does it really matter what year Boris Beizer wrote his famous Software Testing Techniques? Isn’t it more important to know what techniques he was writing about and why?

There are some simple facts that we might expect all testers to know.

For example, what’s the largest ASCII code in the lower ASCII character set, and what character does it signify?

The boundary cases for ASCII might be core testing knowledge, and thus fair game.

However, in most cases, facts are easy to look up in books or with an electronic search. Before asking for a memorized fact, ask why you would care whether the tester had memorized that fact or not.

The concern with questions that are overly general is that they are also usually trivial–or wrong–or both.

6. Avoid opinion-based items.

This is obvious, right? A question is unfair if it asks for an answer that some experts would consider correct and rejects an answer that other experts would consider correct.

But we have this problem in testing.

There are several mutually exclusive definitions of “test case.” There are strong professional differences about the value of a test script or the utility of the V-model or even whether the V-model was implicit in the waterfall model (read the early papers) or a more recent innovation.

Most of the interesting definitions in our field convey opinions, and the Standards that assert the supposedly-correct definitions get that way by ignoring the controversies.

What tactics can we use to deal with this?

a. The qualified opinion.

For example, consider this question:

“The definition of exploratory testing is…”

and this answer:

“a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the value of her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.”

Is the answer correct or not?

Some people think that exploratory testing is bound tightly to test execution; they would reject the definition.

On the other hand, if we changed the question to,

“According to Cem Kaner, the definition of exploratory testing is…”

that long definition would be the right answer.

Qualification is easy in the BBST course because you can use the qualifier, According to the lecture. This is what the student is studying right now and the exam is open book, so the student can check the fact easily.

Qualification is more problematic for closed-book exams like the certification exam. In this general case, can we fairly expect students to know who prefers which definition?

The problem is that qualified opinions contain an often-trivial fact. Should we really expect students or certification-examinees to remember definitions in terms of who said what? Most of the time, I don’t think so.

b. Drawing implications

For example, consider asking a question in one of these ways:

  • If A means X, then if you do A, you should expect the following results.
  • Imagine two definitions of A: X and Y. Which bugs would you be more likely to expose if you followed X in your testing and which if you followed Y?
  • Which definition of X is most consistent with theory Y?

7. Avoid trick items.

Haladyna (2004, p. 104) reports work by Roberts that identified several types of (intentional or unintentional) tricks in questions:

    1. The item writer’s intention appeared to deceive, confuse, or mislead test takers.
    2. Trivial content was represented (which vilates one of our item-writing guidelines)
    3. The discrimination among options was too fine.
    4. Items had window dressing that was irrelevant to the problem.
    5. Multiple correct answers were possible.
    6. Principles were presented in ways that were not learned, thus deceiving students.
    7. Items were so highly ambiguous that even the best students had no idea about the right answer.

Some other tricks that undermine accurate assessment:

    1. Put text in a distractor that is irrelevant to the question but exactly matches something from the assigned readings or the lecture.
    2. Use complex logic (such as not (A and B) or a double negative) — unless the learning being tested involves complex logic.
    3. Accurately qualify a widely discreted view: According to famous-person, the definition of X is Y, where Y is a definition no one accepts any more, but famous-person did in fact publish it.
    4. In the set of items for a question, leave grammatical errors in all but the second-best choice. (Many people will guess that the grammatically-correct answer is the one intended to be graded as correct.)

Items that require careful reading are not necessarily trick items. This varies from field to field. For example, my experience with exams for lawyers and law students is that they often require very precise reading. Testers are supposed to be able to do very fine-grained specification analysis.

Consider Example D:

D. The key difference between black box testing and behavioral testing is that:

The options include several differences that students find plausible. Every time I give this question, some students choose a combination answer (such as (a) and (b)). This is a mistake, because the question calls for “The key difference,” and that cannot be a collection of two or more differences.

Consider Example E:

E. What is the significance of the difference between black box and glass box tests?

A very common mistake is to choose this answer:

Glass box tests focus on the internals of the program whereas black box tests focus on the externally visible behavior.

The answer is an accurate description of the difference, but it says nothing about the significance of the difference. Why would someone care about the difference? What is the consequence of the difference?

Over time, students learn to read questions like this more carefully. My underlying assumption is that they are also learning or applying, in the course of this, skills they need to read technical documents more carefully. Those are important skills for both software testing and legal analysis and so they are relevant to the courses that are motivating this tutorial. However, for other courses, questions like these might be less suitable.

On a high-stakes exam, with students who had not had a lot of exam-preparation training, I would not ask these questions because I would not expect students to be prepared for them. On the high-stakes exam, the ambiguity of a wrong answer (might not know the content vs. might not have parsed the question carefully) could lead to the wrong conclusion about the student’s understanding of the material.

In contrast, in an instructional context in which we are trying to teach students to parse what they read with care, there is value in subjecting students to low-risk reminders to read with care.

STYLE AND FORMAT CONCERNS

8. Format items vertically instead of horizontally.

If the options are brief, you could format them as a list of items, one beside the next. However, these lists are often harder to read and it is much harder to keep formatting consistent across a series of questions.

9. Edit items for clarity.

I improve the clarity of my test items in several ways:

  • I ask colleagues to review the items.
  • I coteach with other instructors or with teaching assistants. They take the test and discuss the items with me.
  • I encourage students to comment on test items. I use course management systems, so it is easy to set up a question-discussion forum for students to query, challenge or complain about test items.

In my experience, it is remarkable how many times an item can go through review (and improvement) and still be confusing.

10. Edit items for correct grammar, punctuation, capitalization and spelling.

It is common for instructors to write the stem and the correct choice together when they first write the question. The instructor words the distractors later, often less carefully and in some way that is inconsistent with the correct choice. These differences become undesirable clues about the right and wrong choices.

11. Simplify vocabulary so that reading comprehension does not interfere with testing the content intended.

There’s not much point asking a question that the examinee doesn’t understand. If the examinee doesn’t understand the technical terms (the words or concepts being tested), that’s one thing. But if the examinee doesn’t understand the other terms, the question simply won’t reach the examinee’s knowledge.

12. Minimize reading time. Avoid excessive verbiage.

Students whose first language is not English often have trouble with long questions.

13. Proofread each item.

Despite editorial care, remarkably many simple mistakes survive review or are introduced by mechanical error (e.g. cutting and pasting from a master list to the test itself).

WRITING THE STEM

14. Make the directions as clear as possible.

Consider the following confusingly-written question:

A program will accept a string of letters and digits into a password field. After it accepts the string, it asks for a comparison string, and on accepting a new input from the customer, it compares the first string against the second and rejects the password entry if the strings do not match.

  1. There are 218340105584896 possible tests of 8-character passwords.
  2. This method of password verification is subject to the risk of input-buffer overflow from an excessively long password entry
  3. This specification is seriously ambiguous because it doesn’t tell us whether the program accepts or rejects/filters non-alphanumeric characters into the second password entry

Let us pretend that each of these answers could be correct. Which is correct for this question? Is the stem calling for an analysis of the number of possible tests, the risks of the method, the quality of the specification, or something else?

The stem should make clear whether the question is looking for the best single answer or potentially more than one, and whether the question is asking for facts, opinion, examples, reasoning, a calculation, or something else.

The reader should never have to read the set of possible answers to understand what the question is asking.

15. Make the stem as brief as possible.

This is part of the same recommendation as Heuristic #12 above. If the entire question should be as short as possible (#12), the stem should be as short as possible.

However, “as short as possible” does not necessarily mean “short.”

Here are some examples:

  • The stem describes some aspect of the program in enough detail that it is possible to compute the number of possible software test cases. The choices include the correct answer and three miscalculations.
  • The stem describes a software development project in enough detail that the reader can see the possibility of doing a variety of tasks and the benefits they might offer to the project, and then asks the reader to prioritize some of the tasks. The choices are of the form, “X is more urgent than Y.”
  • The stem describes a potential error in the code, the types of visible symptoms that this error could cause, and then calls for selection of the best test technique for exposing this type of bug.
  • The stem quotes part of a product specification and then asks the reader to identify an ambiguity or to identify the most serious impact on test design an ambiguity like this might cause.
  • The stem describes a test, a failure exposed by the test, a stakeholder (who has certain concerns) who receives failure reports and is involved in decisions about the budget for the testing effort, and asks which description of the failure would be most likely to be perceived as significant by that stakeholder. An even more interesting question (faced frequently by testers in the real world) is which description would be perceived as significant (credible, worth reading and worth fixing) by Stakeholder 1 and which other description would be more persuasive for Stakeholder 2. (Someone concerned with next months’ sales might assess risk very differently from someone concerned with engineering / maintenance cost of a product line over a 5-year period. Both concerns are valid, but a good tester might raise different consequences of the same bug for the marketer than for the maintenance manager).

Another trend for writing test questions that address higher-level learning is to write a very long and detailed stem followed by several multiple choice questions based on the same scenario.

Long questions like these are fair game (normal cases) in exams for lawyers, such as the Multistate Bar Exam. They are looked on with less favor in discplines that don’t demand the same level of skill in quickly reading/understanding complex blocks of text. Therefore, for many engineering exams (for example), questions like these are probably less popular.

  • They discriminate against people whose first language is not English and who are therefore slower readers of complex English text, or more generally against anyone who is a slow reader, because the exam is time-pressed.
  • They discriminate against people who understand the underlying material and who can reach an application of that material to real-life-complexity circumstances if they can work with a genuine situation or a realistic model (something they can appreciate in a hands-on way) but who are not so good at working from hypotheticals that abstract out all information that the examiner considers inessential.
  • They can cause a cascading failure. If the exam includes 10 questions based on one hypothetical and the examinee misunderstands that one hypothetical, she might blow all 10 questions.
  • They can demoralize an examinee who lacks confidence/skill with this type of question, resulting in a bad score because the examinee stops trying to do well on the test.

However, in a low-stakes exam without time limits, those concerns are less important. The exam becomes practice for this type of analysis, rather than punishment for not being good at it.

In software testing, we are constantly trying to simplify a complex product into testable lines of attack. We ignore most aspects of the product and design tests for a few aspects, considered on their own or in combination with each other. We build explicit or implicit mental models of the product under test, and work from those to the tests, and from the tests back to the models (to help us decide what the results should be). Therefore, drawing out the implications of a complex system is a survival skill for testers and questions of this style are entirely fair game–in a low stakes exam, designed to help the student learn, rather than a high-stakes exam designed to create consequences based on an estimate of what the student knows.

16. Place the main idea of the item in the stem, not in the choices.

Some instructors adopt an intentional style in which the stem is extremely short and the question is largely defined in the choices.

The confusingly-written question in Heuristic #14 was an example of a case in which the reader can’t tell what the question is asking until he reads the choices. In #14, there were two problems:

  • the stem didn’t state what question it was asking
  • the choices themselves were fundamentally different, asking about different dimensions of the situation described in the stem rather than exploring one dimension with a correct answer and distracting mistakes. The reader had to guess / decide which dimension was of interest as well as deciding which answer might be correct.

Suppose we fix the second problem but still have a stem so short that you don’t know what the question is asking for until you read the options. That’s the issue addressed here (Heuristic #16).

For example, here is a better-written question that doesn’t pass muster under Heuristic #16:

A software oracle:

  1. is defined this way
  2. is defined this other way
  3. is defined this other way

The better question under this heuristic would be:

What is the definition of a software oracle?

  1. this definition
  2. this other definition
  3. this other other definition

As long as the options are strictly parallel (they are alternative answers to the same implied question), I don’t think this is a serious a problem.

17. Avoid irrelevant information (window dressing).

Imagine a question that includes several types of information in its description of some aspect of a computer program:

  • details about how the program was written
  • details about how the program will be used
  • details about the stakeholders who are funding or authorizing the project
  • details about ways in which products like this have failed before

All of these details might be relevant to the question, but probably most of them are not relevant to any particular question. For example, to calculate the theoretically-possible number of tests of part of the program doesn’t require any knowledge of the stakeholders.

Information:

  • is irrelevant if you don’t need it to determine which option is the correct answer
  • unless the reader’s ability to wade through irrelevant information of this type in order to get to the right underlying formula (or generally, the right approach to the problem) is part of the

18. Avoid negative words in the stem.

Here are some examples of stems with negative structure:

  • Which of the following is NOT a common definition of software testing?
  • Do NOT assign a priority to a bug report EXCEPT under what condition(s)?
  • You should generally compute code coverage statistics UNLESS:

For many people, these are harder than questions that ask for the same information in a positively-phrased way.

There is some evidence that there are cross-cultural variations. That is, these questions are harder for some people than others because (probably) of their original language training in childhood. Therefore, a bad result on this question might have more to do with the person’s heritage than with their knowledge or skill in software testing.

However, the ability to parse complex logical expressions is an important skill for a tester. Programmers make lots of bugs when they write code to implement things like:

NOT (A OR B) AND C

So testers have to be able to design tests that anticipate the bug and check whether the programmer made it.

It is not unfair to ask a tester to handle some complex negation, if your intent is to test whether the tester can work with complex logical expressions. But if you think you are testing something else, and your question demands careful logic processing, you won’t know from a bad answer whether the problem was the content you thought you were testing or the logic that you didn’t consider.

Another problem is that many people read negative sentences as positive. Their eyes glaze over when they see the NOT and they answer the question as if it were positive (Which of the following IS a common definition of software testing?) Unless you are testing for glazy eyes, you should make the negation as visible as possible I use ITALICIZED ALL-CAPS BOLDFACE in the examples above.

WRITING THE CHOICES (THE OPTIONS)

19. Develop as many effective options as you can, but two or three may be sufficient.

Imagine an exam with 100 questions. All of them have two options. Someone who is randomly guessing should get 50% correct.

Now imagine an exam with 100 questions that all have four options. Under random guessing, the examinee should get 25%.

The issue of effectiveness is important because an answer that is not credible (not effective) won’t gain any guesses. For example, imagine that you saw this question on a quiz in a software testing course:

Green-box testing is:

  1. common at box manufacturers when they start preparing for the Energy Star rating
  2. a rarely-taught style of software testing
  3. a nickname used by automobile manufacturers for tests of hybrid cars
  4. the name of Glen Myers’ favorite book

I suspect that most students would pick choice 2 because 1 and 3 are irrelevant to the course and 4 is ridiculous (if it was a proper name, for example, “Green-box testing” would be capitalized.) So even though there appear to be 4 choices, there is really only 1 effective one.

The number of choices is important, as is the correction-for-guessing penalty, if you are using multiple choice test results to assign a grade or assess the student’s knowledge in way that carries consequences for the student.

The number of choices — the final score — is much less important if the quiz is for learning support rather than for assessment.

The Open Certification exam is for assessment and has a final score, but it is different from other exams in that examinees can review the questions and consider the answers in advance. Statistical theories of scoring just don’t apply well under those conditions.

20. Vary the location of the right answer according to the number of options. Assign the position of the right answer randomly.

There’s an old rule of thumb–if you don’t know the answer, choose the second one in the list. Some inexperienced exam-writers tend to put the correct answer in the same location more often than if they varied location randomly. Experienced exam-writers use a randomization method to eliminate this bias.

21. Place options in logical or numerical order.

The example that Haladyna gives is numeric. If you’re going to ask the examinee to choose the right number from a list of choices, then present them in order (like $5, $10, $20, $175) rather than randomly (like $20, $5, $175, $20).

In general, the idea underlying this heuristic is that the reader is less likely to make an accidental error (one unrelated to their knowledge of the subject under test) if the choices are ordered and formatted in the way that makes them as easy as possible to read quickly and understand correctly.

22. Keep options independent; choices should not be overlapping.

Assuming standard productivity metrics, how long should it take to create and document 100 boundary tests of simple input fields?

  1. 1 hour or less
  2. 5 hours or less
  3. between 3 and 7 hours
  4. more than 6 hours

These choices overlap. If you think the correct answer is 4 hours, which one do you pick as the correct answer?

Here is a style of question that I sometimes use that might look overlapping at first glance, but is not:

What is the best course of action in context C?

  1. Do X because of RY (the reason you should do Y).
  2. Do X because of RX (the reason you should do X, but a reason that the examinee is expected to know is impossible in context C)
  3. Do Y because of RY (the correct answer)
  4. Do Y because of RX

Two options tell you to do Y (the right thing to do), but for different reasons. One reason is appropriate, the other is not. The test is checking not just whether the examinee can decide what to do but whether she can correctly identify why to do it. This can be a hard question but if you expect a student to know why to do something, requiring them to pick the right reason as well as the right result is entirely fair.

23. Keep the options homogeneous in content and grammatical structure.

Inexperienced exam writers often accidentally introduce variation between the correct answer and the others. For example, the correct answer:

  • might be properly punctuated
  • might start with a capital letter (or not start with one) unlike the others
  • might end with a period or semi-colon (unlike the others)
  • might be present tense (the others in past tense)
  • might be active voice (the others in passive voice), etc.

The most common reason for this is that some exam authors write a long list of stems and correct answers, then fill the rest of the questions in later.

The nasty, sneaky tricky exam writer knows that test-wise students look for this type of variation and so introduces it deliberately:

Which is the right answer?

  1. this is the right answer
  2. This is the better-formatted second-best answer.
  3. this is a wrong answer
  4. this is another wrong answer

The test-savvy guesser will be drawn to answer 2 (bwaa-haaa-haa!)

Tricks are one way to keep down the scores of skilled guessers, but when students realize that you’re hitting them with trick questions, you can lose your credibility with them.

24. Keep the length of options about the same.

Which is the right answer?

  1. this is the wrong answer
  2. This is a really well-qualified and precisely-stated answer that is obviously more carefully considered than the others, so which one do you think is likely to be the right answer?.
  3. this is a wrong answer
  4. this is another wrong answer

25. “None of the above” should be used carefully.

As Haladyna points out, there is a fair bit of controversy over this heuristic:

  • If you use it, make sure that you make it the correct answer sometimes and the incorrect answer sometimes
  • Use it when you are trying to make the student actually solve a problem and assess the reasonability of the possible solutions

26. Avoid using “all of the above.”

The main argument against “all of the above” is that if there is an obviously incorrect option, then “all of the above” is obviously incorrect too. Thus, test-wise examinees can reduce the number of plausible options easily. If you are trying to statistically model the difficulty of the exam, or create correction factors (a “correction” is a penalty for guessing the wrong answer), then including an option that is obviously easier than the others makes the modeling messier.

In our context, we aren’t “correcting” for guessing or estimating the difficulty of the exam:

  • In the BBST (open book) exam, the goal is to get the student to read the material carefully and think about it. Difficulty of the question is more a function of difficulty of the source material than of the question.
  • In the Open Certification exam, every question appears on a public server, along a justification of the intended-correct answer and public commentary. Any examinee can review these questions and discussions. Some will, some won’t, some will remember what they read and some won’t, some will understand what they read and some won’t–how do you model the difficulty of questions this way? Whatever the models might be, the fact that the “all of the above” option is relatively easy for some students who have to guess is probably a minor factor.

Another argument is more general. Several authors, including Haladyna, Downing, & Rodriguez (2002), recommend against the complex question that allows more than one correct answer. This makes the question more difficult and more confusing for some students.

Even though some authors recommend against it, our question construction adopt a complex structure that allows selection of combinations (such as (a) and (b) as well as all of the above) — because other educational researchers consider this structure a useful vehicle for presenting difficult questions in a fair way. See for example Wongwiwatthananukit, Popovich & Bennett (2000) and their references.

Note that in the BBST / Open Certification structure, the fact that there is a combination choice or an all of the above choice is not informative because most questions have these.

There is a particular difficulty with this structure, however. Consider this question:

Choose the answer:

  1. This is the best choice
  2. This is a bad choice
  3. This is a reasonable answer, but (a) is far better–or this is really a subset of (a), weak on its own but it would be the only correct one if (a) was not present.
  4. (a) and (b)
  5. (a) and (c)
  6. (b) and (c)
  7. (a) and (b) and (c)

In this case, the student will have an unfairly hard time choosing between (a) and (e). We have created questions like this accidentally, but when we recognize this problem, we fix it in one of these ways:

Alternative 1. Choose the answer:

    1. This is the best choice
    2. This is a bad choice
    3. This is a reasonable answer, but (a) is far better–or this is really a subset of (a), weak on its own but it would be the only correct one if (a) was not present.
    4. This is a bad choice
    5. (a) and (b)
    6. (b) and (c)
    7. (a) and (b) and (c)

    In this case, we make sure that (a) and (c) is not available for selection.

Alternative 2. Choose the answer:

    1. This is the best choice
    2. This is a bad choice
    3. This is a reasonable answer, but (a) is far better–or this is really a subset of (a), weak on its own but it would be the only correct one if (a) was not present.
    4. This is a bad choice

    In this case, no combinations are available for selection.

27. Avoid negative words such as not or except.

This is the same advice, for the options, as we provided in Heuristic #18 for the stem, for the same reasons.

28. Avoid options that give clues to the right answer.

Some of the mistakes mentioned by Haladyna, Downing, & Rodriguez (2002) are:

  • Broad assertions that are probably incorrect, such as always, never, must, and absolutely.
  • Choices that sound like words in the stem, or words that sound like the correct answer
  • Grammatical inconsistencies, length inconsistencies, formatting inconsistencies, extra qualifiers or other obvious inconsistencies that point to the correct choice
  • Pairs or triplet options that point to the correct choice. For example, if every combination option includes (a) (such as (a) and (b) and (a) and (c) and all of the above) then it is pretty obvious that (a) is probably correct and any answer that excludes (a) (such as (b)) is probably wrong.

29. Make all distractors plausible.

This is important for two reasons:

  • If you are trying to do statistical modeling of the difficulty of the exam (“There are 4 choices in this question, therefore there is only a 25% chance of a correct answer from guessing”) then implausible distractors invalidate the model because few people will make this guess. However, in our tests, we aren’t doing this modeling so this doesn’t matter.
  • An implausible choice is a waste of space and time. If no one will make this choice, it is not really a choice. It is just extra text to read.

One reason that an implausible distractor is sometimes valuable is that sometimes students do pick obviously unreasonable distractors. In my experience, this happens when the student is:

  • ill, and not able to concentrate
  • falling asleep, and not able to concentrate
  • on drugs or drunk, and not able to concentrate or temporarily inflicted with a very strange sense of humor
  • copying answers (in a typical classroom test, looking at someone else’s exam a few feet away) and making a copying mistake.

I rarely design test questions with the intent of including a blatantly implausible option, but I am an inept enough test-writer that a few slip by anyway. These aren’t very interesting in the BBST course, but I have found them very useful in traditional quizzes in the traditionally-taught university course.

30. Use typical errors of students when you write distractors.

Suppose that you gave a fill-in-the-blank question to students. In this case, for example, you might ask the student to tell you the definition rather than giving students a list of definitions to choose from. If you gathered a large enough sample of fill-in-the-blank answers, you would know what the most common mistakes are. Then, when you create the multiple choice question, you can include these as distractors. The students who don’t know the right answer are likely to fall into one of the frequently-used wrong answers.

I rarely have the opportunity to build questions this way, but the principle carries over. When I write a question, I ask “If someone was going to make a mistake, what mistake would they make?”

31. Use humor if it is compatible with the teacher; avoid humor in a high-stakes test.

Robert F. McMorris, Roger A. Boothroyd, & ‌Debra J. Pietrangelo (1997) and Powers (2005) advocate for carefully controlled use of humor in tests and quizzes. I think this is reasonable in face-to-face instruction, once the students have come to know the instructor (or in a low-stakes test while students are getting to know the instructor). However, in a test that involves students from several cultures, who have varying degrees of experience with the English language, I think humor in a quiz can create more confusion and irritation than it is worth.

References

These notes summarize lessons that came out of the last Workshop on Open Certification (WOC 2007) and from private discussions related to BBST.

There’s a lot of excellent advice on writing multiple-choice test questions. Here are a few sources that I’ve found particularly helpful:

  1. Lorin Anderson, David Krathwohl, & Benjamin Bloom, Taxonomy for Learning, Teaching, and Assessing, A: A Revision of Bloom’s Taxonomy of Educational Objectives, Complete Edition, Longman Publishing, 2000.
  2. National Conference of Bar Examiners, Multistate Bar Examination Study Aids and Information Guides.
  3. Steven J. Burton, Richard R. Sudweeks, Paul F. Merrill, Bud Wood, How to Prepare Better Multiple-Choice Test Items: Guidelines for University Faculty, Brigham Young University Testing Services, 1991.
  4. Thomas M. Haladyna, Writing Test Items to Evaluate Higher Order Thinking, Allyn & Bacon, 1997.
  5. Thomas M. Haladyna, Developing and Validating Multiple-Choice Test Items, 3rd Edition, Lawrence Erlbaum, 2004.
  6. Thomas M. Haladyna, Steven M. Downing, Michael C. Rodriguez, A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment, Applied Measurement in Education, 15(3), 309–334, 2002.
  7. Robert F. McMorris, Roger A. Boothroyd, & ‌Debra J. Pietrangelo, Humor in Educational Testing: A Review and Discussion, Applied Measurement in Education, 10(3), 269-297, 1997.
  8. Ted Powers, Engaging Students with Humor, Association for Psychological Science Observer, 18(12), December 2005.
  9. The Royal College of Physicians and Surgeons of Canada, Developing Multiple Choice Questions for the RCPSC Certification Examinations.
  10. Supakit Wongwiwatthananukit, Nicholas G. Popovich, & Deborah E. Bennett, Assessing pharmacy student knowledge on multiple-choice examinations using partial-credit scoring of combined-response multiple-choice items, American Journal of Pharmaceutical Education, Spring, 2000.
  11. Bibliography and links on Multiple Choice Questions at http://ahe.cqu.edu.au/MCQ.htm

References to my blogs

7th Workshop on Teaching Software Testing, January 18-20, 2008

October 13th, 2007

This year’s Workshop on Teaching Software Testing (WTST) will be January 18-20 in Melbourne, Florida.

WTST is concerned with the practical aspects of teaching university-caliber software testing courses to academic or commercial students.

This year, we are particularly interested in teaching testing online. How can we help students develop testing skills and foster higher-order thinking in online courses?

We invite participation by:

  • academics who have experience teaching testing courses
  • practitioners who teach professional seminars on software testing
  • academic or practitioner instructors with significant online teaching experience and wisdom
  • one or two graduate students
  • a few seasoned teachers or testers who are beginning to build their strengths in teaching software testing.

There is no fee to attend this meeting. You pay for your seat through the value of your participation. Participation in the workshop is by invitation based on a proposal. We expect to accept 15 participants with an absolute upper bound of 25.

WTST is a workshop, not a typical conference. Our presentations serve to drive discussion. The target readers of workshop papers are the other participants, not archival readers. We are glad to start from already-published papers, if they are presented by the author and they would serve as a strong focus for valuable discussion.

In a typical presentation, the presenter speaks 10 to 90 minutes, followed by discussion. There is no fixed time for discussion. Past sessions’ discussions have run from 1 minute to 3 hours. During the discussion, a participant might ask the presenter simple or detailed questions, describe consistent or contrary experiences or data, present a different approach to the same problem, or (respectfully and collegially) argue with the presenter. In 20 hours of formal sessions, we expect to cover six to eight presentations.

We also have lightning presentations, time-limited to 5 minutes (plus discussion). These are fun and they often stimulate extended discussions over lunch and at night.

Presenters must provide materials that they share with the workshop under a Creative Commons license, allowing reuse by other teachers. Such materials will be posted at http://www.wtst.org.

SUGGESTED TOPICS

There are few courses in software testing, but a large percentage of software engineering practitioners do test-related work as their main focus. Many of the available courses, academic and commercial, attempt to cover so much material that they are superficial and therefore ineffective for improving students skills or ability to analyze and address problems of real-life complexity. Online courses might, potentially, be a vehicle for providing excellent educational opportunities to a diverse pool of students.

Here are examples of ideas that might help us learn more about providing testing education online in ways that realize this potential:

  • Instructive examples: Have you tried teaching testing online? Can you show us some of what you did? What worked? What didn’t? Why? What can we learn from your experience?
  • Instructive examples from other domains: Have you tried teaching something else online and learned lessons that would be applicable to teaching testing? Can you build a bridge from your experience to testing?
  • Instructional techniques, for online instruction, that help students develop skill, insight, appreciation of models and modeling, or other higher-level knowledge of the field. Can you help us see how these apply to testing-related instruction?
  • Test-related topics that seem particularly well-suited to online instruction: Do you have a reasoned, detailed conjecture about how to bring a topic online effectively? Would a workshop discussion help you develop your ideas further? Would it help the other participants understand what can work online and how to make it happen?
  • Lessons learned teaching software testing: Do you have experiences from traditional teaching that seem general enough to apply well to the online environment?
  • Moving from Face-to-Face to Online Instruction – How does one turn a face-to-face class into an effective online class? What works? What needs to change?
  • Digital Backpack – Students and instructors bring a variety of tools and technologies to today’s fully online or web-enhanced classroom. Which tools do today’s teachers need? How can those tools be used? What about students?
  • The Scholarship of Teaching and Learning – How does one research one’s own teaching? What methods capture improved teaching and learning or reveal areas needing improvement? How is this work publishable to meet promotion and tenure requirements?
  • Qualitative Methods – From sloppy anecdotal reports to rigorous qualitative design. How can we use qualitative methods to conduct research on the teaching of computing, including software testing?

TO ATTEND AS A PRESENTER

Please send a proposal BY DECEMBER 1, 2007 to Cem Kaner that identifies who you are, what your background is, what you would like to present, how long the presentation will take, any special equipment needs, and what written materials you will provide. Along with traditional presentations, we will gladly consider proposed activities and interactive demonstrations.

We will begin reviewing proposals on November 1. We encourage early submissions. It is unlikely but possible that we will have accepted a full set of presentation proposals by December 1.

Proposals should be between two and four pages long, in PDF format. We will post accepted proposals to http://www.wtst.org.

We review proposals in terms of their contribution to knowledge of HOW TO TEACH software testing. Proposals that present a purely theoretical advance in software testing, with weak ties to teaching and application, will not be accepted. Presentations that reiterate materials you have presented elsewhere might be welcome, but it is imperative that you identify the publication history of such work.

By submitting your proposal, you agree that, if we accept your proposal, you will submit a scholarly paper for discussion at the workshop by January 7, 2007. Workshop papers may be of any length and follow any standard scholarly style. We will post these at http://www.wtst.org as they are received, for workshop participants to review before the workshop.

TO ATTEND AS A NON-PRESENTING PARTICIPANT:

Please send a message by BY DECEMBER 1, 2007, to Cem Kaner that describes your background and interest in teaching software testing. What skills or knowledge do you bring to the meeting that would be of interest to the other participants?

ADVISORY BOARD MEETING

Florida Tech’s Center for Software Testing Education & Research has been developing a collection of hybrid and online course materials for teaching black box software testing. We now have NSF funding to adapt these materials for implementation by a broader audience. We are forming an Advisory Board to guide this adaptation and the associated research on the effectiveness of the materials in diverse contexts. The Board will meet before WTST, on January 17, 2008. If you are interested in joining the Board and attending the January meeting, please read this invitation and submit an application.

Acknowledgements

Support for this meeting comes from the Association for Software Testing and Florida Institute of Technology.

The hosts of the meeting are:

Research Funding and Advisory Board for the Black Box Software Testing (BBST) Course

October 12th, 2007

Summary: With some new NSF funding, we are researching and revising BBST to make it more available and more useful to more people around the world. The course materials will continue to be available for free. If you are interesting in joining an advisory board that helps us set direction for the course and the research surrounding the course, please contact me, describing your background in software-testing-related education, in education-related research, and your reason(s) for wanting to join the Board.

Starting as a joint project with Hung Quoc Nguyen in 1993, I’ve done a lot of development of a broad set of course materials for black box software testing. The National Science Foundation approved a project (EIA-0113539 ITR/SY+PE “Improving the Education of Software Testers) that evolved my commercial-audience course materials for an academic audience and researched learning issues associated with testing. The resulting course materials are at http://www.testingeducation.org/BBST, with lots of papers at http://www.testingeducation.org/articles and https://kaner.com/?page_id=7. The course materials are available for everyone’s use, for free, under a Creative Commons license.

During that research, I teamed up with Rebecca Fiedler, an experienced teacher (now an Assistant Professor of Education at St. Mary-of-the-Woods College in Terre Haute, Indiana, and also now my wife.) The course that Rebecca and I evolved turned traditional course design inside out in order to encourage students’ involvement, skill development and critical thinking. Rather than using class time for lectures and students’ private time for activities (labs, assignments, debates, etc.), we videotaped the lectures and required students to watch them before coming to class. We used class time for coached activities centered more on the students than the professor.

This looked like a pretty good teaching approach, our students liked it, and the National Science Foundation funded a project to extend this approach to developing course materials on software engineering ethics in 2006. (If you would like to collaborate with us on this project, or if you are a law student interested in a paid research internship, contact Cem Kaner.)

Recently, the National Science Foundation approved Dr. Fiedler’s and my project to improve the BBST course itself, “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing.” With funding running from October 1, 2007 through 2010, our primary goals are:

  • develop and sustain a cadre of academic, in-house, and commercial instructors via:
    • creating and offering an instructor orientation course online;
    • establishing an ongoing online instructors’ forum; and
    • hosting a number of face-to-face instructor meetings
  • offer and evaluate the course at collaborating research sites (including both universities and businesses)
  • analyze several collections of in-class activities to abstract a set of themes / patterns that can help instructors quickly create new activities as needed; and
  • extend instructional support material including grading guides and a pool of exam questions for teaching the course.

All of our materials—such as videos, slides, exams, grading guides, and instructor manuals—are Creative Commons licensed. Most are available freely to the public. A few items designed to help instructors grade student work will be available at no charge, but only to instructors.

Several individuals and organizations have agreed to collaborate in this work, including:

  • AppLabs Technologies. Representative: Geetha Narayanan, CSQA, PMP; Shyam Sunder Depuru.
  • Aztechsoft. Representative: Ajay Bhagwat.
  • The Association for Software Testing. Representative: Michael Kelly, President.
  • AST is breaking the course into several focused, online, mini-courses that run 1 month each. The courses are offered, for free, to AST members. AST is starting its second teaching of the Foundations course this week. We’ll teach Bug Advocacy in a month. As we develop these courses, we are training instructors who, after sufficient training, will teach the course(s) they are trained to teach for AST (free courses) as well as at their school or company (for free or fee, as they choose).

  • Dalhousie University. Representative: Professor Morven Gentleman.
  • Huston-Tillotson University, Computer Science Department. Representative: Allen M. Johnson, Jr., Ph.D.
  • Microsoft. Representative: Marianne Guntow.
  • PerfTest Plus. Representative: Scott Barber.
  • Quardev Laboratories. Representative: Jonathan Bach.
  • University of Illinois at Springfield, Computer Sciences Program. Representative: Dr. Keith W. Miller.
  • University of Latvia. Representative, Professor Juris Borzovs.

If you would like to collaborate on this project as well:

  1. Please read our research proposal.
  2. Please consider your ability to make a financial commitment. We are not asking for donations (well, of course, we would love to get donations, but they are not required) but you or your company would have to absorb the cost of travel to Board of Advisor meetings and you would probably come to the Workshop on Teaching Software Testing and/or the Conference of the Association for Software Testing. Additionally, teaching the course at your organization and collecting the relevant data would be at your expense. (My consultation to you on this teaching would be free, but if you needed me to fly to your site, that would be at your expense and might involve a fee.) We have a little bit of NSF money to subsidize travel to Board of Advisor meetings ($15,000 total for the three years) so we can subsidize travel to a small degree. But it is very limited, and especially little is available for corporations.
  3. Please consider your involvement. What do you want to do?
    • Join the advisory board, help guide the project?
    • Collaborate on the project as a fellow instructor (and get instructor training) ?
    • Come to the Workshop on Teaching Software Testing?
    • Help develop a Body of Knowledge to support the course materials?
    • Participate as a lecturer or on-camera discussant on the video courses?
    • Other stuff, such as …???
  4. Send me a note, that covers 1-3, introduces you and describes your background and interest.

The first meeting of the Advisory Board is January 17, 2008, in Melbourne, Florida. We will host the Workshop on Teaching Software Testing (WTST 2008) from January 18-20. I’ll post a Call for Participation for WTST 2008 on this blog tomorrow.

New Discussion List: Review the ALI Principles of the Law of Software Contracts

July 24th, 2007

New discussion group:
http://groups.yahoo.com/group/review_sw_contract_law

At the Conference of the Association for Software Testing, I presented an introduction to the Principles of the Law of Software Contracts, a project-in-the-works of the American Law Institute that I believe will frame this decade’s (2006-2016) debate over the regulation / enforcement of software contracts in the USA. For more information, see my blog, https://kaner.com/?p=19 and my CAST slides, https://kaner.com/pdfs/Law%20of%20Software%20Contracting.pdf

The ALI is open to critical feedback of its drafts. The Principles project will continue for several years.

The ALI is an association of lawyers, not software developers. If you want the law to sensibly guide engineering practice, you’re going to have to help these lawyers figure out what the implications of their proposals are and, where those implications are bad, what tweaks they might make to achieve their objectives more cleanly.

Note: Free Software Contracts are software contracts too. The same laws that govern enforceability of Microsoft’s software licenses govern the free software licenses. If you don’t think that gives people a chance to make mischief for GPL and Creative Commons, well, think again.

My goal is for this list to help people develop feedback memos for review by the ALI drafting committee.

Membership in this list is open, but messages are tightly moderated. I expect most members will be software development practitioners or academics, with a few journalists and lawyers thrown in. I intend a relatively low volume list with zero spam and reasonably collegial discussion. I am not at all opposed to the idea of developing contradictory sets of comments for ALI (one group submits one set, another group submits another set that completely disagrees with the first) but I will push for development of comments that are well reasoned and well supported by facts.

— Cem Kaner

A first look at the proposed Principles of the Law of Software Contracts

June 24th, 2007

The American Law Institute (ALI) is writing a new Principles of the Law of Software Contracts (PLSC), to replace the failed Uniform Computer Information Transactions Act (UCITA). I recently attended ALI’s annual meeting, in which we reviewed the Principles and am giving a first status report at the Conference of the Association for Software Testing (CAST July 9-10).
The Principles raise important issues for our community. For example:

  • They apply a traditional commercial rule to software contracting–a vendor who knows of a hidden material defect in its product but sells the product without disclosing the defect is liable to the customer for damage (including expenses) caused by the defect.
    • A defect is hidden if a reasonably observant person in the position of the buyer would not have found it in an inspection that one would expect a reasonably diligent customer to perform under the circumstances.
    • A defect is material if it is serious enough to be considered a significant breach of the contract.

    I think this is an excellent idea. It reflects the fact that no one can know all of the bugs in their product and lets vendors shield themselves from liability for defects they didn’t know about, but it demands that vendors reveal the serious bugs that they know about, so that customers can (a) make better informed purchase decisions and (b) avoid doing things that have serious effects.

I think we could help clarify it:

    • When should we hold a software company responsible for “knowing” about a defect? Is an irreproducible bug “known”? What about a bug that wasn’t found in the lab but a customer reported it? One customer reported it? 5000 customers reported it? How long should we allow the company to investigate the report before saying that they’ve had enough time to be held responsible for it?
    • What counts as disclosure to the customer? The bugs Firefox and OpenOffice are disclosed in their open-to-the-public bug databases. Is this good enough? Maybe it is for these, but I tried figuring out the bugs published in Apache’s database and for many reports, had no clue what these people were writing about. Serious problems were reported in ways that tied closely to the implementation and not at all to anything I would know to do or avoid. For their purpose (help the developers troubleshoot the bug), these reports might have been marvelous. But for disclosure to the customer? What should our standards be?
    • What is a material defect anyway? Do the criteria differ depending on the nature of the product? Is a rarely-occuring crash more material in a heart monitor than a word processor? And if a bug corrupts data in a word processor, do we have different expectations from Word, OpenOffice Writer, Wordpad, and my 12-year-old niece’s StudentProjectEditor?
    • What about the idea of security by obscurity? That some security holes won’t be exploited if no one knows about them, and so we should give the vendor a chance to fix some bugs before they disclose them? This is a controversial idea, but there is evidence that at least some problems are much more exploited after they are publicized than before.
  • Another issue is reverse engineering. Historically, reverse engineering of products (all products–hardware, software, chemical, mechanical, whatever) has been fair game. American know-how has a lot of “building a better mousetrap” in it, and to build a better one, you start by reverse engineering the current one. There have been some very recent, very expansive rulings (such as Baystate v Bowers) that have enforced EULA-based restrictions against software reverse engineering that they would exclude black box testing (for example, for a magazine review).
    • The Principles lay out several criteria under which unreasonable restrictions could be ruled unenforceable by a judge. To me, these seem like reasonable criteria (a contract clause is unenforceable if it conflicts with federal or state law, public policy (for example, as expressed in the constitution and statutes), is outrageously unfair, or would be considered an abuse of copyright or patent under (mainly) antitrust doctrines.
    • Does it make sense for us to identify solid examples of contexts in which reverse engineering should be permissible (for example, poses absolutely no threat to the vendor’s legitimate interests) and others in which the vendor might have more of a basis and rationale for seeking protection? We can identify these shades of gray with much more experience-based wisdom than lawyers who don’t even know what a loop is, let alone being unable to code their way out of one.

There are plenty of other issues. CAST will provide an opening forum for discussion–remember, CAST isn’t just a place for presentations. We make time for post-presentation discussion, for hours if the conference participants want to go that long.
.

Conference of the Association for Software Testing

May 29th, 2007

CAST (Conference of the Association for Software Testing) early bird
registration ends this week. Register at
http://associationforsoftwaretesting.org/conference/registration.html

I just got back from ICSE (the International Conference on Software
Engineering). ICSE is the premier conference for academic software
engineers–the conference acceptance standards are so high (only 5% of
submitted papers are accepted) that a paper at ICSE is ranked by
academic bean counters on a par with most computer science journals.

Many (20%?) of the ICSE sessions involved software testing. It was
frustrating to sit through them because the speaker would talk for a
while, say some outrageous things, some interesting things and some
confusing things–and then at the end of the talk, we got 3 minutes to
ask questions and argue with the speaker and the other participants.
Some talks (such as an invited address on the Future of Software
Testing) were clearly designed to be thought-provoking but with only 5
minutes for discussion, their impact evaporated quickly.

In AST, we think that testers, of all people, should critically evaluate
the ideas that are presented to them and we think that testers’
conferences should encourage testers to be testers.

We designed CAST to foster discussion. Every talk–every keynote, every
track talk, every tutorial–has plenty of time for discussion. If a talk
provokes a strong discussion, we delay the next talk. If it looks as
though the discussion will go for a long time, we move the next talk to
another room, so people can choose to stay with the discussion or go to
the next talk. I haven’t seen this at any other conference. It
contributes tremendously to the overall quality of the sessions.

By the way, if you want to come to the meeting to challenge what one of
the speakers is saying, you can prepare some remarks in advance. It is
important that your response be courteous and appreciative of the effort
the speaker has put into her presentation. If you have that, give Jon
Bach some notice and you will almost certainly get floor time right
after that speaker.

Another interesting event at CAST will be the full-day “tutorial” /
workshop / debate on certification. We had a keynote last year on
certification that stirred up so much discussion it nearly took over the
rest of the day. This time, it looks like we’ll have representatives of
all the main groups who certify software testers, plus an academic
curmudgeon (might be me) who is skeptical of the value of all of them.
If you have ever considered favoring certified software testers in your
hiring practices or taking a certification exam to improve your
marketability or credibility, this day will provide a six-way discussion
that compares and contrasts the different approaches to credentialing
software testers. I don’t think there has ever been a session like this
at any other testing conference.

ASSOCIATED WORKSHOPS
Before CAST, the Workshop on Heuristic & Exploratory Techniques will
explore boundary analysis (domain testing), how we actually decide what
boundaries to consider and what values to use. This might sound
straightforward, but in doing task analyses at some other meetings, we
discovered that this is a remarkably complex problem. For more
information, contact Jon Bach.

After CAST, the Workshop on Open Certification will meet to develop
certification questions and work out the logistics of developing the
Exam Server (we already have a Question Server running for the project).
If you want to contribute to this project, contact Mike Kelly or Cem
Kaner. For more info, see http://www.FreeTestingCertification.com

Open Certification at ACM SIGCSE

April 1st, 2007

Tim Coulter and I presented a poster session at the ACM SIGCSE conference.

Here is our poster (page 1) and (page 2)

Here is our paper: Creating an Open Certification Process

I was glad to see the Agile Alliance’s position on certification.

It’s unfortunate that several of the groups who present themselves as professional associations for software testers or software quality workers (e.g. American Society for Quality, British Computer Society, International Institute for Software Testing, Quality Assurance Institute) are selling certifications rather than warning people off of them as Agile Alliance is doing.

We created the Open Certification as an alternative approach. It still doesn’t measure skill. But it does offer four advantages:

  1. The large pool of questions will be public, with references. They will form a study guide.
  2. The pool is not derived from a single (antiquated) view of software testing. Different people with different viewpoints can add their own questions to the pool. If they have well-documented questions/answers, the questions will be accepted and can be included in a customizable exam.
  3. The exam can be run any time, anywhere. Instead of relying on a certificate, an employer can ask a candidate to retake the test and then discuss the candidate’s answers with her. The discussion will be more informative than any number of multiple-false answers.
  4. The exam is free.

The open certification is a bridge between the current certifications and the skill-based evaluations that interviewers should create for themselves or that might someday be available as exams (but are not available today).

The open certification exam isn’t available yet. We’re close to being finished the Question Server (see the paper), which is one of the key components. We’ll start work on the next piece at the Second Workshop on Open Certification, which will be right after CAST 2007. (More on both soon…)

In the meantime, my recommendation is simple:

  • If you are a job candidate considering becoming certified, save your money. Take courses that teach you about the practice of testing rather than about how to pass an exam. (If you want a free course, go to www.testingeducation.org/BBST).
  • If you are an employer considering what qualifications that testing candidates should have, my article on recruiting might help. If you can’t live with that and must use an examination, the following probably has as much validity as the certifications–and it has a high failure rate, so it must demonstrate high standards, right?
Test THIS Certification Exam
Professor Cem Kaner
is a
Purple People-Eating Magic Monkey
…with a Battle Rating of 9.1



To see if your Food-Eating Battle Monkey can
defeat Professor Cem Kaner, enter your name:

Updated publications page

January 10th, 2007

I just updated my website to include my conference talks and tutorials from the last 6 months and a new NSF proposal on software testing education. https://kaner.com/articles.html

_
_
_
_
_
_
_
_
_
_
_
_
_

Schools of software testing

December 22nd, 2006

Every few months, someone asks why James Bach, Bret Pettichord, and I discuss the software testing community in terms of “schools” or suggests that the idea is misguided, arrogant, divisive, inaccurate, or otherwise A Bad Thing. The most recent discussion is happening on the software-testing list (the context-driven testing school’s list on yahoogroups.com). It’s time that I blogged a clarifying response.
Perhaps in 1993, I started noticing that many test organizations (and many test-related authors, speakers and consultants, including some friends and other colleagues I respected) relied heavily — almost exclusively — on one or two main testing techniques. In my discussions with them, they typically seemed unaware of other techniques or uninterested in them.

  • For example, many testers talked about domain testing (boundary and equivalence class analysis) as the fundamental test design technique. You can generalize the method far beyond its original scope (analysis of input fields) to a strategy for reducing the infinity of potential tests to a manageably small, powerful subset via a stratified sampling strategy (see the printer testing chapter of Testing Computer Software for an example of this type of analysis). This is a compelling approach–it yields well-considered tests and enough of them to keep you busy from the start to the end of the project.
  • As a competing example, many people talked about scenario testing as the fundamental approach. They saw domain tests are mechanical and relatively trivial. Scenarios went to the meaning of the specification (where there was one) and to the business value and business risk of the (commercial) application. You needed subject matter experts to do really good scenario testing. To some of my colleagues, this greater need for deep understanding of the application was proof in its own right that scenario testing was far more important than the more mechanistic-seeming domain testing.

In 1995, James Bach and I met face-to-face for the first time. (We had corresponded by email for a long time, but never met.) We ended up spending half a day in a coffee shop at the Dallas airport comparing notes on testing strategy. He too had been listing these dominating techniques and puzzling over the extent to which they seemed to individually dominate the test-design thinking of many colleagues. James was putting together a list that he would soon publish in the first draft of the Satisfice Heuristic Test Strategy Model. I was beginning to use the list for consciousness-raising in my classes and consulting, encouraging people to add one or two more techniques to their repertoire.
James and I were pleasantly shocked to discover that our lists were essentially the same. Our names for the techniques were different, but the list covered the same approaches and we had both seen the same tunnel-vision (one or more companies or groups that did reasonably good work–as measured by the quality of bugs they were finding–that relied primarily on just this one technique, for each of the techniques). I think it was in that discussion that I suggested a comparison to Thomas Kuhn’s notion of paradigms (for a summary, read this article in the Stanford Encyclopedia of Philosophy), which we studied at some length in my graduate school (McMaster University, Psychology Ph.D. program).
The essence of a paradigm is that it creates (or defines) a mainstream of thinking, providing insights and direction for future research or work. It provides a structure for deciding what is interesting, what is relevant, what is important–and implicitly, it defines limits, what is not relevant, not particularly interesting, maybe not possible or not wise. The paradigm creates a structure for solving puzzles and people who solve the puzzles seen as important in the field are highly respected. Scientific paradigms often incorporate paradigmatic cases–exemplars–especially valuable examples that serve as models for future work or molds for future thought. At that meeting in 1995, or soon after that, we concluded that the set of dominant test techniques we were looking at could be thought of as paradigms for some / many people in the field.
This idea wouldn’t make sense in a mature science, because there is one dominating paradigm that creates a common (field-wide) vocabulary, a common sense of the history of the field, and a common set of cherished exemplars. However, in less mature disciplines that have not reached consensus, fragmentation is common. Here’s how the Stanford Encyclopedia of Philosophy summarizes it:

“In the postscript to the second edition of The Structure of Scientific Revolutions Kuhn says of paradigms in this sense that they are “the most novel and least understood aspect of this bookâ€? (1962/1970a, 187). The claim that the consensus of a disciplinary matrix is primarily agreement on paradigms-as-exemplars is intended to explain the nature of normal science and the process of crisis, revolution, and renewal of normal science. It also explains the birth of a mature science. Kuhn describes an immature science, in what he sometimes calls its ‘pre-paradigm’ period, as lacking consensus. Competing schools of thought possess differing procedures, theories, even metaphysical presuppositions. Consequently there is little opportunity for collective progress. Even localized progress by a particular school is made difficult, since much intellectual energy is put into arguing over the fundamentals with other schools instead of developing a research tradition. However, progress is not impossible, and one school may make a breakthrough whereby the shared problems of the competing schools are solved in a particularly impressive fashion. This success draws away adherents from the other schools, and a widespread consensus is formed around the new puzzle-solutions.”

James and I weren’t yet drawn to the idea of well-defined schools, because we see schools as having not only a shared perspective but also a missionary character (more on that later) and what we were focused on was the shared perspective. But the idea of competing schools allowed us conceptual leeway for thinking about, and describing, what we were seeing. We talked about it in classes and small meetings and were eventually invited to present it (Paradigms of Black Box Software Testing) as a pair of keynotes at the 16th International Conference and Exposition on Testing Computer Software (1999) and then at STAR West (1999).
At this point (1999), we listed 10 key approaches to testing:

  • Domain driven
  • Stress driven
  • Specification driven
  • Risk driven
  • Random / statistical
  • Function
  • Regression
  • Scenario / use case / transaction flow
  • User testing
  • Exploratory

There’s a lot of awkwardness in this list, but our intent was descriptive and this is what we were seeing. When people would describe how they tested to us, their descriptions often focused on only one (or two, occasionally three) of these ten techniques, and they treated the described technique(s), or key examples of it, as guides for how to design tests in the future. We were trying to capture that.
At this point, we weren’t explicitly advocating a school of our own. We had a shared perspective, we were talking about teaching people to pick their techniques based on their project’s context and James’ Heuristic Test Strategy Model (go here for the current version) made this explicit, but we were still early in our thinking about it. In contrast, the list of dominating techniques, with minor evolution, captured a significant pattern in our observations over perhaps 7-10 of each of our years.
I don’t think that many people saw this work as divisive or offensive — some did and we got some very harsh feedback from a few people. Others were intrigued or politely bored.
Several people were confused by it, not least because the techniques on this list were far from mutually exclusive. For example, you can apply a domain-driven analysis of the variables of individual functions in an exploratory way. Is this domain testing, function testing or exploratory testing?

  • One answer is “yes” — all three.
  • Another answer is that it is whichever technique is dominant in the mind of the tester who is doing this testing.
  • Another answer is, “Gosh, that is confusing, isn’t it? Maybe this model of “paradigms” isn’t the right subdivider of the diverging lines of testing thought.”

Over time, our thinking shifted about which answer was correct. Each of these would have been my answer — at different times. (My current answer is the third one.)
Another factor that was puzzling us was the weakness of communication among leaders in the field. At conferences, we would speak the same words but with different meanings. Even the most fundamental terms, like “test case” carried several different meanings–and we weren’t acknowledging the differences or talking about them. Many speakers would simply assume that everyone knew what term X meant, agreed with that definition, and agreed with whatever practices were impliedly good that went with those definitions. The result was that we often talked past each other at conferences, disagreeing in ways that many people in the field, especially relative newcomers, found hard to recognize or understand.
It’s easy to say that all testing involves analysis of the program, evaluation of the best types of tests to run, design of the tests, execution, skilled troubleshooting, and effective communication of the results. Analysis. Evaluation. Design. Test. Execution. Troubleshooting. Effective Communication. We all know what those words mean, right? We all know what good analysis is, right? So, basically, we all agree, right?
Well, maybe not. We can use the same words but come up with different analyses, different evaluations, different tests, different ideas about how to communicate effectively, and so on.
Should we gloss over the differences, or look for patterns in them?
James Bach, Bret Pettichord and I muddled through this as we wrote Lessons Learned in Software Testing. We brought a variety of other people into the discussions but as I vaguely recall it, the critical discussions happened in the context of the book. Bret Pettichord put the idea into a first-draft presentation for the 2003 Workshop on Teaching Software Testing. He has polished it since, but it is still very much a work in progress.
I’m still not ready to publish my version because I haven’t finished the literature review that I’d want to publish with it. We were glad to see Bret go forward with his talks, because they opened the door for peer review that provides the foundation for more polished later papers.
The idea that there can be different schools of thought in a field is hardly a new one — just check the 1,180,000 search results you get from Google or the 1,230,000 results you get from Yahoo when you search for the quoted phrase “schools of thought”.
Not everyone finds the notion of divergent schools of thought a useful heuristic–for example, read this discussion of legal schools of thought. However, in studying, teaching and researching experimental psychology, the identification of some schools was extremely useful. Not everyone belonged to one of the key competing schools. Not every piece of research was driven by a competing-school philosophy. But there were organizing clusters of ideas with charismatic advocates that guided the thinking and work of several people and generated useful results. There were debates between leaders of different schools, sometimes very sharp debates, and those debates clarified differences, points of agreement, and points of open exploration.
As I think of schools of thought, a school of testing would have several desirable characteristics:

  • The members share several fundamental beliefs, broadly agree on vocabulary, and will approach similar problems in compatible ways
    • members typically cite the same books or papers (or books and papers that same the same things as the ones most people cite)
    • members often refer to the same stories / myths and the same justifications for their practices
  • Even though there is variation from individual to individual, the thinking of the school is fairly comprehensive. It guides thinking about most areas of the job, such as:
    • how to analyze a product
    • what test techniques would be useful
    • how to decide that X is an example of good work or an example of weak work (or not an example of either)
    • how to interpret test results
    • how much troubleshooting of apparent failures, why, and how much troubleshooting by the testers
    • how to staff a test team
    • how to train testers and what they should be trained in
    • what skills (and what level of skill diversity) are valuable on the team
    • how to budget, how to reach agreements with others (management, programmers) on scope of testing, goals of testing, budget, release criteria, metrics, etc.
  • To my way of thinking, a school also guides thought in terms of how you should interact with peers
    • what kinds of argument are polite and appropriate in criticizing others’ work
    • what kinds of evidence are persuasive
    • they provide forums for discussion among school members, helping individuals refine their understanding and figure out how to solve not-yet-solved puzzles
  • I also see schools as proselytic
    • they think their view is right
    • they think you should think their view is right
    • they promote their view
  • I think that the public face of many schools is the face(s) of the identified leader(s). Bret’s attempts to characterize schools in terms of human representatives was intended as constructive and respectful to the schools involved.

I don’t think the testing community maps perfectly to this. For example (as the biggest example, in my view), very few people are willing to identify themselves as leaders or members of the other (other than context-driven) schools of testing. (I think Agile TDD is another school (the fifth of Bret’s four) and that there are clear thought-leaders there, but I’m not sure that they’ve embraced the idea that they are a school either.) Despite that, I think the notion of division into competing schools is a useful heuristic.

At my time of writing, I think the best breakdown of the schools is:

  • Factory school: emphasis on reduction of testing tasks to routines that can be automated or delegated to cheap labor.
  • Control school: emphasis on standards and processes that enforce or rely heavily on standards.
  • Test-driven school: emphasis on code-focused testing by programmers.
  • Analytical school: emphasis on analytical methods for assessing the quality of the software, including improvement of testability by improved precision of specifications and many types of modeling.
  • Context-drive school: emphasis on adapting to the circumstances under which the product is developed and used.

I think this division helps me interpret some of what I read in articles and what I hear at conferences. I think it helps me explain–or at least rationally characterize–differences to people who I’m coaching or training, who are just becoming conscious of the professional-level discussions in the field.
Acknowledging the imperfect mapping, it’s still interesting to ask, as I read something from someone in the field, whether it fits in any of the groups I think of as a school and if so, whether it gives me insight into any of that school’s answers to the issues in the list above–and if so, whether that insight tells me more that I should think about for my own approach (along with giving me better ways to talk about or talk to people who follow the other approach).
When Bret first started giving talks about 4 schools of software testing, several people reacted negatively:

  • they felt it was divisive
  • they felt that it created a bad public impression because it would be better for business (for all consultants) if it looks as though we all agree on the basics and therefore we are all experts whose knowledge and experience can be trusted
  • they felt that it was inappropriate because competent testers all pretty much agree on the fundamentals.

One of our colleagues (a friend) chastised us for this competing-schools analysis. I think he thought that we were saying this for marketing purposes, that we actually agreed with everyone else on the fundamentals and knew that we all agreed on the fundamentals. We assured him that we weren’t kidding. We might be wrong, but we were genuinely convinced that the field faced powerful disagreements even on the very basics. Our colleague decided to respond with a survey that asked a series of basic questions about testing. Some questions checked basic concepts. Others identified a situation and asked about the best course of action. He was able to get a fairly broad set of responses from a diverse group of people, many (most?) of them senior testers. The result was to highlight the broad disagreement in the field. He chose not to publish (I don’t think he was suppressing his results; I think it takes a lot of work to go from an informal survey to something formal enough to publish). Perhaps someone doing a M.Sc. thesis in the field would like to follow this up with a more formally controlled survey of an appropriately stratified sample across approaches to testing, experience levels, industry and perhaps geographic location. But until I see something better, what I saw in the summary of results given to me looked consistent with what I’ve been seeing in the field for 23 years–we have basic, fundamental, foundational disagreements about the nature of testing, how to do it, what it means to test, who our clients are, what our professional responsibilities are, what educational qualifications are appropriate, how to research the product, how to identify failure, how to report failure, what the value of regression testing is, how to assess the value of a test, etc., etc.
So is there anything we can learn from these differences?

  • One of the enormous benefits of competing schools is that they create a dialectic.
    • It is one thing for theoreticians who don’t have much influence in a debate to characterize the arguments and positions of other people. Those characterizations are descriptive, perhaps predictiive. But they don’t drive the debate. I think this is the kind of school characterization that was being attacked on Volokh’s page.
    • It’s very different for someone in a long term debate to frame their position as a contrast with others and invite response. If the other side steps up, you get a debate that sharpens the distinctions, brings into greater clarity the points of agreement, and highlights the open issues that neither side is confident in. It also creates a collection of documented disagreements, documented conflicting predictions and therefore a foundation for scientific research that can influence the debate. This is what I saw in psychology (first-hand, watching leaders in the field structure their work around controversy).
    • We know full well that we’re making mistakes in our characterization of the other views. We aren’t intentionally making mistakes, and we correct the ones that we finally realize are mistakes, but nevertheless, we have incorrect assumptions and conclusions within our approach to testing and in our understanding of the other folks’ approaches to testing. Everybody else in the field is making mistakes too. The stronger the debate — I don’t mean the nastier, I mean the more seriously we do it, the better we research and/or consider our responses, etc. — the more of those mistakes we’ll collectively bring to the surface and the more opportunities for common ground we will discover. That won’t necessarily lead to the One True School ever, because some of the differences relate to key human values (for example, think some of us have deep personal-values differences in our notions of the relative worths of the individual human versus the identified process ). But it might lead to the next generation of debate, where a few things are indeed accepted as foundational by everyone and the new disagreements are better informed, with a better foundation of data and insight.
  • The dialectic doesn’t work very well if the other side won’t play.
    • Pretending that there aren’t deep disagreements won’t make them go away
  • Even if the other side won’t engage, the schools-approach creates a set of organizing heuristics. We have a vast literature. There are over 1000 theses and dissertations in the field. There are conferences, magazines, journals, lots of books, and new links (e.g. TDD) with areas of work previously considered separate. It’s not possible to work through that much material without imposing simplifying structures. The four-schools (I prefer five-schools) approach provides one useful structure. (“All models are wrong; some models are useful.” — George Box)

It’s 4:27 a.m. Reading what I’ve written above, I think I’m starting to ramble and I know that I’m running out of steam, so I’ll stop here.

Well, one last note.

It was never our intent to use the “schools” notion to demean other people in the field. What we are trying to do is to capture commonalities (of agreement and disagreement).

  • For example, I think there are a lot of people who really like the idea that some group (requirements analysts, business analysts, project manager, programmers) agree on an authoritative specification, that the proper role of Testing is to translate the specification to powerful test cases, automate the tests, run them as regression tests, and report metrics based on these runs.
  • Given that there are a lot of people who share this view, I’d like to be able to characterize the view and engage it, without having to address the minor variations that come up from person to person. Call the collective view whatever you want. The key issues for me are:
    • Many people do hold this view
    • Within the class of people who hold this view, what other views do they hold or disagree with?
    • If I am mischaracterizing the view, it is better for me to lay it out in public and get corrected than push it more privately to my students and clients and not get corrected
  • The fact is, certainly, that I think this view is defective. But I didn’t need to characterize it as a school to think it is often (i.e. in many contexts) a deeply misguided approach to testing. Nor do I need to set it up as a school to have leeway to publicly criticize it.
  • The value is getting the clearest and most authoritatively correct expression of the school that we can, so that if/when we want to attack it or coach it into becoming something else, we have the best starting point and the best peer review process that we can hope to achieve.

Comments are welcome. Maybe we can argue this into a better article.

_
_
_
_
_
_
_
_
_
_
_
_
_

Assessment Objectives. Part 3: Adapting the Anderson & Krathwohl taxonomy for software testing

December 9th, 2006

I like the Anderson / Krathwohl approach (simple summaries here and here and here). One of the key improvements in this update to Bloom’s taxonomy was the inclusion of different types of knowledge (The Knowledge Dimension) as well as different levels of knowledge (The Cognitive Process Dimension).

The Knowledge Dimension The Cognitive Process Dimension
Remember Understand Apply Analyze Evaluate Create
Factual knowledge
Conceptual knowledge
Procedural knowledge
Metacognitive knowledge
Original Anderson Krathwohl model

This is a useful model, but there are things we learn as software testers that don’t quite fit in the knowledge categories. After several discussions with James Bach, I’ve adopted the following revision as the working model that I use:

The Knowledge Dimension The Cognitive Process Dimension
Remember Understand Apply Analyze Evaluate Create
Facts
Concepts
Procedures
Cognitive strategies
Models
Skills
Attitudes
Metacognition
Anderson Krathwohl model modified for software testing

Here are my working definitions / descriptions:

Facts

  • A “statement of fact” is a statement that can be unambiguously proved true or false. For example, “James Bach was born in 1623” is a statement of fact. (But not true, for the James Bach we know and love.) A fact is the subject of a true statement of fact.
  • Facts include such things as:
    • Tidbits about famous people
    • Famous examples (the example might also be relevant to a concept, procedure, skill or attitude)
    • Items of knowledge about devices (for example, a description of an interoperability problem between two devices

Concepts

  • A concept is a general idea. “Concepts are abstract in that they omit the differences of things in their extension, treating them as if they were identical.” (wikipedia: Concept).
  • In practical terms, we treat the following kinds of things as “concepts” in this taxonomy:
    • definitions
    • descriptions of relationships between things
    • descriptions of contrasts between things
    • description of the idea underlying a practice, process, task, heuristic (whatever)
  • Here’s a distinction that you might find useful.
    • Consider the oracle heuristic, “Compare the behavior of this program with a respected competitor and report a bug if this program’s behavior seems inconsistent with and possibly worse than the competitor’s.”
      • If I am merely describing the heuristic, I am giving you a concept.
      • If I tell you to make a decision based on this heuristic, I am giving you a rule.
    • Sometimes, a rule is a concept.
    • A rule is an imperative (“Stop at a red light”) or a causal relationship (“Two plus two yields four”) or a statement of a norm (“Don’t wear undershorts outside of your pants at formal meetings”).
    • The description / definition of the rule is the concept
    • Applying the rule in a straightforward way is application of a concept
    • The decision to puzzle through the value or applicability of a rule is in the realm of cognitive strategies.
    • The description of a rule in a formalized way is probably a model.

Procedures

  • “Procedures” are algorithms. They include a reproducible set of steps for achieving a goal.
  • Consider the task of reporting a bug. Imagine that someone has
    • broken this task down into subtasks (simplify the steps, look for more general conditions, write a short descriptive summary, etc.) and
    • presented the tasks in a sequential order.
  • This description is intended as a procedure if the author expects you to do all of the steps in exactly this order every time.
  • This description is a cognitive strategy if it is meant to provide a set of ideas to help you think through what you have to do for a given bug, with the understanding that you may do different things in different orders each time, but find this a useful reference point as you go.

Cognitive Strategies

  • “Cognitive strategies are guiding procedures that students can use to help them complete less-structured tasks such as those in reading comprehension and writing. The concept of cognitive strategies and the research on cognitive strategies represent the third important advance in instruction.
    “There are some academic tasks that are “well-structured.” These tasks can be broken down into a fixed sequence of subtasks and steps that consistently lead to the same goal. The steps are concrete and visible. There is a specific, predictable algor ithm that can be followed, one that enables students to obtain the same result each time they perform the algorithmic operations. These well-structured tasks are taught by teaching each step of the algorithm to students. The results of the research on tea cher effects are particularly relevant in helping us learn how teach students algorithms they can use to complete well-structured tasks.
    “In contrast, reading comprehension, writing, and study skills are examples of less- structured tasks — tasks that cannot be broken down into a fixed sequence of subtasks and steps that consistently and unfailingly lead to the goal. Because these ta sks are less-structured and difficult, they have also been called higher-level tasks. These types of tasks do not have the fixed sequence that is part of well-structured tasks. One cannot develop algorithms that students can use to complete these tasks.”
    Gleefully pilfered from: Barak Rosenshine, Advances in Research on Instruction, Chapter 10 in J.W. Lloyd, E.J. Kameanui, and D. Chard (Eds.) (1997) Issues in educating students with disabilities. Mahwah, N.J.: Lawrence Erlbaum: Pp. 197-221.
  • In cognitive strategies, we include:
    • heuristics (fallible but useful decision rules)
      • guidelines (fallible but common descriptions of how to do things)
      • good (rather than “best” practices)
  • The relationship between cognitive strategies and models:
    • deciding to apply a model and figuring out how to apply a model involve cognitive strategies
    • deciding to create a model and figuring out how to create models to represent or simplify a problem involve cognitive strategies

    BUT

    • the model itself is a simplified representation of something, done to give you insight into the thing you are modeling.
    • We aren’t sure that the distinction between models and the use of them is worthwhile, but it seems natural to us so we’re making it.

Models

  • A model is
    • A simplified representation created to make something easier to understand, manipulate or predict some aspects of the modeled object or system.
    • Expression of something we don’t understand in terms of something we (think we) understand.
  • A state-machine representation of a program is a model.
  • Deciding to use a state-machine representation of a program as a vehicle for generating tests is a cognitive strategy.
  • Slavishly following someone’s step-by-step catalog of best practices for generating a state- machine model of a program in order to derive scripted test cases for some fool to follow is a procedure.
  • This definition of a model is a concept.
  • The assertion that Harry Robinson publishes papers on software testing and models is a statement of fact.
  • Sometimes, a rule is a model.
    • A rule is an imperative (“Stop at a red light”) or a causal relationship (“Two plus two yields four”) or a statement of a norm (“Don’t wear undershorts outside of your pants at formal meetings”).
    • A description / definition of the rule is probably a concept
    • A symbolic or generalized description of a rule is probably a model.

Skills

  • Skills are things that improve with practice.
    • Effective bug report writing is a skill, and includes several other skills.
    • Taking a visible failure and varying your test conditions until you find a simpler set of conditions that yields the same failure is skilled work. You get better at this type of thing over time.
  • Entries into this section will often be triggered by examples (in instructional materials) that demonstrate skilled work, like “Here’s how I use this technique” or “Here’s how I found that bug.”
  • The “here’s how” might be classed as a:
    • procedure
    • cognitive strategy, or
    • skill
  • In many cases, it would be accurate and useful to class it as both a skill and a cognitive strategy.

Attitudes

  • “An attitude is a persisting state that modifies an individual’s choices of action.” Robert M. Gagne, Leslie J. Briggs & Walter W. Wager (1992) “Principles of Instructional Design” (4th Ed),, p. 48.
  • Attitudes are often based on beliefs (a belief is a proposition that is held as true whether it has been verified true or not).Instructional materials often attempt to influence the student’s attitudes.
  • For example, when we teach students that complete testing is impossible, we might spin the information in different ways to influence student attitudes toward their work:
    • given the impossibility, testers must be creative and must actively consider what they can do at each moment that will yield the highest informational return for their project
    • given the impossibility, testers must conform to the carefully agreed procedures because these reflect agreements reached among the key stakeholders rather than diverting their time to the infinity of interesting alternatives
  • Attitudes are extremely controversial in our field and refusal to acknowledge legitimate differences (or even the existence of differences) has been the source of a great deal of ill will.
  • In general, if we identify an attitude or an attitude-related belief as something to include as an assessable item, we should expect to create questions that:
    • define the item without requiring the examinee to agree that it is true or valid
    • contrast it with a widely accepted alternative, without requiring the examinee to agree that it is better or preferable to the alternative
    • adopt it as the One True View, but with discussion notes that reference the controversy about this belief or attitude and make clear that this item will be accepted for some exams and bounced out of others.

Metacognition

  • Metacognition refers to the executive process that is involved in such tasks as:
    • planning (such as choosing which procedure or cognitive strategy to adopt for a specific task)
    • estimating how long it will take (or at least, deciding to estimate and figuring out what skill / procedure / slave-labor to apply to obtain that information)
    • monitoring how well you are applying the procedure or strategy
    • remembering a definition or realizing that you don’t remember it and rooting through Google for an adequate substitute
  • Much of context-driven testing involves metacognitive questions:
    • which test technique would be most useful for exposing what information that would be of what interest to who?
    • what areas are most critical to test next, in the face of this information about risks, stakeholder priorities, available skills, available resources?
  • Questions / issues that should get you thinking about metacognition are:
    • How to think about …
    • How to learn about …
    • How to talk about …
  • In the BBST course, the section on specification analysis includes a long metacognitive digression into active reading and strategies for getting good information value from the specification fragments you encounter, search for, or create.

Assessment Objectives. Part 2: Anderson & Krathwohl’s (2001) update to Bloom’s taxonomy

December 6th, 2006

Bloom’s taxonomy has been a cornerstone of instructional planning for 50 years. But there have been difficult questions in how to apply it.

The Bloom commission presented 6 levels of (cognitive) knowledge:

  • Knowledge (for example, can state or identify facts or ideas)
  • Comprehension (for example, can summarize ideas, restate them in other words, compare them to other ideas)
  • Application (for example, can use the knowledge to solve problems)
  • Analysis (for example, can identify patterns, identify components and explain how they connect to each other)
  • Synthesis (for example, can relate different things to each other, combine ideas to produce an explanation)
  • Evaluation (for example, can weigh costs and benefits of two different proposals)

For example, I “know” a fact (“the world is flat”) and I can prove that I know it by saying it (“The world is flat”). But I also know a procedure (“Do these 48 steps in this order to replicate this bug”) and I can prove that I know it by, er, ah — maybe it’s easier for me to prove I know it by DOING it than by saying it. (Have you ever tried to DO “the world is flat?”) Is it the same kind of thing to apply your knowledge of a fact as your knowledge of a procedure? What about knowing a model? If knowing a fact lets me say something and knowing a procedure helps me do something, maybe knowing a model helps me predict something. Say = do = predict = know?

Similarly, think about synthesizing or evaluating these different things? Is the type and level of knowledge really the same — would we test people’s knowledge in the same way — for these different kinds of things?

Extensive discussion led to upgrades, such as Anderson & Krathwohl’s and Marzano’s.

Rather than ordering knowledge on one dimension, from easiest-to-learn to hardest, the new approaches look at different types of information (facts, procedures, etc.) as well as different levels of knowledge (remember, apply, etc.).

I find the Anderson / Krathwohl approach (simple summaries here and here and here) more intuitive and more easy to apply, (YMMV, but that’s how it works for me…) Their model looks like this:

The Knowledge Dimension The Cognitive Process Dimension
Remember Understand Apply Analyze Evaluate Create
Factual knowledge
Conceptual knowledge
Procedural knowledge
Metacognitive knowledge

Metacognitive knowledge is knowing how to learn something. For example, much of what we know about troubleshooting and debugging and active reading is metacognitive knowledge.

  • Extending Anderson/Krathwohl for evaluation of testing knowledge
  • Assessment activities for certification in light of the Anderson/Krathwohl taxonomy

Next assessment sequence: Multiple Choice Questions: Design & Content.

Assessment Objectives. Part 1–Bloom’s Taxonomy

November 24th, 2006

This is the first of a set of articles on Assessment Objectives.

My primary objective in restarting my blog was to support the Open Certification Project for Software Testing by providing ideas on how we can go about developing it and a public space for discussion of those (and alternative) ideas.

I’m not opposed to professional certification or licensing in principle. But I do have four primary concerns:

  1. Certification involves some type of assessment (exams, interviews, demonstration of skilled work, evaluation of work products, etc.). Does the assessment measure what it purports to measure? What level of knowledge or skill does it actually evaluate? Is it a fair evaluation? Is it a reasonably accurate evaluation? Is it an evaluation of the performance of the person being assessed?
  2. Certification is often done with reference to a standard. How broadly accepted is this standard? How controversial is it? How relevant is it to competence in the field? Sometimes, a certification is more of a political effort–a way of recruiting supporters for a standard that has not achieved general acceptance. If so, is it being done with integrity–are the people studying for certification and taking the assessment properly informed?
  3. Certifiers can have many different motives. Is this part of an honest effort to improve the field? A profit-making venture to sell training and/or exams? A political effort to recruit relatively naive practitioners to a viewpoint by investing them in credential that they would otherwise be unlikely to adopt?
  4. The certifier and the people who have been certified often make claims about the meaning of the certification. A certification might be taken as significant by employers, schools, businesses or government agencies who are attempting to do business with a qualified contractor, or by other people who are considering becoming certified. Is the marketing (by the certifier or by the people who have been certified and want to market themselves using it) congruent with the underlying value of the certificate?

The Open Certification Project for Software Testing is in part a reaction to the state of certification in our field. That state is variable–the story is different for each of the certifications available. But I’m not satisfied with any of them, nor are several colleagues. The underlying goals of the Open Certification project are partially to create an alternative certification and partially to create competitive pressure on other providers, to encourage them to improve the value and/or the marketing of their certification(s).
For the next few articles, I want to consider assessment.

1n 1948, a group of “college examiners” gathered at an American Psychological Association meeting and decided to try to develop a theoretical foundation for evaluating whether a person knows something, and how well. The key product of that group was Bloom’s (1956) Taxonomy (see Wikipedia, The Encyclopedia of Educational Technology, the National Teaching & Learning Forum, Don Clark’s page, Teacher Tap, or just ask Google for a wealth of useful stuff). The Bloom Committee considered how we could evaluate levels of cognitive knowledge (distinct from psychomotor and affective) and proposed six levels:

  • Knowledge (for example, can state or identify facts or ideas)
  • Comprehension (for example, can summarize ideas, restate them in other words, compare them to other ideas)
  • Application (for example, can use the knowledge to solve problems)
  • Analysis (for example, can identify patterns, identify components and explain how they connect to each other)
  • Synthesis (for example, can relate different things to each other, combine ideas to produce an explanation)
  • Evaluation (for example, can weigh costs and benefits of two different proposals)

It turns out to be stunningly difficult to assess a student’s level of knowledge. All too often, we think we are measuring one thing while we actually measure something else.

For example, suppose that I create an exam that asks students:

“What are the key similarities and differences between domain testing and scenario testing? Describe two cases, one better suited for domain analysis and the other better suited for scenario, and explain why.”

This is obviously an evaluation-level question, right? Well, maybe. But maybe not. Suppose that a student handed in a perfect answer to this question:

  • Knowledge. Maybe students saw this question in a study guide (or a previous exam), developed an answer while they studied together, then memorized it. (Maybe they published it on the Net.) This particular student has memorized an answer written by someone else.
  • Comprehension. Maybe students prepared a sample answer for this question , or saw this comparison online or in the textbook, or the teacher made this comparison in class (including the explanation of the two key examples), and this student learned the comparison just well enough to be able to restate it in her own words.
  • Application. Maybe the comparison was given in class (or in a study guide, etc.) along with the two “classic” cases (one for domain, one for scenario) but the student has had to figure out for himself why one works well for domain and the other for scenario. He has had to consider how to apply the test techniques to the situations.

These cases reflect a very common problem. How we teach, how our students study, and what resources our students study from will impact student performancewhat they appear to know–on exams even if they don’t make much of a difference to the underlying competence–how well they actually know it.

The distinction between competence and performance is fundamental in educational and psychological measurement It also cuts both ways. In the examples above, performance appears to reflect a deeper knowledge of the material. What I often see in my courses is that students who know the material well underperform (get poor grades on my exams) because they are unfamiliar with strictly-graded essay exams (see my grading videos video#1 and video#2 and slides) or with well-designed multiple choice exams. The extensive discussions of racial bias and cultural bias in standardized exams is another example of the competence/performance discussion–some groups perform less well on some exams because of details of the method of examination rather than because of a difference in underlying knowledge.

When we design an assessment for certification:

  • What level of knowledge does the assessments appear to get to?
  • Could someone who knows less or knows the material less deeply perform as well as someone who knows it at the level we are trying to evaluate?
  • Might someone who knows this material deeply perform less well than we expect (for example, because they see ambiguities that a less senior person would miss)?

In my opinion, an assessment is not well designed and should not be used for serious work, if questions like these are not carefully considered in its design.

Coming soon in this sequence — Assessment Objectives:

  • Anderson, Krathwohl et. al (2001) update the 1956 Bloom taxonomy.
  • Extending Anderson/Krathwohl for evaluation of testing knowledge
  • Assessment activities for certification in light of the Anderson/Krathwohl taxonomy

Next assessment sequence — Multiple Choice Questions: Design & Content.