Archive for the ‘testing’ Category

Software error and the meltdown of US finances

Thursday, May 22nd, 2008

“LONDON, May 21 (Reuters) – A computer coding error led Moody’s Investors Service to assign incorrect triple-A ratings to a complex debt product that came to mark the peak of the credit boom, the Financial Times said on Wednesday. (see www.forbes.com/reuters/feeds/reuters/2008/05/21/2008-05-21T075644Z_01_L21551923_RTRIDST_0_MOODYS-CPDOS.html. For more, see blogs.spectrum.ieee.org/riskfactor/2008/05/moodys_rating_bug_gives_credit.html and www.ft.com/cms/s/0/0c82561a-2697-11dd-9c95-000077b07658.html?nclick_check=1 or just search for Moody’s software error.

This is the kind of stuff David Pels and I expected when we fought the Uniform Computer Information Transactions Act (UCITA) back in the 1990’s. UCITA was written as a shield for large software publishers, consulting firms, and other information publishers. It virtually wiped out liability for defects in information-industry products or services, expanded intellectual property rights well beyond what the Copyright Act and the patent laws provide, and helped companies find ways to expand their power to block reverse engineering of products to check for functional or security defects and to publicly report those defects.

UCITA was ultimately adopted only in Virginia and Maryland, rejected in all other American states, but largely imported into most states by judicial rulings (a fine example of “judicial activism”–imposing rules on a state even after its legislators rejected them. People who still whine about left-wing judicial activism are still stuck in the 1970’s).

David Pels and I wrote a book, “Bad Software” on the law of software quality circa 1998. It provides a striking contrast between software customers’ rights in the 1990s and the vastly-reduced rights we have come to expect today, along with some background on the UCITA legislation (UCITA was then called “Article 2B”–as part of a failed effort to add a new Article to the Uniform Commercial Code). John Wiley originally published Bad Software, but they have let me post a free copy at the web. You can find it at http://www.badsoftware.com/wiki/

Four more presentations

Sunday, March 30th, 2008

“Adapting Academic Course Materials in Software Testing for Industrial Professional Development.” [SLIDES] Colloquium, Florida Institute of Technology, March 2008

The Association for Software Testing and I have been adapting the BBST course for online professional development. This presentation updates my students and colleagues at work on what we’re doing to transfer fairly rigorous academic course materials and teaching methods to a practitioner audience.

These next three are reworkings of presentations I’ve given a few times before:

“Software testing as a social science,” [SLIDES] STEP 2000 Workshop on Software Testing, Memphis, May 2008.

Social sciences study humans, especially humans in society. The social scientist’s core question, for any new product or technology is, “What will be the impact of X on people?” Social scientists normally deal with ambiguous issues, partial answers, situationally specific results, diverse interpretations and values– and they often use qualitative research methods. If we think about software testing in terms of the objectives (why we test) and the challenges (what makes testing difficult) rather than the methods and processes, then I think testing is more like a social science than like programming or manufacturing quality control. As with all social sciences, tools are important. But tools are what we use, not why we use them.

“The ongoing revolution in software testing,” [SLIDES] October 2007

My intent in this talk is to challenge an orthodoxy in testing, a set of ommonly accepted assumptions about our mission, skills, and onstraints, including plenty that seemed good to me when I published them in 1988, 1993 or 2001. Surprisingly, some of the old notions lost popularity in the 1990’s but came back under new marketing with the rise of eXtreme Programming.

I propose we embrace the idea that testing is an active, skilled technical investigation. Competent testers are investigators—clever, sometimes mischievous researchers—active learners who dig up information about
a product or process just as that information is needed.

I think that

  • views of testing that don’t portray testing this way are obsolete and counterproductive for most contexts and
  • educational resources for testing that don’t foster these skills and activities are misdirected and misleading.

“Software-related measurement: Risks and opportunties,” [SLIDES] October 2007

I’ve seen published claims that only 5% of software companies have metrics programs. Why so low? Are we just undisciplined and lazy? Most managers who I know have tried at least one measurement program–and abandoned them because so many programs do more harm than good, at a high cost. This session has
three parts:

  1. Measurement theory and how it applies to software development metrics (which, at their core, are typically human performance measures).
  2. A couple of examples of qualitative measurements that can drive useful behavior.
  3. (Consideration of client’s particular context–deleted.)

A few new papers and presentations

Friday, March 28th, 2008

I just posted a few papers to kaner.com. Here are the links and some notes:

Cem Kaner & Stephen J. Swenson, “Good enough V&V for simulations: Some possibly helpful thoughts from the law & ethics of commercial software.” Simulation Interoperability Workshop, Providence, RI, April 2008

What an interesting context for exploratory testers! Military application software that cannot be fully specified in advance, whose requirements and design will evolve continuously through the project. How should they test it? What is good enough? Stephen and I worked together to integrate ideas and references from both disciplines. There are a lot of very interesting papers on simulations, on the web, referenced in the bibliography.

Cem Kaner and Rebecca L. Fiedler, “A cautionary note on checking software engineering papers for plagiarism.” IEEE Transactions on Education, in press.

Journal of the Association for Software Testing hasn’t published its first issue yet, and we’re now rethinking our editorial objectives (more on that after the AST Board Meeting in April). One of the reasons is that over half of the papers submitted to the Journal were plagiarized. I’ve found significant degrees of plagiarism while reviewing submissions for other conferences and journals, and there’s too much in graduate theses/dissertations as well. One of the problems for scholarly publications and research supervisors is that current plagiarism-detection tools seem to promise more than they deliver. (Oh surprise, a software service that oversells itself!) This is the first of a series of papers on that problem.

Cem Kaner, “Improve the power of your tests with risk-based test design.” [SLIDES] QAI QUEST Conference, Chicago, April 2008

Conference keynote. My usual party line.

Cem Kaner, “A tutorial in exploratory testing.” [SLIDES] QAI QUEST Conference, Chicago, April 2008

Conference tutorial. This is a broader set of my slides on exploration than some people have seen before.

Cem Kaner, “BBST: Evolving a course in black box software testing.” [SLIDES] BBST Project Advisory Board Meeting, January 2008

Rebecca Fiedler and I lead a project to adapt a course on black box software testing from its traditional academic setting to commercial and academic-online settings. Much of the raw material (free testing videos) is at www.testingeducation.org/BBST. This is a status report to the project’s advisory board. If you’re interested in collaborating in this project, these slides will provide a lot more detail.

Cem Kaner, “I speak for the user: The problem of agency in software development,” Quality Software & Testing Magazine, April 2007

Short magazine article. Testers often appoint themselves as The User Representative. So do lots of other people. Who should take this role and why?

Cem Kaner & Sowmya Padmanabhan, “Practice and transfer of learning in the teaching of software testing,” [SLIDES] Conference on Software Engineering Education & Training, Dublin, July 2007.

This is a summary of Sowmya’s excellent M.Sc. thesis , “Domain Testing: Divide and Conquer

7th Workshop on Teaching Software Testing, January 18-20, 2008

Saturday, October 13th, 2007

This year’s Workshop on Teaching Software Testing (WTST) will be January 18-20 in Melbourne, Florida.

WTST is concerned with the practical aspects of teaching university-caliber software testing courses to academic or commercial students.

This year, we are particularly interested in teaching testing online. How can we help students develop testing skills and foster higher-order thinking in online courses?

We invite participation by:

  • academics who have experience teaching testing courses
  • practitioners who teach professional seminars on software testing
  • academic or practitioner instructors with significant online teaching experience and wisdom
  • one or two graduate students
  • a few seasoned teachers or testers who are beginning to build their strengths in teaching software testing.

There is no fee to attend this meeting. You pay for your seat through the value of your participation. Participation in the workshop is by invitation based on a proposal. We expect to accept 15 participants with an absolute upper bound of 25.

WTST is a workshop, not a typical conference. Our presentations serve to drive discussion. The target readers of workshop papers are the other participants, not archival readers. We are glad to start from already-published papers, if they are presented by the author and they would serve as a strong focus for valuable discussion.

In a typical presentation, the presenter speaks 10 to 90 minutes, followed by discussion. There is no fixed time for discussion. Past sessions’ discussions have run from 1 minute to 3 hours. During the discussion, a participant might ask the presenter simple or detailed questions, describe consistent or contrary experiences or data, present a different approach to the same problem, or (respectfully and collegially) argue with the presenter. In 20 hours of formal sessions, we expect to cover six to eight presentations.

We also have lightning presentations, time-limited to 5 minutes (plus discussion). These are fun and they often stimulate extended discussions over lunch and at night.

Presenters must provide materials that they share with the workshop under a Creative Commons license, allowing reuse by other teachers. Such materials will be posted at http://www.wtst.org.

SUGGESTED TOPICS

There are few courses in software testing, but a large percentage of software engineering practitioners do test-related work as their main focus. Many of the available courses, academic and commercial, attempt to cover so much material that they are superficial and therefore ineffective for improving students skills or ability to analyze and address problems of real-life complexity. Online courses might, potentially, be a vehicle for providing excellent educational opportunities to a diverse pool of students.

Here are examples of ideas that might help us learn more about providing testing education online in ways that realize this potential:

  • Instructive examples: Have you tried teaching testing online? Can you show us some of what you did? What worked? What didn’t? Why? What can we learn from your experience?
  • Instructive examples from other domains: Have you tried teaching something else online and learned lessons that would be applicable to teaching testing? Can you build a bridge from your experience to testing?
  • Instructional techniques, for online instruction, that help students develop skill, insight, appreciation of models and modeling, or other higher-level knowledge of the field. Can you help us see how these apply to testing-related instruction?
  • Test-related topics that seem particularly well-suited to online instruction: Do you have a reasoned, detailed conjecture about how to bring a topic online effectively? Would a workshop discussion help you develop your ideas further? Would it help the other participants understand what can work online and how to make it happen?
  • Lessons learned teaching software testing: Do you have experiences from traditional teaching that seem general enough to apply well to the online environment?
  • Moving from Face-to-Face to Online Instruction – How does one turn a face-to-face class into an effective online class? What works? What needs to change?
  • Digital Backpack – Students and instructors bring a variety of tools and technologies to today’s fully online or web-enhanced classroom. Which tools do today’s teachers need? How can those tools be used? What about students?
  • The Scholarship of Teaching and Learning – How does one research one’s own teaching? What methods capture improved teaching and learning or reveal areas needing improvement? How is this work publishable to meet promotion and tenure requirements?
  • Qualitative Methods – From sloppy anecdotal reports to rigorous qualitative design. How can we use qualitative methods to conduct research on the teaching of computing, including software testing?

TO ATTEND AS A PRESENTER

Please send a proposal BY DECEMBER 1, 2007 to Cem Kaner that identifies who you are, what your background is, what you would like to present, how long the presentation will take, any special equipment needs, and what written materials you will provide. Along with traditional presentations, we will gladly consider proposed activities and interactive demonstrations.

We will begin reviewing proposals on November 1. We encourage early submissions. It is unlikely but possible that we will have accepted a full set of presentation proposals by December 1.

Proposals should be between two and four pages long, in PDF format. We will post accepted proposals to http://www.wtst.org.

We review proposals in terms of their contribution to knowledge of HOW TO TEACH software testing. Proposals that present a purely theoretical advance in software testing, with weak ties to teaching and application, will not be accepted. Presentations that reiterate materials you have presented elsewhere might be welcome, but it is imperative that you identify the publication history of such work.

By submitting your proposal, you agree that, if we accept your proposal, you will submit a scholarly paper for discussion at the workshop by January 7, 2007. Workshop papers may be of any length and follow any standard scholarly style. We will post these at http://www.wtst.org as they are received, for workshop participants to review before the workshop.

TO ATTEND AS A NON-PRESENTING PARTICIPANT:

Please send a message by BY DECEMBER 1, 2007, to Cem Kaner that describes your background and interest in teaching software testing. What skills or knowledge do you bring to the meeting that would be of interest to the other participants?

ADVISORY BOARD MEETING

Florida Tech’s Center for Software Testing Education & Research has been developing a collection of hybrid and online course materials for teaching black box software testing. We now have NSF funding to adapt these materials for implementation by a broader audience. We are forming an Advisory Board to guide this adaptation and the associated research on the effectiveness of the materials in diverse contexts. The Board will meet before WTST, on January 17, 2008. If you are interested in joining the Board and attending the January meeting, please read this invitation and submit an application.

Acknowledgements

Support for this meeting comes from the Association for Software Testing and Florida Institute of Technology.

The hosts of the meeting are:

Research Funding and Advisory Board for the Black Box Software Testing (BBST) Course

Friday, October 12th, 2007

Summary: With some new NSF funding, we are researching and revising BBST to make it more available and more useful to more people around the world. The course materials will continue to be available for free. If you are interesting in joining an advisory board that helps us set direction for the course and the research surrounding the course, please contact me, describing your background in software-testing-related education, in education-related research, and your reason(s) for wanting to join the Board.

Starting as a joint project with Hung Quoc Nguyen in 1993, I’ve done a lot of development of a broad set of course materials for black box software testing. The National Science Foundation approved a project (EIA-0113539 ITR/SY+PE “Improving the Education of Software Testers) that evolved my commercial-audience course materials for an academic audience and researched learning issues associated with testing. The resulting course materials are at http://www.testingeducation.org/BBST, with lots of papers at http://www.testingeducation.org/articles and http://kaner.com/?page_id=7. The course materials are available for everyone’s use, for free, under a Creative Commons license.

During that research, I teamed up with Rebecca Fiedler, an experienced teacher (now an Assistant Professor of Education at St. Mary-of-the-Woods College in Terre Haute, Indiana, and also now my wife.) The course that Rebecca and I evolved turned traditional course design inside out in order to encourage students’ involvement, skill development and critical thinking. Rather than using class time for lectures and students’ private time for activities (labs, assignments, debates, etc.), we videotaped the lectures and required students to watch them before coming to class. We used class time for coached activities centered more on the students than the professor.

This looked like a pretty good teaching approach, our students liked it, and the National Science Foundation funded a project to extend this approach to developing course materials on software engineering ethics in 2006. (If you would like to collaborate with us on this project, or if you are a law student interested in a paid research internship, contact Cem Kaner.)

Recently, the National Science Foundation approved Dr. Fiedler’s and my project to improve the BBST course itself, “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing.” With funding running from October 1, 2007 through 2010, our primary goals are:

  • develop and sustain a cadre of academic, in-house, and commercial instructors via:
    • creating and offering an instructor orientation course online;
    • establishing an ongoing online instructors’ forum; and
    • hosting a number of face-to-face instructor meetings
  • offer and evaluate the course at collaborating research sites (including both universities and businesses)
  • analyze several collections of in-class activities to abstract a set of themes / patterns that can help instructors quickly create new activities as needed; and
  • extend instructional support material including grading guides and a pool of exam questions for teaching the course.

All of our materials—such as videos, slides, exams, grading guides, and instructor manuals—are Creative Commons licensed. Most are available freely to the public. A few items designed to help instructors grade student work will be available at no charge, but only to instructors.

Several individuals and organizations have agreed to collaborate in this work, including:

  • AppLabs Technologies. Representative: Geetha Narayanan, CSQA, PMP; Shyam Sunder Depuru.
  • Aztechsoft. Representative: Ajay Bhagwat.
  • The Association for Software Testing. Representative: Michael Kelly, President.
  • AST is breaking the course into several focused, online, mini-courses that run 1 month each. The courses are offered, for free, to AST members. AST is starting its second teaching of the Foundations course this week. We’ll teach Bug Advocacy in a month. As we develop these courses, we are training instructors who, after sufficient training, will teach the course(s) they are trained to teach for AST (free courses) as well as at their school or company (for free or fee, as they choose).

  • Dalhousie University. Representative: Professor Morven Gentleman.
  • Huston-Tillotson University, Computer Science Department. Representative: Allen M. Johnson, Jr., Ph.D.
  • Microsoft. Representative: Marianne Guntow.
  • PerfTest Plus. Representative: Scott Barber.
  • Quardev Laboratories. Representative: Jonathan Bach.
  • University of Illinois at Springfield, Computer Sciences Program. Representative: Dr. Keith W. Miller.
  • University of Latvia. Representative, Professor Juris Borzovs.

If you would like to collaborate on this project as well:

  1. Please read our research proposal.
  2. Please consider your ability to make a financial commitment. We are not asking for donations (well, of course, we would love to get donations, but they are not required) but you or your company would have to absorb the cost of travel to Board of Advisor meetings and you would probably come to the Workshop on Teaching Software Testing and/or the Conference of the Association for Software Testing. Additionally, teaching the course at your organization and collecting the relevant data would be at your expense. (My consultation to you on this teaching would be free, but if you needed me to fly to your site, that would be at your expense and might involve a fee.) We have a little bit of NSF money to subsidize travel to Board of Advisor meetings ($15,000 total for the three years) so we can subsidize travel to a small degree. But it is very limited, and especially little is available for corporations.
  3. Please consider your involvement. What do you want to do?
    • Join the advisory board, help guide the project?
    • Collaborate on the project as a fellow instructor (and get instructor training) ?
    • Come to the Workshop on Teaching Software Testing?
    • Help develop a Body of Knowledge to support the course materials?
    • Participate as a lecturer or on-camera discussant on the video courses?
    • Other stuff, such as …???
  4. Send me a note, that covers 1-3, introduces you and describes your background and interest.

The first meeting of the Advisory Board is January 17, 2008, in Melbourne, Florida. We will host the Workshop on Teaching Software Testing (WTST 2008) from January 18-20. I’ll post a Call for Participation for WTST 2008 on this blog tomorrow.

Schools of software testing

Friday, December 22nd, 2006

Every few months, someone asks why James Bach, Bret Pettichord, and I discuss the software testing community in terms of “schools” or suggests that the idea is misguided, arrogant, divisive, inaccurate, or otherwise A Bad Thing. The most recent discussion is happening on the software-testing list (the context-driven testing school’s list on yahoogroups.com). It’s time that I blogged a clarifying response.
Perhaps in 1993, I started noticing that many test organizations (and many test-related authors, speakers and consultants, including some friends and other colleagues I respected) relied heavily — almost exclusively — on one or two main testing techniques. In my discussions with them, they typically seemed unaware of other techniques or uninterested in them.

  • For example, many testers talked about domain testing (boundary and equivalence class analysis) as the fundamental test design technique. You can generalize the method far beyond its original scope (analysis of input fields) to a strategy for reducing the infinity of potential tests to a manageably small, powerful subset via a stratified sampling strategy (see the printer testing chapter of Testing Computer Software for an example of this type of analysis). This is a compelling approach–it yields well-considered tests and enough of them to keep you busy from the start to the end of the project.
  • As a competing example, many people talked about scenario testing as the fundamental approach. They saw domain tests are mechanical and relatively trivial. Scenarios went to the meaning of the specification (where there was one) and to the business value and business risk of the (commercial) application. You needed subject matter experts to do really good scenario testing. To some of my colleagues, this greater need for deep understanding of the application was proof in its own right that scenario testing was far more important than the more mechanistic-seeming domain testing.

In 1995, James Bach and I met face-to-face for the first time. (We had corresponded by email for a long time, but never met.) We ended up spending half a day in a coffee shop at the Dallas airport comparing notes on testing strategy. He too had been listing these dominating techniques and puzzling over the extent to which they seemed to individually dominate the test-design thinking of many colleagues. James was putting together a list that he would soon publish in the first draft of the Satisfice Heuristic Test Strategy Model. I was beginning to use the list for consciousness-raising in my classes and consulting, encouraging people to add one or two more techniques to their repertoire.
James and I were pleasantly shocked to discover that our lists were essentially the same. Our names for the techniques were different, but the list covered the same approaches and we had both seen the same tunnel-vision (one or more companies or groups that did reasonably good work–as measured by the quality of bugs they were finding–that relied primarily on just this one technique, for each of the techniques). I think it was in that discussion that I suggested a comparison to Thomas Kuhn’s notion of paradigms (for a summary, read this article in the Stanford Encyclopedia of Philosophy), which we studied at some length in my graduate school (McMaster University, Psychology Ph.D. program).
The essence of a paradigm is that it creates (or defines) a mainstream of thinking, providing insights and direction for future research or work. It provides a structure for deciding what is interesting, what is relevant, what is important–and implicitly, it defines limits, what is not relevant, not particularly interesting, maybe not possible or not wise. The paradigm creates a structure for solving puzzles and people who solve the puzzles seen as important in the field are highly respected. Scientific paradigms often incorporate paradigmatic cases–exemplars–especially valuable examples that serve as models for future work or molds for future thought. At that meeting in 1995, or soon after that, we concluded that the set of dominant test techniques we were looking at could be thought of as paradigms for some / many people in the field.
This idea wouldn’t make sense in a mature science, because there is one dominating paradigm that creates a common (field-wide) vocabulary, a common sense of the history of the field, and a common set of cherished exemplars. However, in less mature disciplines that have not reached consensus, fragmentation is common. Here’s how the Stanford Encyclopedia of Philosophy summarizes it:

“In the postscript to the second edition of The Structure of Scientific Revolutions Kuhn says of paradigms in this sense that they are “the most novel and least understood aspect of this bookâ€? (1962/1970a, 187). The claim that the consensus of a disciplinary matrix is primarily agreement on paradigms-as-exemplars is intended to explain the nature of normal science and the process of crisis, revolution, and renewal of normal science. It also explains the birth of a mature science. Kuhn describes an immature science, in what he sometimes calls its ‘pre-paradigm’ period, as lacking consensus. Competing schools of thought possess differing procedures, theories, even metaphysical presuppositions. Consequently there is little opportunity for collective progress. Even localized progress by a particular school is made difficult, since much intellectual energy is put into arguing over the fundamentals with other schools instead of developing a research tradition. However, progress is not impossible, and one school may make a breakthrough whereby the shared problems of the competing schools are solved in a particularly impressive fashion. This success draws away adherents from the other schools, and a widespread consensus is formed around the new puzzle-solutions.”

James and I weren’t yet drawn to the idea of well-defined schools, because we see schools as having not only a shared perspective but also a missionary character (more on that later) and what we were focused on was the shared perspective. But the idea of competing schools allowed us conceptual leeway for thinking about, and describing, what we were seeing. We talked about it in classes and small meetings and were eventually invited to present it (Paradigms of Black Box Software Testing) as a pair of keynotes at the 16th International Conference and Exposition on Testing Computer Software (1999) and then at STAR West (1999).
At this point (1999), we listed 10 key approaches to testing:

  • Domain driven
  • Stress driven
  • Specification driven
  • Risk driven
  • Random / statistical
  • Function
  • Regression
  • Scenario / use case / transaction flow
  • User testing
  • Exploratory

There’s a lot of awkwardness in this list, but our intent was descriptive and this is what we were seeing. When people would describe how they tested to us, their descriptions often focused on only one (or two, occasionally three) of these ten techniques, and they treated the described technique(s), or key examples of it, as guides for how to design tests in the future. We were trying to capture that.
At this point, we weren’t explicitly advocating a school of our own. We had a shared perspective, we were talking about teaching people to pick their techniques based on their project’s context and James’ Heuristic Test Strategy Model (go here for the current version) made this explicit, but we were still early in our thinking about it. In contrast, the list of dominating techniques, with minor evolution, captured a significant pattern in our observations over perhaps 7-10 of each of our years.
I don’t think that many people saw this work as divisive or offensive — some did and we got some very harsh feedback from a few people. Others were intrigued or politely bored.
Several people were confused by it, not least because the techniques on this list were far from mutually exclusive. For example, you can apply a domain-driven analysis of the variables of individual functions in an exploratory way. Is this domain testing, function testing or exploratory testing?

  • One answer is “yes” — all three.
  • Another answer is that it is whichever technique is dominant in the mind of the tester who is doing this testing.
  • Another answer is, “Gosh, that is confusing, isn’t it? Maybe this model of “paradigms” isn’t the right subdivider of the diverging lines of testing thought.”

Over time, our thinking shifted about which answer was correct. Each of these would have been my answer — at different times. (My current answer is the third one.)
Another factor that was puzzling us was the weakness of communication among leaders in the field. At conferences, we would speak the same words but with different meanings. Even the most fundamental terms, like “test case” carried several different meanings–and we weren’t acknowledging the differences or talking about them. Many speakers would simply assume that everyone knew what term X meant, agreed with that definition, and agreed with whatever practices were impliedly good that went with those definitions. The result was that we often talked past each other at conferences, disagreeing in ways that many people in the field, especially relative newcomers, found hard to recognize or understand.
It’s easy to say that all testing involves analysis of the program, evaluation of the best types of tests to run, design of the tests, execution, skilled troubleshooting, and effective communication of the results. Analysis. Evaluation. Design. Test. Execution. Troubleshooting. Effective Communication. We all know what those words mean, right? We all know what good analysis is, right? So, basically, we all agree, right?
Well, maybe not. We can use the same words but come up with different analyses, different evaluations, different tests, different ideas about how to communicate effectively, and so on.
Should we gloss over the differences, or look for patterns in them?
James Bach, Bret Pettichord and I muddled through this as we wrote Lessons Learned in Software Testing. We brought a variety of other people into the discussions but as I vaguely recall it, the critical discussions happened in the context of the book. Bret Pettichord put the idea into a first-draft presentation for the 2003 Workshop on Teaching Software Testing. He has polished it since, but it is still very much a work in progress.
I’m still not ready to publish my version because I haven’t finished the literature review that I’d want to publish with it. We were glad to see Bret go forward with his talks, because they opened the door for peer review that provides the foundation for more polished later papers.
The idea that there can be different schools of thought in a field is hardly a new one — just check the 1,180,000 search results you get from Google or the 1,230,000 results you get from Yahoo when you search for the quoted phrase “schools of thought”.
Not everyone finds the notion of divergent schools of thought a useful heuristic–for example, read this discussion of legal schools of thought. However, in studying, teaching and researching experimental psychology, the identification of some schools was extremely useful. Not everyone belonged to one of the key competing schools. Not every piece of research was driven by a competing-school philosophy. But there were organizing clusters of ideas with charismatic advocates that guided the thinking and work of several people and generated useful results. There were debates between leaders of different schools, sometimes very sharp debates, and those debates clarified differences, points of agreement, and points of open exploration.
As I think of schools of thought, a school of testing would have several desirable characteristics:

  • The members share several fundamental beliefs, broadly agree on vocabulary, and will approach similar problems in compatible ways
    • members typically cite the same books or papers (or books and papers that same the same things as the ones most people cite)
    • members often refer to the same stories / myths and the same justifications for their practices
  • Even though there is variation from individual to individual, the thinking of the school is fairly comprehensive. It guides thinking about most areas of the job, such as:
    • how to analyze a product
    • what test techniques would be useful
    • how to decide that X is an example of good work or an example of weak work (or not an example of either)
    • how to interpret test results
    • how much troubleshooting of apparent failures, why, and how much troubleshooting by the testers
    • how to staff a test team
    • how to train testers and what they should be trained in
    • what skills (and what level of skill diversity) are valuable on the team
    • how to budget, how to reach agreements with others (management, programmers) on scope of testing, goals of testing, budget, release criteria, metrics, etc.
  • To my way of thinking, a school also guides thought in terms of how you should interact with peers
    • what kinds of argument are polite and appropriate in criticizing others’ work
    • what kinds of evidence are persuasive
    • they provide forums for discussion among school members, helping individuals refine their understanding and figure out how to solve not-yet-solved puzzles
  • I also see schools as proselytic
    • they think their view is right
    • they think you should think their view is right
    • they promote their view
  • I think that the public face of many schools is the face(s) of the identified leader(s). Bret’s attempts to characterize schools in terms of human representatives was intended as constructive and respectful to the schools involved.

I don’t think the testing community maps perfectly to this. For example (as the biggest example, in my view), very few people are willing to identify themselves as leaders or members of the other (other than context-driven) schools of testing. (I think Agile TDD is another school (the fifth of Bret’s four) and that there are clear thought-leaders there, but I’m not sure that they’ve embraced the idea that they are a school either.) Despite that, I think the notion of division into competing schools is a useful heuristic.

At my time of writing, I think the best breakdown of the schools is:

  • Factory school: emphasis on reduction of testing tasks to routines that can be automated or delegated to cheap labor.
  • Control school: emphasis on standards and processes that enforce or rely heavily on standards.
  • Test-driven school: emphasis on code-focused testing by programmers.
  • Analytical school: emphasis on analytical methods for assessing the quality of the software, including improvement of testability by improved precision of specifications and many types of modeling.
  • Context-drive school: emphasis on adapting to the circumstances under which the product is developed and used.

I think this division helps me interpret some of what I read in articles and what I hear at conferences. I think it helps me explain–or at least rationally characterize–differences to people who I’m coaching or training, who are just becoming conscious of the professional-level discussions in the field.
Acknowledging the imperfect mapping, it’s still interesting to ask, as I read something from someone in the field, whether it fits in any of the groups I think of as a school and if so, whether it gives me insight into any of that school’s answers to the issues in the list above–and if so, whether that insight tells me more that I should think about for my own approach (along with giving me better ways to talk about or talk to people who follow the other approach).
When Bret first started giving talks about 4 schools of software testing, several people reacted negatively:

  • they felt it was divisive
  • they felt that it created a bad public impression because it would be better for business (for all consultants) if it looks as though we all agree on the basics and therefore we are all experts whose knowledge and experience can be trusted
  • they felt that it was inappropriate because competent testers all pretty much agree on the fundamentals.

One of our colleagues (a friend) chastised us for this competing-schools analysis. I think he thought that we were saying this for marketing purposes, that we actually agreed with everyone else on the fundamentals and knew that we all agreed on the fundamentals. We assured him that we weren’t kidding. We might be wrong, but we were genuinely convinced that the field faced powerful disagreements even on the very basics. Our colleague decided to respond with a survey that asked a series of basic questions about testing. Some questions checked basic concepts. Others identified a situation and asked about the best course of action. He was able to get a fairly broad set of responses from a diverse group of people, many (most?) of them senior testers. The result was to highlight the broad disagreement in the field. He chose not to publish (I don’t think he was suppressing his results; I think it takes a lot of work to go from an informal survey to something formal enough to publish). Perhaps someone doing a M.Sc. thesis in the field would like to follow this up with a more formally controlled survey of an appropriately stratified sample across approaches to testing, experience levels, industry and perhaps geographic location. But until I see something better, what I saw in the summary of results given to me looked consistent with what I’ve been seeing in the field for 23 years–we have basic, fundamental, foundational disagreements about the nature of testing, how to do it, what it means to test, who our clients are, what our professional responsibilities are, what educational qualifications are appropriate, how to research the product, how to identify failure, how to report failure, what the value of regression testing is, how to assess the value of a test, etc., etc.
So is there anything we can learn from these differences?

  • One of the enormous benefits of competing schools is that they create a dialectic.
    • It is one thing for theoreticians who don’t have much influence in a debate to characterize the arguments and positions of other people. Those characterizations are descriptive, perhaps predictiive. But they don’t drive the debate. I think this is the kind of school characterization that was being attacked on Volokh’s page.
    • It’s very different for someone in a long term debate to frame their position as a contrast with others and invite response. If the other side steps up, you get a debate that sharpens the distinctions, brings into greater clarity the points of agreement, and highlights the open issues that neither side is confident in. It also creates a collection of documented disagreements, documented conflicting predictions and therefore a foundation for scientific research that can influence the debate. This is what I saw in psychology (first-hand, watching leaders in the field structure their work around controversy).
    • We know full well that we’re making mistakes in our characterization of the other views. We aren’t intentionally making mistakes, and we correct the ones that we finally realize are mistakes, but nevertheless, we have incorrect assumptions and conclusions within our approach to testing and in our understanding of the other folks’ approaches to testing. Everybody else in the field is making mistakes too. The stronger the debate — I don’t mean the nastier, I mean the more seriously we do it, the better we research and/or consider our responses, etc. — the more of those mistakes we’ll collectively bring to the surface and the more opportunities for common ground we will discover. That won’t necessarily lead to the One True School ever, because some of the differences relate to key human values (for example, think some of us have deep personal-values differences in our notions of the relative worths of the individual human versus the identified process ). But it might lead to the next generation of debate, where a few things are indeed accepted as foundational by everyone and the new disagreements are better informed, with a better foundation of data and insight.
  • The dialectic doesn’t work very well if the other side won’t play.
    • Pretending that there aren’t deep disagreements won’t make them go away
  • Even if the other side won’t engage, the schools-approach creates a set of organizing heuristics. We have a vast literature. There are over 1000 theses and dissertations in the field. There are conferences, magazines, journals, lots of books, and new links (e.g. TDD) with areas of work previously considered separate. It’s not possible to work through that much material without imposing simplifying structures. The four-schools (I prefer five-schools) approach provides one useful structure. (“All models are wrong; some models are useful.” — George Box)

It’s 4:27 a.m. Reading what I’ve written above, I think I’m starting to ramble and I know that I’m running out of steam, so I’ll stop here.

Well, one last note.

It was never our intent to use the “schools” notion to demean other people in the field. What we are trying to do is to capture commonalities (of agreement and disagreement).

  • For example, I think there are a lot of people who really like the idea that some group (requirements analysts, business analysts, project manager, programmers) agree on an authoritative specification, that the proper role of Testing is to translate the specification to powerful test cases, automate the tests, run them as regression tests, and report metrics based on these runs.
  • Given that there are a lot of people who share this view, I’d like to be able to characterize the view and engage it, without having to address the minor variations that come up from person to person. Call the collective view whatever you want. The key issues for me are:
    • Many people do hold this view
    • Within the class of people who hold this view, what other views do they hold or disagree with?
    • If I am mischaracterizing the view, it is better for me to lay it out in public and get corrected than push it more privately to my students and clients and not get corrected
  • The fact is, certainly, that I think this view is defective. But I didn’t need to characterize it as a school to think it is often (i.e. in many contexts) a deeply misguided approach to testing. Nor do I need to set it up as a school to have leeway to publicly criticize it.
  • The value is getting the clearest and most authoritatively correct expression of the school that we can, so that if/when we want to attack it or coach it into becoming something else, we have the best starting point and the best peer review process that we can hope to achieve.

Comments are welcome. Maybe we can argue this into a better article.

_
_
_
_
_
_
_
_
_
_
_
_
_

Updating some core concepts in software testing

Tuesday, November 21st, 2006

Most software testing techniques were first developed in the 1970’s, when “large� programs were tiny compared to today.

Programmer productivity has grown dramatically over the years, a result of paradigmatic shifts in software development practice. Testing practice has evolved less dramatically and our productivity has grown less spectacularly. This divergence in productivity has profound implications—every year, testers impact less of the product. If we continue on this trajectory, our work will become irrelevant because its impact will be insignificant.

Over the past few years, several training organizations have created tester certifications. I don’t object in principle to certification but the Body of Knowledge (BoK) underlying a certificate has broader implications. People look to BoKs as descriptions of good (or at least current) attitudes and practice.

I’ve been dismayed by the extent to which several BoKs reiterate the 1980’s. Have we really made so little progress?

When we teach the same basics that we learned, we provide little foundation for improvement. Rather than setting up the next generation to rebel against the same dumb ideas we worked around, we should teach our best alternatives, so that new testers’ rebellions will take them beyond what we have achieved.

One popular source of orthodoxy is my own book, Testing Computer Software (TCS). In this article, I would like to highlight a few of the still-influential assumptions or assertions in TCS, in order to reject them. They are out of date. We should stop relying on them.

Where TCS was different

I wrote TCS to highlight what I saw as best practices (of the 1980’s) in Silicon Valley, which were at odds with much of the received wisdom of the time:

  • Testers must be able to test well without authoritative (complete, trustworthy) specifications. I coined the phrase, exploratory testing, to describe a survival skill.
  • Testing should address all areas of potential customer dissatisfaction, not just functional bugs. Because matters of usability, performance, localizability, supportability, (these days, security) are critical factors in the acceptability of the product, test groups should become skilled at dealing with them. Just because something is beyond your current skill set doesn’t mean it’s beyond your current scope of responsibility.
  • It is neither uncommon nor unethical to defer (choose not to fix) known bugs. However, testers should research a bug or design weakness thoroughly enough and present it carefully enough to help the project team clearly understand the potential consequences of shipping with this bug.
  • Testers are not the primary advocates of quality. We provide a quality assistance service to a broader group of stakeholders who take as much pride in their work as we do.
  • The decision to automate a test is a matter of economics, not principle. It is profitable to automate a test (including paying the maintenance costs as the program evolves) if you would run the manual test so many times that the net cost of automation is less than manual execution. Many manual tests are not worth automating because they provide information that we don’t need to collect repeatedly.
  • Testers must be able to operate effectively within any software development lifecycle—the choice of lifecycle belongs to the project manager, not the test manager. In addition, the waterfall model so often advocated by testing consultants might be a poor choice for testers because the waterfall pushes everyone to lock down decisions long before vital information is in, creating both bad decisions and resistance to later improvement.
  • Testers should design new tests throughout the project, even after feature freeze. As long as we keep learning about the product and its risks, we should be creating new tests. The issue is not whether it is fair to the project team to add new tests late in the project. The issue is whether the bugs those tests could find will impact the customer.
  • We cannot measure the thoroughness of testing by computing simple coverage metrics or by creating at least one test per requirement or specification assertion. Thoroughness of testing means thoroughness of mitigation of risk. Every different way that the program could fail creates a role for another test.

The popularity of these positions (and the ones they challenged) waxes and wanes, but at least they are seen as mainstream points of view.

Where TCS 3 would (will) be different

TCS Editions 1 and 2 were written in a moderate tone. In retrospect, my wording was sometimes so gentle that readers missed key points. In addition, some of TCS 2’s firm positions were simply mistaken:

  • It is not the primary purpose of testing to find bugs. (Nor is it the primary purpose of testing to help the project manager make decisions.) Testing is an empirical investigation conducted to provide stakeholders with information about the quality of the software under test. Stakeholders have different informational needs at different times, in different situations. The primary purpose of testing is to help those stakeholders gain the information they need.
  • Testers should not attempt to specify the expected result of every test. The orthodox view is that test cases must include expected results. There are many stories of bugs missed because the tester simply didn’t recognize the failure. I’ve seen this too. However, I have also seen cases in which testers missed bugs because they were too focused on verifying “expectedâ€? results to notice a failure the test had not been designed to address. You cannot specify all the results—all the behaviors and system/software/data changes—that can arise from a test. There is value in documenting the intent of a test, including results or behaviors to look for, but it is important to do so in a way that keeps the tester thinking and scanning for other results of the test instead of viewing the testing goal as verification against what is written.
  • Procedural documentation probably offers little training value. I used to believe testers would learn the product by following test scripts or by testing user documentation keystroke by keystroke. Some people do learn this way, but others (maybe most) learn more from designing / running their own experiments than from following instructions. In science education, we talk about this in terms of the value of constructivist and inquiry-based learning. There’s an important corollary to this that I’ve learned the hard way—when you create a test script and pass it to an inexperienced tester, she might be able to follow the steps you intended, but she won’t have the observational skills or insights that you would have if you were following the script instead. Scripts might create a sequence of actions but they don’t create cognition.
  • Software testing is more like design evaluation than manufacturing quality control. A manufacturing defect appears in an individual instance of a product (like badly wired brakes in a car). It makes sense to look at every instance in the same ways (regression tests) because any one might fail in a given way, even if the one before and the one after did not. In contrast, a design defect appears in every instance of the product. The challenge of design QC is to understand the full range of implications of the design, not to look for the same problem over and over.
  • Testers should not try to design all tests for reuse as regression tests. After they’ve been run a few times, a regression suite’s tests have one thing in common: the program has passed them all. In terms of information value, they might have offered new data and insights long ago, but now they’re just a bunch of tired old tests in a convenient-to-reuse heap. Sometimes (think of build verification testing), it’s useful to have a cheap heap of reusable tests. But we need other tests that help us understand the design, assess the implications of a weakness, or explore an issue by machine that would be much harder to explore by hand. These often provide their value the first time they are run—reusability is irrelevant and should not influence the design or decision to develop these tests.
  • Exploratory testing is an approach to testing, not a test technique. In scripted testing, a probably-senior tester designs tests early in the testing process and delegates them to programmers to automate or junior testers to run by hand. In contrast, the exploratory tester continually optimizes the value of her work by treating test-related learning, test design, test execution and test result interpretation as mutually supportive activities that run in parallel throughout the project. Exploration can be manual or automated. Explorers might or might not keep detailed records of their work or create extensive artifacts (e.g. databases of sample data or failure mode lists) to improve their efficiency. The key difference between scripting and exploration is cognitive—the scripted tester follows instructions; the explorer reinvents instructions as she stretches her knowledge base and imagination.
  • The focus of system testing should shift to reflect the strengths of programmers’ tests. Many testing books (including TCS 2) treat domain testing (boundary / equivalence analysis) as the primary system testing technique. To the extent that it teaches us to do risk-optimized stratified sampling whenever we deal with a large space of tests, domain testing offers powerful guidance. But the specific technique—checking single variables and combinations at their edge values—is often handled well in unit and low-level integration tests. These are much more efficient than system tests. If the programmers are actually testing this way, then system testers should focus on other risks and other techniques. When other people are doing an honest and serious job of testing in their way, a system test group so jealous of its independence that it refuses to consider what has been done by others is bound to waste time repeating simple tests and thereby miss opportunities to try more complex tests focused on harder-to-assess risks.
  • Test groups should offer diverse, collaborating specialists. Test groups need people who understand the application under test, the technical environment in which it will run (and the associated risks), the market (and their expectations, demands, and support needs), the architecture and mechanics of tools to support the testing effort, and the underlying implementation of the code. You cannot find all this in any one person. You can build a group of strikingly different people, encourage them to collaborate and cross-train, and assign them to project areas that need what they know.
  • Testers may or may not work best in test groups. If you work in a test group, you probably get more testing training, more skilled criticism of your tests and reports, more attention to your test-related career path, and stronger moral support if you speak unwelcome truths to power. If you work in an integrated development group, you probably get more insight into the development of the product, more skilled criticism of the impact of your work, more attention to your broad technical career path, more cross-training with programmers, and less respect if you know lots about the application or its risks but little about how to write code. If you work in a marketing (customer-focused) group, you probably get more training in the application domain and in the evaluation of product acceptability and customer-oriented quality costs (such as support costs and lost sales), more attention to a management-directed career path, and more sympathy if programmers belittle you for thinking more like a customer than a programmer. Similarly, even if there is a cohesive test group, its character may depend on whether it reports to an executive focused on testing, support, marketing, programming, or something else. There is no steady-state best place for a test group. Each choice has costs and benefits. The best choice might be a fundamental reorganization every two years to diversify the perspectives of the long-term staff and the people who work with them.
  • We should abandon the idea, and the hype, of best practices. Every assertion that I’ve made here has been a reaction to another that is incompatible but has been popularly accepted. Testers provide investigative services to people who need information. Depending on the state of their project, the ways in which the product is being developed, and the types of information the people need, different practices will be more appropriate, more efficient, more conducive to good relations with others, more likely to yield the information sought—or less.

This paper has been a summary of a talk I gave at KWSQA last month and was written for publication in their newsletter. For additional details, see my paper, The Ongoing Revolution in Software Testing, available at www.kaner.com/pdfs/TheOngoingRevolution.pdf.

Final Exam for Software Testing 2 Class

Sunday, March 9th, 2003

Final Exam for Software Testing 2 Class

People often ask me about the difference between commercial certification in software testing and university education. I explain that the tester certification exams typically test what people have memorized rather than what they can do.

Several people listen to this politely but don’t understand the distinction that I’m making.

I just finished teaching Software Testing 2 at Florida Tech. This is our first round with the course. Next year’s edition will be tougher. (As we work out the kinks, the course will cover more and better each time we teach it, for at least three more teachings.) Even though the course was relatively easy, I think our course’s final exam illustrates the difference between a certification exam and a university exam.

Comments welcome. Send them to me at weblog@kaner.com

FINAL EXAM–SOFTWARE TESTING 2
April 17 – 25, 2003
Due April 25, 2003. I will accept late exams – without late penalty— up to 5:00 p.m. on May 1. No exams will be accepted after 5 p.m. May 1. It is essential that you work on this exam ALONE with no input or assistance from other people. You MAY NOT discuss your progress or results with other students in the class.

MAIN TASK
Use Ruby to build a high volume automated test tool to check the mathematics in Open Office spreadsheet

Total points available = 100

RUBY PROGRAM DEVELOPMENT — 25 points
1. Your development of the Ruby program should be test-driven. Use testunit (or runit) to test the Ruby program. Show several iterations in the test-driven development.

PICK YOUR FUNCTIONS — 10 points

2. You will test OpenOffice 1.0 by comparing its results to results you get from Microsoft Excel.

2a Choose five mathematical or financial functions that take one or two parameters

2b. Choose five mathematical or financial functions that take many parameters (at least 3)

INDIVIDUAL FUNCTION TESTS — 25 Points

3. Your program should generate random inputs that you will feed as parameter values to the functions that you have selected:
For each function, run 100 tests as follows

* Generate the input(s) for this function. The set you use should be primarily valid, but you should try some invalid values as well.
* Determine whether a given input is a valid or invalid input and reflect this in your output
* Evaluate the function in OpenOffice
* Evaluate the function in Excel
* Compare the results
* Determine whether the results are sufficiently close
* Summarize the results, across all 100 tests of this function

CONSTRUCT FORMULAS — 20 Points

4. Now test formulas that combine functions from the 10 functions you have used so far.

4a. Create and test 5 interestingly complex formulas. Evaluate them with 100 tests each, as you did for functions in Part 3.

RANDOMLY CONSTRUCTED FORMULAS — 20 Points

5 Now test random formulas using the same 10 functions you have used so far.

5a For 100 test cases, randomly create a formula, and randomly generate VALID input data. From here,

* Evaluate the formula in OpenOffice
* Evaluate the formula in Excel
* Compare the results
* Determine whether the results are sufficiently close
* Summarize the results of these 100 tests

BONUS PROBLEM — 20 Points

6. In questions 4 and 5, you probably discovered that you could supply a function with an input value that was valid, but then the function evaluated to a value that was not valid for the function that took this as input.

For example log (cosine (90 degrees)) is undefined. The initial input (90 degrees) is valid. Cosine evaluates to 0, which is valid, but log(0) is undefined and so cosine(90) is invalid as an input for log.

Describe a strategy that you would use to guarantee that the formula evaluates to a valid, numeric result.