Archive for the ‘standards’ Category

Schools of Software Testing: A Debate with Rex Black

Monday, August 24th, 2015

Last year, Rex and I had a debate at STPCon on the legitimacy and value of the concept of “schools of software testing.” We recently obtained a recording of the debates and merged it with slides to create a video. You can find additional slides and notes here: Kaner’s STPCon Debate Slides.

Rex Black expressed a few gripes about the concept of “schools” and about the way this concept has been applied in our field.

Schools or Strategies?

Rex’s first point — I think the central point of his presentation — is to acknowledge that there are different approaches to software testing but to argue that we should think of these as differences in strategy rather than divisions into schools. In his view (as I understand it), different people have different preferred strategies for dealing with testing situations. Some people can shift among strategies, choosing the best one for a given situation. As Rex sees it, looking at our field as a collection of schools is divisive and counterproductive.

I think his idea of conflicting (or alternative) strategies is appealing — plausible, but incomplete in a fundamental way.

The problem is that it ignores the social dynamics of the field, which is exactly what we are trying to capture with the idea of “schools.”

People tend to cluster. They find other people whose views or whose personal styles are compatible with theirs. They learn more from people in their cluster, they pay more attention to them, listen more closely to their advice and criticisms. Sometimes people cluster around an intellectually coherent point of view and organize their thinking about their work in terms of that view. At that point, we have the beginnings of a school. It is not just a strategy; it is an approach that is supported by a strong peer group.

This basic kind of clustering is so common that we barely notice it. Sometimes it becomes more pronounced and several of the clusters become more broadly influential.

Fields tend to swing between extremes of high (apparent) cohesiveness to high fragmentation. The evolution takes time.

  • At the high-apparent-cohesiveness extreme, everyone agrees (or pretends to agree) with a dominant view. There is not much controversy. Progress is incremental and not very creative.
  • At the high-fragmentation extreme, people have stopped listening to each other. They squander their creativity on better ways to promote an approach that they see as The One True Way, and to insult or shout down anyone who disagrees with it. There isn’t much progress at this extreme either. People are too busy scoring points about the basics of the field (or the basics of their controversy).

Several authors identify these extremes and describe them as unproductive. Neither extreme promotes an attitude of paying constructive attention to other views and gaining insights from them or taking risks to develop a new approach. (See my slides and notes for references.)

Between the extremes, you have creative tension and a lot of research (or skill development) that tries to get to the factual questions: what works, what happens, what costs, what benefits, what else can be done?

The idea that fields often organize themselves into schools is not controversial. It’s not something special to software testing. You see it in education, business, psychology, physics (etc., etc.)

It’s also common for the members of the dominant school to see themselves as the entire field. They often see other groups that try to differentiate themselves from the main stream as self-promoting spinoffs, as advocates for minor variations from core views that “everyone” shares. One of the reasons that people will intentionally form and announce a school is to create a rallying point. They want a place where likeminded people can share views without being drowned out by the dominating majority, a platform for publishing and refining their set of ideas.

These rallying points are even more important when the field is engaging in political work. When I say “political,” I mean anything involving power and control. For example, standards committees are political. My experience with the IEEE software engineering standards committees is that I can become a member but my views will have no impact on the standard. The way that people who hold minority views gain more impact is to organize, so that many people together intentionally say the same things. This has an impact. For example, agile approaches (I think of contextual thinking and exploratory testing/development as agile approaches) are much more acceptable than they were 20 years ago. That is largely because of advocacy by many people, speaking together. Political work requires political action.

You can hear more about social dynamics in the debate.

Unfortunate Misbehavior

Rex’s central argument is that the characterization of the field in terms of conflicting schools is inaccurate and would be better replaced by a description of alternative strategies. Along with this argument comes a complaint, which I see as the emotional charge behind his argument. He complains about harsh statements from some people who call themselves Context-Driven and call themselves leaders of the Context-Driven School. I think he’s well-justified in feeling that some people are behaving badly and that they have treated him badly.

If you see yourself as a member of the Context-Driven School, let me suggest that as individuals, we get to choose how far we go down the path of divisiveness:

  • We can choose to compare a school of thought to a religion, but we don’t have to say that.
  • We can choose to say that anyone who isn’t a proponent of the school can’t understand what we have to say, but we don’t have to say that.
  • We can choose to say that everyone belongs to a school (even the people who insist they do not), but we don’t have to say that.

Statements like these are not factual and, to the best of my knowledge, they are not rooted in facts. They reflect choices about how people with differing views should interact.

I think some of the people who say things like this would market themselves more honestly (and in my view, tarnish the Context-Driven Testing brand less) if they would identify themselves as the Rapid Software Testing (TM) school. I would disagree with their approach and their tone, but I wouldn’t feel obliged to assert that such views are not context-driven (see for example, my posts Censure People for Disagreeing with Us? and Context-Driven Testing is Not a Religion and Contexts Differ: Recognizing the Difference between Wrong and WRONG. )

More details of my responses to Rex’s complaints are in the debate itself and in the notes I prepared before the debate at Kaner’s STPCon Debate Slides.

— Cem Kaner

Credentialing in Software Testing: Elaborating on my STPCon Keynote

Thursday, May 9th, 2013

A couple of weeks ago, I talked about the state of software testing education (and software testing certification) in the keynote panel at STPCon. My comments on high-volume test automation and qualitative methods were more widely noticed, but I think the educational issues are more significant.

Here is a summary:

  1. The North American educational systems are in a state of transition.
  2. We might see a decoupling of formal instruction from credentialing.
  3. We are likely to see a dispersion of credentialing—-more organizations will issue more diverse credentials.
  4. Industrial credentials are likely to play a more significant role in the American economy (and probably have an increased or continued-high influence in many other places).

If these four predictions are accurate, then we have thinking to do about the kinds of credentialing available to software testers.

Transition

For much of the American population, the traditional university model is financially unsustainable. We are on the verge of a national credit crisis because of the immensity of student loan debt.

As a society, we are experimenting with a diverse set of instructional systems, including:

  • MOOCs (massive open online courses)
  • Traditionally-structured online courses with an enormous diversity of standards
  • Low-cost face-to-face courses (e.g. community colleges)
  • Industrial courses that are accepted for university credit
  • Traditional face-to-face courses

Across these, we see the full range from easy to hard, from no engagement with the instructor to intense personal engagement, from little student activity and little meaningful feedback to lots of both. There is huge diversity of standards between course structures and institutions and significant diversity within institutions.

  • Many courses are essentially self-study. Students learn from a book or a lecturer but they get no significant assignments, feedback or assessments. Many people can learn some topics this way. Some people can learn many topics this way. For most people, this isn’t a complete solution, but it could be a partial one.
  • Some of my students prosper most when I give them free rein, friendly feedback and low risk. In an environment that is supportive, provides personalized feedback by a human, but is not demanding, some students will take advantage of the flexibility by doing nothing, some students will get lost, and some students will do their best work.
  • The students who don’t do well in a low-demand situation often do better in a higher-demand course, and in my experience, many students need both—-flexibility in fields that capture their imagination and structure/demand in fields that are less engrossing or that a little farther beyond the student’s current knowledge/ability than she can comfortably stretch to.

There is increasing (enormous) political pressure to allow students to take really-inexpensive MOOCs and get course credit for these at more expensive universities. More generally, there is increasing pressure to allow students to transfer courses across institutions. Most universities allow students to transfer in a few courses, but they impose limits in order to ensure that they transfer their culture to their students and to protect their standards. However, I suspect strongly that the traditional limits are about to collapse. The traditional model is financially unsustainable and so, somewhere, somehow, it has to crack. We will see a few reputable universities pressured (or legislated) into accepting many more credits. Once a few do it, others will follow.

In a situation like this, schools will have to find some other way to preserve their standards—-their reputations, and thus the value of their degree for their graduates.

Seems likely to me that some schools will start offering degrees based on students’ performance on exit exams.

  • A high-standards institution might give a long and complex set of exams. Imagine paying $15,000 to take the exam series (and get grades and feedback) and another $15,000 if you pass, to get the degree.
  • At the other extreme, an institution might offer a suite of multiple-guess exams that can be machine-graded at a much lower cost.

The credibility of the degree would depend on the reputation of the exam (determined by “standards” combined with a bunch of marketing).

Once this system got working, we might see students take a series of courses (from a diverse collection of providers) and then take several degrees.

Maybe things won’t happen this way. But the traditional system is financially unsustainable. Something will have to change, and not just a little.

Decoupling Instruction from Credentialing

The vision above reflects a complete decoupling of instruction from credentialing. It might not be this extreme, but any level of decoupling creates new credentialing pressures / opportunities in industrial settings.

Instruction

Instruction consists of the courses, the coaching, the internships, and any other activities the students engage in to learn.

Credentialing

Credentials are independently-verifiable evidence that a person has some attribute, such as a skill, a type of knowledge, or a privilege.

There are several types of credentials:

  • A certification attests to some level of competency or privilege. For example,
    • A license to practice law, or to do plumbing, is a certification.
    • An organization might certify a person as competent to repair their equipment.
    • An organization might certify that, in their opinion, a person is competent to practice a profession.
  • A certificate attests that someone completed an activity
    • A certificate of completion of a course is a certificate
    • A university degree is a certificate
  • There are also formal recognitions (I’m sure there’s a better name for this…)
    • Awards from professional societies are recognitions
    • Granting someone an advanced type of membership (Senior Member or Fellow) in a professional society is a recognition
    • Election to some organizations (such as the American Law Institute or the Royal Academy of Science) is a recognition
    • I think I would class medals in this group
  • There are peer recognitions
    • Think of the nice things people say about you on Linked-In or Entaggle
  • There are workproducts or results of work that are seen as honors
    • You have published X many publications
    • You worked on the development team for X

The primary credentials issued by universities are certificates (degrees). Sometimes, those are also certifications.

Dispersion of Credentialing

Anyone can issue a credential. However, the prestige, credibility, and power of credentials vary enormously.

  • If you need a specific credential to practice a profession, then no matter who endorses some other credential, or how nicely named that other credential is, it still won’t entitle you to practice that profession.
  • Advertising that you have a specific credential might make you seem more prestigious to some people and less prestigious to other people.

It is already the case that university degrees vary enormously in meaning and prestige. As schools further decouple instruction from degrees, I suspect that this variation will be taken even more seriously. Students of mine from Asia, and some consultants, tell me this is already the case in some Asian countries. Because of the enormous variation in quality among universities, and the large number of universities, a professional certificate or certification is often taken more seriously than a degree from a university that an employer does not know and respect.

Industrial Credentials

How does this relate to software testing? Well, if my analysis is correct (and it might well not be), then we’ll see an increase in the importance and value of credentialing by private organizations (companies, rather than universities).

I don’t believe that we’ll see a universally-accepted credential for software testers. The field is too diverse and the divisions in the field are too deep.

I hope we’ll see several credentialing systems that operate in parallel, reflecting different visions of what people should know, what they should believe, what they should be able to do, what agreements they are willing to make (and be bound by) in terms of professional ethics, and what methods of assessing these things are appropriate and in what depth.

Rather than seeing these as mutually-exclusive competing standards, I imagine that some people will choose to obtain several credentials.

A Few Comments On Our Current State

Software Testing has several types of credentials today. Here are notes on a few. I am intentionally skipping several that feel (to me) redundant with these or about which I have nothing useful to say. My goal is to trigger thoughts, not survey the field.

ISTQB

ISTQB is currently the leading provider of testing certifications in the world. ISTQB is the front end of a community that creates and sells courseware, courses, exams and credentials that align with their vision of the software testing field and the role of education within it. I am not personally fond of the Body of Knowledge that ISTQB bases its exams on. Nor am I fond of their approach to examinations (standardized tests that, to my eyes, emphasize memorization over comprehension and skill). I think they should call their credentials certificates rather than certifications. And my opinion of their marketing efforts is that they are probably not legally actionable, but I think they are misleading. (Apart from those minor flaws, I think ISTQB’s leadership includes many nice people.)

It seems to me that the right way to deal with ISTQB is to treat them as a participant in a marketplace. They sell what they sell. The best way to beat it is to sell something better. Some people are surprised to hear me say that because I have published plenty of criticisms of ISTQB. I think there is lots to criticize. But at some point, adding more criticism is just waste. Or worse, distraction. People are buying ISTQB credentials because they perceive a need. Their perception is often legitimate. If ISTQB is the best credential available to fill their need, they’ll buy it. So, to ISTQB’s critics, I offer this suggestion.

Industrial credentialing will probably get more important, not less important, over the next 20 years. Rather than wasting everyone’s time whining about the shortcomings of current credentials, do the work needed to create a viable alternative.

Before ending my comments on ISTQB, let me note some personal history.

Before ASTQB (American ISTQB) formed, a group of senior people in the community invited me into a series of meetings focused on creating a training-and-credentialing business in the United States. This was a private meeting, so I’m not going to say who sponsored it. The discussion revolved around a goal of providing one or more certification-like credentials for software testers that would be (this is my summary-list, not theirs, but I think it reflects their goals):

  • reasonably attainable (people could affort to get the credential, and reasonably smart people who worked hard could earn it),
  • credible (intellectually and professionally supported by senior people in the field who have earned good reputations),
  • scalable (it is feasible to build an infrastructure to provide the relevant training and assessment to many people), and
  • commercially viable (sufficient income to support instructors, maintainers of the courseware and associated documentation, assessors (such as graders of the students and evaluators of the courses), some level of marketing (because a credential that no one knows about isn’t worth much), and in the case of this group, money left over for profit. Note that many dimensions of “commercial viability” come into play even if there is absolutely no profit motive—-the effort has to support itself, somehow).

I think these are reasonable requirements for a strong credential of this kind.

By this point, ISEB (the precursor to ISTQB) had achieved significant commercial success and gained wide acceptance. It was on people’s minds, but the committee gave me plenty of time to speak:

  • I talked about multiple-choice exams and why I didn’t like them.
  • I talked about the desirability of skill-based exams like Cisco’s, and the challenges of creating courses to support preparation for those types of exams.
  • I talked about some of the thinking that some of us had done on how to create a skill-based cert for testers, especially back when we were writing Lessons Learned.

But there was a problem in this. My pals and I had lots of scattered ideas about how to create the kind of certification system that we would like, but we had never figured out how to make it practical. The ideas that I thought were really good were unscalable or too expensive. And we knew it. If you ask today why there is no certification for context-driven testing, you might hear a lot of reasons, including principled-sounding attacks on the whole notion of certification. But back then, the only reason we didn’t have a context-driven certification was that we had no idea how to create one that we could believe in.

So, what I could not provide to the committee was a reasonably attainable, credible, scalable, commercially viable system—-or a plan to create one.

The committee, quite reasonably, chose to seek a practical path toward a credential that they could actually create. I left the committee. I was not party to their later discussions, but I was not surprised that ASTQB formed and some of these folks chose to work with it. I have never forgotten that they gave me every chance to propose an alternative and I did not have a practical alternative to propose.

(Not long after that, I started an alternative project, Open Certification, to see if we could implement some of my ideas. We did a lot of work in that project, but it failed. They really weren’t practical. We learned a lot, which in turn helped me create great courseware—-BBST—-and other ideas about certification that I might talk about more in the future. But the point that I am trying to emphasize here is that the people who founded ASTQB were open to better ideas, but they didn’t get them. I don’t see a reason to be outraged against them for that.)

The Old Boys’ Club

To some degree, your advancement in a profession is not based on what you know. It’s based on who you know and how much they like you.

We have several systems that record who likes like you, including commercial ones (LinkedIn), noncommercial ones (Entaggle), and various types of marketing structures created by individuals or businesses.

There are advantages and disadvantages to systems based on whether the “right” people like you. Networking will never go away, and never should, but it seems to me that

Credentials based on what you know, what you can do, or what you have actually done are a lot more egalitarian than those based on who says they respect you.

I value personal references and referrals, but I think that reliance on these as our main credentialing system is a sure path to cronyism and an enemy of independent thinking.

My impression is that some people in the community have become big fans of reputation-systems as the field’s primary source of credentials. In at least some of the specific cases, I think the individuals would have liked the system a whole lot less when they were less influential.

Miagi-do

I’ve been delighted to see that the Miagi-do school has finally come public.

Michael Larsen states a key view succinctly:

I distrust any certification or course of study that doesn’t, in some way, actually have a tester demonstrate their skills, or have a chance to defend their reasoning or rationale behind those skills.

In terms of the four criteria that I mentioned above, I think this approach is probably reasonably attainable, and to me, it is definitely credible. Whether it scalable and commercially viable has yet to be seen.

I think this is a clear and important alternative to ISTQB-style credentialing. I hope it is successful.

Other Ideas on the Horizon

There are other ideas on the horizon. I’m aware of a few of them and there are undoubtedly many others.

It is easy to criticize any specific credentialing system. All of them, now known or coming soon, have flaws.

What I am suggesting here is:

  • Industrial credentialing is likely to get more important whether you like it or not.
  • If you don’t like the current options, complaining won’t do much good. If you want to improve things, create something better.


This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

Don’t censure people for disagreeing with us

Monday, October 15th, 2012

I just posted “Censure people for disagreeing with us?” to context-driven-testing.com.

I don’t usually cross-reference posts on that blog, but I feel pretty strongly about this….

 

 

 

What is context-driven testing?

Saturday, January 3rd, 2009

James, Bret and I published our definition of context-driven testing at http://www.context-driven-testing.com/.

Some people have found the definition too complex and have tried to simplify it, attempting to equate the approach with Agile development or Agile  testing, or with the exploratory style of software testing. Here’s another crack at a definition:

Context-driven testers choose their testing objectives, techniques, and deliverables (including test documentation) by looking first to the details of the specific situation, including the desires of the stakeholders who commissioned the testing. The essence of context-driven testing is project-appropriate application of skill and judgment. The Context-Driven School of testing places this approach to testing within a humanistic social and ethical framework.

Ultimately, context-driven testing is about doing the best we can with what we get. Rather than trying to apply “best practices,” we accept that very different practices (even different definitions of common testing terms) will work best under different circumstances.

Contrasting context-driven with context-aware testing.

Many testers think of their approach as context-driven because they take contextual factors into account as they do their work. Here are a few examples that might illustrate the differences between context-driven and context-aware:

  • Context-driven testers reject the notion of best practices, because they present certain practices as appropriate independent of context. Of course it is widely accepted that any “best practice” might be inapplicable under some circumstances. However, when someone looks to best practices first and to project-specific factors second, that may be context-aware, but not context-driven.
  • Similarly, some people create standards, like IEEE Standard 829 for test documentation, because they think that it is useful to have a standard to lay out what is generally the right thing to do. This is not unusual, nor disreputable, but it is not context-driven. Standard 829 starts with a vision of good documentation and encourages the tester to modify what is created based on the needs of the stakeholders. Context-driven testing starts with the requirements of the stakeholders and the practical constraints and opportunities of the project. To the context-driven tester, the standard provides implementation-level suggestions rather than prescriptions.

Contrasting context-driven with context-oblivious, context-specific, and context-imperial testing.

To say “context-driven” is to distinguish our approach to testing from context-oblivious, context-specific, or context-imperial approaches:

  • Context-oblivious testing is done without a thought for the match between testing practices and testing problems. This is common among testers who are just learning the craft, or are merely copying what they’ve seen other testers do.
  • Context-specific testing applies an approach that is optimized for a specific setting or problem, without room for adjustment in the event that the context changes. This is common in organizations with longstanding projects and teams, wherein the testers may not have worked in more than one organization. For example, one test group might develop expertise with military software, another group with games. In the specific situation, a context-specific tester and a context-driven tester might test their software in exactly the same way. However, the context-specific tester knows only how to work within her or his one development context (MilSpec) (or games), and s/he is not aware of the degree to which skilled testing will be different across contexts.
  • Context-imperial testing insists on changing the project or the business in order to fit the testers’ own standardized concept of “best” or “professional” practice, instead of designing or adapting practices to fit the project. The context-imperial approach is common among consultants who know testing primarily from reading books, or whose practical experience was context-specific, or who are trying to appeal to a market that believes its approach to development is the one true way.

Contrasting context-driven with agile testing.

Agile development models advocate for a customer-responsive, waste-minimizing, humanistic approach to software development and so does context-driven testing. However, context-driven testing is not inherently part of the Agile development movement.

  • For example, Agile development generally advocates for extensive use of unit tests. Context-driven testers will modify how they test if they know that unit testing was done well. Many (probably most) context-driven testers will recommend unit testing as a way to make later system testing much more efficient. However, if the development team doesn’t create reusable test suites, the context-driven tester will suggest testing approaches that don’t expect or rely on successful unit tests.
  • Similarly, Agile developers often recommend an evolutionary or spiral life cycle model with minimal documentation that is developed as needed. Many (perhaps most) context-driven testers would be particularly comfortable working within this life cycle, but it is no less context-driven to create extensively-documented tests within a waterfall project that creates big documentation up front.

Ultimately, context-driven testing is about doing the best we can with what we get. There might not be such a thing as Agile Testing (in the sense used by the agile development community) in the absence of effective unit testing, but there can certainly be context-driven testing.

Contrasting context-driven with standards-driven testing.

Some testers advocate favored life-cycle models, favored organizational models, or favored artifacts. Consider for example, the V-model, the mutually suspicious separation between programming and testing groups, and the demand that all code delivered to testers come with detailed specifications.

Context-driven testing has no room for this advocacy. Testers get what they get, and skilled context-driven testers must know how to cope with what comes their way. Of course, we can and should explain tradeoffs to people, make it clear what makes us more efficient and more effective, but ultimately, we see testing as a service to stakeholders who make the broader project management decisions.

  • Yes, of course, some demands are unreasonable and we should refuse them, such as demands that the tester falsify records, make false claims about the product or the testing, or work unreasonable hours. But this doesn’t mean that every stakeholder request is unreasonable, even some that we don’t like.
  • And yes, of course, some demands are absurd because they call for the impossible, such as assessing conformance of a product with contractually-specified characteristics without access to the contract or its specifications. But this doesn’t mean that every stakeholder request that we don’t like is absurd, or impossible.
  • And yes, of course, if our task is to assess conformance of the product with its specification, we need a specification. But that doesn’t mean we always need specifications or that it is always appropriate (or even usually appropriate) for us to insist on receiving them.

There are always constraints. Some of them are practical, others ethical. But within those constraints, we start from the project’s needs, not from our process preferences.

Context-driven techniques?

Context-driven testing is an approach, not a technique. Our task is to do the best testing we can under the circumstances–the more techniques we know, the more options we have available when considering how to cope with a new situation.

The set of techniques–or better put, the body of knowledge–that we need is not just a testing set. In this, we follow in Gerry Weinberg’s footsteps:  Start to finish, we see a software development project as a creative, complex human activity. To know how to serve the project well, we have to understand the project, its stakeholders, and their interests. Many of our core skills come from psychology, economics, ethnography, and the other socials sciences.

Closing notes

Reasonable people can advocate for standards-driven testing. Or for the idea that testing activities should be routinized to the extent that they can be delegated to less expensive and less skilled people who apply the routine directions. Or for the idea that the biggest return on investment today lies in improving those testing practices intimately tied to writing the code. These are all widely espoused views. However, even if their proponents emphasize the need to tailor these views to the specific situation, these views reflect fundamentally different starting points from context-driven testing.

Cem Kaner, J.D., Ph.D.
James Bach

Software Customer Bill of Rights

Wednesday, August 27th, 2003

As the software infrastructure has been going through chaos, reporters (and others) have been called me several times to ask what our legal rights are now and whether we should all be able to sue Microsoft (or other vendors who ship defective software or software that fails in normal use).

Unfortunately, software customer rights have eroded dramatically over the last ten years. Ten years ago, the United States Court of Appeals for the Third Circuit flatly rejected a software publisher’s attempts to enforce contract terms that it didn’t make available to the customer until after the customer ordered the software, paid for it, and took delivery. Citing sections of Uniform Commercial Code’s Article 2 (Law of Sales) that every law student works through in tedious detail in their contracts class, the Court said that the contract for sale is formed when the customer agrees to pay and the seller agrees to deliver the product. Terms presented later are proposals for modification to the contract. The customer has the right to keep the product and use it under the original terms, and refuse to accept the new, seller favorable terms. Other courts (such as the United States Court of Appeals for the First Circuit) cited this case as representative of the mainstream interpretation of Article 2. Under this decision, and several decisions before it, shrinkwrapped contracts and clickwrapped contracts (the ones you have to click “OK” to in order to install the product) would be largely unenforceable.

The software publishing community started aggressively trying to rewrite contract law in about 1988, after the United States Court of Appeal for the Fifth Circuit rejected a shrinkwrapped restriction on reverse engineering. That effort resulted in the Uniform Computer Information Transactions Act and a string of court decisions, starting in 1995, that make it almost impossible to hold a software company liable for defects in its product (unless the defect results in injury or death)– even defects that it knew about when it shipped the product — and also very difficult to hold a mass-market seller liable for false claims about its product. (For background, see InfoWorld and Kaner’s Software Engineering & UCITA in the section on Forcing Products Liability Suits into Arbitration).

So what should we do about this? There are some strong feelings to hold companies fully accountable for losses caused by their products’ defects.

I’d rather stand back from the current crisis, consider the legal debates over the last 10 years, and make some modest suggestions that could go a long way toward restoring integrity and trust — and consumer confidence, consumer excitement, and sales — in this stalled marketplace.

1. Let the customer see the contract before the sale. It should be easy for customers of mass-market software products and computer information contracts to compare the contract terms for a product, or for competing products, before they download, use, or pay for a product. (NOTE: This is not a radical principle. American buyers of all types of consumer products that cost more than $15 are entitled to see the contract (at a minimum, the warranties in the contract) before the sale).

2. Disclose known defects. The software company or service provider must disclose the defects that it knows about to potential customers, in a way that is likely to be understood by a typical member of the market for that product or service.

3. The product (or information service) must live up to the manufacturer’s and seller’s claims. A statement by the vendor (manufacturer or seller) about the product that is intended to describe the product to potential customers is a warranty, a promise that the product will work as described. Warranties by sellers are defined in UCC Article 2 Section 313. Manufacturer liability is clarified (manufacturers are liable for claims they make in ads and in the manual) in a set of clarifying amendments to Article 2 that have now been approved by the Permanent Editorial Board for the UCC, which will be probably introduced in state legislatures starting early in 2004. In addition, it is a deceptive trade practice in most states (perhaps all) to make claims about the product that are incorrect and make the product more attractive. For example, under the Uniform Deceptive Trade Practices Act, Section 2(5) it is unlawfully deceptive to represent “that goods or services have sponsorship, approval, characteristics, ingredients, uses, benefits, or quantities that they do not have.” UCITA was designed to pull software out of the scope of laws like this, which it did by defining software transactions as neither goods nor services but licenses. We should get rid of this cleverly created ambiguity.

4. User has right to see and approve all transfers of information from her computer. Before an application transmits any data from the user’s computer, the user should have the ability to see what’s being sent. If the message is encrypted, the user should be shown an unencrypted version. On seeing the message, the user should be able to refuse to send it. This may cause the application to cancel a transaction (such as a sale that depends on transmission of a valid credit card number), but transmission of data from the user’s machine without the user’s knowledge or in spite of the user’s refusal should be prosecutable as computer tampering.

5. A software vendor may not block customer from accessing his own data without court approval.

6. A software vendor may not prematurely terminate a license without court approval. The issue of vendor self-help (early termination of a software contract without a supporting court order) was debated at great length through the UCITA process. To turn off a customer’s access to software that runs on the customer’s machine, the vendor should get an injunction (a court order). However, perhaps a vendor should be able to deny a customer access to software running on the vendor’s machine without getting an injunction (though the unfairly-terminated customer should be allowed to get a court order to restore its access.)

7. Mass-market customers may criticize products, publish benchmark study results, and make fair use of a product. Some software licenses bar the customer from publishing criticisms of the product, or publishing comparisons of this product with others or using screenshots or product graphics to satirize or disparage the product or the company. Under the Copyright Act, you are allowed to reproduce part of a copyrighted work in order to criticize it, comment on it, teach from it, and so on. Software publishers shouldn’t be able to use “license” contracts to bar their mass-market customers from the type of free speech that the Federal laws (including the Copyright Act) have consistently protected.

8. The user may reverse engineer the software. Software licenses routinely ban reverse engineering, but American courts routinely say that reverse engineering is fair use, permissible under the Copyright Act. Recently, California courts have started enforcing no-reverse-engineering bans in software licenses. This is a big problem. Software publishers claim that reverse engineering is a way to steal their work. There are many legitimate, important uses of reverse engineering, such as exposing security holes in the software, exposing and fixing bugs (that the manufacturer might not fix because it is unwilling, unable, or no longer in business), exposing copyright violations or fraudulent claims by the manufacturer, or achieving interoperability (making the product work with another product or device). These benefit or protect the customer but do not help anyone unfairly compete with the manufacturer.

9. Mass-market software should be transferrable. Under the First Sale Doctrine, someone who buys a copyrighted product (like a book) can lend it, sell it, or give it away without having to get permission of the original publisher or author. Similarly, if you buy a car, you don’t have to get the car manufacturer’s permission to lend, sell, or donate your car. UCITA Section 503(2)allows mass-market software publishers to take away their customers’ rights to transfer software that they’ve paid for. It should not.

10. When software is embedded in a product, the law governing the product should govern the software. Think of the software that controls the fuel injectors in a car. Should the car manufacturer be allowed to license this software instead of supplying it under the basic contract for the sale of the car? (Paper 1) (Paper 2). Under extended pressure from the software industry, the Article 2 amendments specify that software (information) is not “goods” and so is not within the scope of Article 2, even though courts have been consistently applying Article 2 to packaged software transactions since 1970. In the 48 states that have not adopted UCITA, this amendment would mean that there is no law in that state that governs transactions in software. The courts would have to reason by analogy, either to UCITA or to UCC 2 or to something else. When a product includes both hardware (the car) and software (the fuel injector software, braking software, etc.), amended Article 2 allows the court to apply Article 2 to the hardware and other law to the software. Thus different warranty rules could apply and even though you could sell your car used without paying a fee to the manufacturer, you might not be able to transfer the car’s software without paying that fee. Vendors should not be able to play these kinds of games. “Embedded software” is itself a highly ambiguous term. In those cases in which it is unclear whether software is embedded or not, the law should treate the software as embedded.

SWEBOK Problems, Part 2

Friday, June 27th, 2003

I’m going through my detailed review of SWEBOK, in preparation for the June 30 comment deadline. The bulk of this blog entry is a page-by-page commentary / critique that I will submit to the SWEBOK review. Before that, here are some contextual comments.

Please get involved in this review process, which will close on June 30. Go to www.swebok.org to sign up and download swebok, and submit comments.

Time is short, and you might not be able to read all of SWEBOK in time to submit detailed comments. That’s OK. I recommend that you download it, skim the parts that are most interesting, realize the extent to which it excludes modern methods (such as agile development) and, if this bothers you, you can submit a very simple comment.

You can say something like:

“I have reviewed SWEBOK. I manage software development staff
and play a role in their training and supervision. SWEBOK does not
provide a good basis for the structure or detail of the knowledge
that I want my staff to have. It emphasizes attitudes and practices
that are not helpful on my projects and it downplays or skips
attitudes and practices that I consider essential. I consider this
document fundamentally flawed, and if I could vote to disapprove it,
I would.”

Obviously, you would tailor this to your circumstances.

=======

Overall Concerns with SWEBOK

=======

SWEBOK was created using a strange process. They started with the table of contents of the main software engineering textbooks — as if there is a strong relationship between software engineering as described in textbooks and software engineering as practiced in the field. From there, SWEBOK developed as delta’s from these books. SWEBOK is focused on “established traditional practices recommended by many organizations” and is intended to exclude “practices used only for certain types of software” and to exclude “innovative practices tested and used by some organizations and concepts still being developed and testing in research organizations.”

Somehow, we conclude that mutation testing is an established traditional practice that is widely recommended and used, but we exclude scenario testing. We conclude that massive tombs of test documentation are an established traditional practice widely followed, even though rants about bad test documentation are, to say the least, a common theme of comment in the community. And we exclude consideration of requirements analytical techniques (or project context considerations) that might help you make a sensible engineering determination of what types of documentation, at what level of depth, for what target reader, are worth the expense of creating and (possibly) maintaining them.

In the SWEBOK, page IX, we learn that the purpose of SWEBOK is to provide a “consensually-validated characterization.” In this, SWEBOK has failed utterly. Only a few people (about 500) were involved in the project. It alienated leading people, such as Grady Booch who recently said (in a post to the extremeprogramming listserv on yahoogroups, dated 5/31/2003)

“I was one of those 500 earlier reviewers – and
my comments were entirely negative. The SWEBOK
I reviewed was well-intentioned but misguided,
naive, incoherent, and just flat wrong in so
many dimensions.”

The Association for Computing Machinery was a co-authoring, co-sponsoring organization of SWEBOK at one point. But ACM eventually commissioned task forces to study the document and the rationale underlying the effort, and the result was a deeply critical evaluation and ACM’s withdrawal from the project.

ACM is the largest association of computing professionals in the world. How can it be said, with a straight face, that SWEBOK is a consensually-validated document when the ACM, including leaders of the ACM Special Interest Group in Software Engineering, determine that the approach to creating the document and the result are fundamentally flawed? See http://www.acm.org/serving/se_policy/ for details.

The SWEBOK response (front page of www.swebok.org) was this:

“The following motion was unanimously adopted
on April 18 2001.

“The Industrial Advisory Board of the Guide
to the Software Engineering Body of Knowledge
(SWEBOK) project recognizes that due process
was followed in the development of the Guide
(Trial Version) and endorses the position that
the Guide (Trial Version) is ready for field
trials for a period of two years.”

I love this phrasing. “Due process” has a fine, legalistic, officious ring to it. It sounds good, and (speaking as an attorney who has experience using lawyerly terms like “due process”) it will intimidate or silence some critics. But if your acceptance criterion is consensus, and you have obviously failed to achieve consensus, then a term like “due process” is just so much smoke to confuse the issue. If the process fails to produce the required product, the fact (if it is a fact) that the process was followed doesn’t make the failure a non-failure.

==========

Detailed Evaluation Comments

==========

Here are my page-by-page comments on the testing section of SWEBOK. I have reviewed other parts of SWEBOK and have concerns about them too, but life is short and precious and there is only so much of mine that I am willing to dedicate to a criticism of a fundamentally flawed piece of work.

===========

Page 69. The document praises the role of testing as a preventative technique throughout the lifecycle, but doesn’t consider test-driven development, which I believe is the single most important type of early testing.

============

Page 69. The document defines software testing as follows: “Software testing consists of the dynamic verification of the behavior of a program on a finite series of test cases, suitably selected from the usually infinite executions [sic] domain, against the specified expected behavior.”

In fact, a great deal of testing is done without specifying expected behavior. Here are three examples:

(1) Exploratory testing is done partially to discover the behavior.

(2) Some types of high volume random testing check for indicators of failure without having any model of expected behavior. (It would be ludicrous to say that their model of the expected behavior is that the program will not have memory leaks, stack corruption or other specific defects.)

(3) Most forms of user testing fail to involve comparison to specified behavior, and the user who protests that a certain behavior in a certain context is inappropriate, confusing or unacceptable, might well not be able to articulate her expectations, even after the failure, let along specify them in advance. (In many cases, expectation is driven by similarity to other experiences and we know from research in cognitive psychology, e.g. from Lee Brooks’ lab at McMaster, that many people would be unable to describe the similarity space that is the basis for their judgments.)

These types of test are widely used by testers, and they have been widely used for decades. Good testing sometimes involves comparison to specified expected behavior, but it often does not.

=============

Page 70. The document provides a laundry list of test techniques with no obvious selection or exclusion principle.

One of the oddities on page 70 is the assertion that “branch coverage is a popular test technique.” HUH? What makes this a technique? You achieve branch coverage by running any group of tests that take the program through each branch. We could achieve this test objective (achieve a certain level of coverage) via scenario tests, domain tests, various other types of tests. We could achieve the objective by running tests at the unit level or the fully integrated system level. SWEBOK says that coverage “should not be considered _per_se_ as the objective of testing.” I share that opinion — it is a poor objective. But it appears to be the objective of many people who drive their testing in order to achieve this result. The fact that the authors of SWEBOK don’t like coverage as an objective doesn’t make it a technique.

Another strange page 70 assertion is that test techniques used primarily to expose failures are primarily domain testing. SWEBOK says, “These techniques variously attempt to “breakâ€? the program, by running one [or more] test[s] drawn from identified classes of (deemed equivalent) executions. The leading principle underlying such techniques is being as much systematic as possible in identifying a representative set of program behaviors (generally in the form of subclasses of the input domain).”

Yes, domain testing is the most commonly described technique in textbooks. It is simple, easy to understand, and easy to teach. But risk-based testing, scenario testing, stress testing, specification-focused testing, high-volume automated testing, state-model-based testing, transaction-flow testing, and heuristic-based exploratory testing are other examples of testing techniques that go after bugs in the product. Why ignore these in favor of domain testing?

Additionally, even though the textbooks most often talk in terms of subclasses of input domains, it is important and fruitful to also analyze the program in terms of its output domains, its interfaces with other devices (disk, printer, etc.) and other processes, and its internal intermediate-result variables. By focusing students (or worse, professionals) on input domains to the exclusion of the others, we virtually blind them to important problems. As the ACM pointed out in its evaluation of SWEBOK, a “body of knowledge” should be focused on competent practice, not on the descriptions in introductory books.

SWEBOK (p. 70) also tells us that to avoid confusing test objectives and techniques, we must clearly distinguish between measures of the thoroughness of testing and measures of the software under test (such as measures of reliability). SWEBOK also tells us that when we conduct testing “in view of a specific purpose”, then that specific purpose is the “test objective.” SWEBOK lists examples of reliability measurement, usability evaluation, and contractor’s acceptance as important examples of objectives. I think those are fine objectives. But if a regulatory requirement specifies that I must achieve a certain type of coverage, and I design tests to meet that requirement, then meeting that coverage target IS my specific purpose for those tests. I can think of several circumstances under which achievement of a level of thoroughness of a certain type of testing IS the specific purpose for running a set of tests. What principled basis does SWEBOK have in (apparently) rejecting these as invalid objectives?

One (failing) rationale for deciding that achieving a certain level of (some type of) coverage is not a valid objective is that we strive to achieve coverage in order to help achieve something else, such as reliability. That sounds good (in spite of the fact that in some situations, we strive to achieve a certain level of coverage primarily in order to be able to say we achieved that level of coverage), but the reasoning generalizes inconveniently. For example, in many organizations, we do usability testing in order to help achieve customer acceptance. So usability evaluation should not be a valid test objective (because in some contexts, coverage is to reliability as, in other contexts, usability is to acceptance). But SWEBOK specifically blesses usability evaluation and contractor (customer) acceptance as valid test objectives.

A test objective is the objective that drives the design and execution of the tests. Different objectives are appropriate in different contexts. SWEBOK has no business dismissing some objectives as non-objectives.

=================

SWEBOK page 70 states that “Software testing is a very expensive and labor-intensive part of development. For this reason, tools are instrumental for automated test execution, test results logging and evaluation, and in general to support test activities. Moreover, in order to enhance cost-effectiveness ratio, a key issue has always been pushing test automation as much as possible.”

The idea that we should be “pushing test automation as much as possible” has been a source of much mischief and misunderstanding. I frequently hear from experienced testers that their highest bug find rates are achieved using manual or computer-assisted one-time-use tests. I don’t believe that it is to our advantage to stop doing this type of testing. Instead, I think we should be “pushing” cost-benefit analysis and implementing automation when it is cost-effective. For additional discussion of cost/benefit analysis for automation, see my papers, Architectures of Test Automation (https://kaner.com/testarch.html) and Avoiding Shelfware: A Manager’s View of Automated GUI Testing (https://kaner.com/pdfs/shelfwar.pdf).

The idea that we are actually automating testing is itself a misconception. Let’s consider the most common form of test “automation”, GUI regression-level “automation”. It involves these tasks

TASK / DONE BY

Analyze the specification and other docs for ambiguity or other indicators of potential error

–> Done by humans

Analyze the source code for potential errors or other things to test

–> Done by humans

Design test cases

–> Done by humans

Create test data

–> Done by humans

Run the tests the first time

–> Done by humans

Evaluate the first result

–> Done by humans

Report a bug from the first run

–> Done by humans

Debug the tests

–> Done by humans

Save the code

–> Done by humans

Save the results

–> Done by humans

Document the tests

–> Done by humans

Build a traceability matrix (tracing test cases back to specs or requirements)

–> Done by humans or by another tool (not the GUI tool)

Select the test cases to be run

–> Done by humans

Run the tests

–> The Tool does it

Record the results

–> The Tool does it

Evaluate the results

–>The Tool does it, but if there’s an apparent failure, a human re-evaluates the results.

Measure the results (e.g. performance measures)

–> Done by humans or by another tool (not the GUI tool)

Report errors

–> Done by humans

Update and debug the tests

–> Done by humans

When we see how many of the testing-related tasks are being done by people or, perhaps, by other testing tools, we realize that the GUI-level regression test tool doesn’t really automate testing. It just helps a human to do the testing. Rather than calling this “automated testing”, we should call it computer-assisted testing. I am not showing disrespect for this approach by calling it computer-assisted testing. Instead, I’m making a point–there are a lot of tasks in a testing project and we can get help from a hardware or software tool to handle any subset of them. GUI regression test tools handle some of these tasks very well. Other tools or approaches will handle a different subset

We should use tools in software testing, but we should not strive for complete automation. It is the wrong goal.

======================

Page 71-73 diagrams

These pages provide some diagrams of the structure of the rest of the testing chapter. Several of the items on these pages are troubling, but I’ll refer to them in the context of the more detailed discussions in the rest of the chapter.

=====================

Page 74, definitions of fault, failure and defect.

I don’t disagree with the definitions of fault and failure. However, SWEBOK equates “fault” and “defect”, where “fault” refers to the underlying cause of a malfunction.

I have two objections to the use of the word defect.

(a) First, in use, the word “defect” is ambiguous. For example, as a matter of law, a product is dangerously defective if it behaves in a way that would be unexpected by a reasonable user and that behavior results in injury. This is a failure-level definition of “defect.” Rather than trying to impose precision on a term that is going to remain ambiguous despite IEEE’s best efforts, our technical language should allow for the ambiguity.

(b) Second, the use of the word “defect” has legal implications. While some people advocate that we should use the word “defect” to refer to “bugs”, a bug-tracking database that contains frequent assertions of the form “X is a defect” may severely and unnecessarily damage the defendant software developer/publisher in court. In a suit based on an allegation that a product is defective (such as a breach of warranty suit, or a personal injury suit), the plaintiff must prove that the product is defective. If a problem with the program is labeled “defect” in the bug tracking system, that label is likely to convince the jury that the bug is a defect, even if a more thorough legal analysis would not result in classification of that particular problem as “defect” in the meaning of the legal system.

We should be cautious in the use of the word “defect”, recognize that this word will be interpreted in multiple ways by technical and nontechnical people, and recognize that a company’s use of the word in its engineering documents might unreasonably expose that company to legal liability.

=====================

Page 75, The Oracle Problem

As Doug Hoffman has pointed out, oracles are heuristic. When we use an oracle to determine that a program has passed or failed a test, we are comparing the program to some model or expectation on some number of dimensions. The program can fail on other dimensions that the oracle is blind to. For example, if we use Excel as the oracle for a spreadsheet under development, and evaluate the formula A1+A2, we might set cell A1 to 2 and cell A2 to 3 in both program and get 5 in both cases. In terms of the oracle, our spreadsheet has passed this test. But suppose the new spreadsheet took 5 hours to evaluate A1+A2. This is unacceptable, but the oracle is oblivious to it.

The characterization of oracles as tools to decide whether a program behaved correctly on a given test, without discussion of the inherent fallibility of all oracles, has led to serious misunderstandings.

=======================

Page 75, Testability

The third common meaning of testability in practice refers to the extend to which the program is easy to test and the test results are easy to interpret. Thus a highly testable program provides a high level of _control_ (the tester might be able to change data, start the program at any point, etc.) and a high level of _visibility_ (the tester can determine the state of the program, the value of specific variables, etc.)

This is widely used and it guides negotiation among testers and programmers regarding the support for testing that will be designed into a program.

=======================

Page 75, Test Levels (Unit, Integration, System)

SWEBOK says “Clearly, unit testing starts after coding is quite mature, for instance after a clean compile.”

This is 100% in disagreement with the practice of test-driven development, which requires the programmer to write a unit test immediately _before_ writing code that will enable the program to pass the test.

I think that test-driven development is the most important advance in the craft of testing of the past 30 years. This, more than any of the other flaws, illustrates the extent to which SWEBOK is blind to modern good practice.

========================

Page 75, Test Levels (Unit, Integration, System)

Much testing now involves API-level driving of a component developed by someone else. I think this is neither unit, nor integration, nor system testing.

========================

Page 75, Test Levels (Unit, Integration, System)

In test-driven development, the programmer implements a test and then writes the code needed for the program to pass the test. (More precisely, implement the test, run the program and see how it fails, write the simplest code that can pass the test, run the program and fix until the program passes the test, then refactor the code and retest.

The first use of these tests is to guide and check the initial implementation of a few lines of code. In that sense, they are “unit” tests and they are often referred to as unit tests. However, as programming tools evolve, these tests often look at the lines of code in the context of several other features. Using tools like Ward Cunningham’s FIT, for example, programmers might create many “unit” tests that are also “integration” (multi-variable, multi-function) and “system” (check whether an intended benefit will actually be provided to the end user).

I don’t think we should ban the use of the terms “unit”, “integration” and “system.” However, thinking about these as THE THREE LEVELS of testing, as defining the 3 targets of testing, leads to blind spots with respect to the nature of targets and the potential focus of individual tests.

========================

Page 75-77, Objectives of Testing

I think the categorization of testing concepts is strikingly odd. Here, I note the oddness of the list of testing objectives.

SWEBOK lists
– conformance testing, which it equates to functional testing
– reliability testing
– usability testing
– acceptance / qualification testing
– installation testing
– alpha and beta testing
– regression testing
– performance testing
– stress testing
– back-to-back testing
– recovery testing
– configuration testing, and
– usability testing.

This seems like a laundry list. Back-to-back testing looks more like a technique than an objective. Several others of these could be classed in different ways.

More important, what determines inclusion on this list?

For example, I think of objectives of testing as including:
– minimize liability risk
– decision support (help a project manager determine whether
to release the product)
– compliance with regulations or the expectations of a
regulatory inspector (this may or may not involve
conformance with a specification)
– assess and improve safety
– determine the nature of problems likely to arise in long
use of the product
– expose defects
– block premature release of a product
– improve the user experience

and many others.

In looking at the various lists in this document, I cannot divine a principle that governs inclusion versus exclusion.

As a teacher, I think that many things off the list are more important than the things on the list.

===================

Page 76. Regression testing

SWEBOK defines regression as

“the selective retesting of a system or component to verify that modifications have not caused unintended effects. In practice, the idea is to show that previously passed tests still do.” It then refers to “the assurance given by regression testing” [. . .] “that the software’s behavior is unchanged.”

An earlier version of SWEBOK noted that “regression testing” is commonly used to mean retesting the program to determine whether a bug was fixed. This is a popular definition. Why is it excluded?

Another common definition of regression testing is retesting the program to determine whether changes have caused fixed bugs to be re-broken.

If SWEBOK is a description of what is generally known and done, it should not select one definition and objective and exclude other common ones without even mentioning them.

Next, consider the idea that we run a bunch of tests again and again in order to assure ourselves that software behavior is unchanged. The regression test suite is a relatively tiny collection of tests that can only look at a relatively small proportion of the system’s behaviors. Our gamble is that the software’s behavior is not changing in ways missed by the regression tests. I have never seen a convincing theoretical argument that a regression test suite will expose most or all possible behavioral changes of a program and therefore I reject the notion that regression testing provides “assurance.”

The initial definition of regression testing is quite different from the idea of “assuring that system behavior is unchanged.” The definition is “verify that modifications have not caused unintended effects.” Let’s restate this definition in less antiquated terms — let’s talk about RISK instead of VERIFICATION.

That yields the idea that regression testing is done to mitigate the risk of unintended side effects of change.

A risk-based view of regression testing no longer requires us to use the same test, time and again, to study aspects of the program that have been previously tested. You can change data or combine a test with other tests or do other creative things to search for side effects. By varying the tests, you give yourself the chance to find previously-missed bugs — problems that were in the software all along but that you have missed with your tests so far — along with catching some side-effects. You are increasing coverage instead of mindlessly repeating the same old thing.

And under the risk-based view, you don’t fool yourself or defraud others with the idea that a small set of tests verifies some quality characteristic of the product. On page 75, SWEBOK approvingly noted Dijkstra’s insight that you can show the presence of bugs, but not their absence. This insight is flatly inconsistent with claims that we do any type of testing to “assure” or “verify”.

This conceptual contradiction illustrates the extent to which SWEBOK (as reflected in the testing section) seems to be more like a dumpster of testing concepts than like a conceptually coherent presentation.

As a dumpster (a disorganized collection of a miscellany of concepts) it is problematic because of the number and nature of things that have been kept out of the dumpster.

(NOTE: The analysis of regression testing above is mainly an analysis of system-level regression testing. I very much like the idea of creating an extensive suite of unit-level change-detectors, tests that we mainly create test-first, that cover every line and branch that we write. The difference between unit-level regression tests and system level is cost. The programmer runs the change-detector suite, every time she recompiles the code. If her change breaks the build, she fixes it immediately.

The labor cost associated with an independent tester discovering a regression error (which might be a bug in the program but is very often a test announcing that it must be changed to conform to a revised design) is quite high. Counting all people involved in the process, the time from failure through bug reporting to bug evaluation, prioritization, repair, and retesting will often total to an average of 4 labor-hours and or even higher (much higher) in some organizations.

In contrast, with the unit level change-detector suite, the programmer discovers the problem when she compiles the code, and immediately either fixes the code or the test. The labor cost is minimal. The practice is cheap enough that we can use it to support refactoring. The cost associated with traditional system-level regression testing is so high that we could not use it to support refactoring. The high communcation cost drives the cost of late change through the roof (one of the factors of the exponential growth curve of the cost of change over project time) whereas the absence of communication cost associated the unit-level change detectors allows us to make late changes at relatively low cost.

This is an important distinction within the description of a technique that SWEBOK says can be run at the system level or the unit level. We can do regression testing at either level, but their costs, benefits and uses are entirely different. It’s too bad that SWEBOK misses this point.

====================

Page 77 Test Techniques

This is another laundry list that excludes important current techniques, includes techniques that seem to be not widely used, and doesn’t expose any principled basis for inclusion or exclusion.

====================

Page 77 “Ad hoc testing”

SWEBOK says this (which it equates to exploratory testing) is the most widely practiced technique, and then advises a more systematic approach and then says that only experts should do ad hoc testing.

This is a blatant admission that the SWEBOK drafters simply don’t understand the most widely practiced approach to testing. SWEBOK cites my book as its source for testing “based on tester’s intuition and experience” and I believe I am the person who coined the term, “exploratory testing”, so let’s look at what this is. (Much of this material was developed by or with James Bach and many other colleagues over a 20 year period.)

First, exploratory testing involves simultaneous design, execution, and learning about the program. Rather than design tests and then run them, you do some testing, learn from them, learn from other sources, and base your design of next tests on your new insights. Your oracle (set of evaluation criteria) evolve as you learn more.

Second, every competent tester does exploratory testing. If you report a bug and the programmer tells you it was fixed, you do some testing around the fix. One test is the test that exposed the bug in the first place. But if you’re any good, you create additional tests to see if the fix is more general than the specific circumstances reported in the bug report, and to see if there were side effects. These are not pre-planned, pre-specified tests. They are designed, run, evaluated and extended in the moment. I use this testing situation as a basic training ground for junior testers (and classroom students). Surely, it is not something we would leave only to the experts. But SWEBOK tells us that “A more systematic approach is advised” and “ad hoc testing might be useful but only if the tester is really expert!”

There is such a thing as systematic exploratory testing.

I’m not trying to write SWEBOK here, but rather to support my assertion that SWEBOK seem to be clueless about a body of work that even it describes as the most widely practiced technique in the field.

If SWEBOK is intended to describe the current body of knowledge and practice in the field, its cluelessness about the most widely practiced approach is inexcusable.

===============

Page 80

My final comment on SWEBOK’s testing section has to do with its comments on test documentation.

“Documentation is an integral part of the formalization of the test process. The IEEE standard for Software Test Documentation [829] provides a good description of test documents and of their relationship with one another and with the testing process. . . . Test documentation should be produced and continually updated, at the same standards as other types of documentation in development.”

This is pious-sounding claptrap, religious doctrine rather than engineering. There is far too much of this in SWEBOK.

Is IEEE Standard 829 a good description?

In my experience, I have seen 829 applied by many commercial software companies or commercial companies that were developing software as part of their support process for their main business. I have never seen a case in which a commercial software application was benefitted more than it was harmed by application of standard 829. Several colleagues of mine have had the same experience. Bach, Pettichord and I discuss the problems in Lessons Learned in Software Testing.

Normally, an engineering body of knowledge includes assertions that are based on theory and tested by experiment. There was no theoretical basis underlying Standard 829. I am not aware of any experimental research of the costs and benefits associated with the application of 829.

Test document is expensive.

Testing is subject to a very difficult constraint. We have an infinity of potential tests and a very limited amount of time in which to imagine, create, document, and evaluate the results of running a few of those many possible tests.

Time spent generating paperwork is time not available for test implementation, test tool development, test execution and evaluation.

Good practice, therefore, probably pushes toward cost-benefit evaluation on a case by case basis. If a certain type of document is so valuable for the current project that it is worth taking time away from competing tasks in order to create the document, create the document.

Before we can pronounce that test documentation should be continually updated, we should discover why the test documentation is being created and how it will be used in the future. Maybe updating is called for. Maybe not. Maybe the documentation should be up to the standards of other documentation on the project, but maybe not. It depends on who will use the documents, and for what purpose.

=====================

In Sum

My time is limited.

I could write pages and pages more about the weaknesses of SWEBOK, but I think it would be pointless.

I agree with the ACM appraisal that the SWEBOK started with a fundamentally flawed approach. The result continues to be fundamentally flawed.

The call for comments on SWEBOK asked for appraisal of SWEBOK as it relates to teaching.

I teach courses in software testing. SWEBOK is not a good reference point for them.

SWEBOK’s criteria for inclusion and exclusion of topics is unsatisfactory. Many of the most important topics in my testing courses, (such as test driven development, API-level testing, scenario testing, skilled exploration, the difference in objectives and cost/benefit for unit-level regression suites and system-level regression suites, risk-based testing, a risk-based approach to domain testing instead of the stale, 40-year-old, boundary/equivalence approach documented in SWEBOK), effective bug reporting, using requirements analysis techniques to drive decisions about the types of artifacts to be generated, and on and on, are absent from SWEBOK. Much of what is present in SWEBOK is organized strangely, is dated, and many of the techniques (etc.) are marginal in terms of how often they are used and what value they actually provide.

I have appraised SWEBOK against my course notes, which I update regularly (and which I am updating again this summer). My conclusion was that SWEBOK’s flaws are so severe that, on balance, it is a less-than-worthless reference point for discovery of opportunities to improve the notes.

I also teach courses in software metrics.

Measurement, as studied in other fields, normally involves extensive study of validity of measures and threats to validity. One of the most important validity questions is how can we tell whether the measure actually measures what it purports to measure. What model or theory (and associated empirical support) relates the number we obtain (a complexity level of “10”) to the underlying attribute we are trying to measure?

Another critical question involves side effects of measurement. Robert Austin’s book, Measuring and Managing Performance in Organizations, discusses this in detail.

The review of measurement theory in SWEBOK (page 174 and on) skips lightly past these issues and provides a laundry list of metrics, including many that are invalid and unvalidated “metrics” — to the extent that it is clear what attribute they are actually intended to measure. The main value of the SWEBOK treatment of measurement is that it is concise. It makes an excellent “straw man”, something I can hand out and enthusiastically criticize. This is probably not the educational use we would hope to obtain from something that is SUPPOSED TO serve as the basis for a licensing exam.

=============

CLOSING ASSERTION

THERE IS NO BALLOTING PROCESS FOR SWEBOK THIS TIME. IF THERE WAS, I WOULD VOTE THAT SWEBOK SHOULD NOT BE ACCEPTED, WITH OR WITHOUT MODIFICATION.

— Cem Kaner
— Professor of Software Engineering
— Florida Institute of Technology

IEEE’s “Body of Knowledge” for Software Engineering

Tuesday, June 17th, 2003

SOFTWARE ENGINEERING’S “BODY OF KNOWLEDGE�

The IEEE Computer Society has been developing its own statement of the Software Engineering Body of Knowledge (SWEBOK). They are now calling for a review of SWEBOK, which you can participate in at www.swebok.org.
According to their Call for Reviewers (email, May 29, 2003:

“The purpose of the Guide is to characterize the contents of the software engineering discipline, to promote a consistent view of software engineering worldwide, to clarify the place of, and set the boundary of, software engineering with respect to other disciplines, and to provide a foundation for curriculum development and individual licensing material. All deliverables are available without any charge at www.swebok.org.”

SWEBOK pushes the traditional, documentation-heavy approaches. I have read several drafts of it over the years but I chose to not be involved in the official process because I believed that:

  • The document had little merit and probably wouldn’t get much better;
  • My comments wouldn’t have much influence.
  • These grand, in my view highly premature, efforts to standardize and regulate the field come and go but don’t really have enough influence to worry about.

In retrospect, I think that keeping away from SWEBOK was a mistake. I think it has the potential to do substantial harm. I urge you to get involved in the SWEBOK review, make your criticisms clear and explicit, and urge them in writing to abandon this project. Even though this will have little influence with the SWEBOK promoters, it will create a public record of controversy and protest. Because SWEBOK is being effectively pushed as a basis for licensing software engineers and evaluating / accrediting software engineering degree programs, a public record of controversy may play an important role.

LICENSING

Should software engineers be licensed as engineers?
One of the key reasons for the creation of the SWEBOK was to support political moves to license software engineers. This is from the SWEBOK Project Overview

“A core body of knowledge is pivotal to the development and accreditation of university curricula and the licensing and certification of professionals. Achieving consensus by the profession on a core body of knowledge is a key milestone in all disciplines and has been identified by the Coordinating Committee as crucial for the evolution of software engineering toward a professional status. The Guide to the Software Engineering Body of Knowledge project is an initiative completed under the auspices of this Committee to reach this consensus. “

In a series of studies, the Association for Computing Machinery recommended against licensing. I was a member of one of the ACM’s study panels, the one that considered the relationship between licensing and safety-critical software.
I think that licensing engineers in our profession today is premature and likely to do serious harm to the profession. I don’t say this lightly. I’ve thought about it for a long time, and from many perspectives (I am a full Professor of Software Engineering, an Attorney who has a strong interest in malpractice law, and a person who has almost 20 years experience in commercial software development (programming, designing user interfaces, testing, tech writing, managing programmers, testers, and writers, consulting, negotiating contracts, etc.).

SWEBOK

The SWEBOK is written as the basis for licensing exams for professional software engineers. If your state requires you to get a license to practice software engineering (and more will, if they are convinced that they can create fair exams based on a consensus document from the profession), the SWEBOK is the document you will have to study.
If the SWEBOK is the basis for the licensing exam, the practices in the SWEBOK will be treated as the basis for malpractice lawsuits. People who do what is called good practice in SWEBOK will be able to defend their practices in court if they are ever sued for malpractice. People who adopt what might be much better practices, but practices that conflict with the SWEBOK, will risk harsh criticism in court. As the basis for a licensing exam, SWEBOK becomes as close to an Official Statement of the approved practices of the field as a licensed profession is going to get.

So what’s in this SWEBOK?

The IEEE SWEBOK is a statement of “generally accepted practices�, which are defined as “established traditional practices recommended by many organizations.� SWEBOK is NOT a document intended to include “specialized� practices, which are “practices used only for certain types of software� nor for “advanced and research� practices, which are “innovative practices tested and used only by some organizations and concepts still being developed and tested in research organizations.
I am most familiar with SWEBOK’s treatments of software testing, software quality and metrics. It endorses practices that I consider wastefully bureaucratic, document-intensive, tedious, and in commercial software development, not likely to succeed. These are the practices that some software process enthusiasts have tried and tried and tried and tried and tried to ram down the throats of the software development community, with less success than they would like.
By promoting these document-centered, rigid practices in a document that serves as the basis for licensing of software engineers, the SWEBOK committee can drive adoption of these practices to a much greater degree than practitioners have accepted voluntarily.
The Association for Computing Machinery assessed SWEBOK and concluded it was seriously flawed and that ACM (originally a partner in development of the SWEBOK) should withdraw from the process. SWEBOK was ultimately adopted as a “consensus� document based on votes from fewer than 350 reviewers, in the face of criticism and walkout by the largest association of computing professionals in the world.

IN SUM

Only 500 people participated in the development of SWEBOK and many of them voiced deep criticisms of it. The balloted draft was supported by just over 300 people (of a mere 340 voting). Within this group were professional trainers who stand to make substantial income from pre-licensing-exam and pre-certification-exam review courses, consulting/contracting firms who make big profits from contracts that (such as government contracts) that specify gold-plated software development processes (of course you need all this process documentation—the IEEE standards say you need it!), and academics who have never worked on a serious development project. There were also experienced, honest people with no conflicts of interest, but when there are only a few hundred voices, the voices of vested interests can exert a substantial influence on the result.
I don’t see a way to vote on the 2003 version of SWEBOK. If I did, I would urge you to vote NO.
But even though you cannot vote to disapprove this document, you can review it, criticize it, and make clear the extent to which it does fails reflect the better practices in your organization.
To the extent that it is clear that there is no consensus around the SWEBOK, engineering societies will be less likely to rely on it in developing licensing exams (and less likely to push ahead with plans to license software engineers), and judges and juries will be less likely to conclude that “It says so in the SWEBOK. That must be what the best minds in the profession have decided is true.�
Please, go to www.swebok.org ASAP.
Comments at www.swebok.org are welcome until July 1, 2003.