Archive for the ‘certification’ Category

Why propose an advanced certification in software testing?

Wednesday, March 26th, 2014

A couple of weeks ago, I posted A proposal for an advanced certification in software testing. There were plenty of comments, on the blog, on Twitter, and in private email to me.

I think the best way to respond to these is with a series of posts, each one focused on a different issue. This first one goes to the fundamental question, Why should we create such a thing?

I used to see certifications as irrelevant (and misleading)

For a long time, when people asked me whether they should get certified in software testing, I said no. I would say that, in my opinion, there is no value in the current certifications.

I know more good testers who are not certified than good ones who are certified. I feel as though I’ve met a whole lot of clueless fools who carry testing certifications.

Many of the exam-review courses teach to the exam and present an oversimplified and outdated view of the field. I think that, from a what-will-you-learn perspective, taking them is a waste of time and money.

It used to seem obvious to me that certification must be irrelevant to a tester’s career.

The market proved me wrong

Unfortunately, my predictions that the community would see the ISTQB/ASQ/QAI-type credential as irrelevant were proved wrong.

The fact that hundreds of thousands of people in our field have decided to get certified demonstrates, in and of itself, that the credential is widely perceived as relevant.

I think that a willingness to discover and publish that you were mistaken is one of the critical traits of a scientist. The history of science is the story of of a never-ending stream of ideas that were well supported at the time—but were proved wrong. They were replaced with better ideas that were more useful and better-supported—and proved wrong too.

It seems to me that I can’t be a great tester (or an adequate scientist) if I am reluctant to subject my beliefs and ideas to the same level of criticism that I apply to the work of others.

In retrospect, I realize that I misread the evolution of certification in 1990 through 2010.

  • The testing community had a growing core of people who had decided to do this work as a career. They believed they were committed to doing good work and that they were good at what they did.
  • The demand for testing services was exploding, with floods of new people who had little background, varying levels of commitment and increasingly inflated salary expectations.
  • Many of the people who saw themselves as professionals were getting tired of being characterized as unskilled, clueless bureaucrats by so many other people in the development community.
  • Many of the people involved in recruiting testers or setting their pay scales don’t know enough about testing to tell the good ones from incompetents who can spin persuasive resumes and interviews.
  • In this environment, even if you are a test manager with really good hiring instincts, you still have the challenge of justifying the salaries you want to pay to people who don’t understand your staff.

Certification was sold as a formal credential, something that demonstrates (at a minimum) that you are committed enough to the field to go through the hassle of getting certified. And as proof that you are at least familiar with the basics of the field and that you are good enough at precision reading to be able to pass a formal exam.

If there is no stronger credential in the field, it is easy to see this as better than nothing.

I think that some of the get-certified sales pitches goes far beyond than this, saying or implying that certification demonstrates that a person has genuine professional competence. I think that goes far beyond what any of these certifications could possibly attest to, but I think that’s the impression that is sometimes encouraged.

We can argue about the motivation and about the marketing. We can speculate endlessly about why someone would spend good money on exam-prep courses so they could get one or more of these certifications.

I think it is more useful to ask whether we can give them better value for their time and money.

One approach: Open Certification

My interest in creating a better alternative to the current certifications is not new. Back in 2006, Mike Kelly and I started hosting workshops to plan an “Open Certification”. The idea was to create a huge, open pool of multiple-choice questions and to examine candidates via a random stratified sample of questions from the pool. However, there were some insurmountable problems:

  • We were determined to not be tied to one proprietary body of knowledge. But consider this example: Suppose we are willing to accept six different widely-used definitions of “test case.” Which one is the right one for this exam? And what if the student encounters (and answers on the basis of) Definition 7? How do we say that one is wrong?
    • The obvious way to deal with this is to write the question to say “Famous Person 1′s definition of test case is …” but what do people have to do to prepare for such an exam? Do they have to memorize 6 different definitions and the names of the people we tie those definitions to? Almost no one could pass such an exam. An even if you could pass it, all the memorizing you would have to do in order to pass it would be an abuse of your time.
  • We were determined, back then, to do something extremely cheap or free. But the development and maintenance costs for the software and questions were going to be very high. Even if we could get volunteer labor to create the first drafts of the exam (and exam site), we would need to do a lot of sustaining engineering. People were going to have to be paid.
  • The exam would be free but with this complex a series of questions, how long would it be before training companies started selling exam prep courses? The cost of the exams is not the big cost factor in the other certifications. It is the cost of the training. Were we kidding ourselves about making a difference here?
  • Finally, there was the most difficult problem. Even if the exam was successful, it would still be a bunch of multiple-choice questions. Our approach to certification wouldn’t be offering any better evidence of deep knowledge or skill than the others.

I forget his exact words, but Mike laid out an important criterion early in the project. If we couldn’t be confident of developing something clearly better than the alternative we were replacing, we shouldn’t bother doing it. As we proceeded, it became clearer and clearer that we were creating something that might be cheaper, but that probably wasn’t better.

Eventually, we pulled the plug on Open Certification.

But that was not abandonment of the idea of a better certification. It was a recognition that we didn’t have a better idea, yet.

In parallel with the Open Certification project, I was transforming BBST from a purely academic course to a very student-challenging industrial course.

One of the really valuable outcomes of the Open Certification meetings was a “standard” for drafting challenging multiple-choice test questions. I applied this to the BBST courses, creating a suite of quiz questions that BBST’s graduates have come to know and love.

But we didn’t stop with multiple-choice. We used multiple-choice as a tutorial tool, not as the core examiner. BBST demanded a much higher level of knowledge and skill than I knew how to get from multiple-choice exams. I concluded that something along these lines was a better way to go.

Another alternative

Rather than trying to replace the ASQ/ISTQB/QAI approach,  I think we can build on it.

  • Let people get one of those credentials. Or let them get some other credential that is challenging but that approaches the field in less simplistic terms. Treat their credential-from-training as a baseline.
  • From here, let the tester present a portfolio of evidence that s/he can do more than just pass an exam or two—that s/he can actually do competent work in the field.

The person who can demonstrate both, mastery of basic training and a competent portfolio gets an advanced certification.

I think this gives us two important advances:

  • It breaks out of the ideological stranglehold that a few vendors have had on credentialing in our field.
  • It presents a richer view of the capabilities and contributions of the person who carries the credential.

This isn’t perfect, but it’s better. I think that has some value.

 

A proposal for an advanced certification in software testing

Monday, March 3rd, 2014

This is a draft of a proposal to create a more advanced, more credible credential (certification) in software testing.

The core idea is a certification based on a multidimensional collection of evidence of education, experience, skill and good character.

  • I think it is important to develop a credential that is useful and informative.
    • I think we damage the reputation of the field if we create a certification that requires only a shallow knowledge of software testing.
    • I think we damage the value of the certification if we exaggerate how much knowledge or skill is required to obtain it.
  • I think it is important to find a way to tolerate different approaches to software testing, and different approaches to training software testers. This proposal is not based on any one favored “body of knowledge” and it is not tied to any one ideology or group of vendors.

The idea presented here is imperfect—as are the other certifications in our field. It can be gamed—as can the others. Someone who is intent on gaining a credential via cheating and fraud can probably get away with it for a while—but the others have security risks too. This certification does not assure that the certified person is competent—neither do the others. The certification does not subject the certified person to formal professional accountability for their work—neither do the others—and even though certificate holders say that they will follow a code of ethics, we have no mechanism for assuring that they do or punishing them if they don’t—and neither do the others.

With all these we-don’t-do-thises and we-don’t-promise-thats, you might think I’m kidding about this being a real proposal. I’m not.

Even if we agree that this proposed certification lacks the kinds of powers that could be bestowed by law or magic, I think it can provide useful information and that it can create incentives that favor higher ethics in job-seeking and, eventually, professional practice. It is not perfect, but I think it is far better than what we have now.

The Proposal

This credential is based on a collection of several different types of evidence that, taken together, indicate that the certificate holder has the knowledge and skill needed to competently perform the usual services provided by a software tester.

Here are the types of evidence. As you read this, imagine that the Certification Body hosts a website that will permanently post a publicly-viewable dossier (a collection of files) for every person certified by that body. The dossier would include everything submitted by an applicant for certification, plus some additional material. Here’s what we’d find in the file.

Authorization by the Applicant

As part of the application, the applicant for Certification would grant the Certification Board permission to publish all of the following materials. The applicant would also sign a legal waiver that would shield the Board from all types of legal action by the applicant / Certified Tester arising out of publication of the materials described below. The waiver will also authorize the Board to exercise its judgment in assessing the application and will shield the Board from legal action by the applicant if the Board decides, in its unfettered discretion, to reject the applicant’s application or to later cancel the applicant’s Certification.

Education (Academic)

The Certified Tester should have at least a minimum level of formal education. The baseline that I imagine is a bachelor’s-level degree in a field relevant to software testing.

  • Some fields, such as software engineering, are obviously relevant to software testing. But what about others like accounting, mathematics, philosophy, physics, psychology, or technical writing? We would resolve this by requiring the applicant for certification to explain in writing how and why her or his education has proved to be relevant to her or his experiences as a tester and why it should be seen as relevant education for someone in the field.
  • The requirement for formal education should be waived if the applicant requests waiver and justifies the request on the basis of a sufficient mix of practical education and professional achievement.

Education (Practical)

The Certified Tester should have successfully completed a significant amount of practical training in software testing. Most of this training would typically be course-based, typically commercial training. Some academic courses in software testing would also qualify. A non-negotiable requirement is successful completion of at least some courses that are considered advanced. “Successful” completion means that the student completed an exam or capstone project that a student who doesn’t know the material would not pass.

  • There is an obvious accreditation issue here. Someone has to decide which courses are suitable and which are advanced.
  • I think that many different types of courses and different topics might be suitable as part of the practical training. For example, suppose we required 100 classroom-hours of training (1 training day = 6 classroom hours). Perhaps 60 of those hours could be in related fields (programming, software metrics, software-related law, project accounting, etc.) but a core would have to be explicitly focused on testing.
  • I think the advanced course hours (24 classroom hours?) would have to be explicitly advanced software testing courses.
  • There is no requirement that these courses come from any particular vendor or that they follow any particular software testing or software development ideology.

Examination

The Certified Tester should have successfully completed a proctored, advanced, examination in software testing.

  • This requirement anticipates competing exams offered by several different groups that endorse different approaches to software testing. Our field does not have agreement on one approach or even one vocabulary. The appearance of agreement that shows up in industry “standards” is illusory. As a matter of practice (I think, often good practice), the standards are routinely ignored by practitioners. Examinations that adopt or endorse these standards should be welcome but not mandatory.

Which exams are suitable and which are advanced?

There is an obvious accreditation issue here. Someone has to decide which exams are suitable and which are advanced.

I am inclined to tentatively define an advanced exam as one that requires as minimum prerequisites (a) successful completion of a specified prior exam and (b) additional education and experience. For example, ISTQB Foundations would not qualify but an ISTQB Advanced or Expert exam might. Similarly, BBST:Foundations would not qualify but BBST:Bug Advocacy might and BBST:Domain Testing definitely should.

An exam might be separate from a course or it might be a final exam in a sufficiently advanced course.

For an exam to be used by a Certified Tester, the organization that offers and grades the exam must provide the Certification Board with a copy of a sample exam. The organization must attest under penalty of perjury that they believe the sample is fairly representative of the scope and difficulty of the actual current exam. This sample will appear on the Certification Board’s website, and be accessible as a link from the Certified Tester’s dossier. (Thus, the dossier doesn’t show the Certified Tester’s actual exam but it does show an exam that is comparable to the actual one.)

What about the reliability and the validity of the exams?

Let me illustrate the problem with two contrasting examples:

  • I think it is fair to characterize ISTQB as an organization that is striving to create highly reliable exams. To achieve this, they are driven toward questions that have unambiguously correct answers. Even in sample essay questions I have seen for the Expert exam, the questions and the expected answers are well-grounded in a published, relatively short, body of knowledge. I think this is a reasonable and respectable approach to assessment and I think that exams written this way should be considered acceptable for this certification.
  • The BBST assessment philosophy emphasizes several other principles over reliability. We expect answers to be clearly written, tightly focused on the question that was asked, with a strong logical argument in favor of whatever position the examinee takes in her or his answer, that demonstrates relevant knowledge of the field. We expect a diversity of points of view. I think it gives the examiner greater insight into the creativity and depth of knowledge of the examinee. I think this is also a reasonable and respectable approach to assessment that we should also consider acceptable for this certification.

There is a tradeoff between these approaches. Approaches like ISTQB’s are focused on the reliability of the exam, especially on between-grader reliability. This is an important goal. The BBST exams are not focused on this. For certification purposes, we would expect to improve BBST reliability by using paired grading (two examiners) but this is imperfect. I would not expect the same level of reliability in BBST exams that ISTQB achieves. However, in my view of the assessment of cognitively complex skills, I believe the BBST approach achieves greater validity. Complicating the issue, there are problems in the measurement of both, reliability and validity, of education-related exams.

The difference here is not just a difference of examination style. I believe it reflects a difference in ideology.

Somehow, the Certification Board will have to find a way to accredit some exams as “sufficiently serious” tests of knowledge even though one is obviously more reliable than the other, one is obviously more tightly based on a published body of knowledge than the other, etc.

Somehow, the Certification Board will have to find a way to refuse to accredit some exams even though they have the superficial form of an exam. In general, I suspect that the Certification Board will cast a relatively broad net and that if groups like ASQ and QAI offer advanced exams, those exams will probably qualify. Similarly, I suspect that a final exam in a graduate-level university course that is an “advanced” software testing course (prerequisite being successful completion of an earlier graduate-level course in testing) would qualify.

Professional Achievement

Professional achievements include publications, honors (such as awards), and other things that indicate that the candidate did something at a professional level.

An applicant for certification does not have to include any professional achievements. However, if the applicant provides them, they will become part of the applicant’s dossier and will be publicly visible.

Some decisions will lie in the discretion of the Certification Board. For example, the Certification Board:

  • might or might not accept an applicant’s academic background as sufficiently relevant (or as sufficiently complete)
  • might or might not accept an applicant’s training-experience portfolio as sufficient or as containing enough courses that are sufficiently related to software testing

In such cases, the Certification Board will consider the applicant’s professional achievements as additional evidence of the applicant’s knowledge of the field.

References

The applicant will provide at least three letters of endorsement from other people who have stature in the field. These letters will be public, part of the Certified Tester’s dossier. An endorsement is a statement from a person that, in that person’s opinion, the applicant has the knowledge, skills and character needed to competently provide the services of a professional software tester. The letter should provide additional details that establish that the endorser knows the knowledge, skill and character of the applicant well enough to credibly make an endorsement.

  • A person of stature is someone who is experienced in the field and respected. For example, the person might be (this is not a complete list)
    • personally known to the Certification Board
    • a Certified Tester
    • a Senior Member or Distinguished Member or Fellow of ACM, ASQ, or IEEE
  • If one of the endorsers withdraws his or her endorsement, that withdrawal will be published in the Certified Tester’s dossier along with the original endorsement (now marked “withdrawn”) and the Certified Tester will be required to get a new endorser.
  • If one of the apparent endorsers contacts the Certification Board and asserts that s/he did not write an endorsement for an applicant and that s/he does not endorse the applicant, and if the apparent endorser provides credible proof of identify, that letter will be published in the Certified Tester’s dossier along with the original letter (now marked “disputed”).

Professional Experience

The applicant will provide a detailed description of his or her professional history that includes at least N years of relevant experience.

  • The applicant must attest that this description is true and not materially incomplete. It will be published as part of the dossier. Potential future employers will be able to check the claims made here against the claims made in the applicant’s application for work with them.
  • The descriptions of relevant positions will include descriptions of the applicant’s role(s) and responsibilities, including typical tasks s/he performed in that position
  • The applicant’s years of relevant experience and years of formal education will interact: Someone with more formal education that is relevant to the field will be able to become certified with less relevant experience (but never less than K years of experience).

Continuing Education

The candidate must engage in professional activities, including ongoing study, to keep the certification.

Code of Ethics

The candidates must agree to abide by a specific Code of Ethics, such as the ACM code. We should foresee this as a prelude to creating an enforcement structure in which a Certified Tester might be censured or certification might be publicly canceled for unethical conduct.

Administrative Issues

Somehow, we have to form a Certification Board. The Board will have to charge a fee for application because the website, the accrediting activities, evaluation of applications, marketing of the certification, etc., will cost money.

Benefits

This collection of material does not guarantee competence, but it does present a multidimensional view of the capability of an experienced person in the field. It speaks to a level of education and professional involvement and to the credibility of self-assertions made when someone applies for a job, submits a paper for publication, etc. I think that the public association of the endorser with the people s/he endorses will encourage most possible endorsers to think carefully about who they want to be permanently publicly identified with. I think the existence of the dossier will discourage exaggeration and fraud by the Certified Tester.

It is not perfect, but I think it will be useful, and better than what I think we have now.

This is not a certification of a baseline of competence in the way that certifications (licenses) work in fields like law, engineering, plumbing, and cosmetology. Those are regulated professions in which the certified person is subject to penalties and civil litigation for conduct that falls below baseline. Software engineering (including software testing) is not a regulated profession, there is no such cause of action in the courts as “software engineering malpractice,” and there are no established penalties for incompetence. There is broad disagreement in the field about whether such regulations should exist (for example, the Association for Computing Machinery strongly opposes the licensing of software engineers while the IEEE seems inclined to support it) and the creation of this certification does not address the desirability of such regulation.

The Current Goal: A Constructive Discussion

This article is a call for discussion. It is not yet a call for action, though I expect we’ll get there soon.

This article follows up an article I wrote last May about credentialing systems. I identified several types of credentials in use in our field and suggested four criteria for a better credential:

  • reasonably attainable (people could affort to get the credential, and reasonably smart people who worked hard could earn it),
  • credible (intellectually and professionally supported by senior people in the field who have earned good reputations),
  • scalable (it is feasible to build an infrastructure to provide the relevant training and assessment to many people), and
  • commercially viable (sufficient income to support instructors, maintainers of the courseware and associated documentation, assessors (such as graders of the students and evaluators of the courses), some level of marketing (because a credential that no one knows about isn’t worth much), and in the case of this group, money left over for profit. Note that many dimensions of “commercial viability” come into play even if there is absolutely no profit motive—-the effort has to support itself, somehow).

I think the proposal in this article sketches a system that would meet those criteria.

A more detailed draft of this proposal was reviewed at the 2014 Workshop on Teaching Software Testing. We did not debate alternative proposals or attempt to reach consensus. The ideas in this paper are not the product of WTST. Nor are they the responsibility of any participant at WTST. However, I am here acknowledging the feedback I got at that meeting and thanking the participants: Scott Allman, Janaka Balasooriya, Rex Black, Jennifer Brock, Reetika Datta, Casey Doran, Rebecca L. Fiedler, Scott Fuller, Keith Gallagher, Dan Gold, Douglas Hoffman, Nawwar Kabbani, Chris Kenst, Michael Larsen, Jacek Okrojek, Carol Oliver, Rob Sabourin, Mike Sowers, and Andy Tinkham. Payson Hall has also questioned the reasoning and offered useful suggestions.

To this point, we have been discussing whether these ideas are worthwhile in principle. That’s important and that discussion should continue.

We have not yet begun to tackle the governance and implementation issues raised by this proposal. It is probably time to start thinking about that.

  • I’m positively impressed by (what I know of) the governance model of ISTQB and wonder whether we should follow that model.
  • I would expect to be an active supporter/contributor to the governance of this project (for example an active member of the governing Board). However—just as I helped found AST but steadfastly refused to run for President of AST—I believe we can find a better choice than me for chief executive of the project.

Comments?

On the design of advanced courses in software testing

Sunday, January 19th, 2014

This year’s Workshop on Teaching Software Testing (WTST 2014) is on teaching advanced courses in software testing. During the workshop, I expect we will compare notes on how we design/evaluate advanced courses in testing and how we recognize people who have completed advanced training.

This post is an overview of one of the two presentations I am planning for WTST.

This presentation will consider the design of the courses. The actual presentation will rely heavily on examples, mainly from BBST (Foundations, Bug Advocacy, Test Design), from our new Domain Testing course, and from some of my non-testing courses, especially statistics and metrics. The slides that go with these notes will appear at the WTST site in late January or early February.

In the education community, a discussion like this would come as part of a discussion of curriculum design. That discussion would look more broadly at the context of the curriculum decisions, often considering several historical, political, socioeconomic, and psychological issues. My discussion is more narrowly focused on the selection of materials, assessment methods and teaching-style tradeoffs in a specialized course in a technical field. The broader issues come into play, but I find it more personally useful to think along six narrower dimensions:

  • content
  • institutional considerations
  • skill development
  • instructional style
  • expectations of student performance
  • credentialing

Content

In terms of the definition of “advanced”, I think the primary agreement in the instructional community is that there is no agreement about the substance of advanced courses. A course can be called advanced if it builds on other courses. Under this institutional definition, the ordering of topics and skills (introductory to advanced) determines what is advanced, but that ordering is often determined by preference or politics rather than by principle.

I am NOT just talking here about fields whose curricula involve a lot of controversy. Let me give an example. I am currently teaching Applied Statistics (Computer Science 2410). This is parallel in prerequisites and difficulty to the Math department’s course on Mathematical Statistics (MTH 2400). When I started teaching this, I made several assumptions about what my students would know, based on years of experience with the (1970′s to 1990′s) Canadian curriculum. I assumed incorrectly that students would learn very early about the axioms underlying algebra—this was often taught as Math 100 (1st course in the university curriculum). Here, it seems common to find that material in 3rd year. I also assumed incorrectly that my students would be very experienced in the basics of proving theorems. Again mistaken, and to my shock, many CS students will graduate, having taken several required math courses, with minimal skills in formal logic or theorem proof. I’m uncomfortable with these choices (in the “somebody moved my cheese” sense of the word “uncomfortable”)—it doesn’t feel right, but I am confident that these students studied other topics instead, topics that I would consider 3rd-year or 4th-year. Even in math, curriculum design is fluid and topics that some of us consider foundational, others consider advanced.

In a field like ours (testing) that is far more encumbered with controversy, there is a strong argument for humility when discussing what is “foundational” and what is “advanced”.

Institutional Considerations

In my experience, one of the challenges in teaching advanced topics is that many students will sign up who lack basic knowledge and skills, or who expect to use this course as an opportunity to relitigate what they learned in their basic course(s). This is a problem in commercial and university courses, but in my experience, it is much easier to manage in a university because of the strength and visibility of the institutional support.

To make space for advanced courses, institutions that designate a courses as advanced are likely to

  • state and enforce prerequisites (courses that must be taken, or knowledge/skill that must be demonstrated before the student can enrol in the advanced course)
  • accept transfer credit (a course can be designated as equivalent to one of the institution’s courses and serve as a prerequisite for the advanced course)

The designation sets expectations. Typically, this gives instructors room to:

  1. limit class time spent rehashing foundational material
  2. address topics that go beyond the foundational material (whatever material this institution has designated as foundational)
  3. tell students who do not know the foundational material (or who cannot apply it to the content of the advanced course) that it is their responsibility to catch up to the rest of the class, not the course’s responsibility to slow down for them
  4. demand an increased level of individual performance from the students (not just work products on harder topics, but better work products that the student produces with less handholding from the instructor)

Note clearly that in an institution like a university, the decisions about what is foundational, what is advanced, and what prerequisites are required for a particular course are made by groups of instructors, not by the administrators of the institution. This is an idealized model–it is common for institutional administrators to push back, encouraging instructors to minimize the number of prerequisites they demand for any particular course and encouraging instructors to take a broader view of equivalence when evaluating transfer credits. But at its core, the administration adopts structures that support the four benefits that I listed above (and probably others). I think this is the essence of what we mean by “protecting the standards” of the institution.

Skill Development

I think of a skill as a type of knowledge that you can apply (you use it, rather than describe it) and your application (your peformance) improves with deliberate practice.

Students don’t just learn content in courses. They learn how to learn, how to investigate and find/create new ideas or knowledge on their own, how to find and understand the technical material of their field, how to critically evaluate ideas and data, how to communicate what they know, how to work with other students, and so on. Every course teaches some of these to some degree. Some courses are focused on these learning skills.

Competent performance in a professional field involves skills that go beyond the learning skills. For example, skills we must often apply in software testing include:

  • many test design techniques (domain testing, specification-based testing, etc.). Testers get better with these through a combination of theoretical instruction, practice, and critical feedback
  • many operational tasks (setting up test systems, running tests, noting what happened)
  • many advanced communication skills (writing that combines technical, persuasive and financial considerations)

Taxonomies like Bloom’s make the distinction between memorizable knowledge and application (which I’m describing as skill here). Some courses, and some exams, are primarily about memorizable knowledge and some are primarily about application.

In general, in my own teaching, I think of courses that focus on memorizable knowledge as survey courses (broad and shallow). I think of survey courses as foundational rather than advanced.

Most survey courses involve some application. The student learns to apply some of the content. In many cases, the student can’t understand the content without learning to apply it at least to simple cases. (In our field, I think domain testing–boundary and equivalence class analysis–is like this.) It seems to me that courses run on a continuum, how much emphasis on learning things you can remember and describe versus learning ways to apply knowledge more effectively. I think of a course that is primarily a survey course as a survey course, even if it includes some application.

Instructional Style

Lecture courses are probably the easiest to design and the easiest to sell. Commercial and university students seem to prefer courses that involve a high proportion of live lecture.

Lectures are effective for introducing students to a field. They introduce vocabulary (not that students remember much of it–they forget most of what they learn in lecture). They convey attitudes and introduce students to the culture of the field. They can give students the sense that this material is approachable and worth studying. And they entertain.

Lectures are poor vehicles for application of the material (there’s little space for students to try things out, get feedback and try them again).

In my experience, they are usually also poor vehicles for critical thinking (evaluating the material). Some lecturers develop a style that demands critical thinking from the students (think of law schools) but I think this requires very strong cultural support. Students understand, in law school, that they will flunk out if they come to class unprepared and are unwilling or unable to present and defend ideas quickly, in response to questions that might come from a professor at any time. Lawyers view the ability to analyze, articulate and defend in real time as a core skill in their field and so this approach to teaching is considered appropriate. In other fields that don’t prioritize oral argumentation so highly, a professor who relied on this teaching style and demanded high performance from every student, would be treated as unusual and perhaps inappropriate.

As students progress from basic to advanced, the core experiences they need to support further progress also change, from lecture to activities that require them to do more–more applications to increasingly complex tasks, more critical evaluation of what they are doing, what others are doing, and what they are being told to do or to accept as correct or wise. Fewer things are correct. More are better-for-these-reasons or better-for-these-purposes.

Expectations of Student Performance

More advanced courses demand that students take more responsibility for the quality of their work:

  • The students expect, and tolerate, less specific instructions. If they don’t understand the instructions, the students understand that it is their responsibility to ask for clarification or to do other research to fill in the blanks.
  • The students don’t expect (or know they are not likely to get) worked examples that they can model their answers from or rubrics (step-by-step evaluation guides) that they can use to structure their answers. These are examples of scaffolding, instructional support structures to help junior students accomplish new things. They are like the training wheels on bicycles. Eventually, students have to learn to ride without them. Not just how to ride down this street for these three blocks, but how to ride anywhere without them. Losing the scaffolding is painful for many students and some students protest emphatically that it is unfair to take these away. I think the trend in many universities has been to provide more scaffolding for longer. This cuts back on student appeals and seems to please accreditors (university evaluators) but I think this delays students’ maturation in their field (and generally in their education).

One of the puzzles of commercial instruction is how to assess student performance. We often think of assessment in terms of passing or failing a course. However, assessment is more broadly important, for giving a student feedback on how well she knows the material or how well she does a task. There has been so much emphasis on high-stakes assessment (you pass or you fail) in academic instruction that many students don’t understand the concept of formative assessment (assessment primarly done to give the student feedback in order to help the student learn). This is a big issue in university instruction too, but my experience is that commercial students are more likely to be upset and offended when they are given tough tasks and told they didn’t perform well on them. My experience is that they will make more vehement demands for training wheels in the name of fairness, without being willing to accept the idea that they will learn more from harder and less-well-specified tasks.

Things are not so well specified at work. More advanced instruction prepares students more effectively for the uncertainties and demands of real life. I believe that preparation involves putting students into uncertain and demanding situations, helping them accept this as normal, and helping them learn to cope with situations like these more effectively.

Credentialing

Several groups offer credentials in our field. I wrote recently about credentialing in software testing at http://kaner.com/?p=317. My thoughts on that will come in a separate note to WTST participants, and a separate presentation.

Last call for WTST 2014

Sunday, November 24th, 2013

This year’s Workshop on Teaching Software Testing is focused on designing and teaching advanced courses in software testing. It is in sunny Florida, in late January 2014. Right after WTST, we will teach a 5-day pilot of the Domain Testing course. You can apply to attend either one.

We expect the WTST discussion to flow down two paths. At this point, we are not sure which will dominate:

1. What are the characteristics of a genuinely “advanced” testing course?

What are people teaching or designing at this level and what design decisions and assessment decisions are they making? What courses should we be designing?

2. What should the characteristics be for an advanced certification in software testing?

I’ve been criticizing the low bar set by ISTQB’s, QAI’s, and ASQ’s certifications for over 15 years. From about 1996 to (maybe it was) 2003, I worked with several colleagues on ideas for a better certification. As I pointed out recently, those ideas failed. We couldn’t find a cost-effective solution that met our quality standards. I moved on to other challenges, such as creating the BBST series. Some others adopted a more critical posture toward certification in general.

Looking back, I think the same problems that motivated thousands of testers (and employers) to seek a credentialing system for software testers are still with us. The question, I think, is not whether we need a good credentialing system. The question is whether we can get a credentialing system that is good.

From some discussions about advanced course design, I think we are likely to see a discussion of advanced credentialing at WTST. The idea that ties this discussion to WTST is that the credential would be based at least partially on performance in advanced courses.

I don’t know whether this discussion will go very far, whether it will be a big part of the meeting itself or just the after-meeting dinners, or whether anyone will come to any agreements. But if you are interested in participating in a constructive discussion about a very hard problem, this might be a good meeting.

To apply to come to WTST, please send me a note (kaner@cs.fit.edu).

For more information about WTST, see http://wtst.org/. For more on the first pilot teaching of the Domain Testing course, which we will teach immediately following WTST, see http://bbst.info.

The “Failure” of Udacity

Saturday, November 23rd, 2013

If you are not aware of it, Udacity is a huge provider of a type of online courses called MOOCs (Massive Open Online Courses). Recently, a founder of Udacity announced that he was disappointed in Udacity’s educational results and was shifting gears from general education to corporate training.

I was brought into some discussions of this among academics and students. A friend suggested that I slightly revise one of my emails for general readership on my blog. So here goes.

My note is specifically a reaction to two articles:

Udacity offers free or cheap courses. My understanding is that it has a completion rate of 10% (only 10% of the students who start, finish) and a pass rate of 5%. This is not a surprising number. Before there were MOOCs, numbers like this were reported for other types of online education in which students set their own pace or worked with little direct interaction with the instructor. For example, I heard that Open University (a school for which I have a lot of respect) had numbers like this.

I am not sure that 10% (or 5%) is a bad rate. If the result is that thousands of people get opportunities that they would otherwise not have had, that’s an important benefit—even if only 5% find the time to make full use of those opportunities.

In general, I’m a fan of open education. When I interviewed for a professorship at Florida Tech in 1999, I presented my goal of creating open courseware for software testing (and software engineering education generally). NSF funded this in 2001. The result has been the BBST course series, used around the world in commercial and academic courses.

Software testing is a great example of the need for courses and courseware that don’t fit within the traditional university stream. I don’t believe that we will see good undergraduate degree programs in software testing. Instead, advanced testing-specific education will come from training companies and professional societies, perhaps under the supervision/guidance of some nonprofits formed for this purpose, either in universities (like my group, the Center for Software Testing Education & Research) or in the commercial space (like ISTQB). As I wrote in a recent post, I believe we have to develop a better credentialing system for software testing.

We are going to talk about this in the Workshop on Teaching Software Testing (WTST 13, January 24-26, 2014). The workshop is focused on Teaching Advanced Courses in Software Testing. It seems clear from preparatory discussions that this topic will be a springboard for discussions of advanced credentials.

Back to the MOOCs.

Udacity (and others) have earned some ill-will in the instructional community. There have been several types of irritants, such as:

  • Some advocates of MOOCs have pushed the idea that MOOCs will eliminate most teaching positions. After all, if you can get a course from one of the world’s best teachers, why settle for second best? The problem with this is that it assumes that teaching = lectures. For most students, this is not true. Students learn by doing things and getting feedback. By writing essays and getting feedback. By writing code and getting feedback. By designing tests and getting feedback. The student activities—running them, coaching students through them, critiquing student work, suggesting follow-up activities for individuals to try next—do not easily scale. I spent about 15 hours this week in face-to-face meetings with individual students, coaching them on statistical analysis or software testing. Next week I will spend about 15 hours in face-to-face meetings with local students or Skype sessions with online students. This is hard work for me, but my students tell me they learn a lot from this. When people dismiss the enormous work that good teachers spend creating and supporting feedback loops for their students—especially when people who stand to make money from convincing customers and investors that this work is irrelevant—those teachers sometimes get annoyed.
  • Some advocates of MOOCs, and several politicians and news columnists, have pushed the idea that this type of education can replace university education. After all, if you can educate a million students at the same time (with one of the world’s best teachers, no less), why bother going to a brick-and-mortar institution? It is this argument that fails when 95% of the students flunk out or drop out. But I think it fails worse when you consider what these students are learning. How hard are the tests they are taking or the assignments they are submitting? How carefully graded is the work—not just how accurate is the grading, though that can certainly be a big issue with computerized grading—but also, how informative is the feedback from grading? Students pay attention to what you tell them about their work. They learn a lot from that, if you give them something to learn from. My impression is that many of the tests/exams are superficial and that much of the feedback is limited and mechanical. When university teachers give this quality of feedback, students complain. They know they should get better than that at school.
  • Proponents of MOOCs typically ignore or dismiss the social nature of education. Students learn a lot from each other. Back when I paid attention to the instructional-research literature, I used to read studies that reported graduating students saying they learned more from each other than from the professors. There are discussion forums in many (most? all?) MOOCs, but from what I’ve seen and been told by others, these are rarely or never well moderated. A skilled instructor keeps forum discussions on track, moves off-topic posts to another forum, asks questions, challenges weak answers, suggests readings and follow-up activities. I haven’t seen or heard of that in the MOOCs.

As far as I can tell, in the typical MOOC course, students get lectures that may have been fantastically expensive to create, but they get little engagement in the course beyond the lectures. They are getting essentially-unsupervised online instruction. And that “instruction” seems to be a technologically-fancier way of reading a book. A fixed set of material flows from the source (the book or the video lecture) to the student. There are cheaper, simpler, and faster ways to read a book.

My original vision for the BBST series was much like this. But by 2006, I had abandoned the idea of essentially-unsupervised online instruction and started working on the next generation of BBST, which would require much more teacher-with-student and student-with-student engagement.

There has been relentless (and well-funded) hype and political pressure to drive universities to offer credit for courses completed on Udacity and platforms like it. Some schools have succumbed to the pressure.

The political pressure on universities that arises from this model has been to push us to lower standards:

  • lower standards of interaction (students can be nameless cattle herded into courses where no one will pay attention to you)
  • lower standards of knowledge expectation (trivial, superficial machine grading of the kind that can scale to a mass audience)
  • lower standards of instructional design (good design starts from considering what students should learn and how to shepherd them through experiences that will help them achieve those learning objectives. Lecture plans are not instructional design, even if the lectures are well-funded, entertaining and glitzy.)

Online instruction doesn’t have to be simplistic, but when all that the public see in the press is well-funded hype that pushes technoglitz over instructional quality, people compare what they see with what is repeated uncritically as if it was news.

The face-to-face model of instruction doesn’t scale well enough to meet America’s (or the world’s) socioeconomic needs. We need new models. I believe that online instruction has the potential to be the platform on which we can develop the new models. But the commoditizing of the instructor and the cattle-herding of the students that have been offered by the likes of Udacity are almost certainly not the answer.

Quality – which I measure by how much students learn – costs money. Personal interaction between students and instructors, significant assignments that get carefully graded and detailed feedback – costs money. It is easy to hire cheap assistants or unqualified adjuncts but it takes more than a warm body to provide high quality feedback. (There are qualified adjuncts, but the law of supply and demand has an effect when adjunct pay is low.)

The real cost of education is not the money. Yes, that is hugely significant. But it is as nothing compared to the years of life that students sacrifice to get an education. The cost of time wasted is irrecoverable.

In the academic world, there are some excellent online courses and there has been a lot of research on instructional effectiveness in these courses. Many online courses are more effective—students learn more—than face-to-face courses that cover the same material. But these are also more intense, for the teacher and the students. The students, and their teachers, work harder.

Becky Fiedler and I formed Kaner Fiedler Associates to support the next generation of BBST courses. We started the BBST effort with a MOOC-like vision of a structure that offers something for almost nothing. Our understanding evolved as we created generations of open courseware.

I think we can create high-quality online education that costs less than traditional schooling. I think we can improve the ways institutions recognize students’ preexisting knowledge, reducing the cost (but not the quality) of credentials. But cost-reducing and value-improvement does not mean “free” or even “cheap.” The price has to be high enough to sustain the course development, the course maintenance, and the costs of training, providing and supervising good instructors. There is, as far as we can tell, no good substitute for this.

Credentialing in Software Testing: Elaborating on my STPCon Keynote

Thursday, May 9th, 2013

A couple of weeks ago, I talked about the state of software testing education (and software testing certification) in the keynote panel at STPCon. My comments on high-volume test automation and qualitative methods were more widely noticed, but I think the educational issues are more significant.

Here is a summary:

  1. The North American educational systems are in a state of transition.
  2. We might see a decoupling of formal instruction from credentialing.
  3. We are likely to see a dispersion of credentialing—-more organizations will issue more diverse credentials.
  4. Industrial credentials are likely to play a more significant role in the American economy (and probably have an increased or continued-high influence in many other places).

If these four predictions are accurate, then we have thinking to do about the kinds of credentialing available to software testers.

Transition

For much of the American population, the traditional university model is financially unsustainable. We are on the verge of a national credit crisis because of the immensity of student loan debt.

As a society, we are experimenting with a diverse set of instructional systems, including:

  • MOOCs (massive open online courses)
  • Traditionally-structured online courses with an enormous diversity of standards
  • Low-cost face-to-face courses (e.g. community colleges)
  • Industrial courses that are accepted for university credit
  • Traditional face-to-face courses

Across these, we see the full range from easy to hard, from no engagement with the instructor to intense personal engagement, from little student activity and little meaningful feedback to lots of both. There is huge diversity of standards between course structures and institutions and significant diversity within institutions.

  • Many courses are essentially self-study. Students learn from a book or a lecturer but they get no significant assignments, feedback or assessments. Many people can learn some topics this way. Some people can learn many topics this way. For most people, this isn’t a complete solution, but it could be a partial one.
  • Some of my students prosper most when I give them free rein, friendly feedback and low risk. In an environment that is supportive, provides personalized feedback by a human, but is not demanding, some students will take advantage of the flexibility by doing nothing, some students will get lost, and some students will do their best work.
  • The students who don’t do well in a low-demand situation often do better in a higher-demand course, and in my experience, many students need both—-flexibility in fields that capture their imagination and structure/demand in fields that are less engrossing or that a little farther beyond the student’s current knowledge/ability than she can comfortably stretch to.

There is increasing (enormous) political pressure to allow students to take really-inexpensive MOOCs and get course credit for these at more expensive universities. More generally, there is increasing pressure to allow students to transfer courses across institutions. Most universities allow students to transfer in a few courses, but they impose limits in order to ensure that they transfer their culture to their students and to protect their standards. However, I suspect strongly that the traditional limits are about to collapse. The traditional model is financially unsustainable and so, somewhere, somehow, it has to crack. We will see a few reputable universities pressured (or legislated) into accepting many more credits. Once a few do it, others will follow.

In a situation like this, schools will have to find some other way to preserve their standards—-their reputations, and thus the value of their degree for their graduates.

Seems likely to me that some schools will start offering degrees based on students’ performance on exit exams.

  • A high-standards institution might give a long and complex set of exams. Imagine paying $15,000 to take the exam series (and get grades and feedback) and another $15,000 if you pass, to get the degree.
  • At the other extreme, an institution might offer a suite of multiple-guess exams that can be machine-graded at a much lower cost.

The credibility of the degree would depend on the reputation of the exam (determined by “standards” combined with a bunch of marketing).

Once this system got working, we might see students take a series of courses (from a diverse collection of providers) and then take several degrees.

Maybe things won’t happen this way. But the traditional system is financially unsustainable. Something will have to change, and not just a little.

Decoupling Instruction from Credentialing

The vision above reflects a complete decoupling of instruction from credentialing. It might not be this extreme, but any level of decoupling creates new credentialing pressures / opportunities in industrial settings.

Instruction

Instruction consists of the courses, the coaching, the internships, and any other activities the students engage in to learn.

Credentialing

Credentials are independently-verifiable evidence that a person has some attribute, such as a skill, a type of knowledge, or a privilege.

There are several types of credentials:

  • A certification attests to some level of competency or privilege. For example,
    • A license to practice law, or to do plumbing, is a certification.
    • An organization might certify a person as competent to repair their equipment.
    • An organization might certify that, in their opinion, a person is competent to practice a profession.
  • A certificate attests that someone completed an activity
    • A certificate of completion of a course is a certificate
    • A university degree is a certificate
  • There are also formal recognitions (I’m sure there’s a better name for this…)
    • Awards from professional societies are recognitions
    • Granting someone an advanced type of membership (Senior Member or Fellow) in a professional society is a recognition
    • Election to some organizations (such as the American Law Institute or the Royal Academy of Science) is a recognition
    • I think I would class medals in this group
  • There are peer recognitions
    • Think of the nice things people say about you on Linked-In or Entaggle
  • There are workproducts or results of work that are seen as honors
    • You have published X many publications
    • You worked on the development team for X

The primary credentials issued by universities are certificates (degrees). Sometimes, those are also certifications.

Dispersion of Credentialing

Anyone can issue a credential. However, the prestige, credibility, and power of credentials vary enormously.

  • If you need a specific credential to practice a profession, then no matter who endorses some other credential, or how nicely named that other credential is, it still won’t entitle you to practice that profession.
  • Advertising that you have a specific credential might make you seem more prestigious to some people and less prestigious to other people.

It is already the case that university degrees vary enormously in meaning and prestige. As schools further decouple instruction from degrees, I suspect that this variation will be taken even more seriously. Students of mine from Asia, and some consultants, tell me this is already the case in some Asian countries. Because of the enormous variation in quality among universities, and the large number of universities, a professional certificate or certification is often taken more seriously than a degree from a university that an employer does not know and respect.

Industrial Credentials

How does this relate to software testing? Well, if my analysis is correct (and it might well not be), then we’ll see an increase in the importance and value of credentialing by private organizations (companies, rather than universities).

I don’t believe that we’ll see a universally-accepted credential for software testers. The field is too diverse and the divisions in the field are too deep.

I hope we’ll see several credentialing systems that operate in parallel, reflecting different visions of what people should know, what they should believe, what they should be able to do, what agreements they are willing to make (and be bound by) in terms of professional ethics, and what methods of assessing these things are appropriate and in what depth.

Rather than seeing these as mutually-exclusive competing standards, I imagine that some people will choose to obtain several credentials.

A Few Comments On Our Current State

Software Testing has several types of credentials today. Here are notes on a few. I am intentionally skipping several that feel (to me) redundant with these or about which I have nothing useful to say. My goal is to trigger thoughts, not survey the field.

ISTQB

ISTQB is currently the leading provider of testing certifications in the world. ISTQB is the front end of a community that creates and sells courseware, courses, exams and credentials that align with their vision of the software testing field and the role of education within it. I am not personally fond of the Body of Knowledge that ISTQB bases its exams on. Nor am I fond of their approach to examinations (standardized tests that, to my eyes, emphasize memorization over comprehension and skill). I think they should call their credentials certificates rather than certifications. And my opinion of their marketing efforts is that they are probably not legally actionable, but I think they are misleading. (Apart from those minor flaws, I think ISTQB’s leadership includes many nice people.)

It seems to me that the right way to deal with ISTQB is to treat them as a participant in a marketplace. They sell what they sell. The best way to beat it is to sell something better. Some people are surprised to hear me say that because I have published plenty of criticisms of ISTQB. I think there is lots to criticize. But at some point, adding more criticism is just waste. Or worse, distraction. People are buying ISTQB credentials because they perceive a need. Their perception is often legitimate. If ISTQB is the best credential available to fill their need, they’ll buy it. So, to ISTQB’s critics, I offer this suggestion.

Industrial credentialing will probably get more important, not less important, over the next 20 years. Rather than wasting everyone’s time whining about the shortcomings of current credentials, do the work needed to create a viable alternative.

Before ending my comments on ISTQB, let me note some personal history.

Before ASTQB (American ISTQB) formed, a group of senior people in the community invited me into a series of meetings focused on creating a training-and-credentialing business in the United States. This was a private meeting, so I’m not going to say who sponsored it. The discussion revolved around a goal of providing one or more certification-like credentials for software testers that would be (this is my summary-list, not theirs, but I think it reflects their goals):

  • reasonably attainable (people could affort to get the credential, and reasonably smart people who worked hard could earn it),
  • credible (intellectually and professionally supported by senior people in the field who have earned good reputations),
  • scalable (it is feasible to build an infrastructure to provide the relevant training and assessment to many people), and
  • commercially viable (sufficient income to support instructors, maintainers of the courseware and associated documentation, assessors (such as graders of the students and evaluators of the courses), some level of marketing (because a credential that no one knows about isn’t worth much), and in the case of this group, money left over for profit. Note that many dimensions of “commercial viability” come into play even if there is absolutely no profit motive—-the effort has to support itself, somehow).

I think these are reasonable requirements for a strong credential of this kind.

By this point, ISEB (the precursor to ISTQB) had achieved significant commercial success and gained wide acceptance. It was on people’s minds, but the committee gave me plenty of time to speak:

  • I talked about multiple-choice exams and why I didn’t like them.
  • I talked about the desirability of skill-based exams like Cisco’s, and the challenges of creating courses to support preparation for those types of exams.
  • I talked about some of the thinking that some of us had done on how to create a skill-based cert for testers, especially back when we were writing Lessons Learned.

But there was a problem in this. My pals and I had lots of scattered ideas about how to create the kind of certification system that we would like, but we had never figured out how to make it practical. The ideas that I thought were really good were unscalable or too expensive. And we knew it. If you ask today why there is no certification for context-driven testing, you might hear a lot of reasons, including principled-sounding attacks on the whole notion of certification. But back then, the only reason we didn’t have a context-driven certification was that we had no idea how to create one that we could believe in.

So, what I could not provide to the committee was a reasonably attainable, credible, scalable, commercially viable system—-or a plan to create one.

The committee, quite reasonably, chose to seek a practical path toward a credential that they could actually create. I left the committee. I was not party to their later discussions, but I was not surprised that ASTQB formed and some of these folks chose to work with it. I have never forgotten that they gave me every chance to propose an alternative and I did not have a practical alternative to propose.

(Not long after that, I started an alternative project, Open Certification, to see if we could implement some of my ideas. We did a lot of work in that project, but it failed. They really weren’t practical. We learned a lot, which in turn helped me create great courseware—-BBST—-and other ideas about certification that I might talk about more in the future. But the point that I am trying to emphasize here is that the people who founded ASTQB were open to better ideas, but they didn’t get them. I don’t see a reason to be outraged against them for that.)

The Old Boys’ Club

To some degree, your advancement in a profession is not based on what you know. It’s based on who you know and how much they like you.

We have several systems that record who likes like you, including commercial ones (LinkedIn), noncommercial ones (Entaggle), and various types of marketing structures created by individuals or businesses.

There are advantages and disadvantages to systems based on whether the “right” people like you. Networking will never go away, and never should, but it seems to me that

Credentials based on what you know, what you can do, or what you have actually done are a lot more egalitarian than those based on who says they respect you.

I value personal references and referrals, but I think that reliance on these as our main credentialing system is a sure path to cronyism and an enemy of independent thinking.

My impression is that some people in the community have become big fans of reputation-systems as the field’s primary source of credentials. In at least some of the specific cases, I think the individuals would have liked the system a whole lot less when they were less influential.

Miagi-do

I’ve been delighted to see that the Miagi-do school has finally come public.

Michael Larsen states a key view succinctly:

I distrust any certification or course of study that doesn’t, in some way, actually have a tester demonstrate their skills, or have a chance to defend their reasoning or rationale behind those skills.

In terms of the four criteria that I mentioned above, I think this approach is probably reasonably attainable, and to me, it is definitely credible. Whether it scalable and commercially viable has yet to be seen.

I think this is a clear and important alternative to ISTQB-style credentialing. I hope it is successful.

Other Ideas on the Horizon

There are other ideas on the horizon. I’m aware of a few of them and there are undoubtedly many others.

It is easy to criticize any specific credentialing system. All of them, now known or coming soon, have flaws.

What I am suggesting here is:

  • Industrial credentialing is likely to get more important whether you like it or not.
  • If you don’t like the current options, complaining won’t do much good. If you want to improve things, create something better.


This post is partially based on work supported by NSF research grant CCLI-0717613 ―Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing. Any opinions, findings and conclusions or recommendations expressed in this post are those of the author and do not necessarily reflect the views of the National Science Foundation.

A new brand of snake oil for software testing

Wednesday, May 19th, 2010

I taught a course last term on Quantitative Investment Modeling in Software Engineering to a mix of undergrad and grad students of computer science, operations research and business. We had a great time, we learned a lot about the market, about modeling, and about automated exploratory testing (more on this type of exploratory testing at this year’s Conference of the Association for Software Testing…)

In the typical undergraduate science curriculum, most of the experimental design we teach to undergraduates is statistical. Given a clearly formulated hypothesis and a reasonably clearly understood oracle, we learn how to design experiments that control for confounding variables, so that we can decide whether our experimental effect was statistically significant. We also teach some instrumentation, but in most cases, the students learn how to use well-understood instruments as opposed to how to appraise, design, develop, calibrate and then apply them.

Our course was not so traditionally structured. In our course, each student had to propose and evaluate an investment strategy. We started with a lot of bad ideas. (Most small investors lose money. One of the attributes of our oracle is, “If it causes you to lose money, it’s probably a bad idea.”) We wanted to develop and demonstrate good ideas instead. We played with tools (some worked better than others) and wrote code to evolve our analytical capabilities, studied some qualitative research methods (hypothesis-formation is a highly qualitative task), ran pilot studies, and then eventually got to the formal-research stages that the typical lab courses start at.

Not surprisingly, the basics of designing a research program took about 1/3 of the course. With another course, I probably could have trained these students to be moderately-skilled EVALUATORS of research articles. (It is common in several fields to see this as a full-semester course in a doctoral program.)

Sadly, few CS doctoral programs (and even fewer undergrad programs) offer courses in the development or evaluation of research, or if they offer them, they don’t require them.

The widespread gap between having a little experience replicating other people’s experiments and seeing some work on a lab, on the one hand, and learning to do and evaluate research on the other hand — this gap is the home court for truthiness. In the world of truthiness, it doesn’t matter whether the evidence in support of an absurd assertion is any good, as long as we can make it look to enough people as though good enough evidence exists. Respectable-looking research from apparently-well-credentialed people is hard for someone to dispute if, as most people in our field, one lacks training in critical evaluation of research.

The new brand of snake oil is “evidence-based” X, such as “evidence-based” methods of instruction or in a recent proposal, evidence-based software testing. Maybe I’m mistaken in my hunch about what this is about, but the tone of the abstract (and what I’ve perceived in my past personal interactions with the speaker) raise some concerns.

Jon Bach addresses the tone directly. You’ll have to form your own personal assessments of the speaker. But I agree with Jon that this does not sound merely like advocacy of applying empirical research methods to help us improve the practice of testing, an idea that I rather like. Instead, the wording  suggests a power play that seems to me to have less to do with research and more to do with the next generation of ISTQB marketing.

So let me talk here about this new brand of snake oil (“Evidence-Based!”), whether it is meant this way by this speaker or not.

The “evidence-based” game is an interesting one to play when most of the people in a community have limited training in research methods or research evaluation. This game has been recently fashionable in American education. In that context, I think it has been of greatest benefit to people who make money selling mediocritization. It’s not clear to me that this movement has added one iota of value to the quality of education in the United States.

In principle, I see 5 problems (or benefits, depending on your point of view). I say, “in principle” because of course, I have no insight into the personal motives and private ideas of Dr. Reid or his colleagues. I am raising a theoretical objection. Whether it is directly applicable to Dr. Reid and ISTQB is something you will have to decide yourself, and these comments are not sufficient to lead you to a conclusion.

  1. It is easy to promote forced results from worthless research when your audience has limited (or no) training in research methods, instrumentation, or evaluation of published research. And if someone criticizes the details of your methods, you can dismiss their criticisms as quibbling or theoretical. Too many people in the audience will be stuck making their decision about the merits of the objection on the personal persuasiveness of the speakers (which snake oil salesmen excel at) rather than on the underlying merits of the research.
  2. When one side has a lot of money (such as, perhaps, proceeds from a certification business), and a plan to use “research” results as a sales tool to make a lot more money, they can invest in “research” that yields promotable results. The work doesn’t have to be competent (see #1). It just has to support a conclusion that fits with the sales pitch.
  3. When the other side doesn’t have a lot of money, when the other side are mainly practitioners (not much time or training to do the research), and when competent research costs a great deal more than trash (see #2 and #5), the debates are likely to be one-sided. One side has “evidence” and if the other side objects, well, if they think the “evidence” is so bad,  they should raise a bunch of money and donate a bunch of time to prove it. It’s an opportunity for well-funded con artists to take control of the (apparent) high road. They can spew impressive-looking trash at a rate that cannot possibly be countered by their critics.
  4. It is easy for someone to do “research” as a basis for rebranding and reselling someone else’s ideas. Thus, someone who has never had an original thought in his life can be promoted as the “leading expert” on X by publishing a few superficial studies of it.  A certain amount of this goes on already in our field, but largely as idiosyncratic misbehavior by individuals. There is a larger threat. If a training organization will make more money (influence more standards, get its products mandated by more suckers) if its products and services have the support of “the experts”, but many of “the experts” are inconveniently critical, there is great marketing value in a vehicle for papering over the old experts with new-improved experts who have done impressive-looking research that gives “evidence-based” backing to whatever the training organization is selling. Over time, of course, this kind of plagiarism kills innovation by bankrupting the innovators. For companies that see innovation as a threat, however, that’s a benefit, not a problem. (For readers who are wondering whether I am making a specific allegation about any person or organization, I am not. This is merely a hypothetical risk in an academic’s long list of hypothetical risks, for you to think about  in your spare time.)
  5. In education, we face a classic qualitative-versus-quantitative tradeoff. We can easily measure how many questions someone gets right or wrong on simplistic tests. We can’t so easily measure how deep an understanding someone has of a set of related concepts or how well they can apply them. The deeper knowledge is usually what we want to achieve, but it takes much more time and much more money and much more research planning to measure it. So instead, we often substitute the simplistic metrics for the qualitative studies. Sadly, when we drive our programs by those simplistic metrics, we optimize to them and we gradually teach to the superficial and abandon the depth. Many of us in the teaching community in the United States believe that over the past few years, this has had a serious negative impact on the quality of the public educational system and that this poses a grave threat to our long-term national competitiveness.

Most computer science programs treat system-level software testing as unfit for the classroom.

I think that software testing can have great value, that it can be very important, and that a good curriculum should have an emphasis on skilled software testing. But the popular mix of ritual, tedium, and moralizing that has been passed off by some people as testing for decades has little to offer our field, and even less for university instruction. I think ISTQB has been masterful at selling that mix. It is easy to learn and easy to certify. I’m sure that a new emphasis, “New! Improved! Now with Evidence!” could market the mix even better. Just as worthless, but with even better packaging.

What is context-driven testing?

Saturday, January 3rd, 2009

James, Bret and I published our definition of context-driven testing at http://www.context-driven-testing.com/.

Some people have found the definition too complex and have tried to simplify it, attempting to equate the approach with Agile development or Agile  testing, or with the exploratory style of software testing. Here’s another crack at a definition:

Context-driven testers choose their testing objectives, techniques, and deliverables (including test documentation) by looking first to the details of the specific situation, including the desires of the stakeholders who commissioned the testing. The essence of context-driven testing is project-appropriate application of skill and judgment. The Context-Driven School of testing places this approach to testing within a humanistic social and ethical framework.

Ultimately, context-driven testing is about doing the best we can with what we get. Rather than trying to apply “best practices,” we accept that very different practices (even different definitions of common testing terms) will work best under different circumstances.

Contrasting context-driven with context-aware testing.

Many testers think of their approach as context-driven because they take contextual factors into account as they do their work. Here are a few examples that might illustrate the differences between context-driven and context-aware:

  • Context-driven testers reject the notion of best practices, because they present certain practices as appropriate independent of context. Of course it is widely accepted that any “best practice” might be inapplicable under some circumstances. However, when someone looks to best practices first and to project-specific factors second, that may be context-aware, but not context-driven.
  • Similarly, some people create standards, like IEEE Standard 829 for test documentation, because they think that it is useful to have a standard to lay out what is generally the right thing to do. This is not unusual, nor disreputable, but it is not context-driven. Standard 829 starts with a vision of good documentation and encourages the tester to modify what is created based on the needs of the stakeholders. Context-driven testing starts with the requirements of the stakeholders and the practical constraints and opportunities of the project. To the context-driven tester, the standard provides implementation-level suggestions rather than prescriptions.

Contrasting context-driven with context-oblivious, context-specific, and context-imperial testing.

To say “context-driven” is to distinguish our approach to testing from context-oblivious, context-specific, or context-imperial approaches:

  • Context-oblivious testing is done without a thought for the match between testing practices and testing problems. This is common among testers who are just learning the craft, or are merely copying what they’ve seen other testers do.
  • Context-specific testing applies an approach that is optimized for a specific setting or problem, without room for adjustment in the event that the context changes. This is common in organizations with longstanding projects and teams, wherein the testers may not have worked in more than one organization. For example, one test group might develop expertise with military software, another group with games. In the specific situation, a context-specific tester and a context-driven tester might test their software in exactly the same way. However, the context-specific tester knows only how to work within her or his one development context (MilSpec) (or games), and s/he is not aware of the degree to which skilled testing will be different across contexts.
  • Context-imperial testing insists on changing the project or the business in order to fit the testers’ own standardized concept of “best” or “professional” practice, instead of designing or adapting practices to fit the project. The context-imperial approach is common among consultants who know testing primarily from reading books, or whose practical experience was context-specific, or who are trying to appeal to a market that believes its approach to development is the one true way.

Contrasting context-driven with agile testing.

Agile development models advocate for a customer-responsive, waste-minimizing, humanistic approach to software development and so does context-driven testing. However, context-driven testing is not inherently part of the Agile development movement.

  • For example, Agile development generally advocates for extensive use of unit tests. Context-driven testers will modify how they test if they know that unit testing was done well. Many (probably most) context-driven testers will recommend unit testing as a way to make later system testing much more efficient. However, if the development team doesn’t create reusable test suites, the context-driven tester will suggest testing approaches that don’t expect or rely on successful unit tests.
  • Similarly, Agile developers often recommend an evolutionary or spiral life cycle model with minimal documentation that is developed as needed. Many (perhaps most) context-driven testers would be particularly comfortable working within this life cycle, but it is no less context-driven to create extensively-documented tests within a waterfall project that creates big documentation up front.

Ultimately, context-driven testing is about doing the best we can with what we get. There might not be such a thing as Agile Testing (in the sense used by the agile development community) in the absence of effective unit testing, but there can certainly be context-driven testing.

Contrasting context-driven with standards-driven testing.

Some testers advocate favored life-cycle models, favored organizational models, or favored artifacts. Consider for example, the V-model, the mutually suspicious separation between programming and testing groups, and the demand that all code delivered to testers come with detailed specifications.

Context-driven testing has no room for this advocacy. Testers get what they get, and skilled context-driven testers must know how to cope with what comes their way. Of course, we can and should explain tradeoffs to people, make it clear what makes us more efficient and more effective, but ultimately, we see testing as a service to stakeholders who make the broader project management decisions.

  • Yes, of course, some demands are unreasonable and we should refuse them, such as demands that the tester falsify records, make false claims about the product or the testing, or work unreasonable hours. But this doesn’t mean that every stakeholder request is unreasonable, even some that we don’t like.
  • And yes, of course, some demands are absurd because they call for the impossible, such as assessing conformance of a product with contractually-specified characteristics without access to the contract or its specifications. But this doesn’t mean that every stakeholder request that we don’t like is absurd, or impossible.
  • And yes, of course, if our task is to assess conformance of the product with its specification, we need a specification. But that doesn’t mean we always need specifications or that it is always appropriate (or even usually appropriate) for us to insist on receiving them.

There are always constraints. Some of them are practical, others ethical. But within those constraints, we start from the project’s needs, not from our process preferences.

Context-driven techniques?

Context-driven testing is an approach, not a technique. Our task is to do the best testing we can under the circumstances–the more techniques we know, the more options we have available when considering how to cope with a new situation.

The set of techniques–or better put, the body of knowledge–that we need is not just a testing set. In this, we follow in Gerry Weinberg’s footsteps:  Start to finish, we see a software development project as a creative, complex human activity. To know how to serve the project well, we have to understand the project, its stakeholders, and their interests. Many of our core skills come from psychology, economics, ethnography, and the other socials sciences.

Closing notes

Reasonable people can advocate for standards-driven testing. Or for the idea that testing activities should be routinized to the extent that they can be delegated to less expensive and less skilled people who apply the routine directions. Or for the idea that the biggest return on investment today lies in improving those testing practices intimately tied to writing the code. These are all widely espoused views. However, even if their proponents emphasize the need to tailor these views to the specific situation, these views reflect fundamentally different starting points from context-driven testing.

Cem Kaner, J.D., Ph.D.
James Bach