I’m going through my detailed review of SWEBOK, in preparation for the June 30 comment deadline. The bulk of this blog entry is a page-by-page commentary / critique that I will submit to the SWEBOK review. Before that, here are some contextual comments.
Please get involved in this review process, which will close on June 30. Go to www.swebok.org to sign up and download swebok, and submit comments.
Time is short, and you might not be able to read all of SWEBOK in time to submit detailed comments. That’s OK. I recommend that you download it, skim the parts that are most interesting, realize the extent to which it excludes modern methods (such as agile development) and, if this bothers you, you can submit a very simple comment.
You can say something like:
“I have reviewed SWEBOK. I manage software development staff
and play a role in their training and supervision. SWEBOK does not
provide a good basis for the structure or detail of the knowledge
that I want my staff to have. It emphasizes attitudes and practices
that are not helpful on my projects and it downplays or skips
attitudes and practices that I consider essential. I consider this
document fundamentally flawed, and if I could vote to disapprove it,
I would.”
Obviously, you would tailor this to your circumstances.
=======
Overall Concerns with SWEBOK
=======
SWEBOK was created using a strange process. They started with the table of contents of the main software engineering textbooks — as if there is a strong relationship between software engineering as described in textbooks and software engineering as practiced in the field. From there, SWEBOK developed as delta’s from these books. SWEBOK is focused on “established traditional practices recommended by many organizations” and is intended to exclude “practices used only for certain types of software” and to exclude “innovative practices tested and used by some organizations and concepts still being developed and testing in research organizations.”
Somehow, we conclude that mutation testing is an established traditional practice that is widely recommended and used, but we exclude scenario testing. We conclude that massive tombs of test documentation are an established traditional practice widely followed, even though rants about bad test documentation are, to say the least, a common theme of comment in the community. And we exclude consideration of requirements analytical techniques (or project context considerations) that might help you make a sensible engineering determination of what types of documentation, at what level of depth, for what target reader, are worth the expense of creating and (possibly) maintaining them.
In the SWEBOK, page IX, we learn that the purpose of SWEBOK is to provide a “consensually-validated characterization.” In this, SWEBOK has failed utterly. Only a few people (about 500) were involved in the project. It alienated leading people, such as Grady Booch who recently said (in a post to the extremeprogramming listserv on yahoogroups, dated 5/31/2003)
“I was one of those 500 earlier reviewers – and
my comments were entirely negative. The SWEBOK
I reviewed was well-intentioned but misguided,
naive, incoherent, and just flat wrong in so
many dimensions.”
The Association for Computing Machinery was a co-authoring, co-sponsoring organization of SWEBOK at one point. But ACM eventually commissioned task forces to study the document and the rationale underlying the effort, and the result was a deeply critical evaluation and ACM’s withdrawal from the project.
ACM is the largest association of computing professionals in the world. How can it be said, with a straight face, that SWEBOK is a consensually-validated document when the ACM, including leaders of the ACM Special Interest Group in Software Engineering, determine that the approach to creating the document and the result are fundamentally flawed? See http://www.acm.org/serving/se_policy/ for details.
The SWEBOK response (front page of www.swebok.org) was this:
“The following motion was unanimously adopted
on April 18 2001.
“The Industrial Advisory Board of the Guide
to the Software Engineering Body of Knowledge
(SWEBOK) project recognizes that due process
was followed in the development of the Guide
(Trial Version) and endorses the position that
the Guide (Trial Version) is ready for field
trials for a period of two years.”
I love this phrasing. “Due process” has a fine, legalistic, officious ring to it. It sounds good, and (speaking as an attorney who has experience using lawyerly terms like “due process”) it will intimidate or silence some critics. But if your acceptance criterion is consensus, and you have obviously failed to achieve consensus, then a term like “due process” is just so much smoke to confuse the issue. If the process fails to produce the required product, the fact (if it is a fact) that the process was followed doesn’t make the failure a non-failure.
==========
Detailed Evaluation Comments
==========
Here are my page-by-page comments on the testing section of SWEBOK. I have reviewed other parts of SWEBOK and have concerns about them too, but life is short and precious and there is only so much of mine that I am willing to dedicate to a criticism of a fundamentally flawed piece of work.
===========
Page 69. The document praises the role of testing as a preventative technique throughout the lifecycle, but doesn’t consider test-driven development, which I believe is the single most important type of early testing.
============
Page 69. The document defines software testing as follows: “Software testing consists of the dynamic verification of the behavior of a program on a finite series of test cases, suitably selected from the usually infinite executions [sic] domain, against the specified expected behavior.”
In fact, a great deal of testing is done without specifying expected behavior. Here are three examples:
(1) Exploratory testing is done partially to discover the behavior.
(2) Some types of high volume random testing check for indicators of failure without having any model of expected behavior. (It would be ludicrous to say that their model of the expected behavior is that the program will not have memory leaks, stack corruption or other specific defects.)
(3) Most forms of user testing fail to involve comparison to specified behavior, and the user who protests that a certain behavior in a certain context is inappropriate, confusing or unacceptable, might well not be able to articulate her expectations, even after the failure, let along specify them in advance. (In many cases, expectation is driven by similarity to other experiences and we know from research in cognitive psychology, e.g. from Lee Brooks’ lab at McMaster, that many people would be unable to describe the similarity space that is the basis for their judgments.)
These types of test are widely used by testers, and they have been widely used for decades. Good testing sometimes involves comparison to specified expected behavior, but it often does not.
=============
Page 70. The document provides a laundry list of test techniques with no obvious selection or exclusion principle.
One of the oddities on page 70 is the assertion that “branch coverage is a popular test technique.” HUH? What makes this a technique? You achieve branch coverage by running any group of tests that take the program through each branch. We could achieve this test objective (achieve a certain level of coverage) via scenario tests, domain tests, various other types of tests. We could achieve the objective by running tests at the unit level or the fully integrated system level. SWEBOK says that coverage “should not be considered _per_se_ as the objective of testing.” I share that opinion — it is a poor objective. But it appears to be the objective of many people who drive their testing in order to achieve this result. The fact that the authors of SWEBOK don’t like coverage as an objective doesn’t make it a technique.
Another strange page 70 assertion is that test techniques used primarily to expose failures are primarily domain testing. SWEBOK says, “These techniques variously attempt to “breakâ€? the program, by running one [or more] test[s] drawn from identified classes of (deemed equivalent) executions. The leading principle underlying such techniques is being as much systematic as possible in identifying a representative set of program behaviors (generally in the form of subclasses of the input domain).”
Yes, domain testing is the most commonly described technique in textbooks. It is simple, easy to understand, and easy to teach. But risk-based testing, scenario testing, stress testing, specification-focused testing, high-volume automated testing, state-model-based testing, transaction-flow testing, and heuristic-based exploratory testing are other examples of testing techniques that go after bugs in the product. Why ignore these in favor of domain testing?
Additionally, even though the textbooks most often talk in terms of subclasses of input domains, it is important and fruitful to also analyze the program in terms of its output domains, its interfaces with other devices (disk, printer, etc.) and other processes, and its internal intermediate-result variables. By focusing students (or worse, professionals) on input domains to the exclusion of the others, we virtually blind them to important problems. As the ACM pointed out in its evaluation of SWEBOK, a “body of knowledge” should be focused on competent practice, not on the descriptions in introductory books.
SWEBOK (p. 70) also tells us that to avoid confusing test objectives and techniques, we must clearly distinguish between measures of the thoroughness of testing and measures of the software under test (such as measures of reliability). SWEBOK also tells us that when we conduct testing “in view of a specific purpose”, then that specific purpose is the “test objective.” SWEBOK lists examples of reliability measurement, usability evaluation, and contractor’s acceptance as important examples of objectives. I think those are fine objectives. But if a regulatory requirement specifies that I must achieve a certain type of coverage, and I design tests to meet that requirement, then meeting that coverage target IS my specific purpose for those tests. I can think of several circumstances under which achievement of a level of thoroughness of a certain type of testing IS the specific purpose for running a set of tests. What principled basis does SWEBOK have in (apparently) rejecting these as invalid objectives?
One (failing) rationale for deciding that achieving a certain level of (some type of) coverage is not a valid objective is that we strive to achieve coverage in order to help achieve something else, such as reliability. That sounds good (in spite of the fact that in some situations, we strive to achieve a certain level of coverage primarily in order to be able to say we achieved that level of coverage), but the reasoning generalizes inconveniently. For example, in many organizations, we do usability testing in order to help achieve customer acceptance. So usability evaluation should not be a valid test objective (because in some contexts, coverage is to reliability as, in other contexts, usability is to acceptance). But SWEBOK specifically blesses usability evaluation and contractor (customer) acceptance as valid test objectives.
A test objective is the objective that drives the design and execution of the tests. Different objectives are appropriate in different contexts. SWEBOK has no business dismissing some objectives as non-objectives.
=================
SWEBOK page 70 states that “Software testing is a very expensive and labor-intensive part of development. For this reason, tools are instrumental for automated test execution, test results logging and evaluation, and in general to support test activities. Moreover, in order to enhance cost-effectiveness ratio, a key issue has always been pushing test automation as much as possible.”
The idea that we should be “pushing test automation as much as possible” has been a source of much mischief and misunderstanding. I frequently hear from experienced testers that their highest bug find rates are achieved using manual or computer-assisted one-time-use tests. I don’t believe that it is to our advantage to stop doing this type of testing. Instead, I think we should be “pushing” cost-benefit analysis and implementing automation when it is cost-effective. For additional discussion of cost/benefit analysis for automation, see my papers, Architectures of Test Automation (https://kaner.com/testarch.html) and Avoiding Shelfware: A Manager’s View of Automated GUI Testing (https://13j276.p3cdn1.secureserver.net/pdfs/shelfwar.pdf).
The idea that we are actually automating testing is itself a misconception. Let’s consider the most common form of test “automation”, GUI regression-level “automation”. It involves these tasks
TASK / DONE BY
Analyze the specification and other docs for ambiguity or other indicators of potential error
–> Done by humans
Analyze the source code for potential errors or other things to test
–> Done by humans
Design test cases
–> Done by humans
Create test data
–> Done by humans
Run the tests the first time
–> Done by humans
Evaluate the first result
–> Done by humans
Report a bug from the first run
–> Done by humans
Debug the tests
–> Done by humans
Save the code
–> Done by humans
Save the results
–> Done by humans
Document the tests
–> Done by humans
Build a traceability matrix (tracing test cases back to specs or requirements)
–> Done by humans or by another tool (not the GUI tool)
Select the test cases to be run
–> Done by humans
Run the tests
–> The Tool does it
Record the results
–> The Tool does it
Evaluate the results
–>The Tool does it, but if there’s an apparent failure, a human re-evaluates the results.
Measure the results (e.g. performance measures)
–> Done by humans or by another tool (not the GUI tool)
Report errors
–> Done by humans
Update and debug the tests
–> Done by humans
When we see how many of the testing-related tasks are being done by people or, perhaps, by other testing tools, we realize that the GUI-level regression test tool doesn’t really automate testing. It just helps a human to do the testing. Rather than calling this “automated testing”, we should call it computer-assisted testing. I am not showing disrespect for this approach by calling it computer-assisted testing. Instead, I’m making a point–there are a lot of tasks in a testing project and we can get help from a hardware or software tool to handle any subset of them. GUI regression test tools handle some of these tasks very well. Other tools or approaches will handle a different subset
We should use tools in software testing, but we should not strive for complete automation. It is the wrong goal.
======================
Page 71-73 diagrams
These pages provide some diagrams of the structure of the rest of the testing chapter. Several of the items on these pages are troubling, but I’ll refer to them in the context of the more detailed discussions in the rest of the chapter.
=====================
Page 74, definitions of fault, failure and defect.
I don’t disagree with the definitions of fault and failure. However, SWEBOK equates “fault” and “defect”, where “fault” refers to the underlying cause of a malfunction.
I have two objections to the use of the word defect.
(a) First, in use, the word “defect” is ambiguous. For example, as a matter of law, a product is dangerously defective if it behaves in a way that would be unexpected by a reasonable user and that behavior results in injury. This is a failure-level definition of “defect.” Rather than trying to impose precision on a term that is going to remain ambiguous despite IEEE’s best efforts, our technical language should allow for the ambiguity.
(b) Second, the use of the word “defect” has legal implications. While some people advocate that we should use the word “defect” to refer to “bugs”, a bug-tracking database that contains frequent assertions of the form “X is a defect” may severely and unnecessarily damage the defendant software developer/publisher in court. In a suit based on an allegation that a product is defective (such as a breach of warranty suit, or a personal injury suit), the plaintiff must prove that the product is defective. If a problem with the program is labeled “defect” in the bug tracking system, that label is likely to convince the jury that the bug is a defect, even if a more thorough legal analysis would not result in classification of that particular problem as “defect” in the meaning of the legal system.
We should be cautious in the use of the word “defect”, recognize that this word will be interpreted in multiple ways by technical and nontechnical people, and recognize that a company’s use of the word in its engineering documents might unreasonably expose that company to legal liability.
=====================
Page 75, The Oracle Problem
As Doug Hoffman has pointed out, oracles are heuristic. When we use an oracle to determine that a program has passed or failed a test, we are comparing the program to some model or expectation on some number of dimensions. The program can fail on other dimensions that the oracle is blind to. For example, if we use Excel as the oracle for a spreadsheet under development, and evaluate the formula A1+A2, we might set cell A1 to 2 and cell A2 to 3 in both program and get 5 in both cases. In terms of the oracle, our spreadsheet has passed this test. But suppose the new spreadsheet took 5 hours to evaluate A1+A2. This is unacceptable, but the oracle is oblivious to it.
The characterization of oracles as tools to decide whether a program behaved correctly on a given test, without discussion of the inherent fallibility of all oracles, has led to serious misunderstandings.
=======================
Page 75, Testability
The third common meaning of testability in practice refers to the extend to which the program is easy to test and the test results are easy to interpret. Thus a highly testable program provides a high level of _control_ (the tester might be able to change data, start the program at any point, etc.) and a high level of _visibility_ (the tester can determine the state of the program, the value of specific variables, etc.)
This is widely used and it guides negotiation among testers and programmers regarding the support for testing that will be designed into a program.
=======================
Page 75, Test Levels (Unit, Integration, System)
SWEBOK says “Clearly, unit testing starts after coding is quite mature, for instance after a clean compile.”
This is 100% in disagreement with the practice of test-driven development, which requires the programmer to write a unit test immediately _before_ writing code that will enable the program to pass the test.
I think that test-driven development is the most important advance in the craft of testing of the past 30 years. This, more than any of the other flaws, illustrates the extent to which SWEBOK is blind to modern good practice.
========================
Page 75, Test Levels (Unit, Integration, System)
Much testing now involves API-level driving of a component developed by someone else. I think this is neither unit, nor integration, nor system testing.
========================
Page 75, Test Levels (Unit, Integration, System)
In test-driven development, the programmer implements a test and then writes the code needed for the program to pass the test. (More precisely, implement the test, run the program and see how it fails, write the simplest code that can pass the test, run the program and fix until the program passes the test, then refactor the code and retest.
The first use of these tests is to guide and check the initial implementation of a few lines of code. In that sense, they are “unit” tests and they are often referred to as unit tests. However, as programming tools evolve, these tests often look at the lines of code in the context of several other features. Using tools like Ward Cunningham’s FIT, for example, programmers might create many “unit” tests that are also “integration” (multi-variable, multi-function) and “system” (check whether an intended benefit will actually be provided to the end user).
I don’t think we should ban the use of the terms “unit”, “integration” and “system.” However, thinking about these as THE THREE LEVELS of testing, as defining the 3 targets of testing, leads to blind spots with respect to the nature of targets and the potential focus of individual tests.
========================
Page 75-77, Objectives of Testing
I think the categorization of testing concepts is strikingly odd. Here, I note the oddness of the list of testing objectives.
SWEBOK lists
– conformance testing, which it equates to functional testing
– reliability testing
– usability testing
– acceptance / qualification testing
– installation testing
– alpha and beta testing
– regression testing
– performance testing
– stress testing
– back-to-back testing
– recovery testing
– configuration testing, and
– usability testing.
This seems like a laundry list. Back-to-back testing looks more like a technique than an objective. Several others of these could be classed in different ways.
More important, what determines inclusion on this list?
For example, I think of objectives of testing as including:
– minimize liability risk
– decision support (help a project manager determine whether
to release the product)
– compliance with regulations or the expectations of a
regulatory inspector (this may or may not involve
conformance with a specification)
– assess and improve safety
– determine the nature of problems likely to arise in long
use of the product
– expose defects
– block premature release of a product
– improve the user experience
and many others.
In looking at the various lists in this document, I cannot divine a principle that governs inclusion versus exclusion.
As a teacher, I think that many things off the list are more important than the things on the list.
===================
Page 76. Regression testing
SWEBOK defines regression as
“the selective retesting of a system or component to verify that modifications have not caused unintended effects. In practice, the idea is to show that previously passed tests still do.” It then refers to “the assurance given by regression testing” [. . .] “that the software’s behavior is unchanged.”
An earlier version of SWEBOK noted that “regression testing” is commonly used to mean retesting the program to determine whether a bug was fixed. This is a popular definition. Why is it excluded?
Another common definition of regression testing is retesting the program to determine whether changes have caused fixed bugs to be re-broken.
If SWEBOK is a description of what is generally known and done, it should not select one definition and objective and exclude other common ones without even mentioning them.
Next, consider the idea that we run a bunch of tests again and again in order to assure ourselves that software behavior is unchanged. The regression test suite is a relatively tiny collection of tests that can only look at a relatively small proportion of the system’s behaviors. Our gamble is that the software’s behavior is not changing in ways missed by the regression tests. I have never seen a convincing theoretical argument that a regression test suite will expose most or all possible behavioral changes of a program and therefore I reject the notion that regression testing provides “assurance.”
The initial definition of regression testing is quite different from the idea of “assuring that system behavior is unchanged.” The definition is “verify that modifications have not caused unintended effects.” Let’s restate this definition in less antiquated terms — let’s talk about RISK instead of VERIFICATION.
That yields the idea that regression testing is done to mitigate the risk of unintended side effects of change.
A risk-based view of regression testing no longer requires us to use the same test, time and again, to study aspects of the program that have been previously tested. You can change data or combine a test with other tests or do other creative things to search for side effects. By varying the tests, you give yourself the chance to find previously-missed bugs — problems that were in the software all along but that you have missed with your tests so far — along with catching some side-effects. You are increasing coverage instead of mindlessly repeating the same old thing.
And under the risk-based view, you don’t fool yourself or defraud others with the idea that a small set of tests verifies some quality characteristic of the product. On page 75, SWEBOK approvingly noted Dijkstra’s insight that you can show the presence of bugs, but not their absence. This insight is flatly inconsistent with claims that we do any type of testing to “assure” or “verify”.
This conceptual contradiction illustrates the extent to which SWEBOK (as reflected in the testing section) seems to be more like a dumpster of testing concepts than like a conceptually coherent presentation.
As a dumpster (a disorganized collection of a miscellany of concepts) it is problematic because of the number and nature of things that have been kept out of the dumpster.
(NOTE: The analysis of regression testing above is mainly an analysis of system-level regression testing. I very much like the idea of creating an extensive suite of unit-level change-detectors, tests that we mainly create test-first, that cover every line and branch that we write. The difference between unit-level regression tests and system level is cost. The programmer runs the change-detector suite, every time she recompiles the code. If her change breaks the build, she fixes it immediately.
The labor cost associated with an independent tester discovering a regression error (which might be a bug in the program but is very often a test announcing that it must be changed to conform to a revised design) is quite high. Counting all people involved in the process, the time from failure through bug reporting to bug evaluation, prioritization, repair, and retesting will often total to an average of 4 labor-hours and or even higher (much higher) in some organizations.
In contrast, with the unit level change-detector suite, the programmer discovers the problem when she compiles the code, and immediately either fixes the code or the test. The labor cost is minimal. The practice is cheap enough that we can use it to support refactoring. The cost associated with traditional system-level regression testing is so high that we could not use it to support refactoring. The high communcation cost drives the cost of late change through the roof (one of the factors of the exponential growth curve of the cost of change over project time) whereas the absence of communication cost associated the unit-level change detectors allows us to make late changes at relatively low cost.
This is an important distinction within the description of a technique that SWEBOK says can be run at the system level or the unit level. We can do regression testing at either level, but their costs, benefits and uses are entirely different. It’s too bad that SWEBOK misses this point.
====================
Page 77 Test Techniques
This is another laundry list that excludes important current techniques, includes techniques that seem to be not widely used, and doesn’t expose any principled basis for inclusion or exclusion.
====================
Page 77 “Ad hoc testing”
SWEBOK says this (which it equates to exploratory testing) is the most widely practiced technique, and then advises a more systematic approach and then says that only experts should do ad hoc testing.
This is a blatant admission that the SWEBOK drafters simply don’t understand the most widely practiced approach to testing. SWEBOK cites my book as its source for testing “based on tester’s intuition and experience” and I believe I am the person who coined the term, “exploratory testing”, so let’s look at what this is. (Much of this material was developed by or with James Bach and many other colleagues over a 20 year period.)
First, exploratory testing involves simultaneous design, execution, and learning about the program. Rather than design tests and then run them, you do some testing, learn from them, learn from other sources, and base your design of next tests on your new insights. Your oracle (set of evaluation criteria) evolve as you learn more.
Second, every competent tester does exploratory testing. If you report a bug and the programmer tells you it was fixed, you do some testing around the fix. One test is the test that exposed the bug in the first place. But if you’re any good, you create additional tests to see if the fix is more general than the specific circumstances reported in the bug report, and to see if there were side effects. These are not pre-planned, pre-specified tests. They are designed, run, evaluated and extended in the moment. I use this testing situation as a basic training ground for junior testers (and classroom students). Surely, it is not something we would leave only to the experts. But SWEBOK tells us that “A more systematic approach is advised” and “ad hoc testing might be useful but only if the tester is really expert!”
There is such a thing as systematic exploratory testing.
I’m not trying to write SWEBOK here, but rather to support my assertion that SWEBOK seem to be clueless about a body of work that even it describes as the most widely practiced technique in the field.
If SWEBOK is intended to describe the current body of knowledge and practice in the field, its cluelessness about the most widely practiced approach is inexcusable.
===============
Page 80
My final comment on SWEBOK’s testing section has to do with its comments on test documentation.
“Documentation is an integral part of the formalization of the test process. The IEEE standard for Software Test Documentation [829] provides a good description of test documents and of their relationship with one another and with the testing process. . . . Test documentation should be produced and continually updated, at the same standards as other types of documentation in development.”
This is pious-sounding claptrap, religious doctrine rather than engineering. There is far too much of this in SWEBOK.
Is IEEE Standard 829 a good description?
In my experience, I have seen 829 applied by many commercial software companies or commercial companies that were developing software as part of their support process for their main business. I have never seen a case in which a commercial software application was benefitted more than it was harmed by application of standard 829. Several colleagues of mine have had the same experience. Bach, Pettichord and I discuss the problems in Lessons Learned in Software Testing.
Normally, an engineering body of knowledge includes assertions that are based on theory and tested by experiment. There was no theoretical basis underlying Standard 829. I am not aware of any experimental research of the costs and benefits associated with the application of 829.
Test document is expensive.
Testing is subject to a very difficult constraint. We have an infinity of potential tests and a very limited amount of time in which to imagine, create, document, and evaluate the results of running a few of those many possible tests.
Time spent generating paperwork is time not available for test implementation, test tool development, test execution and evaluation.
Good practice, therefore, probably pushes toward cost-benefit evaluation on a case by case basis. If a certain type of document is so valuable for the current project that it is worth taking time away from competing tasks in order to create the document, create the document.
Before we can pronounce that test documentation should be continually updated, we should discover why the test documentation is being created and how it will be used in the future. Maybe updating is called for. Maybe not. Maybe the documentation should be up to the standards of other documentation on the project, but maybe not. It depends on who will use the documents, and for what purpose.
=====================
In Sum
My time is limited.
I could write pages and pages more about the weaknesses of SWEBOK, but I think it would be pointless.
I agree with the ACM appraisal that the SWEBOK started with a fundamentally flawed approach. The result continues to be fundamentally flawed.
The call for comments on SWEBOK asked for appraisal of SWEBOK as it relates to teaching.
I teach courses in software testing. SWEBOK is not a good reference point for them.
SWEBOK’s criteria for inclusion and exclusion of topics is unsatisfactory. Many of the most important topics in my testing courses, (such as test driven development, API-level testing, scenario testing, skilled exploration, the difference in objectives and cost/benefit for unit-level regression suites and system-level regression suites, risk-based testing, a risk-based approach to domain testing instead of the stale, 40-year-old, boundary/equivalence approach documented in SWEBOK), effective bug reporting, using requirements analysis techniques to drive decisions about the types of artifacts to be generated, and on and on, are absent from SWEBOK. Much of what is present in SWEBOK is organized strangely, is dated, and many of the techniques (etc.) are marginal in terms of how often they are used and what value they actually provide.
I have appraised SWEBOK against my course notes, which I update regularly (and which I am updating again this summer). My conclusion was that SWEBOK’s flaws are so severe that, on balance, it is a less-than-worthless reference point for discovery of opportunities to improve the notes.
I also teach courses in software metrics.
Measurement, as studied in other fields, normally involves extensive study of validity of measures and threats to validity. One of the most important validity questions is how can we tell whether the measure actually measures what it purports to measure. What model or theory (and associated empirical support) relates the number we obtain (a complexity level of “10”) to the underlying attribute we are trying to measure?
Another critical question involves side effects of measurement. Robert Austin’s book, Measuring and Managing Performance in Organizations, discusses this in detail.
The review of measurement theory in SWEBOK (page 174 and on) skips lightly past these issues and provides a laundry list of metrics, including many that are invalid and unvalidated “metrics” — to the extent that it is clear what attribute they are actually intended to measure. The main value of the SWEBOK treatment of measurement is that it is concise. It makes an excellent “straw man”, something I can hand out and enthusiastically criticize. This is probably not the educational use we would hope to obtain from something that is SUPPOSED TO serve as the basis for a licensing exam.
=============
CLOSING ASSERTION
THERE IS NO BALLOTING PROCESS FOR SWEBOK THIS TIME. IF THERE WAS, I WOULD VOTE THAT SWEBOK SHOULD NOT BE ACCEPTED, WITH OR WITHOUT MODIFICATION.
— Cem Kaner
— Professor of Software Engineering
— Florida Institute of Technology