This is the first of a set of articles on Assessment Objectives.
My primary objective in restarting my blog was to support the Open Certification Project for Software Testing by providing ideas on how we can go about developing it and a public space for discussion of those (and alternative) ideas.
I’m not opposed to professional certification or licensing in principle. But I do have four primary concerns:
- Certification involves some type of assessment (exams, interviews, demonstration of skilled work, evaluation of work products, etc.). Does the assessment measure what it purports to measure? What level of knowledge or skill does it actually evaluate? Is it a fair evaluation? Is it a reasonably accurate evaluation? Is it an evaluation of the performance of the person being assessed?
- Certification is often done with reference to a standard. How broadly accepted is this standard? How controversial is it? How relevant is it to competence in the field? Sometimes, a certification is more of a political effort–a way of recruiting supporters for a standard that has not achieved general acceptance. If so, is it being done with integrity–are the people studying for certification and taking the assessment properly informed?
- Certifiers can have many different motives. Is this part of an honest effort to improve the field? A profit-making venture to sell training and/or exams? A political effort to recruit relatively naive practitioners to a viewpoint by investing them in credential that they would otherwise be unlikely to adopt?
- The certifier and the people who have been certified often make claims about the meaning of the certification. A certification might be taken as significant by employers, schools, businesses or government agencies who are attempting to do business with a qualified contractor, or by other people who are considering becoming certified. Is the marketing (by the certifier or by the people who have been certified and want to market themselves using it) congruent with the underlying value of the certificate?
The Open Certification Project for Software Testing is in part a reaction to the state of certification in our field. That state is variable–the story is different for each of the certifications available. But I’m not satisfied with any of them, nor are several colleagues. The underlying goals of the Open Certification project are partially to create an alternative certification and partially to create competitive pressure on other providers, to encourage them to improve the value and/or the marketing of their certification(s).
For the next few articles, I want to consider assessment.
1n 1948, a group of “college examiners” gathered at an American Psychological Association meeting and decided to try to develop a theoretical foundation for evaluating whether a person knows something, and how well. The key product of that group was Bloom’s (1956) Taxonomy (see Wikipedia, The Encyclopedia of Educational Technology, the National Teaching & Learning Forum, Don Clark’s page, Teacher Tap, or just ask Google for a wealth of useful stuff). The Bloom Committee considered how we could evaluate levels of cognitive knowledge (distinct from psychomotor and affective) and proposed six levels:
- Knowledge (for example, can state or identify facts or ideas)
- Comprehension (for example, can summarize ideas, restate them in other words, compare them to other ideas)
- Application (for example, can use the knowledge to solve problems)
- Analysis (for example, can identify patterns, identify components and explain how they connect to each other)
- Synthesis (for example, can relate different things to each other, combine ideas to produce an explanation)
- Evaluation (for example, can weigh costs and benefits of two different proposals)
It turns out to be stunningly difficult to assess a student’s level of knowledge. All too often, we think we are measuring one thing while we actually measure something else.
For example, suppose that I create an exam that asks students:
“What are the key similarities and differences between domain testing and scenario testing? Describe two cases, one better suited for domain analysis and the other better suited for scenario, and explain why.”
This is obviously an evaluation-level question, right? Well, maybe. But maybe not. Suppose that a student handed in a perfect answer to this question:
- Knowledge. Maybe students saw this question in a study guide (or a previous exam), developed an answer while they studied together, then memorized it. (Maybe they published it on the Net.) This particular student has memorized an answer written by someone else.
- Comprehension. Maybe students prepared a sample answer for this question , or saw this comparison online or in the textbook, or the teacher made this comparison in class (including the explanation of the two key examples), and this student learned the comparison just well enough to be able to restate it in her own words.
- Application. Maybe the comparison was given in class (or in a study guide, etc.) along with the two “classic” cases (one for domain, one for scenario) but the student has had to figure out for himself why one works well for domain and the other for scenario. He has had to consider how to apply the test techniques to the situations.
These cases reflect a very common problem. How we teach, how our students study, and what resources our students study from will impact student performance—what they appear to know–on exams even if they don’t make much of a difference to the underlying competence–how well they actually know it.
The distinction between competence and performance is fundamental in educational and psychological measurement It also cuts both ways. In the examples above, performance appears to reflect a deeper knowledge of the material. What I often see in my courses is that students who know the material well underperform (get poor grades on my exams) because they are unfamiliar with strictly-graded essay exams (see my grading videos video#1 and video#2 and slides) or with well-designed multiple choice exams. The extensive discussions of racial bias and cultural bias in standardized exams is another example of the competence/performance discussion–some groups perform less well on some exams because of details of the method of examination rather than because of a difference in underlying knowledge.
When we design an assessment for certification:
- What level of knowledge does the assessments appear to get to?
- Could someone who knows less or knows the material less deeply perform as well as someone who knows it at the level we are trying to evaluate?
- Might someone who knows this material deeply perform less well than we expect (for example, because they see ambiguities that a less senior person would miss)?
In my opinion, an assessment is not well designed and should not be used for serious work, if questions like these are not carefully considered in its design.
Coming soon in this sequence — Assessment Objectives:
- Anderson, Krathwohl et. al (2001) update the 1956 Bloom taxonomy.
- Extending Anderson/Krathwohl for evaluation of testing knowledge
- Assessment activities for certification in light of the Anderson/Krathwohl taxonomy