On the Quality of Qualitative Measures

On the Quality of Qualitative Measures

Cem Kaner, J.D., Ph.D. & Rebecca L. Fiedler, M.B.A., Ph.D.

This is an informal first draft of an article that will summarize some of the common guidance on the quality of qualitative measures.

  • The immediate application of this article is to Kaner’s courses on software metrics and software requirements analysis. Students would be well-advised to read this summary of my lectures carefully (yes, this stuff is probably on the exam).
  • The broader application is to the increasingly large group of software development practitioners who are considering using qualitative measures as a replacement for many of the traditional software metrics. For example, we see a lot of attention to qualitative methods in the agenda of the 2014 Conference of the Association for Software Testing. We won’t be able to make it to CAST this year, but perhaps these notes will provide some additional considerations for their discussions.

On Measurement

Managers have common and legitimate informational needs that skilled measurement can help with. They need information in order to (for example…)

  • Compare staff
  • Compare project teams
  • Calculate actual costs
  • Compare costs across projects or teams
  • Estimate future costs
  • Assess and compare quality across projects and teams
  • Compare processes
  • Identify patterns across projects and trends over time

Executives need these types of information, whether we know how to provide them or not.

Unfortunately, there are strong reasons to be concerned about the use of traditional metrics to answer questions like these. These are human performance measures. As such, they must be used with care or they will cause dysfunction (Austin, 1996). That has been a serious real-life problem (e.g. Hoffman 2000). The empirical basis supporting several of them has been substantially exaggerated (Bossavit, 2014). Many of the managers who use them know so little about mathematics that they don’t understand what their measurements mean and their primary uses are to placate management or intimidate staff. Many of the consultants who give talks and courses advocating metrics also seem to know little about mathematics or about measurement theory. They seem unable to distinguish strong questions from weak ones, unable to discuss the underlying validity of the measures they advocate, and so they seem reliant on appeals to authority, on the intimidating quality of published equations, and on the dismissal of the critic as a nitpicker or an apologist for undisciplined practices.

In sum, there are problems with the application of traditional metrics in our field. It is no surprise that people are looking for alternatives.

In a history of psychological research methods, Kurt Danziger (1994) discusses the distorting impact of quantification on psychological measurement. (See especially his Chapter 9, From quantification to methodolatry.) Researchers designed experiments that looked more narrowly at human behavior, ignoring (designing out of the research) those aspects of behavior or experience that they could not readily quantify and interpret in terms of statistical models.

“All quantitative data is based upon qualitative judgments.”
(Trochim, 2006 at http://www.socialresearchmethods.net/kb/datatype.php)

Qualitative methods might sometimes provide a richer description of a project or product that is less misleading, easier to understand, and more effective as a source of insight. However, there are problems with the application of qualitative approaches.

  • Qualitative reports are, at their core, subjective.
  • They are subject to bias at every level (how the data are gathered or selected, stored, analyzed, interpreted and reported). This is a challenge for every qualitative researcher, but it is especially significant in the hands of an untrained researcher.
  • They are based on selected data.
  • They aren’t very helpful for making comparisons or for providing quantitative estimates (like, how much will this cost?).
  • They are every bit as open to abuse as quantitative methods.
  • And it costs a lot of effort to do qualitative measurement well.

We are fans of measurement (qualitative or quantitative) when it is done well and we are unenthusiastic about measurement (qualitative or quantitative) when it is done badly or sold overenthusiastically to people who aren’t likely to understand what they’re buying.

Because this paper won’t propagandize qualitative measurement as the unquestioned embodiment of sweetness and light, some readers might misunderstand where we are coming from. So here is a little about our background.

  • As an undergraduate, Kaner studied mainly mathematics and philosophy. He also took two semesters of coursework with Kurt Danziger. We only recently read Danziger (1994) and realized how profoundly Danziger has influenced Kaner’s professional development and perspective. As a doctoral student in experimental psychology, Kaner did some essentially-qualitative research (Kaner et al., 1978) but most of his work was intensely statistical, applying measurement theory to human perception and performance (e.g. Kaner, 1983). He applied qualitative methods to client problems as a consultant in the 1990’s. His main stream of published critiques of traditional quantitative approaches started in 1999 (Kaner, 1999a, 1999b). He wrote explicitly about the field’s need to use qualitative measures in 2002. He started giving talks titled “Software Testing as a Social Science” in 2004, explicitly identifying most software engineering measures as human performance measures subject to the same types of challenges as we see in applied measurement in psychology and in organizational management.
  • Fiedler’s (2006, 2007) dissertation used Cultural-Historical Activity Theory (CHAT)–a framework for analyzing and organizing qualitative investigations–to examine portfolio management software in universities. Kaner & Fiedler started applying CHAT to scenario test design in 2007. We presented a qualitative-methods tutorial and a long paper with detailed pointers to the literature at CAST in 2009 and at STPCon in Spring 2013. We continue to use and teach these ideas and have been working for years on a book relating qualitative methods to the design of scenario tests.

We aren’t new to qualitative methods. This is not a shiny new fad for us. We are enthusiastic about increasing the visibility and use of these methods but we are keenly aware of the risk of over-promoting a new concept to the mainstream in ways that dilute the hard parts until all that remains are buzzwords and rituals. (For us, the analogies are Total Quality Management, Six Sigma, and Agile Development.)

Perhaps some notes on what makes qualitative measures “good” (and what doesn’t) might help slow that tide.

No, This is Not Qualitative

Maybe you have heard a recommendation to make project status reporting more qualitative. To do this, you create a dashboard with labels and faces. The labels identify an area or issue of concern, such as how buggy the software is. And instead of numbers, use colored faces because this is more meaningful. A red frowny-face says, There is trouble here. A yellow neutral-face says, Things seem OK, nothing particularly good or bad to report now. And a green smiley-face says, Things go well. You could add more differentiation by having a brighter red with a crying-face or a screaming-or-cursing-face and by having a brighter green with a happy-laughing face.

See, there are no numbers on this dashboard, so it is not quantitative, right?

Wrong.

The faces are ordered from bad to good. You can easily assign numerals to these, 1 for red-screaming-face through 5 for green-laughing-face, you can talk about the “average” (median) score across all the categories of information, you can even draw graphs of the change of confidence (or whatever you map to happyfacedness) from week to week across the project.

This might not be very good quantitative measurement but as qualitative measurement it is even worse. It uses numbers (symbols that are equivalent to 1 to 5) to show status without emphasizing the rich detail that should be available to explain and interpret the situation.

When you watch a consultant present this as qualitative reporting, send him away. Tell him not to come back until he actually learns something about qualitative measures.

OK, So What is Qualitative?

A qualitative description of a product or process is a detail-rich, multidimensional story (or collection of stories) about it. (Creswell, 2012; Denzin & Lincoln 2011; Patton, 2001).

For example, if you are describing the value of a product, you might present examples of cases in which it has been valuable to someone. The example wouldn’t simply say, “She found it valuable.” The example would include a description of what made it valuable, perhaps how the person used it, what she replaced with it, what made this one better than the last one, and what she actually accomplished with it. Other examples might cover different uses. Some examples might be of cases in which the product was not useful, with details about that. Taken together, the examples create an overall impression of a pattern – not just the bias of the data collector spinning the tale he or she wants to tell. For example, the pattern might be that most people who try to do THIS with the product are successful and happy with it, but most people who try to do THAT with it are not, and many people who try to use this tool after experience with this other one are likely to be confused in the following ways …

When you describe qualitatively, you are describing your perceptions, your conclusions, and your analysis. You back it up with examples that you choose, quotes that you choose, and data that you choose. Your work should be meticulously and systematically even-handed.  This work is very time-consuming.

Quantitative work is typically easier, less ambiguous, requires less-detailed knowledge of the product or project as a whole, and is therefore faster.

If you think your qualitative measurement methods are easier, faster and cheaper than the quantitative alternatives, you are probably not doing the qualitative work very well.

Quality of Qualitative

In quantitative measurement, questions about the value of a measure boil down to questions of validity and reliability.

A measurement is valid to the extent that it provides a trustworthy description of the attribute being measured. (Shadish, Cook & Campbell, 2001)

A measurement is reliable to the extent that repeating the same operations (measuring the same thing in the same ways) yields the same (or similar) results.

In qualitative work, the closest concept corresponding to validity is credibility (Guba & Lincoln, 1989). The essential question about the credibility of a report of yours is, Why should someone else trust your work? Here are examples of some of the types of considerations associated with credibility.

Examples of Credibility-Related Considerations

The first and most obvious consideration is whether you have the background (knowledge and skill) to be able to collect, interpret and explain this type of data.

Beyond that, several issues come up frequently in published discussions of credibility. (Our presentation is based primarily on Agostinho, 2005; Creswell, 2012; Erlandson et al., 1993; Finlay, 2006; and Guba & Lincoln, 1989.)

  • Did you collect the data in a reasonable way?
    • How much detail?: Students of ours work with qualitative document analysis tools, such as ATLAS.ti, Dedoose, and NVivo. These tools let you store large collections of documents (such as articles, slides, and interview transcripts), pictures, web pages, and videos (https://en.wikipedia.org/wiki/Computer_Assisted_Qualitative_Data_Analysis_Software). We are now teaching scenario testers to use the same types of tools. If you haven’t worked with one of these, imagine a concept-mapping tool that allows you to save all the relevant documents as sub-documents in the same document as the map and allows you to show the relationships among them not just with a two-dimensional concept map but with a multidimensional network, a set of linkages from any place in any document to any place in any other document.

    As you see relevant information in a source item, you can code it. Coding means applying meaningful tags to the item, so that you can see later what you were thinking now. For example, you might code parts of several documents as illustrating high or low productivity on a project. You can also add comments to these examples, explaining for later review what you think is noteworthy about them. You might also add a separate memo that describes your ideas about what factors are involved in productivity on this project, and another memo that discusses a different issue, such as notes on individual differences in productivity that seem to be confounding your evaluation of tool-caused differences. Later, you can review the materials by looking at all the notes you’ve made on productivity—all the annotated sources and all your comments.

    You have to find (or create) the source materials. For example, you might include all the specification-related documents associated with a product, all the test documentation, user manuals from each of your competitors, all of the bug reports on your product and whatever customer reports you can capture for other products, interviews with current users, including interviews with extremely satisfied users, users who abandoned the product and users who still work with the product but hate it. Toss in status reports, comments in the source code repository, emails, marketing blurbs, and screen shots. All these types of things are source materials for a qualitative project.

    You have to read and code the material. Often, you read and code with limited or unsophisticated understanding at first. Your analytical process (and your ongoing experience with the product) gives you more insight, which causes you to reread and recode material. The researcher typically works through this type of material in several passes, revising the coding structure and adding new types of comments (Patton, 2001). New information and insights can cause you to revise your analysis and change your conclusions.

    The final report gives a detailed summary of the results of this analysis.

    • Prolonged engagement: Did you spend enough time at the site of inquiry to learn the culture, to “overcome the effects of misinformation, distortion, or presented ‘fronts’, to establish rapport and build the trust necessary to overcome constructions, and to facilitate immersing oneself in and understanding the context’s culture”?
    • Persistent observation: Did you observe enough to focus on the key elements and to add depth? The distinction between prolonged engagement and persistent observation is the difference between having enough time to make the observations and using that time well.
    • Triangulation and convergence. Triangulation leads to credibility by using different or multiple sources of data (time, space, person), methods (observations, interviews, videotapes, photographs, documents), investigators (single or multiple), or theory (single versus multiple perspectives of analysis).” (Erlandson et al. 1993, p. 137-138). “The degree of convergence attained through triangulation suggests a standard for evaluating naturalistic studies. In other words, the greater the convergence attained through the triangulation of multiple data sources, methods, investigators, or theories, the greater the confidence in the observed findings. The convergence attained in this manner, however, never results in data reduction but in an expansion of meaning through overlapping, compatible constructions emanating from different vantage points.” (Erlandson et al. 1993, p. 139).
  • Are you summarizing the data fairly?
  • How are you managing your biases (people are often not conscious of the effects of their biases) as you select and organize your observations?
  • Are you prone to wishful thinking or to trying to please (or displease) people in power?
    • Peer debriefing: Did you discussion your ideas with one or more disinterested peers who gave constructively critical feedback, questioned your ideas, methods, motivation, and conclusions?
    • Disconfirming case analysis: Did you look for counter-examples? Did you revise your working hypotheses in light of experiences that were inconsistent with them?
    • Progressive subjectivity: As you observed situations or created and looked for data to assess models, how much did you pay attention to your own expectations? How much did you consider the expectations and observations of others. An observer who affords too much privilege to his or her own ideas is not paying attention.
    • Member checks: If you observed / measured / evaluated others, how much did you involve them in the process? How much influence did they have over the structures you would use to interpret the data (what you saw or heard or read) that you got from them? Do they believe you accurately and honestly represented their views and their experiences? Did you ask?

Transferability

The concerns that underlie transferability are much the same as for external validity (or generalization validity) of traditional metrics:

  • If you or someone else did a comparable study in a different setting, how likely is it that you would make the same observations (see similar situations, tradeoffs, examples, etc.)?
  • How well would your conclusions apply in a different setting?

When evaluating a research report, thorough description is often a key element. The reader doesn’t know what will happen when someone tries a similar study in the future, so they (and you) probably cannot authoritatively predict generalizability (whether people in other settings will see the same things). However, if you describe what you saw well enough, in enough detail and with enough attention to the context, then when someone does perform a potentially-comparable study somewhere else, they will probably be able to recognize whether they are seeing things that are similar to what you were seeing.

Over time, a sense of how general something is can build as multiple similar observations are recorded in different settings

Dependability

The concerns that underlie dependability are similar to those for internal validity of traditional metrics. The core question is whether your work is methodologically sound.

Qualitative work is more exploratory than quantitative (at least, more exploratory than quantitative work is traditionally described). You change what you do as you learn more or as you develop new questions. Therefore consistency of methodology is not an ultimate criterion in qualitative work, as it is for some quantitative work.

However, a reviewer can still ask how well (methodologically) you do your work. For example:

    • Do you have the necessary skills and are you applying them?
    • If you lack skills, are you getting help?
    • Do you keep track of what you’re doing and make your methodological changes deliberately and thoughtfully? Do you use a rigorous and systematic approach?

Many of the same ideas that we mentioned under credibility apply here too, such as prolonged engagement, persistent observation, effort to triangulate, disconfirming case analysis, peer debriefing, member checks and progressive subjectivity. These all describe how you do your work.

  • As issues of credibility, we are asking whether you and your work are worth paying attention to? Your attention to methodology and fairness reflect on your character and trustworthiness.
  • As issues of methodology, we are asking more about your skill than about your heart.

Confirmability

Confirmability is as close to reliability as qualitative methods get, but the qualitative approach does not rest as firmly on reliability. The quantitative measurement model is mechanistic. It assumes that under reasonably similar conditions, the same acts will yield the same results. Qualitative researchers are more willing to accept the idea that, given what they know (and don’t know) about the dynamics of what they are studying, under seemingly-similar circumstances, the same things might not happen next time.

We assess reliability by taking repeated measurements (do similar things and see what happens). We might assess confirmability as the ability to be confirmed rather than whether the observations were actually confirmed. From that perspective, if someone else works through your data:

  • Would they see the same things as you?
  • Would they generally agree that things you see as representative are representative and things that you see as idiosyncratic are idiosyncratic?
  • Would they be able to follow your analysis, find your records, understand your ways of classifying things and agree that you applied what you said you applied?
  • Does your report give your reader enough raw data for them to get a feeling for the confirmability of your work?

In Sum

Qualitative measurements tell a story (or a bunch of stories). The skilled qualitative researcher relies on transparency in methods and data to tell persuasive stories. Telling stories that can stand up to scrutiny over time takes enormous work. This work can have great value, but to do it, you have to find time, gain skill, and master some enabling technology. Shortchanging any of these areas can put your credibility at risk as decision-makers rely on your stories to make important decisions.

References

S. Agostinho, (2005, March). “Naturalistic inquiry in e-learning research“, International Journal of Qualitative Methods 4 (1).

R. D. Austin (1996). Measuring and Managing Performance in Organizations. Dorset House.

L. Bossavit (2014). The Leprechauns of Software Engineering: How folklore turns into fact and what to do about it. Leanpub.

J. Creswell (2012, 3rd ed.). Qualitative Inquiry and Research Design: Choosing Among Five Approaches. Sage Publications.

K. Danziger (1994). Constructing the Subject: Historical Origins of Psychological Research. Cambridge University Press.

N.K. Denzin & Y.S. Lincoln (2011, 4th ed.) The SAGE Handbook of Qualitative Research. Sage Publications.

D.A. Erlandson, E.L. Harris,B.L. Skipper & S.D. Allen, S. D. (1993). Doing Naturalistic Inquiry: A Guide to Methods. Sage Publications.

R. L. Fiedler (2006). “In transition”: An activity theoretical analysis examining electronic portfolio tools’ mediation of the preservice teacher’s authoring experience. Unpublished Ph.D. dissertation, University of Central Florida (Publication No. AAT 3212505).

R.L. Fiedler (2007). “Portfolio authorship as a networked activity“. Paper presented at the Society for Information Technology and Teacher Education.

R. L. Fiedler & C. Kaner (2009). “Putting the context in context-driven testing (an application of Cultural Historical Activity Theory)“. Conference of the Association for Software Testing. Colorado Springs, CO.

L. Finlay (2006). “‘Rigour’, ‘Ethical Integrity” or ‘Artistry”? Reflexively reviewing criteria for evaluating qualitative research.” British Journal of Occupational Therapy. 69 (7), 319-326.

E.G. Guba & Y.S. Lincoln (1989). Fourth Generation Evaluation. Sage Publications.

D. Hoffman (2000). “The darker side of metrics,” presented at Pacific Northwest Software Quality Conference, Portland, OR.

C. Kaner (1983). Auditory and visual synchronization performance over long and short intervals, Doctoral Dissertation: McMaster University.

C. Kaner (1999a). “Don’t use bug counts to measure testers.” Software Testing & Quality Engineering, May/June, 1999, p. 80.

C. Kaner (1999b). “Yes, but what are we measuring?” (Invited address) Pacific Northwest Software Quality Conference, Portland, OR.

C. Kaner (2002). “Measuring the effectiveness of software testers.”15th International Software Quality Conference (Quality Week), San Francisco, CA.

C. Kaner (2004). “Software testing as a social science.” IFIP Working Group 10.4 meeting on Software Dependability, Siena, Italy.

C. Kaner & R. L. Fiedler (2013). “Qualitative Methods for Test Design“. Software Test Professionals Conference (STPCon), San Diego, CA.

C. Kaner, B. Osborne, H. Anchel, M. Hammer & A.H. Black (1978). “How do fornix-fimbria lesions affect one-way active avoidance behavior?86th Annual Convention of the American Psychological Association, Toronto, Canada.

M.Q. Patton (2001, 3rd ed.). Qualitative Research & Evaluation Methods. Sage Publications.

W.R. Shadish, T.D. Cook & D.T. Campbell (2001, 2nd ed.). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Cengage.

W.M.K. Trochim (2006). Research Methods Knowledge Base.

Wikipedia (2014). https://en.wikipedia.org/wiki/Computer_Assisted_Qualitative_Data_Analysis_Software

 

4 Responses to “On the Quality of Qualitative Measures”

  1. Hi Cem

    In the main I agree with the majority of what you have written here with regards to the use of qualitative measurements I especially like the section on coding since this is something I am investigating to make my own qualitative reporting better.

    A couple of points I would like to pick up on.

    “When you watch a consultant present this as qualitative reporting, send him away. Tell him not to come back until he actually learns something about qualitative measures.”

    I feel the context here is important rather than make a sweeping statement such as the one above. I use the smiley face system but not as an overall judgement of the quality of the product more as a guidance to how ‘I’ feel about the product. Emotions play a key part in how we think about a product (http://steveo1967.blogspot.co.uk/2010/07/emotional-tester-part-2.html) Could I do this as a scale of liking from 1-10? Maybe. However as a quick and dirty measure of how I (or anyone on the team) was feeling during that particular time it could provide valuable to information to someone that matters. (this is a form of coding in accordance with basic social science measurements)

    My other point is that it is difficult to get measurements correct, especially quantitative ones, and your suggestions of how we should go about it are detailed however I do not see any examples of how to apply this in practical world where we are constrained by time people and money. If I produced an academic quantitative report for testing that had been done it would not be simple for anyone to understand quickly the important issues. Which for me goes back to the dashboard style of reporting/. What is missing from this article is who is the target audience, since how and what you report will differ. At a base level ‘quick and frugal’ dashboard reporting with links to further and more detailed reporting as you describe above is a useful compromise. hose who make the decisions based upon your reporting can then have access to both styles, if something looks not right at the high level they can drill down to the detail. My method is to have this high level overview quantitative style dashboard (note I say style which is to imply it may not be as fully quantitative as you have so carefully written in the article) which links down to stories about what happened and provides further more detailed quantitative information.

    To summarise what you have written is a great guide to what quantitative measuring is and I appreciate that , I would love to see some context to how to apply this to the ‘real world’ rather than an academic report.

    John, I don’t object to dashboards. My concern is that I am seeing these presented as qualitative reports, by people who present themselves as knowledgeable about qualitative measures, who contrast these with "metrics". Metrics are, by definition, statistics based on the data. The items on the dashboards are numbers expressed in pictures. They reflect a simple ordinal scale. They carry no texture. They are equivalent to your 1-10 scale. They are metrics. They are not qualitative reports.

    It is common to present metrics in summary form, with links to supporting data. There are many summarizing formats for metrics, including dashboards. I like dashboards. I teach Applied Probability & Statistics (a required course for our software engineering and computer science majors). The course presents dashboards early. I point students to Stephen Few’s books, Information Dashboard Design (http://www.amazon.com/Information-Dashboard-Design-At-Glance/dp/1938377001) and Show Me the Numbers: Designing Tables and Graphs to Enlighten (http://www.amazon.com/Show-Me-Numbers-Designing-Enlighten/dp/0970601972 ), to Tufte’s Visual Display of Quantitative Information (http://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142 ), and to McDaniel and McDaniel’s The Accidental Analyst (http://www.amazon.com/Accidental-Analyst-Show-Your-Data/dp/1477432264 ).

    I like dashboards, but I don’t pretend they are qualitative measures. I don’t tell people that "metrics" are "bad" and they should instead use "qualitative measures" like dashboards. I don’t think that’s what you’re doing, but I am seeing this kind of posturing.

    Your final paragraphs (starting with "My other point") are confusing because they repeatedly refer to quantitative measurement. I am not writing at all about quantitative measurement. I am writing about qualitative measurement.

    • I don’t think that you can use qualitative measures as the backbone of a test management process that is based on measurement. I think I would rely on a mix on quantitative measures and qualitative.
    • I think there are serious problems with the current crop of quantitative measures (software metrics). However, I think the solution in many cases will have to be to improve how we work with quantitative information, not to replace it completely with stories. That is, we don’t need "no more metrics". We need better metrics.
    • But this post isn’t about quantitative metrics. It’s about qualitative measures.

    You asked for guidance about how to use qualitative methods in the ‘real world’ rather than the academy. I don’t think the standards change. I think the stories you tell will be more credible or less. I think that if you do poor research to support your stories, you will eventually be humiliated and lose credibility. I think there is a disdain for qualitative approaches among many managers because they have too often heard (or spoken) statements that were made to sound authoritative but were really backed by little or no actual research. If you want to create a well-supported story that describes a complex system in a way that respects its complexity, the criteria that I describe are going to be relevant to your work and to the evaluation by people who look at your work.

    I am not trying to pretend that these are "standards". I am not writing this to encourage ISTQB, IEEE and ISO to pretend to legislate minimum standards for qualitative measurements so that people can memorize them for certification exams. I am saying that in the community of people who use qualitative methods, these are the dimensions that appear commonly in their discussions of what makes work weaker or stronger.

    My personal uses of qualitative methods in industry have been to

    • study patterns of customer satisfaction and dissatisfaction with a software product that I was managing,
    • study ways that users interacted with a telephone system’s operator’s console, to provide guidance as I designed a new generation of console,
    • study the communication quality of bug reports over a 5 year period in a company that had suffered several serious recalls involving bugs that had been discovered, reported, but not fixed,
    • study the quality of collections of individual workproducts as I did staff-performance evaluations,
    • look for patterns in court decisions about reckless driving to determine which types of clusters of evidence were persuasive with California judges (I was a prosecutor at the time—this was applied research, not academic),
    • understand the ways people uses classes of products so that I could design suites of scenario tests that would be relevant and effective.

    I’ve probably used qualitative methods in industry other ways, but these are the examples that come to mind today.

    I think there are plenty of discussions of using qualitative methods in industry. If you are interested, I would suggest asking Bing about publications on human-computer interaction. You might look for the intersection of this with cultural-historical activity theory as a starting point to narrow your initial search.

    The primary guidance that I am writing on the application of qualitative methods in testing is on scenario test design. Design, not control. I think that quantitative measures are often better for control. As to scenario test design, this is part of the same series as The Domain Testing Workbook (http://www.amazon.com/Domain-Testing-Workbook-Cem-Kaner/dp/0989811905 ). Rebecca Fiedler and I have been working on The Scenario Testing Workbook for a long time. Our ideas are developed enough that we are going to teach it as one-half of a one-semester graduate course on advanced test design. That will reveal the next generation of serious problems in our thinking about how to explain/teach/do this type of work. Hopefully, a year later, we’ll have a book.

    For now, I teach qualitative methods in my university courses on

    • Software Requirements (students use qualitative methods to discover and evaluate possible requirements for a software product) (as far as I can tell, what I am teaching is consistent with the blog post and immediately applicable to that type of work in industry) and
    • Software Metrics (students learn these in overview format as a different way of looking at project information).

    I keep thinking about how to design a BBST-like set of courses on software-related statistics and software-related measurement. Maybe after the scenario book and the next generation of BBST, this will be the next project.

    — Cem

    • Thank you for the detailed reply Cem it clarified some of the issues.

      To clarify your confusion with the other point (forgive my proof reading skills and auto-correcting spell checker!) it should have been about qualitative measures which is what I feel you picked up on in your reply. To summarize your point I feel we need to use qualitative stories backed up with quantitative measurements to make them of value. Numbers without stories and stories without numbers are as much use to those who matter as a chocolate fire-screen.

  2. Good article. A few random thoughts…

    1)
    I agree with the ideas presented in the article. I think that qualitative measures are very valuable and a nice supplement (but not replacement) to “traditional metrics” (that are constructed, presented, and used properly).

    2)
    “Qualitative measurements tell a story (or a bunch of stories).”
    My current title is “Enterprise Quality – Metrics and Reporting Lead”. However, I simply tell people, “I help others answer questions and tell stories with data”. But, I wonder…can’t quantitative metrics be wrapped in a story, too? I’m always hesitant to simply present “numbers”. I’d rather use numbers to support my story.

    3)
    “All quantitative data is based upon qualitative judgments.”
    That reminds me of this (paraphrased) idea: “Even heavily scripted testing involves some level of unscripted exploration”.

    4)
    It occurred to me that this article itself is qualitative. This article describes your perceptions, conclusions, and analysis about “qualitative measurements”. You back all this up with examples, quotes, and data that you choose from your selected references. I assume that, during research, you had a prolonged engagement, made persistent observations, and used triangulation and convergence. Prior to publishing the article, I assume you had it peer debriefed, looked for disconfirming case studies, adjusted for your own progressive subjectivity, and were conducted member checks. You hint at this in the Transferability section, when you state, “When evaluating a research report, thorough description is often a key element”. After I realized that the article itself was qualitative, I could more clearly see how the ideas within could also be applied to measurements.

    5)
    I think the real problem is this:
    “Quantitative work is typically easier, …, and … faster.”
    “Telling [qualitative] stories that can stand up to scrutiny over time takes enormous work.”
    While I may be up to the task/challenge, sadly, I think that many others are not. They prefer the “easy” way. The “fast” way. Even if it is not the “better” way. Convincing them otherwise is a challenge.

    Thanks!

  3. […] On the Quality of Qualitative Measures – Cem Kaner – http://kaner.com/?p=409 […]