The Bipolar Diagnosis: How Reliable Is Reliable?

John McManamy Health Guide
  • Last week, I posted Is the Diagnosis Worse Than the Illness? This was in response to some reader comments, in particular, Donna, who wrote that: “Honestly, sometimes I think being diagnosed did me in.”


    Misdiagnosis is common, not to mention overdiagnosis and underdiagnosis. A psychiatrist, after all, unlike a radiologist, can’t exactly pull up the equivalent of a mammogram. If only, but wait ...


    How reliable is a mammogram in the first place? It turns out that reading a mammogram is a highly interpretive exercise, especially in regard to women under age 40. Fair enough, but you would expect two experts, based on the same evidence, to make the same diagnostic call, right?

    Add This Infographic to Your Website or Blog With This Code:


    Not so fast. “Reliability” is a measure of clinical agreement. If, for example, two or more radiologists perfectly agree on the results of the same mammogram, we have a “kappa value” of 1.0. Any kappa value above .8 is regarded as almost perfect. Above .6 to .8 is “substantial.” Above .4 to .6 is “moderate,” and above .2 to .4 as “fair.”


    These values tend to apply across all of medicine, though less prevalent conditions are expected to yield lower kappas. Incidentally, in regard to mammograms, a 2010 meta-analysis of ten studies found mean kappas ranging from .21 to .74.


    Switching specialties ...


    There is a lot about psychiatry that invites criticism, but its detractors tend to fall into the fallacy of idealizing physical medicine. Thus, according to the late Thomas Szasz, who spoke out against abuses in psychiatry: 


    There is no blood or other biological test to ascertain the presence or absence of a mental illness, as there is for most bodily diseases.


    Dr Szasz went further by asserting there is no such thing as mental illness.


    Dr Szasz came to prominence back in the 1960s, before the biological component to mental illness was widely recognized. The DSM of the era, according to a 1994 article by Stuart Kirk and Herb Kutchins, had a mean reliability of .52 across 18 disease categories. This was considered “not good” and “not uniformly high” by Robert Spitzer writing in 1974.


    The implication was that psychiatry could do a lot better, and that Dr Spitzer was the man to do it. Dr Spitzer was responsible for the historic DSM-III of 1980, which achieved unprecedented approval thanks to his claims of “much greater reliability.” 


    The signature feature of the DSM-III and those to follow was the ubiquitous symptom checklist. In theory, two psychiatrists in a room observing the same patient, working off the same checklists, could be counted on to reach the same diagnostic conclusion at least most of the time.


    Dr Spitzer’s reliability data appeared to support this claim, but a subsequent study found the DSM-III yielded an “average weighted kappa” of .61. Better than the previous DSM, but hardly a monumental breakthrough.


    In 1987, the American Psychiatric Association released the DSM-III-R, which preserved the checklists in much their original form, as did the DSM-IV of 1994, and more recently the DSM-5 of 2013.


    Add This Infographic to Your Website or Blog With This Code:

    The only real surprise in the DSM-5 were the reliability studies it published the year before. Are you ready? Depression had a ridiculously low kappa of .28, down from its ridiculously high kappa of .80 in 1980. Keep in mind, the depression symptom checklist has remained virtually unchanged all that time.


    In a similar fashion, bipolar I has a current kappa of .56 compared to a 1994 kappa of .69. Schizophrenia, generalized anxiety disorder, and other diagnoses experienced similar sharp drops in reliability.


    The American Psychiatric Association attributed these declines to more sophisticated research techniques, but one can’t help but wonder how much cooking of the data went into those earlier kappas. At any rate, we now have kappas that Robert Spitzer, writing 40 years ago, would have considered “not good.”


    So how reliable is reliable? The psychiatric establishment seems to believe that current kappas reflect a state of reliable enough. An editorial in the Jan 2012 American Journal of Psychiatry noted that a kappa of .8 or above would be considered “miraculous” and .6 to .8 as “cause for celebration.” The commentary deems .4 to .6 as “realistic” and .2 to .4 as “acceptable.”


    According to the editorial: 


    It is unrealistic to expect that the quality of psychiatric diagnoses can be much greater than that of diagnoses in other areas of medicine, where diagnoses are largely based on evidence that can be directly observed. 


    Indeed, when you look at the kappas for mammograms, you can see the editorial writers have a valid point. It’s not just psychiatry where diagnostic uncertainty abounds.


    But this comes as cold comfort to Donna and all the rest of us grappling with the repercussions of our psychiatric labels. Mental illness may indeed be a riddle wrapped in a mystery inside an enigma, but this is no excuse for psychiatry to try to keep solving it using the same old checklists from a bygone era.


    We deserve a lot better.


    More to come ...

Published On: December 15, 2013