Blog

The Authors of Noise on Variation in Medical Diagnoses

Posted on 19th May 2021 by Mark Skinner

In Noise, the author of Thinking Fast and Slow, Daniel Kahneman, and Nudge's co-writer Cass R. Sunstein team up with Olivier Sibony, an expert on strategic thinking, to examine the worrying phenomenon of 'noise' and how it affects our decision making. In this piece, the authors dissect how damaging noise can be in the medical profession.  

A number of years ago, a good friend of ours (let’s call him Paul) was diagnosed with high blood pressure by his primary care doctor (we will call him Dr. Jones). The doctor advised Paul to try medications. Dr. Jones prescribed a diuretic, but it had no effect. Paul’s blood pressure did not go down. A few weeks later, Dr. Jones responded with a second medication, a calcium channel blocker. Its effect was also modest.

That baffled Dr. Jones. After three months of weekly office visits, Paul’s high blood pressure readings were found to be slightly lower but remained too high, and it wasn’t clear what the next steps would be. Paul was anxious and Dr. Jones was troubled, not least because Paul was a relatively young man in good health. Dr. Jones contemplated trying a third medication.

At that point, Paul happened to move to a new city, where he consulted a new primary care doctor (we will call him Dr. Smith). Paul told Dr. Smith the story of his continuing struggles with high blood pressure. Dr. Smith immediately responded, “Buy a home blood pressure kit, and see what the readings are. I don’t think you have a high blood pressure at all. You probably just have White Coat Syndrome – your blood pressure goes up in doctors’ offices!”

Paul did as he was told, and sure enough, his blood pressure was normal at home. It has been normal ever since (and a month after Dr. Smith told him about White Coat Syndrome, it became normal in doctors’ offices as well).

A central task of doctors is to make diagnoses – to decide whether a patient has some kind of illness, and if so, to identify it. Often diagnosis requires some kind of judgment. When there are differences between the judgments doctors make on the same case, it is an example of noise – variability in professional judgments that should be identical.

In medicine as in other domains, some judgments are routine and largely mechanical. It’s usually easy for specialists to determine whether someone has a broken toe. At the same time, many people know that when doctors do exercise judgment, they might err; it is standard to advise patients to “get a second opinion.” In some hospitals, a second opinion is even mandatory.

Some patients (including Paul) have been astonished to see how much the second opinion diverges from the first. But the biggest surprise is not the existence of noise in the medical profession. It is its sheer magnitude. Consider a few examples:

  • Tuberculosis (TB) is one of the most widespread and deadly diseases worldwide, infecting over 10 million people and killing almost 2 million in 2016. A widely used method for detecting TB is a chest X-ray, which allows examination of the lungs for the empty space caused by the TB bacteria. Variability in diagnosis of TB has been a well-documented issue for almost 75 years.
  • There is wide variability in radiologists’ ability to detect breast cancer from screening mammograms. A large study found that the range of false negatives among different radiologists varied from very low (the radiologist was correct every time) to very high (meaning that the radiologist incorrectly said the mammogram was normal much of the time). Similarly, false positive rates ranged from near zero to high (meaning that in many cases, the radiologist said the mammogram showed cancer when it was normal).
  • Coronary angiograms are one of the primary methods used to evaluate for heart disease by looking at blood flow through the arteries of the heart. The process might seem mechanical. But variability in interpreting angiograms has been well-documented for over 40 years, potentially leading to unnecessary procedures.
  • When pathologists analyze skin lesions for the presence of melanoma – the most dangerous form of skin cancer – there is significant noise. In one study, eight pathologists reviewing each case were unanimous only 62% of the time. Another study at an oncology center found that the diagnostic accuracy of melanomas was only 64%, meaning that they misdiagnosed melanomas in one of every three lesions.

These are examples of noise between doctors, in the sense that doctors disagreed with one another, but in some cases, individual doctors are not consistent. A doctor might not agree with himself! In a radiology study, doctors offered a different view when assessing the same case again, and thus disagreed with themselves, between 8 percent and 40 percent of the time. When assessing the degree of blockage in angiograms, 22 physicians disagreed with themselves between 63% to 92% of the time. These are examples of occasion noise: variability in the judgment of the same case, by the same professional, on two separate occasions.

Another study offers a clue about some of the sources of occasion noise. In short, doctors are significantly more likely to order breast and colon cancer screenings early in the morning than late in the end of the afternoon. It follows that patients with appointment times later in the day were less likely to be ordered for and receive guideline-recommended cancer screening. 

How can we explain such findings? One answer is that physicians almost inevitably run behind in clinic. In order to keep up, they may skip discussions about preventive health measures, such as cancer screening or nutrition counseling. Decision fatigue likely also contributes. During the end of hospital shifts, doctors default to the easier course. This is a robust finding. For example, doctors show higher rates of inappropriate antibiotic prescriptions, lower rates of influenza vaccine administration, and higher rates of opioid prescribing for back pain by PCPs at the end of the day. Among clinicians, there are also lower rates of appropriate handwashing during the end of hospital shifts.

In medicine as in all judgments, many strategies might help reduce noise and errors. Obtaining second or multiple opinions exemplifies an important noise reduction strategy, the aggregation of multiple judgments. That’s an effective approach. Consensus conferences bring together physicians of one or multiple specialties – who may have differed in their interpretation of patients’ imaging or findings – to discuss challenging patient cases with the goal of ultimately agreeing upon one diagnosis. Such conferences are an excellent idea.

A more ambitious approach is to use guidelines that standardize criteria for diagnosis – and thus to reduce the role of judgment altogether. Perhaps the most famous involves the Apgar score, developed in 1952 by the obstetric anesthesiologist, Virginia Apgar. Her score has become a standard tool in assessing newborn babies. Apgar measures the baby’s Appearance (skin color), Pulse (heart rate), Grimace (reflexes), Activity (muscle tone), and Respiration (breathing rate and effort). In the Apgar test, each of these five is given a score. The test involves judgment at various stages, but it is straightforward to apply. As a result, Apgar scoring greatly reduces noise. In many other areas, standardized guidelines are the best path forward. Whether we are speaking of strep throat, heart disease, lung cancer, or COVID-19, such guidelines can reduce the role of judgment – and cut both error and noise.

A third strategy is to use technology as a potential noise reduction strategy. Doctors are now using deep learning algorithms and artificial intelligence to reduce noise. Such algorithms have been used to detect lymph node metastases in women with breast cancer. The best of these have been found to be superior to the best pathologist, and of course algorithms are not noisy. Deep learning algorithms have also been used, with significant success, for detection of eye problems associated with diabetes. Artificial intelligence is also being pursued as a potential means to improve the accuracy of reading mammogram. It seems clear that in the future, the medical profession will increasingly rely on algorithms. 

What is true of medical judgments is also true of the judgments professionals make in all fields. There is noise in the judgments of asylum judges who decide who is admitted in the country, of bail judges who decide who will be free and who will be behind bars while awaiting trial, and of federal judges who sentence convicted offenders. There is noise in patent offices, in child protection services, and in forensic science laboratories. There is noise in forecasting, in recruiting, and in personnel evaluation. There is noise in the day-to-day judgments of corporations and in their strategic moves. There is noise in the policy decisions of elected officials. In short, wherever there is judgment, there is noise – and more of it than you think.

As the case of medicine illustrates, when noise is identified, it can be reduced, and judgments can be improved. Yet noise remains largely unmeasured and unrecognized. Noise: A Flaw in Human Judgment offers an in-depth analysis of the problem of noise, illustrated by case studies in many fields, and describes a large set of tools that can be deployed to reduce it. But above all, it is a call to action. In government, in business, and elsewhere, noise causes tragic errors and rampant injustice. It is a large societal problem. We should do much more to address it.

Comments

There are currently no comments.