DOCTOR'S ORDERS: HOW DO WE KNOW THEY HELP?
by Drs. Victor DeGruttola and Thomas Fleming

(Dr. Fleming is Chairman, Department of Biostatistics, University of Washington)

Deliberation about the optimal degree of Federal involvement in provision of health care and other social services has dominated the news, but obscured one central point: however such programs are managed, getting the right treatment to patients depends on consensus concerning standards of evidence. And as a nation we are far from having reached consensus even on the mix of professional skills that are required for evaluation of evidence in an era of increasingly complex information systems. In this sense, we face a similar challenge to that of the former Soviet Union trying to create a business culture with no standard notion of accounting. Without a widely disseminated ability to consider and evaluate information, it will be difficult to hold institutions and professional associations accountable for decisions that affect everyone. The potential for waste of scarce resources is vast, especially in the most politicized arenas of health care.

A process for evaluating evidence in criminal proceedings long distinguished England from the rest of Europe, and was an important factor in the development of modern notions of Civil Society. English Law formed a basis for that island's unique development of curbs on organized power; and ideas about due process set a standard that continues to spread throughout the world. Such a process depends on the belief that any full-fledged citizen can comprehend and evaluate testimony.

Fairness in distribution of health care getting useful treatments to all who could benefit, but limiting access to treatments that have no benefit or even do positive harm depends equally on widespread understanding of basic principles of evidence. The notion that patients' interests could be fundamentally protected by leaving all decisions to professional associations was shattered by the infamous Tuskegee study of syphilis in African-American men. Al- though patient and other constituency groups have become increasingly involved in approval of research practices and new drugs, and even development of standards of medical care, formal training in quantitative methods is rarely seen as an important qualification either among policy-makers or their critics. Yet more and more, statistical argument has become the basis for evaluation of evidence and development of policy; at least one AIDS organization, the Treatment Activist Group, has developed a high degree of appreciation for the role of statistical methods in protecting patients' interests.

Although demands by patient advocates for better evaluation of quantitative evidence are unusual, all Americans have a vested interest in the question they pose: How do we know when a treatment is worth taking? It is no surprise that many industry representatives sometimes favor a relaxed standard of evidence, and in the climate of deregulation, government increasingly has attempted attempt to accommodate industry. In fact, Newt Gingrich has even applied pressure from Congress, speaking of the job losses that result from FDA regulation. In addition to the FDA, federal research agencies involved in areas like AIDS must demonstrate a spirit of cooperation with industry access to maintain interesting compounds. Thus arises a world of interlocking interests, which may not always promote a spirit of critical inquiry.

An example of this dynamic was provided by the National Task Force on AIDS Drug Development, set up to speed the development of new AIDS drugs. This highly visible Task Force includes heads of federal agencies and the pharmaceutical industry; its chairman, the Assistant Secretary of Health, was appointed by the President. Although the committee was intended to represent all important interest groups, no one with skills in quantitative evaluation was included--a situation comparable to an imagined first major audit of a privatized factory in Volgagrad, in which no one present had actually studied accounting. As expected, this Task Force has considered the question of what constitutes evidence of drug benefit. This issue turns on the question of how well we understand the mechanisms of disease and drug action. Important testimony at a recent meeting propounded the view that changes in viral burden (based on scientific wizardry that permits counting the number of viral particles in blood) explained virtually all of the treatment benefit in a study comparing early versus delayed use of AZT. A message that has been touted at every important AIDS meeting in the past year, this finding has enormous appeal. If the mechanisms of drug action on disease were completely understood, one could license a huge range of drugs based on quick studies of viral burden, and use them in successive combinations. One might use these measures of virus as triggers; whenever these values appear to increase, switch to new drug combinations. If they worked, such approaches might be worth the potential cost in resources and toxicity. If not, however, we could end up with a lot of new treatments and no way of knowing whether the net effect on length or quality of life is even positive. Did the analysis in question really provide support for a full understanding of the role of viral burden? Unfortunately, it does not hold up.

First, if a drug has a beneficial mechanism that is partly mediated through effects on measures of viral burden, and a harmful one that is not, the degree to which viral burden appears to explain benefit increases with greater degrees of harm. In simple words, the more harmful the side effects of the drug, the more completely its benefit appears to be explained. And even if the drug's mechanisms of action were completely understood, any statistician can tell you that such estimates wobble enormously. In fact, we have already seen in AIDS how misleading such markers can be. For example, British and French investigators preserved with a placebo controlled study of AZT long after all laboratory measures had suggested drug benefit. Long-term results from this land mark trial demonstrated that early and prolonged AZT therapy is not clinically superior to delayed use of AZT, thus establishing the hazard of relying on early marker based prediction of efficacy. A trial of early versus delayed use of AZT in the U.S., analyzed in the Department, produced similar findings.

AIDS is only one of many disease settings where patients' interests are threatened. Consider our experience with the drugs encainide and flacainide used in cardiovascular disease. Demonstrating that these drugs suppress cardiac arrythmias, which often occur after heart attacks and increase the risk of sudden death, led to the use of these drugs by 200,000 Americans per year. Fortunately, a proper long-term study in 2000 patients established that the drugs actually increased deaths by a factor of three, in spite of the positive effect on arrhythmias. Such results imply that their widespread use caused an excess of 4000 deaths per year. If such studies are not carried out in the future, will Americans have reason for belief that approved and prescribed drugs are beneficial?

While Task Force members openly delighted in evidence purportedly showing viral burden is a "surrogate" for clinical benefit, critical review would have been absent except for one remarkable fact. Some patient activists had insisted on the presence of statisticians at the meeting. Although our objections met with a chilly reception, the cautionary arguments at least got heard. More importantly, so did an important alternative to evaluating therapies based on changes in viral burden, i.e. perform large studies in advanced patients the ones most in need and show that these drugs actually slow disease.

Will other such alliances form to question profitable orthodoxies; can appreciation for quantitative ideas grow? Teaching people formal methods of statistical reasoning may be a tall order in a society that increasingly uses TV images as its reference point. Yet statistical thinking does not always rely on complex math or elaborate logic. Everyone uses the basics of such reasoning whenever deciding what time to leave the office to be reasonably sure of making a flight. Education that builds on a basis of common sense about quantifying uncertainty can reach a wide audience with varying levels of skill. Without such an attempt, the public's ability to assure that its interests are being met could shrink to the size of a sound bite in the nightly news.

Last modified $Date: 1995/09/15 13:38:57 $ by Ribika Moses moses@hsph.harvard.edu