FRAUD AND THE STATISTICIAN'S ROLE IN PROTECTING THE
INTEGRITY OF CLINICAL RESEARCH

by Dr. Paul Catalano


There has been much upheaval in the world of clinical trials lately regarding issues of fraud and scientific misconduct in clinical research. This article aims to outline some possible reasons for the occurrence of fraud and discuss ways in which sound statistical practice can be an effective deterrent to misconduct and the resulting biases that may develop as a result of it both in large scale multi-center studies and in smaller single institution investigations.

But first, a bit of history: in 1990, a data manager for the National Surgical Adjuvant Breast and Bowel Project (NSABP), a large U.S. multi-center cancer cooperative group, noticed a discrepancy in a breast cancer patient's date of surgery on two data forms submitted from St. Luc Hospital in Montreal. The story from here is long and complex but this ultimately led to a lengthy series of audits and investigations into all St. Luc patients entered on a dozen NSABP protocols during the period 1977-1990. Out of 1,511 St. Luc cases examined, 115 falsifications of data were discovered involving a total of 99 patients. Most of the instances involved altering eligibility information to allow ineligible patients to enter onto NSABP clinical trials. The physician/scientist responsible for the misrepresentations was debarred from receiving federal grants and prohibited from serving on any review boards or advisory committees for eight years.

Much of the recent publicity over the fraud incidents concerned not the original commitment of misconduct (which was investigated during 1991-1992 and made public in 1993) but the fact that the NSABP delayed in reporting the effects of removing the St. Luc data on the scientific outcomes of the studies. The actions taken against NSABP (including the removal of two senior clinical directors and the chief biostatistician) remain a source of controversy in the scientific community; for example, Richard Peto presented a highly critical view of the politics of the incident at a recent meeting of the Society of Clinical Trials.

From a statistical standpoint, though, it is interesting to note the nature of the misrepresentations in the St. Luc data: no randomization assignments were breached and no outcome data were altered--in all but one case, only eligibility data were affected. Furthermore, all assigned treatments were given as specified and all patients were followed per protocol. Re-analysis of data from the affected clinical trials (after removing all St. Luc data, including those cases entered properly) fortunately resulted in no qualitative difference in the scientific conclusions of the tainted studies.

What might be some of the reasons for falsifying patient information in clinical research? The issues are complex and multi-faceted. It is no secret that medical researchers have much to gain personally and professionally by publishing positive scientific findings. Thus there may be a built in bias towards wanting to study patients who will do "best" on trial. In addition, the researcher may have a stake in the therapy under investigation (for example, pharmaceutical companies certainly have a vested interest in their own inventions). Study incentives, such as being paid or otherwise credited for entering patients on trial, may also contribute to misrepresentation of information. In a more patient-centric light, many physicians view clinical research as providing a mechanism for excellent medical care and follow-up for their patients. Thus, entering patients on trial can be a very attractive option since there is potential gain for both the researcher and the patient. Hence there may be many incentives for anything from "stretching the rules" of patient eligibility to outright data fraud in reporting clinical outcomes.

There are several statistical and data management procedures that can protect against data alteration in a multi- center clinical trial setting. Central data collection/management and routine auditing of institution records by the cooperative group are probably the most effective safeguards for ensuring reliable and accurate reporting of patient information. Eligibility checks by data management staff and other quality control procedures also provide a level of oversight that make it quite difficult for investigators to break protocol guidelines or falsify data. In fact, central NSABP data management was responsible for the original discovery of the altered St. Luc eligibility information.

Such quality control measures can also protect against simple carelessness and human error in data recording. In fact, it is probably the case that carelessness in recording results and keeping accurate records is far more prevalent than any deliberate altering of patient information or outcome data.

Even though the St. Luc data were altered, the results of the studies were unaffected. Why were the studies robust to the St. Luc alterations? One reason is that the trials were large-scale, multi-center studies making the impact of any one institution small in the scheme of things. In large cooperative group trials, often only a small percentage of cases are enrolled through any single hospital. Thus, in addition to making study conclusions more generalizable, incorporating many institutions also has the effect of reducing the influence of any one center or physician, especially if the treatment assignments are balanced within institution. Second, because patients were randomized to treatment, the effect of entering otherwise ineligible patients served only to dilute any true treatment difference that may have existed. It also prevented any bias due to skewing the patient population on a specific treatment arm. (This, of course, would not have been a safeguard if outcome data had been altered.) Third, the studies enrolled large numbers of patients and therefore provided high statistical power for making treatment comparisons, even if some of the data had to be removed from the analysis. Thus, the end result was that the overall scientific integrity of the research was held in check mostly due to the use of sound statistical design and modern clinical trials methodology.

Most research does not occur in large cooperative groups, but rather in individual institutions and often conducted completely by individual investigators. What safeguards can statisticians use to protect the integrity of these studies? Because much (if not all) of the control over the ongoings of a individualized research rests with the individual investigator, small-scale research can be a much harder problem in terms of monitoring what gets recorded and ultimately forwarded to the statistician for analysis. There might also be the honest "mistake" of not considering some aspects of the research or particular endpoints relevant. Or the investigator may consider particular patients outliers and not consider them in the analysis. Such sins of omission may not be perceived by the investigator as out and out scientific fraud but nonetheless could convey misleadingly positive results when only the "interesting" outcomes are chosen to advance to publication. The statistician may often be the only one to call into question study results at the level of the raw data.

The collection of all of the relevant data is therefore crucial to precisely describing the study population and accurately reporting the results of a small scale study. One way to accomplish this is to try to work with the investigator BEFORE the data are collected to clearly define the study protocol, objectives, and primary outcomes. Another is to establish a central computer database for all study records. The last point may seem trivial and obvious but keeping a direct line to the primary data allows the statistician to monitor the conduct of the trial and establishes standardized procedures for the investigators to follow. It also helps ensure that all study subjects are registered as participants in the research.

Recent news coverage may indicate otherwise, but these discussions are not to imply that fraud in clinical research is rampant. It is difficult to estimate, but many feel that by and large most investigators maintain high scientific standards and want to do proper research. Nonetheless, it is important that clinical studies be as robust as possible to possible misrepresentations of data. This article has outlined some ways in which sound statistical practice and forethought in design can be used to minimize the impact of fraud in clinical research.

Last modified $Date: 1995/09/14 13:38:57 $ by Ribika Moses moses@hsph.harvard.edu