Too Many Events
A recent consultancy exposed me to a situation that I'd never thought about before, but one that is probably not rare. To preserve confidentiality and to sharpen the issues, I have here changed some of the details.
My client had developed New Drug (ND), and they sponsored a randomized, double-blind, Phase 3 trial in patients who all had a Disease associated with increased mortality. An Old Drug (OD) is widely used in the treatment of patients with Disease, but OD is incompletely effective in protecting patients from the Disease-related increase in mortality. In the trial, patients with Disease were randomized to receive either OD (ROD) or the sponsor's New Drug (RND). The intended analysis was on the basis of intention-to-treat (ITT).
During the trial, ND became available for another indication. Patients in my client's trial began to drop out of randomized treatment, apparently (for most of them) to open-label ND (OLND). Patients who thus switched from RND to OLND were making no change, but those who switched from ROD to OLND were likely to be receiving a survival benefit that would distort the intention-to-treat analysis. As events accrued through the trial, the trial's statistical power was likely to increase, but after a certain point, as more and more events no longer informed the ND/OD comparison, the statistical power was going to decrease with each additional event. This was new.
The point was not that after unblinding at the end of the trial, some alternative analysis might have produced a better p-value. That's true all the time. The point was that before the trial started, the estimates that were used in power calculations could have been used to estimate the optimal time to stop the trial. Moreover, some of those estimates (for example, the hazard function describing the rate at which subjects would be lost to follow-up) could have been refined during the trial without breaking the blind, directing the trialists to a new optimal stopping time. Such on-the-fly adjustment is not different in principle from what is done in event-driven trials, where the stopping time is adjusted in response to refined estimates of the hazard function for the events of the primary endpoint.
A weakly analogous situation arises when simple event-counting is used for overlong periods. For example, it is ordinary to compare acute interventions on the basis of crude 90-day mortality. If one instead chose 90-year mortality, no two interventions could ever be distinguished. That is why the analysis of longer-term outcomes always relies on some sort of time-to-event statistic.
In a trial like the ND/OD trial, there appears to be no ITT-compatible analysis scheme to cope with the anti-informative late events. It is ordinary for accumulating events to provide diminishing returns as one creeps into the upper end of a sigmoid, but here, accumulating events are of value for a while, but then they become actually harmful to the statistical soundness of the analysis. It is appropriate to want to stop such a trial (before breaking the blind) when its informativeness is likely to be maximal.
I initially modeled the process with a spreadsheet, but now I've produced a more general program (updated 2013-04-08). One can plug in values for the population sizes, hazard-function parameters, and so on, and watch the statistical power of a trial rise and then fall. The weakening of the test statistics is generally not dramatic, but the important observation is that a trial can go on for months, wasting time, money, and patient events, while its scientific result is quietly becoming weaker instead of stronger.
(There was a bug in the program in versions before 220.127.116.11, leading to occasional crashes during initialization. It is now fixed).
Page revised: 2016-06-27 10:14