home > ict > summer 2011 > the profitable pause
International Clinical Trials

The Profitable Pause

Clinical development teams can use interim analyses to improve efficiency in drug development, providing both costs and ethical benefits, as David Underwood at Quanticate demonstrates

Interim analyses are widely used in clinical trials and they offer companies the opportunity to stop studies early in cases where it appears that the primary objective will not be achieved (often referred to as ‘stopping for futility’). They also offer the opportunity to stop studies early when there is clear evidence that the primary objective has already been met (often referred to as ‘stopping for efficacy’ or ‘stopping for success’). This article shows how the Bayesian statistical framework is highly appropriate for planning and executing interim analyses. The formulation of decision rules on the basis of posterior and predictive probability is introduced and evaluated using a case study. Simulations are included to assess the merits of the different decision rules and the relative timing of the interim analysis. Using these methods in clinical drug development can result in efficient studies that make the best use of resources while improving the likelihood of success.


Where studies are designed within the traditional hypothesis testing framework, the most common approach to decisionmaking at the interim analysis is to perform a test of the primary hypothesis based on the data gathered up to that point in time. If a statistically significant result is obtained, the study is stopped for success. The difficulty with this approach is that the potential for a false positive outcome from the study is increased. This is because there are now two (or more) opportunities to declare success (each time an interim analysis is performed, and also at the end of the study). To prevent this, an adjustment is usually made to the significance level (alpha) at which the hypothesis is assessed. As a result, the overall sample size for the study (if the study is not stopped at the interim analysis) usually needs to be increased, otherwise there is the potential for loss of power when the hypothesis is tested at the end of the study. In other words, there is a penalty to pay for introducing the option of an early stop. There is, therefore, a tradeoff to be made between the potential benefit of being able to stop the study early, with an overall reduction in sample size, versus the increased sample size if it becomes necessary to run the study to completion.


The Bayesian framework lends itself quite naturally to application in interim analysis. The basic structure of the Bayesian approach involves:
  • Having a belief about the likely magnitude of effect of the new chemical entity (NCE), and being able to express how confident one is that one’s belief is correct (prior belief)
  • Gathering some data to explore what the likely magnitude of effect of the NCE might be (likelihood)
  • Updating the belief about the effect of the NCE, based on the data collected (posterior belief)
For now, we will assume that the prior position is one of ignorance (called a ‘non-informative’ prior) – that is, all possible values of the magnitude of effect are equally likely.

When applying Bayesian approaches to the area of interim analysis, we simply go round this cycle. The first time round, our prior belief is that of ignorance. We then collect some data, up to the first interim analysis, from which we derive our first posterior belief. This becomes our prior belief for the next cycle; we gather some more data, conduct another analysis combining our new prior belief and our new data, and update our posterior belief. This continues until we have enough posterior belief to conclude that either our NCE is effective, or that collecting more data would be unlikely to result in success (that is, any further efforts are futile). This approach is illustrated in Figure 1.


How should a clinical development team decide whether they have gathered enough evidence to stop the trial? There are two simple approaches that are straightforward to use.

Posterior Probability
The first is to calculate, at the interim analysis, the posterior probability that the true difference between the test compound and the control is greater than the target effect. This can answer the questions, given the accrued data so far, ‘what is the probability that our test compound is efficacious?’ and ‘what is the probability that our test compound delivers the effect that we need?’ If these probabilities are sufficiently high (or low), the study can be stopped and no further data needs to be collected.

Predictive Probability
A second approach calculates the predictive probability of achieving a successful result at the end of the study. This can be particularly helpful if statistical criteria for determining if the success or failure of the study has been defined clearly. The predictive probability answers the question: ‘what is the probability, given the data that we have gathered so far, and the planned additional number of subjects to be recruited, that at the end of the study we will meet our criteria for success?’ If this predictive probability at the interim is low, the study can be stopped for futility, since there is enough evidence to show that it is unlikely the study will be successful if it is continued through to the end. It is worth noting at this point that even if the planned analysis at the end of the study is to be performed in a hypothesis testing (Frequentist) framework, and not Bayesian, it is still possible to use Bayesian predictive probability to make decisions at an interim analysis, without requiring an adjustment to the significance level (alpha).

With both approaches, it should be noted that there is a risk that a wrong decision could be made at the interim stage. This could result in the study being stopped for futility when actually in truth the test compound is effective, and if the study had continued, this may have been demonstrated. Alternatively the study could be stopped early for success, but in truth the test compound is not effective, and further studies are then conducted which result in failure.

The risk of making a wrong decision depends on the timing of the interim analysis (early interim analysis offers the opportunity for greater cost savings, but with an increased risk of a wrong decision) and the threshold probabilities for stop/continue decisions at the interim analysis. Therefore, it is important before the study begins to agree on what the planned stopping thresholds will be, as well as to assess up front the risks of making wrong decisions. This assessment will be illustrated in the following case study.


In the case study, the desired treatment effect will be expressed in terms of the effect size (ES); where ES = treatment difference/standard deviation. For example, a test compound that delivers a difference in treatment means (compared to a reference therapy) of 3.5
points, where the standard deviation of the outcome measure is 10 points, has an ES of 0.35.

Suppose you have a NCE that is about to enter its first efficacy study. It is unknown at this stage what effect the NCE is likely to deliver, but an effect of 0.4 or higher would mean the NCE was well positioned for your particular indication. You have already agreed your decision rules at the end of the study as follows:
  • Success: the posterior probability that the NCE is better than placebo is at least 0.9
  • Failure: otherwise
The study has been designed as a parallel group study comparing NCE to placebo, and 80 subjects per group are planned, based on previous work which has shown that this sample size has a good chance of success if the true effect is 0.4 or greater, and is unlikely to lead to success if the true effect is close to 0. Table 1 summarises the operating characteristics of the design without an interim analysis.

Table 1 shows that, if the true effect of the NCE is 0.4 or higher, we have more than a 90 per cent chance of success, based on our planned sample size and decision rules. If, on the other hand, the truth is that the NCE is no different to the comparator (ES=0), there is only a 10 per cent chance of incorrectly concluding that the NCE is effective.

However, because it is not clear what effect the NCE is likely to have, the clinical development team want to include an interim analysis to enable the study to be stopped early for futility – if it is clear that the NCE is unlikely to show a benefit over placebo; or for efficacy – if there is strong evidence that the NCE will deliver an effect greater than 0.4. The team decide for logistical reasons that one interim analysis is sufficient, and feel that an interim analysis after 30 subjects per group have completed the study would be an appropriate point to look at the data. The proposed decision rules for the interim analysis are as follows:
  1. Efficacy: stop if the posterior probability, based on interim data and a non-informative prior, that the true effect size is at least 0.4 is greater than 0.6 (that is, if there is at least a 60 per cent chance that the true effect is greater than the target effect, stop for efficacy)
  2. Futility: stop if the predictive probability of obtaining a successful outcome if the study continues to the end is less than 0.2 (that is, stop if the chance of getting a successful outcome at the end of the study is less than 20 per cent)
It has to be noted that the interim efficacy stopping rule is looking for evidence that the compound delivers more than the target effect (ES is greater than 0.4), and therefore has a lower evidence threshold (posterior probability is greater than 0.6) than the study end success rule; this assesses whether the compound is better than the placebo (ES higher than 0), and commands a higher evidence threshold (posterior probability greater than 0.9). Overall, the interim rule is a harder hurdle to achieve.


Before the final study design is agreed, it is important to evaluate whether these proposed rules will lead to appropriate decision-making. We want to maximise the chance of stopping for futility if the NCE is not effective, but ensure that we don’t stop for futility if the NCE works as well as we hope. Equally, we want to minimise the chance of stopping for efficacy if the NCE is no better than placebo, but allow ourselves a good chance of stopping for efficacy if the effect of the NCE exceeds our hopes.

We can simulate the probability that these decision rules would lead us to make a decision to stop or continue the study, under the assumption of various possible values for the actual true effect size. In Figure 2, the x-axis represents these possible true effect sizes (with values greater than 0, indicating a benefit of the NCE), and the y-axis shows the chance of making the various decisions at the interim analysis. The results are based on 5,000 simulations.

The green long-dashed line shows us that if the NCE is worse than the placebo (true effect of less than 0), it is very unlikely (less than a four per cent chance) that the study would be stopped early for efficacy. However, if the NCE is in truth better than hoped, with an effect size of 0.4 or higher, there is a reasonable chance of stopping for efficacy (40 per cent for an ES of 0.4, but greater than 80 per cent for an ES of 0.7 or higher).

The red short-dashed line shows us that if the NCE is worse than the placebo, there is a more than 50 per cent chance that the study will be stopped at the interim analysis for futility. However, if the effect of the NCE is 0.4 or greater, it is unlikely the study would be stopped for futility (less than eight per cent chance).

For completeness, the solid orange line shows us the chance of deciding to continue the study to the end. As might be expected, this is most likely to happen where the true effect shows a small to moderate benefit over placebo. From Figure 2, we can see that the proposed decision rules should lead to sensible decision-making at the interim analysis.


We also need to check that including the interim analysis doesn’t greatly affect the chances of making appropriate decisions at the end of the study. Table 2 shows the chance of concluding a success or failure at the end of the study. Here, success is defined as either stopping for efficacy at the interim, or continuing at the interim and meeting the success criteria at the end of the study. Likewise, failure is defined as stopping for futility at the interim, or continuing at the interim and failing to meet the success criteria at the end of the study.

Comparing the results in Table 2 with those in Table 1, we can see that including the interim analysis has not significantly effected the chance of an overall successful outcome. The chance of concluding a successful study, if in truth the ES is 0.4, has fallen from 90 to 86 per cent, primarily because of the risk of incorrectly stopping at the interim, but this is only a small hit compared to the potential benefit of early stopping. The chance of concluding a successful study when in truth the compound is no better than placebo (ES=0) has only marginally increased from 10 to 11 per cent, and this is unlikely to be of concern.


Finally, the team are interested in whether the interim analysis should be achieved at an earlier or later point in the study. Often, the timing of interim analyses is driven by considerations such as recruitment rates and the collection of appropriate endpoint data. However, it is useful to look at the statistical benefits of different timings for the interim analyses. We can reproduce Figure 2 and include graphs for different sample sizes at the interim. In Figure 3, we can see the chance of different outcomes at the interim analysis when the interim is performed after n1=15, 30 or 45 subjects per group. The total sample size if the study runs to completion remains fixed at 80 subjects per group. The results are based on 5,000 simulations.

In Figure 3, the solid lines, short-dashed lines and mixed dashed lines represent 15, 30 and 45 subjects per group at the interim analysis respectively. Red shows the chance of stopping for futility, green the chance of stopping for efficacy, and orange the chance of continuing the study to complete 80 subjects per group.

If the interim analysis is conducted after 15 subjects per group instead of 30 per group, we can see from Figure 3 that the chance of stopping early for an NCE that has no benefit, and is much lower (for example, 44 per cent versus 55 per cent when the true ES=0). Similarly, for strong effect sizes of greater than 0.5, the chance of stopping for success is much lower if the interim is performed after 15 instead of 30 subjects per group. So, although there may be a perceived advantage in conducting the interim analysis earlier, the penalty is that the chances of stopping the study early (when it would be appropriate to do so) are somewhat reduced.

We can also see that by conducting the interim analysis after 45 instead of 30 subjects per group, our chances of being able to stop early are increased – although not by a great amount, and therefore it is likely that the benefit of an earlier interim with 30 subjects per group will outweigh the slightly improved operating characteristics of waiting for 45 subjects per group.

Graphs such as Figure 3 can be produced for a variety of different sample sizes, and also a variety of different rules for interim decisionmaking. This can help to facilitate discussion among clinical development teams and ensure that a statistically robust design is selected, together with an appropriately planned and timed interim analysis.


It has been shown that the Bayesian statistical framework is very appropriate for planning and executing interim analyses. The concepts of posterior probability and predictive probability are intuitive for making decisions about continuation or early stopping, and can be used at interim analyses even if the final planned analysis is to be performed in the classical Frequentist hypothesis testing framework. Simulations of proposed designs can help assess the performance of different decision rules and assist in the determination of the sample size and timing of the interim analysis. Bayesian approaches are often simpler to interpret than Frequentist methods and allow teams to consider the evidence in support of different effects. These methods make the best use of the resources while ensuring successful results are achievable.

Read full article from PDF >>

Rate this article You must be a member of the site to make a vote.  
Average rating:

There are no comments in regards to this article.

David Underwood is CEO and Chairman of Quanticate. He has been in the pharmaceutical industry for over 30 years, starting his career at GlaxoSmithKline as a statistician. David started his own company 15 years ago to provide specialist biometrics services and fully understands that data and their interpretation are the final product of clinical trials and their importance cannot be overstated. Part of this remit is the provision of statistical consultancy expertise to the industry. He is delighted to present this paper on behalf of the statistical consultancy team on the use of interim analysis to improve the efficiency of drug development. Email:
David Underwood
Print this page
Send to a friend
Privacy statement
News and Press Releases

PharmaMarketing Summit returns 12-13 May in Boston, MA

We are thrilled to announce that the PHARMAMARKETING SUMMIT returns this May 12-13 in Boston, MA, USA! The summit brings leading pharmaceutical marketing executives and innovative suppliers and service providers together at an exquisite location the Encore Boston Harbor, Boston, MA, USA.
More info >>

White Papers

Conveying Medical Guidance in Clinical Trials – A Survey

Europital Medical Consultancy

With the incremental demand for proactive safety surveillance throughout the conduct of clinical trials, the role of Medical Management is at the fore in ensuring the safety and wellbeing of the participants. The complex responsibilities of a Medical Monitor (MM) starts from the design and development phase, through to study close out. Understanding the principle behind the protocol and the prospective medical solution the study would deliver forms the bloodline for the MM role. Often, the MM is the face of contact for both the site personnel and the study team members with regard to medical, safety and scientific issues within the project. When it comes to medical guidance, the communication channel used to deliver solutions contributes to a large extent in effectively managing decisive situations. Our previous study on acquiring medical guidance from an operations team perspective revealed that e-mails were the most used communication method (see the article, 'Talking Points', in ICT November 2014). In our efforts to further strengthen the mode of medical guidance delivery, we designed a survey to study the existing trend and constraints in this communication chain management, as outlined here.
More info >>




©2000-2011 Samedan Ltd.
Add to favourites

Print this page

Send to a friend
Privacy statement