

International Clinical Trials

Clinical development teams can use interim analyses to improve
efficiency in drug development, providing both costs and ethical
benefits, as David Underwood at Quanticate demonstrates
Interim analyses are widely used in clinical trials and they offer
companies the opportunity to stop studies early in cases where it
appears that the primary objective will not be achieved (often referred
to as ‘stopping for futility’). They also offer the opportunity to stop
studies early when there is clear evidence that the primary objective
has already been met (often referred to as ‘stopping for efficacy’ or
‘stopping for success’). This article shows how the Bayesian
statistical framework is highly appropriate for planning and executing
interim analyses. The formulation of decision rules on the basis of
posterior and predictive probability is introduced and evaluated using
a case study. Simulations are included to assess the merits of the
different decision rules and the relative timing of the interim
analysis. Using these methods in clinical drug development can result
in efficient studies that make the best use of resources while
improving the likelihood of success.
INTERIM ANALYSIS IN THE TRADITIONAL (FREQUENTIST) FRAMEWORK
Where studies are designed within the traditional hypothesis testing
framework, the most common approach to decisionmaking at the interim
analysis is to perform a test of the primary hypothesis based on the
data gathered up to that point in time. If a statistically significant
result is obtained, the study is stopped for success. The difficulty
with this approach is that the potential for a false positive outcome
from the study is increased. This is because there are now two (or
more) opportunities to declare success (each time an interim analysis
is performed, and also at the end of the study). To prevent this, an
adjustment is usually made to the significance level (alpha) at which
the hypothesis is assessed. As a result, the overall sample size for
the study (if the study is not stopped at the interim analysis) usually
needs to be increased, otherwise there is the potential for loss of
power when the hypothesis is tested at the end of the study. In other
words, there is a penalty to pay for introducing the option of an early
stop. There is, therefore, a tradeoff to be made between the potential
benefit of being able to stop the study early, with an overall
reduction in sample size, versus the increased sample size if it
becomes necessary to run the study to completion.
INTERIM ANALYSIS IN THE BAYESIAN FRAMEWORK
The Bayesian framework lends itself quite naturally to application in
interim analysis. The basic structure of the Bayesian approach
involves:
 Having a belief about the likely magnitude of effect of the new
chemical entity (NCE), and being able to express how confident one is
that one’s belief is correct (prior belief)
 Gathering some data to explore what the likely magnitude of effect of the NCE might be (likelihood)
 Updating the belief about the effect of the NCE, based on the data collected (posterior belief)
For now, we will assume that the prior position is one of ignorance
(called a ‘noninformative’ prior) – that is, all possible values of
the magnitude of effect are equally likely.
When applying Bayesian approaches to the area of interim analysis, we
simply go round this cycle. The first time round, our prior belief is
that of ignorance. We then collect some data, up to the first interim
analysis, from which we derive our first posterior belief. This becomes
our prior belief for the next cycle; we gather some more data, conduct
another analysis combining our new prior belief and our new data, and
update our posterior belief. This continues until we have enough
posterior belief to conclude that either our NCE is effective, or that
collecting more data would be unlikely to result in success (that is,
any further efforts are futile). This approach is illustrated in Figure
1.
COLLECT DATA: CHECK.NOW WHAT?
How should a clinical development team decide whether they have
gathered enough evidence to stop the trial? There are two simple
approaches that are straightforward to use.
Posterior Probability
The first is to calculate, at the interim analysis, the posterior
probability that the true difference between the test compound and the
control is greater than the target effect. This can answer the
questions, given the accrued data so far, ‘what is the probability that
our test compound is efficacious?’ and ‘what is the probability that
our test compound delivers the effect that we need?’ If these
probabilities are sufficiently high (or low), the study can be stopped
and no further data needs to be collected.
Predictive Probability
A second approach calculates the predictive probability of achieving a
successful result at the end of the study. This can be particularly
helpful if statistical criteria for determining if the success or
failure of the study has been defined clearly. The predictive
probability answers the question: ‘what is the probability, given the
data that we have gathered so far, and the planned additional number of
subjects to be recruited, that at the end of the study we will meet our
criteria for success?’ If this predictive probability at the interim is
low, the study can be stopped for futility, since there is enough
evidence to show that it is unlikely the study will be successful if it
is continued through to the end. It is worth noting at this point that
even if the planned analysis at the end of the study is to be performed
in a hypothesis testing (Frequentist) framework, and not Bayesian, it
is still possible to use Bayesian predictive probability to make
decisions at an interim analysis, without requiring an adjustment to
the significance level (alpha).
With both approaches, it should be noted that there is a risk that a
wrong decision could be made at the interim stage. This could result in
the study being stopped for futility when actually in truth the test
compound is effective, and if the study had continued, this may have
been demonstrated. Alternatively the study could be stopped early for
success, but in truth the test compound is not effective, and further
studies are then conducted which result in failure.
The risk of making a wrong decision depends on the timing of the
interim analysis (early interim analysis offers the opportunity for
greater cost savings, but with an increased risk of a wrong decision)
and the threshold probabilities for stop/continue decisions at the
interim analysis. Therefore, it is important before the study begins to
agree on what the planned stopping thresholds will be, as well as to
assess up front the risks of making wrong decisions. This assessment
will be illustrated in the following case study.
USING POSTERIOR & PREDICTIVE PROBABILITY TO MAKE DECISIONS AT AN INTERIM ANALYSIS
In the case study, the desired treatment effect will be expressed in
terms of the effect size (ES); where ES = treatment difference/standard
deviation. For example, a test compound that delivers a difference in
treatment means (compared to a reference therapy) of 3.5
points, where the standard deviation of the outcome measure is 10 points, has an ES of 0.35.
Suppose you have a NCE that is about to enter its first efficacy study.
It is unknown at this stage what effect the NCE is likely to deliver,
but an effect of 0.4 or higher would mean the NCE was well positioned
for your particular indication. You have already agreed your decision
rules at the end of the study as follows:
 Success: the posterior probability that the NCE is better than placebo is at least 0.9
 Failure: otherwise
The study has been designed as a parallel group study comparing NCE to
placebo, and 80 subjects per group are planned, based on previous work
which has shown that this sample size has a good chance of success if
the true effect is 0.4 or greater, and is unlikely to lead to success
if the true effect is close to 0. Table 1 summarises the operating
characteristics of the design without an interim analysis.
Table 1 shows that, if the true effect of the NCE is 0.4 or higher, we
have more than a 90 per cent chance of success, based on our planned
sample size and decision rules. If, on the other hand, the truth is
that the NCE is no different to the comparator (ES=0), there is only a
10 per cent chance of incorrectly concluding that the NCE is effective.
However, because it is not clear what effect the NCE is likely to have,
the clinical development team want to include an interim analysis to
enable the study to be stopped early for futility – if it is clear that
the NCE is unlikely to show a benefit over placebo; or for efficacy –
if there is strong evidence that the NCE will deliver an effect greater
than 0.4. The team decide for logistical reasons that one interim
analysis is sufficient, and feel that an interim analysis after 30
subjects per group have completed the study would be an appropriate
point to look at the data. The proposed decision rules for the interim
analysis are as follows:
 Efficacy: stop if the posterior probability, based on interim
data and a noninformative prior, that the true effect size is at least
0.4 is greater than 0.6 (that is, if there is at least a 60 per cent
chance that the true effect is greater than the target effect, stop for
efficacy)
 Futility: stop if the predictive probability of obtaining a
successful outcome if the study continues to the end is less than 0.2
(that is, stop if the chance of getting a successful outcome at the end
of the study is less than 20 per cent)
It has to be noted that the interim efficacy stopping rule is looking
for evidence that the compound delivers more than the target effect (ES
is greater than 0.4), and therefore has a lower evidence threshold
(posterior probability is greater than 0.6) than the study end success
rule; this assesses whether the compound is better than the placebo (ES
higher than 0), and commands a higher evidence threshold (posterior
probability greater than 0.9). Overall, the interim rule is a harder
hurdle to achieve.
MAKING THE RIGHT DECISIONS
Before the final study design is agreed, it is important to evaluate
whether these proposed rules will lead to appropriate decisionmaking.
We want to maximise the chance of stopping for futility if the NCE is
not effective, but ensure that we don’t stop for futility if the NCE
works as well as we hope. Equally, we want to minimise the chance of
stopping for efficacy if the NCE is no better than placebo, but allow
ourselves a good chance of stopping for efficacy if the effect of the
NCE exceeds our hopes.
We can simulate the probability that these decision rules would lead us
to make a decision to stop or continue the study, under the assumption
of various possible values for the actual true effect size. In Figure
2, the xaxis represents these possible true effect sizes (with values
greater than 0, indicating a benefit of the NCE), and the yaxis shows
the chance of making the various decisions at the interim analysis. The
results are based on 5,000 simulations.
The green longdashed line shows us that if the NCE is worse than the
placebo (true effect of less than 0), it is very unlikely (less than a
four per cent chance) that the study would be stopped early for
efficacy. However, if the NCE is in truth better than hoped, with an
effect size of 0.4 or higher, there is a reasonable chance of stopping
for efficacy (40 per cent for an ES of 0.4, but greater than 80 per
cent for an ES of 0.7 or higher).
The red shortdashed line shows us that if the NCE is worse than the
placebo, there is a more than 50 per cent chance that the study will be
stopped at the interim analysis for futility. However, if the effect of
the NCE is 0.4 or greater, it is unlikely the study would be stopped
for futility (less than eight per cent chance).
For completeness, the solid orange line shows us the chance of deciding
to continue the study to the end. As might be expected, this is most
likely to happen where the true effect shows a small to moderate
benefit over placebo. From Figure 2, we can see that the proposed
decision rules should lead to sensible decisionmaking at the interim
analysis.
THE INTERIM EFFECT ON OVERALL STUDY OUTCOME
We also need to check that including the interim analysis doesn’t
greatly affect the chances of making appropriate decisions at the end
of the study. Table 2 shows the chance of concluding a success or
failure at the end of the study. Here, success is defined as either
stopping for efficacy at the interim, or continuing at the interim and
meeting the success criteria at the end of the study. Likewise, failure
is defined as stopping for futility at the interim, or continuing at
the interim and failing to meet the success criteria at the end of the
study.
Comparing the results in Table 2 with those in Table 1, we can see that
including the interim analysis has not significantly effected the
chance of an overall successful outcome. The chance of concluding a
successful study, if in truth the ES is 0.4, has fallen from 90 to 86
per cent, primarily because of the risk of incorrectly stopping at the
interim, but this is only a small hit compared to the potential benefit
of early stopping. The chance of concluding a successful study when in
truth the compound is no better than placebo (ES=0) has only marginally
increased from 10 to 11 per cent, and this is unlikely to be of
concern.
TIMING IS EVERYTHING
Finally, the team are interested in whether the interim analysis should
be achieved at an earlier or later point in the study. Often, the
timing of interim analyses is driven by considerations such as
recruitment rates and the collection of appropriate endpoint data.
However, it is useful to look at the statistical benefits of different
timings for the interim analyses. We can reproduce Figure 2 and include
graphs for different sample sizes at the interim. In Figure 3, we can
see the chance of different outcomes at the interim analysis when the
interim is performed after n1=15, 30 or 45 subjects per group. The
total sample size if the study runs to completion remains fixed at 80
subjects per group. The results are based on 5,000 simulations.
In Figure 3, the solid lines, shortdashed lines and mixed dashed lines
represent 15, 30 and 45 subjects per group at the interim analysis
respectively. Red shows the chance of stopping for futility, green the
chance of stopping for efficacy, and orange the chance of continuing
the study to complete 80 subjects per group.
If the interim analysis is conducted after 15 subjects per group
instead of 30 per group, we can see from Figure 3 that the chance of
stopping early for an NCE that has no benefit, and is much lower (for
example, 44 per cent versus 55 per cent when the true ES=0). Similarly,
for strong effect sizes of greater than 0.5, the chance of stopping for
success is much lower if the interim is performed after 15 instead of
30 subjects per group. So, although there may be a perceived advantage
in conducting the interim analysis earlier, the penalty is that the
chances of stopping the study early (when it would be appropriate to do
so) are somewhat reduced.
We can also see that by conducting the interim analysis after 45
instead of 30 subjects per group, our chances of being able to stop
early are increased – although not by a great amount, and therefore it
is likely that the benefit of an earlier interim with 30 subjects per
group will outweigh the slightly improved operating characteristics of
waiting for 45 subjects per group.
Graphs such as Figure 3 can be produced for a variety of different
sample sizes, and also a variety of different rules for interim
decisionmaking. This can help to facilitate discussion among clinical
development teams and ensure that a statistically robust design is
selected, together with an appropriately planned and timed interim
analysis.
CONCLUSION
It has been shown that the Bayesian statistical framework is very
appropriate for planning and executing interim analyses. The concepts
of posterior probability and predictive probability are intuitive for
making decisions about continuation or early stopping, and can be used
at interim analyses even if the final planned analysis is to be
performed in the classical Frequentist hypothesis testing framework.
Simulations of proposed designs can help assess the performance of
different decision rules and assist in the determination of the sample
size and timing of the interim analysis. Bayesian approaches are often
simpler to interpret than Frequentist methods and allow teams to
consider the evidence in support of different effects. These methods
make the best use of the resources while ensuring successful results
are achievable.

Read full article from PDF >>




Rate this article 
You must be a member of the site to make a vote. 

Average rating: 
0 
    



News and Press Releases 

PharmaMarketing Summit returns 1213 May in Boston, MA
We are thrilled to announce that the PHARMAMARKETING SUMMIT returns this May 1213 in Boston, MA, USA!
The
summit brings leading pharmaceutical marketing executives and
innovative suppliers and service providers together at an exquisite
location the Encore Boston Harbor, Boston, MA, USA.
More info >> 


White Papers 

Conveying Medical Guidance in Clinical Trials – A Survey
Europital Medical Consultancy
With the incremental demand for proactive safety surveillance throughout the conduct of clinical trials, the role of Medical Management is at the fore in ensuring the safety and wellbeing of the participants. The complex responsibilities of a Medical Monitor (MM) starts from the design and development phase, through to study close out. Understanding the principle behind the protocol and the prospective medical solution the study would deliver forms the bloodline for the MM role. Often, the MM is the face of contact for both the site personnel and the study team members with regard to medical, safety and scientific issues within the project.
When it comes to medical guidance, the communication channel used to deliver solutions contributes to a large extent in effectively managing decisive situations. Our previous study on acquiring medical guidance from an operations team perspective revealed that emails were the most used communication method (see the article, 'Talking Points', in ICT November 2014). In our efforts to further strengthen the mode of medical guidance delivery, we designed a survey to study the existing trend and constraints in this communication chain management, as outlined here.
More info >> 

