An Official Journal of the European Federation of Medical Informatics

About EJBI Editorial Board Instructions for Authors Browse EJBI Special Issues Sponsorship & Ads Contact
ISSN 1801-5603
English
Czech English

Methods of the Survival Analysis

1. Faculty of Medicine and Dentristy, Palacky University, Olomouc, Czech Republic

Supervisor: Doc. Zdeněk Valenta, M.Sc., M.S., PhD.

Abstract

The survival analysis is a set of statistical methods dealing with time-to-event data. In biomedical applications the event of interest is usually relapse of the disease or death. A special feature of the survival analysis is censoring and truncation of data. When censoring or truncation occurs some information about the patients’ survival is lost, e.g. some patients are lost to follow-up or the study ends before all the patients die. The survival analysis methods are used for estimation of the survival time distribution, for identification of risk factors that affect the survival time, and also for predicting the survival time when risk factors are present. Survival analysis methods have been further developed by the means of counting processes and martingale theory. Univariate survival analysis methods have been extended to multivariate setting. The multivariate survival analysis covers the field where independence between survival times cannot be assumed. Multi-state models and frailty models represent the two main approaches of multivariate methods.

Keywords: survival analysis, survival function, hazard function, cumulative hazard function, censoring, truncation, Kaplan-Meier estimator, Nelson-Aalen estimator, Cox PH model, partial likelihood, counting process, history, filtration, martingale, competing risk, multi-state models, frailty models

1. Introduction

The survival analysis is a collection of statistical methods for analyzing time-to-event data. The commencement of the survival analysis dates back to the 18th century when analyses of mortality experience of human populations started. During the World War II, the survival analysis focused on engineering – reliability of military equipment was being analyzed. After the World War II the interest turned towards economics and medicine. In 1960s, after the fundamental article of E. L. Kaplan and P. Meier [9] had been published, medical applications of the survival analysis shifted to the center of statistical focus.

2. Basic concepts in the survival analysis

2.1 The survival and hazard function

Let X be the time until some specified event occurs, i.e. X is a non-negative real valued random variable having continuous distribution with finite expectation. There are several functions characterizing the distribution of X:
  •  The probability density of X: f(x), x≥0.

  •  The survival function:

where F(x) is the cumulative distribution function. The survival function describes the probability of an individual surviving beyond time x (experiencing the event after time x).
  • The hazard function:


for all x>0. The hazard function represents a conditional probability rate at which an individual alive at time will experience an event in the next instant. There is a close relationship between the hazard and the survival functions:


  •   The cumulative hazard function:

Thus


If X is a discrete random variable taking values x1<x2<... with associated probability mass function f(x1)=P(X=x1), i=1,2,..., the survival function is

the hazard x1 at is

where S(x-)=limt→x-S(t). The survival function and probability mass function can be also written as (see [8])

More generally, the distribution of X may have both discrete and continuous components. The approach to the discrete and continuous parts can be unified through the notion of a product integral: Let λc(x) be the continuous component of the hazard function, and let λ1, λ2,... be the discrete components at times x1<x2<... .The overall survival function is then

and the cumulative hazard function is

Let dΛ(x) be a differential increment of the cumulative hazard over the interval [x,x+dx):

The survival function in the discrete, continuous, or mixed cases can then be written as

where


is the product integral [8].

 

2.2 Censoring and truncation

Survival data possess a special feature of censoring, compared to other statistical data. Censoring is used when the survival time is not known exactly, the event is only known to have occurred within some time interval. There are several types of censoring: right, left and interval. In biomedical applications, right censoring is the most common type of censoring. It occurs when the survival time is incomplete on the right-hand side of the follow-up period, i.e. the study ends before all patients experience the event or a patient is lost to follow-up (dies due to reasons other than the event of interest, withdraws from the study, moves to another city, etc.).

Let X1,X2,...,Xn be independent and identically distributed (i.i.d.) survival times and C1,C2,...Cn be i.i.d. censoring times. The lifetime Xi of the i-th individual will be known if, and only if, Xi<Ci. If Ci<Xi the event time will be censored at Ci. Thus it is convenient to represent the survival experience of a group of patients by the pairs of random variables  (Ti, δi), where Ti=min(Xi,Ci), δi=I(Xi<Ci), and I is an indicator of the event's occurring, having value one if the event occurs, and zero otherwise.

Another feature, common in survival data, is a truncation. Truncation occurs when only those individuals whose event time lies within a certain time interval (TL, TR) are observed. For left truncation, TR=∞, in case of right truncation, TL=0. Individuals, whose event time is not in this interval, are not observed and no information on these subjects is available. This is in contrast to censoring where there is at least partial information available on each patient. When data are truncated, a conditional distribution has to be used in constructing the likelihood (see [10]).

A critical assumption for the likelihood construction is the independence of lifetimes and censoring times. Censoring is said to be independent if the failure rates that apply to individuals on trial at each time t > 0 are the same as those that would have applied had there been no censoring [8]. Thus the requirement is that at each time t,

where Y(t)=1 indicates that the individual is at risk of failure at time t (has neither failed nor been censored prior to t).

2.3 Counting processes and martingales

An alternative approach to develop inference procedures for censored data involves counting processes. A counting process N={N(t),t≥0} is a stochastic process with N(0)=0, whose value at time t counts the number of events that have occurred in the interval  (0,t]. The sample paths (realizations) of N are nondecreasing, right-continuous step functions that jump whenever an event (or events) occur. In the counting process formulation, the pair of variables (Ti, δi) introduced in Section 2.2 is replaced with the pair of functions Ni(t),Yi(t),i=1,...n, where

Ni(t) is a counting process, while Yi(t) is a predictable process, i.e. a process whose value at time t is known infinitesimally before t, at time t-. This process has left-continuous sample paths. Right-censored survival data are included in this formulation as a special case: Ni(t)=I(Tit,δi=1) and Yi(t)=I(Ti≥t).

To deal with all on-study information of each patient, a term history (or filtration) is used. A history, denoted {Ft,t≥0} is a σ-algebra generated by Ni and Yi:
where Yi(s+)=limu→s+Yi(u). Thus Ft contains the information up to and including time t. The information in Ft increases with increasing time on study, i.e.  for s≤t [4]. Let dNi(t) denote the increment of Ni over the time interval [t,t+dt).
dNi(t)=Ni((t+dt)-)-Ni(t-).

For each t>0, let

denote the full history of the processes Ni(s),Yi(s),i=1,...,n, up to but not including t. Then (see [4]):

where λi(t) is the hazard function. The process

is called the intensity process. At each fixed t, this process is a random variable which approximates the number of jumps by Ni over (0,t]. In fact, ENi(t)=EΛi(t) and thus


For any given i, define the process


                          (1)
Equivalently, the process can be defined

where dMi(t)=dNi(t)-Yi(t)λi(t)dt.

It can be seen that


for all t and


for all s≤t [4]. A process that satisfies these (equivalent) conditions is a martingale. According to the Doob-Meier decomposition theorem (see [4]), any counting process may be uniquely decomposed as a sum of a martingale and a compensator C which is a predictable, right-continuous process with C(0)=0. As an example, according to (1)
 
(2)
where Mi(t) is the counting process martingale corresponding to Ni(t) and Λi(t) is the compensator of the counting process Ni with respect to the filtration Ft-. In terms of differential increments, the process (2) can be equivalently written as
 
dNi(t)=dMi(t)+Yi(t)λi(t)dt.

The approach using martingale methods is very useful in yielding results for censored and truncated data, especially for calculating and verifying asymptotic properties of test statistics and estimators.

3. Non-parametric and semi-parametric models

A principle objective of the survival analysis focuses on estimation of basic quantities (the survival and hazard function) based on censored data. To analyze survival data parametrically, assumptions about the distribution of the failure times would have to be made. To avoid such assumptions, it is common to use non-parametric models. The simplest non-parametric estimate of a distribution function is the empirical distribution function

when a continuous distribution is estimated by a discrete one. For an uncensored sample of n distinct failure times, the empirical survival function is then estimated by Sn(t)=1–Fn(t). The only problem with this approach is the censoring – it is not taken into account in standard statistical methods. Important steps in the development of appropriate methods were done by Kaplan and Meier [9] and Cox [3].

3.1 Kaplan-Meier and Nelson-Aalen estimators

The Kaplan-Meier estimator (called also the product-limit estimator) estimates the survival function by

where there are di events observed at time ti and Ri is the number of individuals still at risk at time ti (uncensored survivors just before ti). The variance of the estimator can be estimated using Greenwood's formula (see [11]):

The product-limit estimator can also be used to estimate the cumulative hazard function:


An alternative estimator of the cumulative hazard function was proposed by Nelson in 1972 [12] and rediscovered by Aalen in 1978 [1]:
 
The variance of the Nelson-Aalen estimator was estimated by Aalen using counting process techniques and is given by

Based on the Nelson-Aalen estimator of the cumulative hazard function, an alternative estimator of the survival function becomes


Suppose now that n individuals from a homogeneous population are put on a study at time 0. Let Ni be the counting process and Yi be the at-risk process of the i-th individual, as described in Section 2.3. Let
 
and

N.(t) denotes the total number of observed failures in the interval, while Y.(t) is the number of individuals in the entire study group that are at risk at time t. The Nelson-Aalen estimator of the cumulative hazard can be written in the counting process notation as
where J(u)=I(Y.(u)>0) with the convention that 0/0 is interpreted as 0 [8]. The Kaplan-Meier estimator of the survival function is then

A common interest is to compare two or more samples, i.e. to test whether there is a significant difference in survival experience of distinct groups of patients. Several generalizations of standard non-parametric tests have been developed to deal with censored and truncated data. The most common tests are the log-rank test, Gehan-Wilcoxon test and Peto-Peto test. For more information, see e.g. [10].

3.2 The Cox model

In clinical studies we typically examine the association of several risk factors with the occurrence of the event of interest. Various patients' characteristics may be associated with patients' survival experience (e.g. age, sex, blood pressure, ...). The Cox proportional hazards model has become a popular approach to modeling covariate effects on survival. In this model the intensity process (hazard) for the i-th subject is
λi(t)=Yi(t)λ0(t)exp(βTXi),
 
where Yi(t) is the at-risk process, λ0 is the baseline hazard (common to all individuals in the study population), Xi is the vector of covariates of individual i and β is a vector of unknown regression parameters. In this model, the ratio of hazard functions of two individuals is constant (the baseline hazard λ0(t) is canceled out), thus the temporal effect is separated from the effect of the covariates. Estimation of the regression coefficients is based on maximizing of the partial likelihood function, which was introduced by Cox in 1972 [3]. The partial likelihood function for β reads

where t1<...<tk are the uncensored failure times of the study group, R(tj) is the set of subjects at risk of failure at time t-j (just prior to time tj), and Xj denotes the covariate vector for an individual failing at tj. The partial likelihood function is treated as a standard likelihood, and inference is carried out by usual means.

4. Multivariate survival analysis

In most clinical applications the univariate survival analysis assumes that the observed survival times are mutually independent (i.i.d. failure times). In practice, however, dependence can occur for very different kinds of data, e.g. survival of twins or other several individuals, similar organs, recurrent events or multi-state events. The multivariate survival analysis covers the field where independence between survival times cannot be assumed. According to [7], the various approaches to analyzing multivariate survival data fall into four main categories: multi-state models, frailty models, marginal modeling and non-parametric methods. The data structure should be considered as well. The data can be parallel (where the number of failures is fixed by the design of the study) or longitudinal (where the number of failures is random for each object under study). The data sets are classified into six types: several individuals, similar organs, recurrent events, repeated measurements, different events and competing risks. Relation of the data types to the two main approaches of analysis (multi-state and frailty models) is described in Table 1. Only these two approaches to analyzing multivariate survival data are presented in this paper. For more information on marginal and non-parametric methods, see e.g. [7], [16], or [8].

Tab. 1. Overview of data types and approaches; x means relevant, blank not relevant. Adopted from [7].

Type of data
Multi-state 
Frailty 
Several individuals
xx
Similar organs
xx
Recurrent events
xx
Repeated measurements 
 x
Different events
x 
Competing risks
x 

 

4.1 Competing risk and multi-state models

Multi-state models are commonly used for describing the development of longitudinal data. They model stochastic processes, which at any time point occupy one of a set of discrete states. In medicine, the states can be e.g. healthy, diseased, and dead. A change of state is called a transition. The competing risk model is an example of multi-state modeling. In competing risks, various causes of death "compete" in the life of patient, and occurrence of one event precludes occurrence of the other events. There are generally three areas of interest in the analysis of competing risks [8]:

  1. Studying the relationship between a vector of covariates and the rate of occurrence of specific types of failure.

  2. Analyzing whether patients at high risk of one type of failure are also at high risk for others.

  3. Estimating the risk of one type of failure after removing others.

Suppose that individuals under study can experience any one of m distinct failure types. For each individual, the underlying failure time T and a covariate vector X are known. The overall hazard function at time t is

To model competing risks, a cause-specific hazard function is considered:

for j=1,...,m, J is a random variable representing the type of failure, and t>0. In words, λj(t,X) specifies the rate of type j failures, given X and in the presence of all other failure types [8]. If only one of the m failure types can occur, then
due to the law of total probability.

It is possible to calculate the Kaplan-Meier estimator for each type of failure separately, but it is difficult to give this a survival function interpretation and therefore this is not recommended [7]. Instead, generalizations of the Kaplan-Meier and Nelson-Aalen estimators can be made (see e.g. [8]). The generalized estimator includes all causes of failure and is usually denoted the Aalen-Johanson estimator.

The Cox model for the cause-specific hazard functions can be considered:

Both the baseline hazards λ0j and the regression coefficients βj vary arbitrarily over the m failure types. Estimation and comparison of the coefficients  βj can be conducted by applying asymptotic likelihood techniques individually to the m factors.

A traditional approach to multi-state models is based on the Markov models. Consider first a homogeneous population with no covariates. Let A(t) be the state occupied at time t,t≥0, with probability model of A(t) being the Markov process. The individuals under study move among m>1 discrete states. If a randomly chosen individual is in state i at time t-, the transition rate (or intensity) from i to j at time t is given by

which holds for all A(u),0≤u<t with A(t-)=i and i,j {1,...,m}, j≠i. The process is memoryless in that only the current state occupied is relevant in specifying the transition rates [8]. In the continuous case, ij(t)=λij(t)dt for all i,j=1,...,m, so that λij(t),i≠j is the continuous-time intensity function for i-to-j transitions. Estimation of the cumulative intensity functions Λij(t) proceeds as follows [8]: consider a possibly right-censored sample of n individuals. For k=1,...,n, let Nijk(t) be the right continuous process that counts the number of observed direct i-to-j transitions for k-th individual, i,j=1,...m,i≠j. Let Yik(t) be the corresponding at-risk process. Define the filtration process as

for k=1,...n; i,j=1,...m and suppose that censoring is independent, so that


which must hold for all i, j, k and t>0. The Nelson-Aalen estimator of Λij(t) is then given by

for all i≠j.

When the vector of covariates X is present, the continuous-time modulated Markov model can be specified for the underlying intensity function

Parametric and semi-parametric models for λijk are obtained analogously as earlier and may be found in [8].

4.2 Frailty models

Frailty models represent an extension of the Cox proportional hazards model. The concept of frailty provides a way to introduce random effects into the model to account for association (correlation) and unobserved heterogeneity. This heterogeneity may be difficult to assess but is nevertheless of a great importance. The frailty is an unobserved random factor that modifies multiplicatively the hazard function of an individual or a group of individuals. The key idea of these models is that individuals most "frail" die earlier than the others [16]. The frailty models are relevant to lifetimes of several individuals, similar organs and repeated measurements. They are not generally relevant for the case of different events [7].

First, bivariate models will be considered. Let

S12(t1,t2)=P(T1≥t1, T2≥t2)
be the joint survival function for the two survival times T1 and T2 where S12(t,t) is the probability that both subjects under study will be alive at time t.

The marginal survival functions are then

S1(t1)=P(T1≥t1)=S12(t1,0)
 
S2(t2)=P(T2≥t2)=S12(0,t2).

If T1 and T2 are independent, S12(t1,t2)=S1(t1)S2(t2). The joint hazard function is

 

and the marginal hazards are

for i=1,2. To address heterogeneity in the survival times it is assumed that the lifetimes are conditionally independent, i.e. T1 and T2 are independent given the random effect Z called frailty:

Usually, the frailty is assumed to act multiplicatively on the hazard, so that

for some baseline hazard λ0i(t) and baseline survival function S0i(t) (when known covariates Xi are present, the hazard may be expressed as λ0i(ti)=λ0i(ti)exp(βTXi) through the Cox regression model). Under the assumption of multiplicative frailty, the cumulative hazards are  Λi(ti)=ZΛ0i(ti). The conditional joint survival function is then

As the frailty Z is an unobserved effect, it needs to be ‘integrated out' of the survival function. This is done by the Laplace transform, which is defined for a random variable Z as

where g(z) is the probability density of Z. For the bivariate survival function thus

where the Laplace transform of g(z) is evaluated at s01(t1)+Λ02(t2).

In many applications, the frailty Z is assumed to follow some distribution with the explicit Laplace transform. A standard (and most widely used) distribution for frailty is the gamma distribution. The random variable Z is gamma distributed with parameters k and 
 

if its probability density function is

with


The gamma function in the denominator of the probability density function is defined as

It satisfies Γ(k+1)=kΓ(k). The gamma distribution fits very well to failure data and is also convenient from computational and analytical point of views [19].

Suppose the common frailty component Z has a gamma distribution with parameters k=Θ=1/σ2. The Laplace transform of the gamma density is then
which leads to (see [7])

To extend the bivariate model to a multivariate one, consider a set of clustered data where for the j-th individual in the i-th group (or cluster) there are the observation times tij and the vector of covariates Xij. The assumption is, again, that given Xij and a random effect Zi, the mi lifetimes in the group i are independent. Thus the joint distribution of these lifetimes given Zi is the product of the marginal distributions given Zi. The marginal hazards then satisfy
 
When the hazards are modeled using the Cox proportional hazards,

If the cluster-specific random effects Zi have independent gamma distributions, then the unconditional survival for the mi lifetimes in cluster i is
 
where 
 
 
This can be solved using the Laplace transform (see [14])
 

where

Different choices of distribution for the frailty Z are possible, e.g. the family of positive stable distributions or the PVF (power variance function) family. For more information about these, see [7]. The frailty Z may also be treated non-parametrically. Although it is desirable to have completely non-parametric estimate of the survival function, the estimates are mathematically complicated and are not of major importance [7].
 
Statistical models that use counting process notation and are convenient for these types of analyses are slightly different from those used until now. In the previously used models, the intensity process λ(t) at the follow-up time t given the covariates X was

In this expression it is assumed that jumps in N are of a unit size only. However, recurrent and correlated failure time data include jumps of a size greater than one (more than one event can be recorded for an individual at a specific follow-up time). Thus it is natural to model the mean jump in N across time:
in the cumulative intensity process. The Cox-type model for the intensity process is then
For more details, see [8].

5. Conclusion

The survival analysis is a collection of specific statistical methods. In this paper, a short overview of these methods was presented. The standard univariate models were extended to multivariate models dealing with parallel and longitudinal data. The two major multivariate concepts were introduced: multi-state and frailty models.

Acknowledgement

The paper has been supported by the project SVV-2010-265513.

References

[1]
Aalen O. O.: Nonparametric Inference for a Family of Counting Processes. Annals of Statistics 6 (1978), 701-726. 
[2]Andersen,P. K. Gill R. D.> Cox's regression model for counting processes: a large sample study. The Annals of Statistics 10 (1982), 1100-1120.
[3]Cox  D. R. : Regression Models and Life-Tables. Journal of the Royal Statistical Society B 34 (1972), 187-220.
[4]Fleming  T. R., Harrington D. P.: Counting Processes and Survival Analysis. John Wiley & Sons, New York, 1991.
[5]Fürstová J.: Multivariate Methods of Survival Analysis. Doktorandský den 2010, Matfyzpress, Praha, 2010.
[6]Gill R. D.: Understanding Cox's regression model: a martingale approach. Journal of the American Statistical Association 79 (1984), 441-447.
[7]Hougaard P.: Analysis of Multivariate Survival Data. Springer, New York, 2000.
[8]Kalbfleisch J. D., Prentice R. L.: The Statistical Analysis of Failure Time Data. John Wiley & Sons, New York, 2002.
[9]Kaplan E. L., Meier P.: Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association 53 (1958), 457-481.
[10]Klein J. P., Moeschberger M. L.: Survival Analysis. Techniques for Censored and Truncated Data. Springer, New York, 2003.
[11]Miller R. G., Gong G., Muñoz A.: Survival Analysis. John Wiley & Sons, New York, 1998.
[12]Nelson W.: Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14 (1972), 945-965.
[13]Prentice R. L., Williams B. J., Peterson A. V.: On the regression analysis of multivariate failure time data. Biometrika 68 (1981), 373-379.
[14]Rodríguez G.: Multivariate Survival Models. available at http://data.princeton.edu/, cited on April 10, 2010.
[15]Self  S. G., Prentice R. L.: Commentary on Andersen and Gill's "Cox's regression model for counting processes: a large sample study". The Annals of Statistics 10 (1982), 1121-1124.
[16]Therneau T. M., Grambsch P. M.: Modeling Survival Data. Extending the Cox Model. Springer, New York, 2000.
[17]Vaupel  J. W., Manton K. G., Stallard E.: The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16 (1979), 439-454.
[18]Wei L. J., Lin D. Y., Weissfeld L.: Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association 84 (1989), 1065-1073.
[19] 
Wienke A.: Frailty Models in Survival Analysis. Habilitation. Martin-Luther-Universität Halle-Wittenberg, 2007. available at http://sundoc.bibliothek.uni-halle.de/habil-online/, cited on September 30, 2009.

Jana Fürstová
Faculty of Medicine and Dentristy
Palacky University
Tř. Svobody 8
771 26  Olomouc
Czech Republic
e-mail:jana.furstova@email.cz