MITB Banner

How To Do Survival Analysis In R

Share

In order to analyse the expected duration of time until any event happens, i.e. it could be failure in the mechanical system or any death, the survival analysis comes in rescue to perform ‘Time to Event Analysis’.

Model fitting and method used:

# Install and Load the below packages 

library(BDgraph)

library(survival)
data(churn) #Load the data
head(churn)

##   State Account.Length Area.Code Int.l.Plan VMail.Plan VMail.Message Day.Mins
## 1    KS            128       415         no        yes            25    265.1
## 2    OH            107       415         no        yes            26    161.6
## 3    NJ            137       415         no         no             0    243.4
## 4    OH             84       408        yes         no             0    299.4
## 5    OK             75       415        yes         no             0    166.7
## 6    AL            118       510        yes         no             0    223.4
##   Day.Calls Day.Charge Eve.Mins Eve.Calls Eve.Charge Night.Mins Night.Calls
## 1       110      45.07    197.4        99      16.78      244.7          91
## 2       123      27.47    195.5       103      16.62      254.4         103
## 3       114      41.38    121.2       110      10.30      162.6         104
## 4        71      50.90     61.9        88       5.26      196.9          89
## 5       113      28.34    148.3       122      12.61      186.9         121
## 6        98      37.98    220.6       101      18.75      203.9         118
##   Night.Charge Intl.Mins Intl.Calls Intl.Charge CustServ.Calls Churn
## 1        11.01      10.0          3        2.70              1 False
## 2        11.45      13.7          3        3.70              1 False
## 3         7.32      12.2          5        3.29              0 False
## 4         8.86       6.6          7        1.78              2 False
## 5         8.41      10.1          3        2.73              3 False
## 6         9.18       6.3          6        1.70              0 False

dim(churn)

## [1] 3333   20

# Set up Surv() object
dat<-churn[,c(“Account.Length”,”Churn”)]

#Taking the Churn value as 0(False) and 1(True) for the event
dat$Churn<-as.numeric(churn$Churn)1
survdat <-Surv(time=dat$Account.Length,event=dat$Churn)

# Fit the model to the data
fit<-survfit(survdat~1,se=TRUE)
plot(fit,main=”Survival Function Plot”,xlab=”Time”,ylab=”Survival Probability”)

Median time-to-churn

fit

## Call: survfit(formula = survdat ~ 1, se = TRUE)
##
##       n  events  median 0.95LCL 0.95UCL
##    3333     483     201     188      NA

We can see from the fit output that the median time to churn is 201

Cox proportional hazards method

#Fit the Cox proportional hazards model
fitt<-coxph(Surv(time=dat$Account.Length,event = dat$Churn)~., data=churn)
summary(fitt)

## Call:
## coxph(formula = Surv(time = dat$Account.Length, event = dat$Churn) ~
##     ., data = churn)
##
##   n= 3333, number of events= 483
##
##                      coef  exp(coef)   se(coef)      z Pr(>|z|)   
## StateAL         3.727e-01  1.452e+00  6.815e-01  0.547 0.584509   
## StateAR         7.263e-01  2.067e+00  6.578e-01  1.104 0.269536   
## StateAZ         3.591e-01  1.432e+00  7.679e-01  0.468 0.640011   
## StateCA         1.501e+00  4.484e+00  6.710e-01  2.236 0.025333 * 
## StateCO         3.684e-01  1.445e+00  6.714e-01  0.549 0.583169   
## StateCT         9.994e-01  2.717e+00  6.487e-01  1.541 0.123376   
## StateDC         5.992e-01  1.821e+00  7.352e-01  0.815 0.415027   
## StateDE         9.039e-01  2.469e+00  6.734e-01  1.342 0.179495   
## StateFL         6.055e-01  1.832e+00  6.818e-01  0.888 0.374456   
## StateGA         7.639e-01  2.147e+00  6.832e-01  1.118 0.263471   
## StateHI        -1.553e-01  8.562e-01  8.200e-01 -0.189 0.849830   
## StateIA         6.270e-01  1.872e+00  8.220e-01  0.763 0.445642   
## StateID         9.225e-01  2.516e+00  6.724e-01  1.372 0.170049   
## StateIL         2.504e-01  1.285e+00  7.335e-01  0.341 0.732838   
## StateIN         8.284e-01  2.290e+00  6.709e-01  1.235 0.216887   
## StateKS         9.738e-01  2.648e+00  6.441e-01  1.512 0.130580   
## StateKY         1.377e+00  3.965e+00  6.829e-01  2.017 0.043688 * 
## StateLA         4.333e-01  1.542e+00  7.687e-01  0.564 0.572958   
## StateMA         1.335e+00  3.802e+00  6.545e-01  2.040 0.041311 * 
## StateMD         9.443e-01  2.571e+00  6.314e-01  1.496 0.134760   
## StateME         1.257e+00  3.515e+00  6.487e-01  1.938 0.052649 . 
## StateMI         1.252e+00  3.497e+00  6.344e-01  1.973 0.048456 * 
## StateMN         5.042e-01  1.656e+00  6.498e-01  0.776 0.437808   
## StateMO         4.839e-01  1.622e+00  6.955e-01  0.696 0.486547   
## StateMS         1.210e+00  3.353e+00  6.414e-01  1.886 0.059262 . 
## StateMT         1.612e+00  5.013e+00  6.418e-01  2.512 0.012010 * 
## StateNC         1.687e-01  1.184e+00  6.618e-01  0.255 0.798794   
## StateND         1.417e-01  1.152e+00  7.112e-01  0.199 0.842124   
## StateNE         3.729e-01  1.452e+00  7.373e-01  0.506 0.613038   
## StateNH         1.024e+00  2.784e+00  6.713e-01  1.525 0.127203   
## StateNJ         1.453e+00  4.275e+00  6.297e-01  2.307 0.021041 * 
## StateNM         3.356e-01  1.399e+00  7.127e-01  0.471 0.637719   
## StateNV         1.187e+00  3.278e+00  6.396e-01  1.856 0.063450 . 
## StateNY         7.184e-01  2.051e+00  6.377e-01  1.127 0.259918   
## StateOH         8.039e-01  2.234e+00  6.629e-01  1.213 0.225212   
## StateOK         6.394e-01  1.895e+00  6.744e-01  0.948 0.343108   
## StateOR         7.061e-01  2.026e+00  6.564e-01  1.076 0.282024   
## StatePA         1.153e+00  3.168e+00  6.791e-01  1.698 0.089500 . 
## StateRI         2.414e-01  1.273e+00  7.100e-01  0.340 0.733853   
## StateSC         1.166e+00  3.210e+00  6.472e-01  1.802 0.071541 . 
## StateSD         7.118e-01  2.038e+00  6.808e-01  1.046 0.295734   
## StateTN         8.526e-01  2.346e+00  7.342e-01  1.161 0.245514   
## StateTX         1.101e+00  3.007e+00  6.295e-01  1.749 0.080331 . 
## StateUT         8.669e-01  2.379e+00  6.628e-01  1.308 0.190896   
## StateVA        -3.732e-01  6.885e-01  7.339e-01 -0.508 0.611116   
## StateVT         3.576e-01  1.430e+00  6.831e-01  0.524 0.600624   
## StateWA         1.180e+00  3.255e+00  6.409e-01  1.842 0.065517 . 
## StateWI         4.867e-01  1.627e+00  6.941e-01  0.701 0.483203   
## StateWV         6.826e-01  1.979e+00  6.619e-01  1.031 0.302395   
## StateWY         2.866e-01  1.332e+00  6.710e-01  0.427 0.669252   
## Area.Code       4.925e-04  1.000e+00  1.105e-03  0.446 0.655668   
## Int.l.Planyes   1.329e+00  3.777e+00  1.094e-01 12.152  < 2e-16 ***
## VMail.Planyes  -2.304e+00  9.988e-02  5.165e-01 -4.460 8.19e-06 ***
## VMail.Message   4.801e-02  1.049e+00  1.590e-02  3.020 0.002527 **
## Day.Mins       -3.636e-01  6.952e-01  2.727e+00 -0.133 0.893932   
## Day.Calls       1.262e-03  1.001e+00  2.299e-03  0.549 0.582966   
## Day.Charge      2.193e+00  8.959e+00  1.604e+01  0.137 0.891270   
## Eve.Mins        3.242e-01  1.383e+00  1.415e+00  0.229 0.818755   
## Eve.Calls       1.872e-03  1.002e+00  2.413e-03  0.776 0.437909   
## Eve.Charge     -3.759e+00  2.332e-02  1.665e+01 -0.226 0.821364   
## Night.Mins     -3.205e-01  7.258e-01  7.682e-01 -0.417 0.676546   
## Night.Calls     1.513e-03  1.002e+00  2.457e-03  0.616 0.537914   
## Night.Charge    7.186e+00  1.320e+03  1.707e+01  0.421 0.673777   
## Intl.Mins      -4.607e+00  9.980e-03  4.510e+00 -1.021 0.307019   
## Intl.Calls     -7.837e-02  9.246e-01  2.179e-02 -3.596 0.000323 ***
## Intl.Charge     1.724e+01  3.081e+07  1.670e+01  1.032 0.301906   
## CustServ.Calls  3.318e-01  1.393e+00  2.947e-02 11.257  < 2e-16 ***
## —
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##

 We can see from the output that these variables are significant at 95% level- 1)State 2)Int.l.Planyes 3)VMail.Planyes 4)VMail.Message 5)Intl.Calls 6)CustServ.Calls Therefore ,these variables will develop a parsimonious model.

Summarizing the results

The survival function starts at 1 and is going down with time.The estimated median time to churn is 201. We can see that the State, Int.l.Planyes,VMail.Planyes,VMail.Message,Intl.Calls and CustServ are significant. We know that if Hazard increases the survival function decreases and when Hazard decreases the survival function increases. From the output we can see that State, Int.l.Planyes, VMail.Planyes,VMail.Message,Intl.Calls and CustServ.

Calls are significant variables. These variables are significant at 95% level. We know that the exponential of coefficient gives the Hazard ratio. Therefore, Int.l.Planyes- This has Hazard ratio >1 so there is an increase in hazard and increased possibility of customer churn(decreased survival probability) 

VMail.Planyes- This has Hazard ratio <1 so there is reduction in hazard and decreased possibility of customer churn(increased survival probability) 

VMail.Message- This has Hazard ratio >1 so there is increase in hazard and increased risk of customer churn(decreased survival probability) 

Intl.Calls – This has Hazard ratio <1 so there is reduction in hazard and decreased possibility of customer churn(increased survival probability) 

CustServ.Calls – This has Hazard ratio>1 so there is increase in hazard and increased possibility of customer churn(decreased survival probability) 

StateCA,StateKY,StateMA,StateMI,StateMT,StateNJ-they all have HR>1 so there is increase in Hazard and increased risk of customer churn.(decrease in survival)

Share
Picture of Gaurav Kumar

Gaurav Kumar

An Engineering graduate with Master's degree in Data Science and having expertise in Machine learning, Deep learning and Data visualization. Other than work, you can find me as a fun-loving person with hobbies such as reading , music and sports.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.