In order to analyse the expected duration of time until any event happens, i.e. it could be failure in the mechanical system or any death, the survival analysis comes in rescue to perform ‘Time to Event Analysis’.
Model fitting and method used:
# Install and Load the below packages
library(BDgraph)
library(survival)
data(churn) #Load the data
head(churn)
## State Account.Length Area.Code Int.l.Plan VMail.Plan VMail.Message Day.Mins
## 1 KS 128 415 no yes 25 265.1
## 2 OH 107 415 no yes 26 161.6
## 3 NJ 137 415 no no 0 243.4
## 4 OH 84 408 yes no 0 299.4
## 5 OK 75 415 yes no 0 166.7
## 6 AL 118 510 yes no 0 223.4
## Day.Calls Day.Charge Eve.Mins Eve.Calls Eve.Charge Night.Mins Night.Calls
## 1 110 45.07 197.4 99 16.78 244.7 91
## 2 123 27.47 195.5 103 16.62 254.4 103
## 3 114 41.38 121.2 110 10.30 162.6 104
## 4 71 50.90 61.9 88 5.26 196.9 89
## 5 113 28.34 148.3 122 12.61 186.9 121
## 6 98 37.98 220.6 101 18.75 203.9 118
## Night.Charge Intl.Mins Intl.Calls Intl.Charge CustServ.Calls Churn
## 1 11.01 10.0 3 2.70 1 False
## 2 11.45 13.7 3 3.70 1 False
## 3 7.32 12.2 5 3.29 0 False
## 4 8.86 6.6 7 1.78 2 False
## 5 8.41 10.1 3 2.73 3 False
## 6 9.18 6.3 6 1.70 0 False
dim(churn)
## [1] 3333 20
# Set up Surv() object
dat<-churn[,c(“Account.Length”,”Churn”)]
#Taking the Churn value as 0(False) and 1(True) for the event
dat$Churn<-as.numeric(churn$Churn)–1
survdat <-Surv(time=dat$Account.Length,event=dat$Churn)
# Fit the model to the data
fit<-survfit(survdat~1,se=TRUE)
plot(fit,main=”Survival Function Plot”,xlab=”Time”,ylab=”Survival Probability”)
Median time-to-churn
fit
## Call: survfit(formula = survdat ~ 1, se = TRUE)
##
## n events median 0.95LCL 0.95UCL
## 3333 483 201 188 NA
We can see from the fit output that the median time to churn is 201
Cox proportional hazards method
#Fit the Cox proportional hazards model
fitt<-coxph(Surv(time=dat$Account.Length,event = dat$Churn)~., data=churn)
summary(fitt)
## Call:
## coxph(formula = Surv(time = dat$Account.Length, event = dat$Churn) ~
## ., data = churn)
##
## n= 3333, number of events= 483
##
## coef exp(coef) se(coef) z Pr(>|z|)
## StateAL 3.727e-01 1.452e+00 6.815e-01 0.547 0.584509
## StateAR 7.263e-01 2.067e+00 6.578e-01 1.104 0.269536
## StateAZ 3.591e-01 1.432e+00 7.679e-01 0.468 0.640011
## StateCA 1.501e+00 4.484e+00 6.710e-01 2.236 0.025333 *
## StateCO 3.684e-01 1.445e+00 6.714e-01 0.549 0.583169
## StateCT 9.994e-01 2.717e+00 6.487e-01 1.541 0.123376
## StateDC 5.992e-01 1.821e+00 7.352e-01 0.815 0.415027
## StateDE 9.039e-01 2.469e+00 6.734e-01 1.342 0.179495
## StateFL 6.055e-01 1.832e+00 6.818e-01 0.888 0.374456
## StateGA 7.639e-01 2.147e+00 6.832e-01 1.118 0.263471
## StateHI -1.553e-01 8.562e-01 8.200e-01 -0.189 0.849830
## StateIA 6.270e-01 1.872e+00 8.220e-01 0.763 0.445642
## StateID 9.225e-01 2.516e+00 6.724e-01 1.372 0.170049
## StateIL 2.504e-01 1.285e+00 7.335e-01 0.341 0.732838
## StateIN 8.284e-01 2.290e+00 6.709e-01 1.235 0.216887
## StateKS 9.738e-01 2.648e+00 6.441e-01 1.512 0.130580
## StateKY 1.377e+00 3.965e+00 6.829e-01 2.017 0.043688 *
## StateLA 4.333e-01 1.542e+00 7.687e-01 0.564 0.572958
## StateMA 1.335e+00 3.802e+00 6.545e-01 2.040 0.041311 *
## StateMD 9.443e-01 2.571e+00 6.314e-01 1.496 0.134760
## StateME 1.257e+00 3.515e+00 6.487e-01 1.938 0.052649 .
## StateMI 1.252e+00 3.497e+00 6.344e-01 1.973 0.048456 *
## StateMN 5.042e-01 1.656e+00 6.498e-01 0.776 0.437808
## StateMO 4.839e-01 1.622e+00 6.955e-01 0.696 0.486547
## StateMS 1.210e+00 3.353e+00 6.414e-01 1.886 0.059262 .
## StateMT 1.612e+00 5.013e+00 6.418e-01 2.512 0.012010 *
## StateNC 1.687e-01 1.184e+00 6.618e-01 0.255 0.798794
## StateND 1.417e-01 1.152e+00 7.112e-01 0.199 0.842124
## StateNE 3.729e-01 1.452e+00 7.373e-01 0.506 0.613038
## StateNH 1.024e+00 2.784e+00 6.713e-01 1.525 0.127203
## StateNJ 1.453e+00 4.275e+00 6.297e-01 2.307 0.021041 *
## StateNM 3.356e-01 1.399e+00 7.127e-01 0.471 0.637719
## StateNV 1.187e+00 3.278e+00 6.396e-01 1.856 0.063450 .
## StateNY 7.184e-01 2.051e+00 6.377e-01 1.127 0.259918
## StateOH 8.039e-01 2.234e+00 6.629e-01 1.213 0.225212
## StateOK 6.394e-01 1.895e+00 6.744e-01 0.948 0.343108
## StateOR 7.061e-01 2.026e+00 6.564e-01 1.076 0.282024
## StatePA 1.153e+00 3.168e+00 6.791e-01 1.698 0.089500 .
## StateRI 2.414e-01 1.273e+00 7.100e-01 0.340 0.733853
## StateSC 1.166e+00 3.210e+00 6.472e-01 1.802 0.071541 .
## StateSD 7.118e-01 2.038e+00 6.808e-01 1.046 0.295734
## StateTN 8.526e-01 2.346e+00 7.342e-01 1.161 0.245514
## StateTX 1.101e+00 3.007e+00 6.295e-01 1.749 0.080331 .
## StateUT 8.669e-01 2.379e+00 6.628e-01 1.308 0.190896
## StateVA -3.732e-01 6.885e-01 7.339e-01 -0.508 0.611116
## StateVT 3.576e-01 1.430e+00 6.831e-01 0.524 0.600624
## StateWA 1.180e+00 3.255e+00 6.409e-01 1.842 0.065517 .
## StateWI 4.867e-01 1.627e+00 6.941e-01 0.701 0.483203
## StateWV 6.826e-01 1.979e+00 6.619e-01 1.031 0.302395
## StateWY 2.866e-01 1.332e+00 6.710e-01 0.427 0.669252
## Area.Code 4.925e-04 1.000e+00 1.105e-03 0.446 0.655668
## Int.l.Planyes 1.329e+00 3.777e+00 1.094e-01 12.152 < 2e-16 ***
## VMail.Planyes -2.304e+00 9.988e-02 5.165e-01 -4.460 8.19e-06 ***
## VMail.Message 4.801e-02 1.049e+00 1.590e-02 3.020 0.002527 **
## Day.Mins -3.636e-01 6.952e-01 2.727e+00 -0.133 0.893932
## Day.Calls 1.262e-03 1.001e+00 2.299e-03 0.549 0.582966
## Day.Charge 2.193e+00 8.959e+00 1.604e+01 0.137 0.891270
## Eve.Mins 3.242e-01 1.383e+00 1.415e+00 0.229 0.818755
## Eve.Calls 1.872e-03 1.002e+00 2.413e-03 0.776 0.437909
## Eve.Charge -3.759e+00 2.332e-02 1.665e+01 -0.226 0.821364
## Night.Mins -3.205e-01 7.258e-01 7.682e-01 -0.417 0.676546
## Night.Calls 1.513e-03 1.002e+00 2.457e-03 0.616 0.537914
## Night.Charge 7.186e+00 1.320e+03 1.707e+01 0.421 0.673777
## Intl.Mins -4.607e+00 9.980e-03 4.510e+00 -1.021 0.307019
## Intl.Calls -7.837e-02 9.246e-01 2.179e-02 -3.596 0.000323 ***
## Intl.Charge 1.724e+01 3.081e+07 1.670e+01 1.032 0.301906
## CustServ.Calls 3.318e-01 1.393e+00 2.947e-02 11.257 < 2e-16 ***
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
We can see from the output that these variables are significant at 95% level- 1)State 2)Int.l.Planyes 3)VMail.Planyes 4)VMail.Message 5)Intl.Calls 6)CustServ.Calls Therefore ,these variables will develop a parsimonious model.
Summarizing the results
The survival function starts at 1 and is going down with time.The estimated median time to churn is 201. We can see that the State, Int.l.Planyes,VMail.Planyes,VMail.Message,Intl.Calls and CustServ are significant. We know that if Hazard increases the survival function decreases and when Hazard decreases the survival function increases. From the output we can see that State, Int.l.Planyes, VMail.Planyes,VMail.Message,Intl.Calls and CustServ.
Calls are significant variables. These variables are significant at 95% level. We know that the exponential of coefficient gives the Hazard ratio. Therefore, Int.l.Planyes- This has Hazard ratio >1 so there is an increase in hazard and increased possibility of customer churn(decreased survival probability)
VMail.Planyes- This has Hazard ratio <1 so there is reduction in hazard and decreased possibility of customer churn(increased survival probability)
VMail.Message- This has Hazard ratio >1 so there is increase in hazard and increased risk of customer churn(decreased survival probability)
Intl.Calls – This has Hazard ratio <1 so there is reduction in hazard and decreased possibility of customer churn(increased survival probability)
CustServ.Calls – This has Hazard ratio>1 so there is increase in hazard and increased possibility of customer churn(decreased survival probability)
StateCA,StateKY,StateMA,StateMI,StateMT,StateNJ-they all have HR>1 so there is increase in Hazard and increased risk of customer churn.(decrease in survival)