Now Reading
Quick Guide To Survival Analysis Using Kaplan Meier Curve (With Python Code)

Quick Guide To Survival Analysis Using Kaplan Meier Curve (With Python Code)

Survival_analysis

Today, with the advancement in technology, Survival analysis is frequently used in the pharmaceutical sector. It analyses a given dataset in a characterised time length before another event happens. The Kaplan Meier estimator is an estimator used in survival analysis by using the lifetime data. In medical research, it is frequently used to gauge the part of patients living for a specific measure of time after treatment.

Here, we will implement the survival analysis using the Kaplan Meier Estimate to predict whether or not the patient will survive for at least one year.

Register for our upcoming Masterclass>>

About the dataset

The dataset can be downloaded from the following link. It gives the details of the patient’s heart attack and condition.

Code Implementation

Install all the libraries required for this project.

pip install lifelines
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics
from sklearn.impute import SimpleImputer
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
from scipy import stats

Reading the Data

df = pd.read_csv("echocardiogram.csv")
df.head()
read_data

Data Pre-Processing

Let us check for missing values and impute them with mean values.

Looking for a job change? Let us help you.
mean = SimpleImputer(missing_values = np.nan, strategy = 'mean')
Columns = ['age', 'pericardialeffusion', 'fractionalshortening', 'epss', 'lvdd', 'wallmotion-score']
X = mean.fit_transform(df[Columns])
df_X = pd.DataFrame(X,
                    columns = Columns)
keep = ['survival', 'alive']
df_keepcolumn = df[keep]
df = pd.concat([df_keepcolumn, df_X], axis = 1)
df = df.dropna() 
print(df.isnull().sum())
print(df.shape)
preprocessed_data

Create a new column  

df.loc[df.alive == 1, 'dead'] = 0
df.loc[df.alive == 0, 'dead'] = 1
df.groupby('dead').count()
new_column

Kaplan Meier Curve

kmf = KaplanMeierFitter()
X= df['survival']
Y = df['dead']
kmf.fit(X, event_observed = Y)
kmf.plot()
plt.title("Kaplan Meier estimates")
plt.xlabel("Month after heart attack")
plt.ylabel("Survival")
plt.show()
kaplan_curve

From the plot we can see that the survival rate decreases with the increase in the number of months.The Kaplan estimate is 1 for the initial days following the heart treatment.It gradually decreases to around 0.05 after 50 months.

print("The median survival time :",kmf.median_survival_time_)

The average survival time of patients is 29 months.Given below is the KM_estimate that gives the probability of survival after the treatment.

print(kmf.survival_function_)
age_group = df['age'] < statistics.median(df['age'])
ax = plt.subplot(111)
kmf.fit(X[age_group], event_observed = Y[age_group], label = 'below 62')
kmf.plot(ax = ax)
kmf.fit(X[~age_group], event_observed = Y[~age_group], label = 'above 62')
kmf.plot(ax = ax)
plt.title("Kaplan Meier estimates by age group")
plt.xlabel("Month after heart attack")
plt.ylabel("Survival")
Kaplan_meier_estimator

Kaplan Meier Curve Using Wallmotion Score

As we can see that the difference between the age groups is less in the previous step, it is good to analyse our data using the wallmotion-score group.The Kaplan estimate for age group below 62 is higher for 24 months after the heart condition. After it, the survival rate is similar to the age group above 62.

score_group = df['wallmotion-score'] < statistics.median(df['wallmotion-score'])
ax = plt.subplot(111)
kmf.fit(X[score_group], event_observed = Y[score_group], label = 'Low score')
kmf.plot(ax = ax)
kmf.fit(X[~score_group], event_observed = Y[~score_group], label = 'High score')
kmf.plot(ax = ax)
plt.title("Kaplan Meier estimates by wallmotion-score group")
plt.xlabel("Month after heart attack")
plt.ylabel("Survival")
Kaplan_wallmotion

Conclusion

In this article, we have discussed the survival analysis using the Kaplan Meier Estimate. It also helps us to determine distributions given the Kaplan survival plots. Further, we researched on the survival rate of different age groups after following the heart treatment. Finally, it is advisable to look into survival analysis in detail.

What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top