MITB Banner

Quick Guide To Survival Analysis Using Kaplan Meier Curve (With Python Code)

The Kaplan–Meier estimator is an estimator used in survival analysis by using the lifetime data. In medical research, it is frequently used to gauge the part of patients living for a specific measure of time after treatment.
Share
Survival_analysis

Today, with the advancement in technology, Survival analysis is frequently used in the pharmaceutical sector. It analyses a given dataset in a characterised time length before another event happens. The Kaplan Meier estimator is an estimator used in survival analysis by using the lifetime data. In medical research, it is frequently used to gauge the part of patients living for a specific measure of time after treatment.

Here, we will implement the survival analysis using the Kaplan Meier Estimate to predict whether or not the patient will survive for at least one year.

About the dataset

The dataset can be downloaded from the following link. It gives the details of the patient’s heart attack and condition.

Code Implementation

Install all the libraries required for this project.

pip install lifelines
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics
from sklearn.impute import SimpleImputer
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
from scipy import stats

Reading the Data

df = pd.read_csv("echocardiogram.csv")
df.head()
read_data

Data Pre-Processing

Let us check for missing values and impute them with mean values.

mean = SimpleImputer(missing_values = np.nan, strategy = 'mean')
Columns = ['age', 'pericardialeffusion', 'fractionalshortening', 'epss', 'lvdd', 'wallmotion-score']
X = mean.fit_transform(df[Columns])
df_X = pd.DataFrame(X,
                    columns = Columns)
keep = ['survival', 'alive']
df_keepcolumn = df[keep]
df = pd.concat([df_keepcolumn, df_X], axis = 1)
df = df.dropna() 
print(df.isnull().sum())
print(df.shape)
preprocessed_data

Create a new column  

df.loc[df.alive == 1, 'dead'] = 0
df.loc[df.alive == 0, 'dead'] = 1
df.groupby('dead').count()
new_column

Kaplan Meier Curve

kmf = KaplanMeierFitter()
X= df['survival']
Y = df['dead']
kmf.fit(X, event_observed = Y)
kmf.plot()
plt.title("Kaplan Meier estimates")
plt.xlabel("Month after heart attack")
plt.ylabel("Survival")
plt.show()
kaplan_curve

From the plot we can see that the survival rate decreases with the increase in the number of months.The Kaplan estimate is 1 for the initial days following the heart treatment.It gradually decreases to around 0.05 after 50 months.

print("The median survival time :",kmf.median_survival_time_)

The average survival time of patients is 29 months.Given below is the KM_estimate that gives the probability of survival after the treatment.

print(kmf.survival_function_)
age_group = df['age'] < statistics.median(df['age'])
ax = plt.subplot(111)
kmf.fit(X[age_group], event_observed = Y[age_group], label = 'below 62')
kmf.plot(ax = ax)
kmf.fit(X[~age_group], event_observed = Y[~age_group], label = 'above 62')
kmf.plot(ax = ax)
plt.title("Kaplan Meier estimates by age group")
plt.xlabel("Month after heart attack")
plt.ylabel("Survival")
Kaplan_meier_estimator

Kaplan Meier Curve Using Wallmotion Score

As we can see that the difference between the age groups is less in the previous step, it is good to analyse our data using the wallmotion-score group.The Kaplan estimate for age group below 62 is higher for 24 months after the heart condition. After it, the survival rate is similar to the age group above 62.

score_group = df['wallmotion-score'] < statistics.median(df['wallmotion-score'])
ax = plt.subplot(111)
kmf.fit(X[score_group], event_observed = Y[score_group], label = 'Low score')
kmf.plot(ax = ax)
kmf.fit(X[~score_group], event_observed = Y[~score_group], label = 'High score')
kmf.plot(ax = ax)
plt.title("Kaplan Meier estimates by wallmotion-score group")
plt.xlabel("Month after heart attack")
plt.ylabel("Survival")
Kaplan_wallmotion

Conclusion

In this article, we have discussed the survival analysis using the Kaplan Meier Estimate. It also helps us to determine distributions given the Kaplan survival plots. Further, we researched on the survival rate of different age groups after following the heart treatment. Finally, it is advisable to look into survival analysis in detail.

PS: The story was written using a keyboard.
Picture of Ankit Das

Ankit Das

A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed