In the past we have seen many visualization tools like PowerBI, Tableau, Salesforce, Splunk, etc. and lots of libraries like matplotlib, plotly, ggplot, bamboolib, etc., but how many of us have seen a code editor helping us with visualizations without having to code? Interesting right?
This is possible by using the SandDance extension in Visual Studio Code. SandDance is an extension that helps us visualize our data, drill down by filtering and it can also generate 3d graphs by a single click. Let us see how we can have to get started with SandDance on Visual Studio Code.
This article will cover:
- Data transformations
- Explanation of the dataset
- Loading dataset and viewing it with SandDance
- Visualizations and insights
- Visual Studio Code
- Python extension in Visual Studio Code
- SandDance extension in Visual Studio Code
- Titanic dataset
- Survived (0 = No, 1 = Yes)
- PClass (1 = 1st class, 2 = 2nd class, 3 = 3rd class)
Explanation of the dataset
- PassengerId – unique id for every passenger
- Survived – did the passenger survive after the accident of not (Yes/No)
- PClass – information on the passenger class (1st, 2nd or 3rd class)
- Name – name of the passengers
- Sex – gender of the passenger
- Age – age
- SibSp – number of siblings/spouses aboard
- Parch – number of parents/children aboard
- Ticket number – ticket number
- Cabin number – cabin number
- Embarked – point of embarkation (C = Cherbourg, Q= Queenstown, S = Southampton)
Loading dataset and viewing it with SandDance:
- File -> Open file… -> Navigate & select titanic dataset
- Once the dataset is loaded, right click the dataset file, and look for “View in SandDance”.
Visualizations and insights:
When you view the dataset using SandDance, this is how it will look like.
Before we start with any visualizations, let’s what do all the icons on the page mean.
- From figure 1 we can understand that there were more men on the ship as compared to women but the thing to notice is the survival ratio of female was higher than men. Let’s dig deeper and see what else can be understood.
Figure 1: Column chart for Sex
By isolating the female column and dividing it further based on Passenger Class (PClass), from figure 2 we can see that about 50% of the women travelling on 3rd class died whereas most of the women travelling on 1st and 2nd class survived.
Figure 2: Column chart for females based on PClass
Now let’s add one more layer of detail with the select tab and highlighting females below the age of 18. Figure 3 shows that maximum number of females below the age of 18 were travelling in the 3rd class and their ratio of death & survival is similar.
Figure 3: Overview of females below 18
- Figure 4 shows us an overview of the passengers who boarded from Cherbourg, Queenstown and Southampton which is a result of faceting the column chart of sex by embarked. From figure 4 we can see that most of the passengers boarded the ship from Southampton and looking closely we could identify that around 75% of the men who boarded Titanic from Southampton, died. On the other hand, all most all men who boarded the ship from Queenstown died in the accident.
Figure 4: Faceting column chart of sex based on embarked
- Figure 5 gives us the information of the passenger class of the people who survived. Let’s take a closer look at each graph separately. The colors tell us the that in the 1st class most of the people embarked from Cherbourg and Southampton on the other hand the 2nd class is crowded by people from Southampton. The third class looks like a mix of people from all 3 locations
Figure 5: Column chart of sex faceted by PClass
- Observing figure 6 we can find some anomalies related to the fare that people have paid to get into different classes. Have a look at the region encircled in the figure. The passenger has paid a very low fare still he got into the first class. If you click on that cell and look up his name on the internet, you’ll come to know that he wasn’t satisfied with his ticket hence the crew upgraded him.
Figure 6: Tree map of PClass based on fare
- Figure 7 shows a 3d graph of people in the first class with the Z-axis as fare paid by people. It’s interesting to note that, on an average people who embarked from Cherbourg have paid more for the first-class ticket in comparison to the others.
Figure 7: 3d graph of the fare paid by 1st class people
EDA is a very crucial part of the data science pipeline and one should always use tools that provide a lot of functionality with less stress on coding. Better & quicker visualizations lead to efficient decision making. One of the major benefits of using SandDance is how easy it is to drill down to a focused view of every graph and the ability to isolate parts of the graphs for further analysis.