Most technical blogs and books teach you there are basically three types of people in emerging tech, specifically data analytics:
- Those who can code well
- Those who know the business well
- Those who know both coding and business
The people who are most in demand are the third type but they are also the rarest.
Throughout my career, I have focused on becoming a Type 3 analyst. I don’t prefer to write complex codes but I do want to be someone who can bring out interesting insights from the data. I always focused on becoming someone who can link the data back to the problem statement and present insights which can help make a decision. I always focused on becoming someone who can link the data back to the problem statement and present insights which can help make a decision.
Majority of the work an analyst does will be on statistical software such as SAS, Python, R etc., excels, PowerPoints and so on.
Below is my personal list of 6 things which has helped me be a good analyst in my 12 years of analytics career:
- Always Quality-Check Your Work: This the most important activity you will do when performing any data analysis. This includes making sure there are no errors, warnings in your code, excel formulas calculated correctly, no spelling mistakes, logical errors etc. In my first job, we used to have one person who will write the SAS code for analysis and then there will be another person who will QA that SAS code with the analyst. Remember a line of codes is going to extract some relevant information in the form of data which is then going to help you make some business decisions, so it’s very important that the line of code is checked thoroughly.
- Always Comment Your Code: Always write some description of what your code is doing so that if anyone tries to run your program in your absence they know what each line of code is going to do. It’s helpful even when you have to revisit your program in future for any changes.
- Do A Proper Exploratory Data Analysis Of Your Data: For numerical/continuous variables always check a number of missing observations, average, standard deviation, min and maximum values. Such information helps to identify anomalies in the data at an early stage. For categorical variables, check distribution by category. By looking at various categories of a categorical variable, we know if any cleanup is required in terms of combining any of the categories to get a more even distribution for that variable if it has values of all required categories etc. In model development, this is one of the main steps which is crucial for variable pre-processing.
- Look At The Record Level To Validate If A Variable Has Been Created Correctly: Let’s say you have the below data and you want to create a variable Income_Flag which takes value 1 if a customer has income more than 1500. After coding, you now have Cust IDs 1,5 tagged as 0 and rest as 1. In order to see if variable has been created correctly, you can pick any customer ID randomly (this is a data with just 5 observations but in real life we deal with datasets with huge number of observations and sometimes we are interested in a set of customers) and check if the values of Income_Flag variable have been tagged correctly. This is a very effective technique and it’s extremely useful when a variable is created using the sum of a number of variables.
- Format Output Well: Colouring, text size, table formatting, presentation formatting, charts, legends etc. People might not remember what you said on a particular slide but they will always remember a badly formatted graph with uneven axis values, unformatted text and poor choice of font. If you have done some brilliant analysis which saves your organisation millions in cost each year but you don’t put it in a nicer format which can be presented to senior management in a language which is easily understood by them, trust me all your handwork is going to be a waste. Always pay attention to choosing the right kind of graph, table format, font, a colour combination which will help make your output/insights visually attractive.
- Remember What’s Going To Be An Approximate Value Of Your Business Portfolio Mix: This is in terms of products, geographies, etc. which can help you with QA. For example, if you have been asked to fetch the total number of customers in a particular product category for your organisation and your code fetches a number which is 20% more than the usual number. Anyone who can get it confirmed that this number is off the limit will surely won’t make any mistakes in the analysis.
If there is anything else apart from the above (and I am sure there is), please share. I know most of the stuff is common sense but in our ever-busy lives of an analyst, we forget to do the basics and end up creating a mess sometimes. However, if we stick to the basics and always do the right thing, I am sure it will leave us with some free time which we can use the way we want to.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad