While COVID-19 pandemic has had a huge impact on people, function and process in innumerable ways, it has brought about an acceleration in the adoption of digital transformation across business and social sectors. The industry needs to rapidly ramp up on skills required to manage this rapid digitalisation. One of these critical skills is Data Engineering – in fact the DICE report of 2020 has labelled Data Engineering (DE) as the fastest-growing tech job with a 45% year-on-year growth.
The pioneers of formal Business Analytics/ Data Science education in India, Praxis Business School, are launching a 9-month full-time post-graduate program in Data Engineering to address the business need for people with these skills. This course by Praxis is supported by industry giants Genpact and LatentView, who are providing industry inputs and know-how to strengthen the program.
Analytics India Magazine caught up with Anandi Thyagarajan of Genpact, Ramesh Hariharan of LatentView and Charanpreet Singh of Praxis Business School to understand the subject of Data Engineering and details about the program that Praxis is launching.
Sign up for your weekly dose of what's up in emerging technology.
Anandi Thyagarajan is the Vice President, Global Analytics, AI & Risk Operations Leader, Banking & Capital Markets at Genpact. She has more than 22 years of cross-functional experience in Banking and Financial Services across global organisations and geographies. At Genpact, she drives the advanced analytics, journey mapping, machine learning, artificial intelligence, and robotics capabilities to build an operating model for banks that is geared to deliver transformational impact on business outcomes. Apart from work, she is a certified master black belt, a design thinking practitioner and a fitness enthusiast.
Ramesh Hariharan is the co-founder of LatentView and currently heads the data and technology practice. With deep expertise in building large scale data ecosystems for advanced analytics and real-time analytics, he has been instrumental in driving and delivering on crucial transformation initiatives for clients across diverse domains. Apart from work, he is an avid reader and likes to keep himself updated on the latest technology trends.
Prof. Charanpreet Singh is the founder and director of Praxis Business School Foundation. He has been a part of the corporate world for 20 years in industries as varied as Cryogenics, Steel, International Trade, Consulting and IT with organisations such as British Oxygen, Tata Steel, PwC and Compaq-HP. At HP he was Country Manager, Marketing for SMB when he decided to switch to his first passion, academics, and embarked on a mission to set up a Business School of the highest quality in the country. Charanpreet believes in creating an academic environment that encourages learning through doing, discussion and debate, and envisions Praxis as a centre of excellence for the delivery of education in skills required for the digital world.
Compared to roles like Data Science and Business Analytics, Data Engineering is a comparatively newer profile. Can you tell me a little about what a Data Engineer really does? How is it different from Data Engineering and why is there such a buzz around this domain?
Anandi: With the advent of artificial intelligence and machine learning in business, profiles such as data scientist, business/data analysts, and data engineers are now emerging into individual specialised skill sets as described below:
– Data Scientists: develop and use advanced analytical techniques such as neural networks, decision trees, clustering, etc., to derive business insights from data. They have deep expertise in artificial intelligence, machine learning, advanced statistics, and data handling.
– Business/Data Analysts: translate quantitative data into a form that can be understood even by non-technical employees across the enterprise. They have expertise in programming languages such as Python or R, statistical tools such as Excel, SQL, SAS, data handling fundamentals, business reporting, and modelling.
– Data Engineers: are responsible for data plumbing, integration, cleansing, pairing, and data preparation for operational or analytical purposes. They have expertise in the construction, development, maintenance of data architecture, and big data. Good data engineering is a critical element of any advanced analytics team to design, build, and continuously improve analytics infrastructure.
The buzz around data engineering is based on the critical role it plays in real-time analytics and the massive gap between the growing demand for these skills and the supply of trained professionals.
Ramesh: Data engineering has always been around. We see data engineering as a complementary and essential component of analytics platforms. In the last decade, there has been great value-driven by analytics. As organisations are going digital, engineering data platforms would be the right step to ensure that there is enough data capital for enterprises to tap into customer intent, provide competitive intelligence and recommend for upsell/cross-sell. The buzz is becoming evident as the need is far greater and clearer than ever before. Without engineering a reliable and robust data platform, there will be no way to scale up analytics!
Charanpreet: The way we describe this to students is that Data Engineers are focused on building infrastructure for data generation, whereas, Data Scientists focus on advanced techniques using statistics, machine learning, and deep learning to draw insights from that generated data. So in a sense, data scientists are the internal clients of data engineers. While this is clear to practitioners, students and aspirants need to be educated so that they understand which of these roles suits them better.
What are the top 5 things a Data Engineer needs to be good at?
Anandi: We at Genpact look for candidates that not only have the ability to maintain and govern various types of data across various infrastructures but can also ensure that data quality levels are maintained. Thus, data engineers should aim at being proficient in handling big data and comprehending the domain relevance of data. Some of the key skill sets required in data engineering are:
– Ability to integrate with diverse APIs, understanding of multiple programming languages such as Java, SQL, SAS, Python, proficiency in data handling frameworks such as Hadoop, MapReduce, Pig, Hive, Apache Spark, NoSQL, and Data Streaming.
– Knowledge of Python, MATLAB, R, and familiarity with distributed storage and processing tools like Hadoop or Spark is a big plus.
– Ability to devise and implement processes that improve data reliability, efficiency, and quality, and identify, acquire, and assimilate new data for processing by data scientists.
– Understanding of domain-specific regulatory directives (HIPAA for Healthcare) and geography-specific compliance guidelines (GDPR for EU region) on the utilisation of Personally Identifiable Information (PII) and other sensitive data.
– Ability to partner with data scientists and domain experts to ensure feature selection leads to the development of responsible and ethical AI/ML models.
Ramesh: Engineering & Data, obviously! While Anandi has explained the skills in a comprehensive manner, some of the on-ground tools, techniques, technologies, and concepts that Data Engineers need to be good at are:
– Scripting (Shell, Python, Scala, Go)
– Logic (Data Structures & Algorithms)
– Pipelines (Airflow, Nifi, etc.)
– Platforms & Tools (Cloud: AWS, Azure, GCP, etc.), Distributed Systems (Spark, Kafka), Data Warehouses, Containers, etc.)
– Concepts (Data Warehousing, Data Lake, Streaming, Observability)
Charanpreet: In addition to having a thorough knowledge of the tools and technologies, I feel that to be a successful data engineer, you need to be good at traits such as desire and ability to learn, critical thinking, creative thinking, problem-solving, communication, collaboration, and ethical judgment. We can call them soft skills — or fundamental traits — that go a long way in ensuring that you have a sustainable career in a rapidly evolving, complex, multi-layered role like data engineering.
How is your program structured? Does one really require a full-time 9-month program for learning and getting a job in DE?
Charanpreet: Let me answer your second question first. Praxis was the first institute in the country to offer a formal, full-time, one-year program in Business Analytics. It will be the first to offer a full-time program in Data Engineering as well. This is driven by the belief that subjects like data science, analytics, data engineering are complex, and for a sustainable career in these fields, a well-structured, full-time, in-class immersive program is the right option. You invest a period of nine months for a career that will give you handsome returns for the next 20-30 years.
The Praxis program is designed to arm students with the concepts, tools, techniques, and technologies that enable a seamless absorption into the world of data engineering. We offer students a comprehensive experience and a deep-dive that ensures extensive coverage, rigour, and hands-on lab work to equip them with the know-how of existing tools and technologies for Data Management & Data Modelling. It also introduces them to the paradigms of Distributed Systems and Cloud Computing. The participants will get to work on a Capstone project that sees them taking data from a legacy system and migrating it to a big-data platform hosted on the Cloud.
The program is structured into three trimesters covering, respectively, modules on:
– Working with traditional data
– Engineering platforms for Big Data
– Running Enterprise Business on Cloud
Two important features of the program are the incredible industry partners we have been able to find in Genpact and LatentView, and the comprehensive Campus Placement Program we have in place for our students.
Why did you decide to partner with Praxis for this program? How are you adding value to the course?
Anandi: At Genpact, we believe that talent is one of the critical factors for success. As the adoption rate of augmented intelligence-led solutions is skyrocketing, data engineers’ demand is increasing exponentially. To strengthen our talent tool of data engineers, we leverage our partnerships with premier institutions like Praxis.
Through this partnership, we aim at nurturing a new generation of students who can apply their knowledge on data engineering, big data management, and data governance to solve real-world business challenges. Praxis can leverage Genpact as a centre of excellence for AI/ML and advanced analytics, allowing students to gain valuable industry experience and making their curriculum more industry-centric.
Ramesh: We believe Praxis provides a strong ecosystem for developing an engineering and analytics mindset. We recruit from Praxis for our data science requirements and believe that Praxis has the capability to create a sound program in data engineering.
LatentView has decided to be an academic partner for the Praxis data engineering program and provide the necessary support to bridge the gap between academia and industry.
Who should seek a career in Data Engineering? What does the future look like for someone who embarks on it at this time?
Anandi: One of the recent challenges in the AI/ML field has been – “Why do 87% of data science projects never make it into production?” (Source). This has shifted the focus on how to enable seamless operationalisation of AI/ML models in any production environment (on-premise, cloud-enabled, or hybrid). This means data engineers have great opportunities — from taking up pivotal roles like ML engineers and ML architects to orchestrating the availability and accessibility of appropriate data for data scientists. Some of the other roles include ensuring that AI/ML algorithms’ codes follow software engineering principles and defining the overall architecture strategy for operationalising and maintaining AI/ML models.
Further, as the demand for data engineers is going up, they should also gain expertise in handling big data on cloud-based technologies like AWS, Azure, Google Cloud Platform, Cloudera, and others. Advanced technologies will play a much more significant role for data engineers in the years to come.
Ramesh: Anyone interested in building data platforms or instrument workflows and decision making can seek out a career in data engineering. There is a pressing need to revisit the state of data platforms. We see enterprises at all levels taking serious initiatives to re-architect and modernise their data ecosystems to become nimbler and more scalable. This is an excellent opportunity to contribute in the coming years!
Charanpreet: For us, a good aspirant is a graduate who has a tech background (academic and/or workplace-related) or a keen interest in tech, has the curiosity and keenness to learn new things, likes to solve problems, and is serious about a career in the data field. Our ecosystem and the rigour of the program will take care of the rest. Our objective is to make the student capable of transitioning successfully into the data engineering domain.
Any last words of advice to the aspirants?
Anandi: Analytics-focused enterprises need to have the right kind of talent – ones who can innovate business models and set up a data strategy and infrastructure. The rapid pace of change in the artificial intelligence and advanced analytics world calls for a culture of continuous learning from data engineers. This will enable data engineers to remain up to date with cutting-edge technologies in their field, understand domain-specific trends, and stay cognizant of best practices for developing data infrastructures. A curiosity to know how things work and improve them is an essential trait for anyone aspiring to be a data engineer.
Ramesh: Algorithms without accurate data won’t help in decision-making. Even worse, this could lead to bad decision making. It is essential to acknowledge that and ensure the criticality of future roles as data engineering leaders that you would be playing to enable faster and smarter decisions.
Charanpreet: If you like technology, are fascinated by how artificial intelligence and analytics are changing the world, and would like to be a part of the team that makes all this happen — look seriously at a career in data engineering. Invest your time in getting trained in a structured manner in skills that are in great demand and be an architect of the change happening all around us.