The data engineer builds the framework for the business’s data analytics pipeline—clearly, a key position in the space. For a data engineer to excel, a solid grip over programming languages like Python and Java is a must, along with a core understanding of data structures, databases and business goals. Python has emerged as the most in-demand language to learn recently.
Another programming language that is often not highlighted while talking about data engineering is Scala. Though it has become popular in the recent past, it does not occupy the same level of importance widely as other popular languages do. But is Scala beneficial to a data engineer? And, should data engineers really spend their time learning Scala?
What is Scala?
Scala 2.13.7 was made available just a few months ago. The chief highlights of this release were its support for Scala 3.1 in TASTy reader, support for JDK 16 record syntax in Java sources, and improved Android compatibility.
Advantages of using Scala
Scala does come with certain advantages that have seen its adaptability in big names in the tech space.
- As Scala runs on the JVM, Java and Scala stacks can be mixed for seamless integration.
- It uses data-parallel operations on collections, actors for concurrency and distribution, and futures for asynchronous programming.
- We can mix multiple traits into a class in Scala to combine their interface and their behaviour.
- Structural data types are represented through case classes in Scala.
- The type system of Scala supports generic classes, variance annotations, abstract type members, compound types and more.
- Scala has a simple structure which makes it suitable for big data processors.
- The Scala Library Index (Scaladex) is a representation of a map of all published Scala libraries. A developer can query more than 175,000 releases of Scala libraries.
Why should a data engineer go for it?
In a YouTube video, Zach Wilson, tech lead at Airbnb, points out some important reasons why learning Scala is important and how it can help data engineers in their careers. Read some of them here:
- Many big tech companies like Netflix and Airbnb have a strong bet on Scala, and they write a lot of pipelines in it, indicating they will have a strong need for data engineers who know Scala.
- Scala is a type-safe language, whereas Python is not. The type-safety provides an extra layer of protection.
- Spark is native in Scala. Writing Spark jobs in Scala is the native way of writing it.
- Scala allows data engineers to adopt a software engineering mindset. You are not just writing an SQL pipeline—you have to think about unit testing, integration testing, continuous integration, and similar points.
Still not widely adopted
Wilson, in the same video, also points out certain reasons why learning Scala may not be beneficial to a data engineer.
- Scala is difficult to learn.
- It is not widely adopted. While looking for a data engineering job, approximately 10% of the jobs need the knowledge of Scala as a requirement. If one is not applying to those jobs, it becomes pointless to learn Scala.
In the end, it depends on the data engineer’s needs and career goals. If they want to build their career in companies that largely use Scala, it would make sense to learn the language well. If they want to build a software engineering mind frame that can help them solve analytical problems in the future, learning Scala is a good bet.