Why should data engineers learn Scala?

Though not widely popular, Scala can be a good bet for data engineers to get well-versed in.

The data engineer builds the framework for the business’s data analytics pipeline—clearly, a key position in the space. For a data engineer to excel, a solid grip over programming languages like Python and Java is a must, along with a core understanding of data structures, databases and business goals. Python has emerged as the most in-demand language to learn recently.

Another programming language that is often not highlighted while talking about data engineering is Scala. Though it has become popular in the recent past, it does not occupy the same level of importance widely as other popular languages do. But is Scala beneficial to a data engineer? And, should data engineers really spend their time learning Scala?

What is Scala?

Scala supports functional as well as object-oriented programming. Its static types help avoid bugs in complex applications, and its Java Virtual Machine (JVM) and JavaScript runtimes help build high-performance systems.

Scala 2.13.7 was made available just a few months ago. The chief highlights of this release were its support for Scala 3.1 in TASTy reader, support for JDK 16 record syntax in Java sources, and improved Android compatibility.

Advantages of using Scala

Scala does come with certain advantages that have seen its adaptability in big names in the tech space. 

  • As Scala runs on the JVM, Java and Scala stacks can be mixed for seamless integration.
  • It uses data-parallel operations on collections, actors for concurrency and distribution, and futures for asynchronous programming.
  • We can mix multiple traits into a class in Scala to combine their interface and their behaviour.
  • Structural data types are represented through case classes in Scala.
  • The type system of Scala supports generic classes, variance annotations, abstract type members, compound types and more.
  • Scala has a simple structure which makes it suitable for big data processors. 
  • The Scala Library Index (Scaladex) is a representation of a map of all published Scala libraries. A developer can query more than 175,000 releases of Scala libraries.

Why should a data engineer go for it?

In a YouTube video, Zach Wilson, tech lead at Airbnb, points out some important reasons why learning Scala is important and how it can help data engineers in their careers. Read some of them here:

  • Many big tech companies like Netflix and Airbnb have a strong bet on Scala, and they write a lot of pipelines in it, indicating they will have a strong need for data engineers who know Scala.
  • Scala is a type-safe language, whereas Python is not. The type-safety provides an extra layer of protection.
  • Spark is native in Scala. Writing Spark jobs in Scala is the native way of writing it. 
  • Scala allows data engineers to adopt a software engineering mindset. You are not just writing an SQL pipeline—you have to think about unit testing, integration testing, continuous integration, and similar points.

Still not widely adopted

Wilson, in the same video, also points out certain reasons why learning Scala may not be beneficial to a data engineer. 

  • Scala is difficult to learn.
  • It is not widely adopted. While looking for a data engineering job, approximately 10% of the jobs need the knowledge of Scala as a requirement. If one is not applying to those jobs, it becomes pointless to learn Scala.

In the end, it depends on the data engineer’s needs and career goals. If they want to build their career in companies that largely use Scala, it would make sense to learn the language well. If they want to build a software engineering mind frame that can help them solve analytical problems in the future, learning Scala is a good bet.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM