SAS vs R vs Python, this for many is not even a right question, especially when all three do an excellent job on what they are set out to do. Many years ago we had seen similar debates on Mac vs Windows vs Linux, and in the present world we know that there is a place for all three.
However, for budding professionals, who are looking forward to build their career into Data Science and organizations that are building advance analytics capabilities, this is an important question staring at them. Here we will like to discuss some of the important aspects like ease of learning, data management capabilities, advance modeling, graphical capabilities, big data applications and cost-effectiveness.
Cost, Upgrades and Support
This is one area where R and Python have an upper hand given the fact that they are open-source and this has been a key factor for their phenomenal rise of usage.
On the other hand, SAS is expensive licensed software, but has an excellent support, especially in critical areas and delicate scenarios where there is no room for experimentation, SAS has proven itself to be indispensable. However, given the cost, many of the times SAS is out-of-bounds, for many small organizations, especially for startups.
Although, most versions of R and Python don’t have any support systems, and these come with absolutely no warranty, a silver lining lies in its vibrant user community. This is a rapidly evolving community that comprises of people from various walks of life - academicians, students, programmers, analysts - every brain contributing and troubleshooting, and so, adding to the improvements to the R and Python's APIs via Packages and Libraries.
Given the considerable cost advantage, small-to-mid sized companies prefer to go with R or Python. R is preferred by companies which are primarily focused on advanced analytics and pretty much become a lingua franca for Data Science. On the other side, Python is preferred by tech companies where they need end-to-end integration and develop analytics based applications, leveraging analytics friendly libraries.
This is considered as one of the key reasons for SAS being eclipsed by R and Python and we expect this trend to continue.
SAS is not difficult to learn, infact across all 3, it is perhaps the easiest to learn and can be picked up by anyone without prior knowledge in programming. The ability to parse SQL codes, combined with macros and other native flavors makes learning SAS a child's play for individuals with basic SQL know how. Overall learning difficulty is low-to-medium
R has a rather steeper learning curve compared to SAS and Python. Being a low-level programming language, it calls for proficiency and some programming orientation. And if not handled tactfully, even a minor task becomes Herculean and may require complex lines of code.
Overall learning difficulty is medium-to-high
Even though it is originally a scripting language, but Python gets full marks on the simplicity and flexibility it gives to the users along with the intuitive syntax it provides. Given this, and it’s analytics friendly libraries, Python has become a new phenomenon in the data analytics toolkit. While it doesn’t take time to pick Python, it does take you can allocate it some time to master.
Overall learning difficulty is medium
Data Handling and Management
When it comes to data handling and management on standalone systems, SAS is safe, smooth and better.
Today as our data sizes are increasing, we have to be careful about memory allocation principles of languages and software that we use. R here comes with a major disadvantage that it works only on RAM creating a big problem since small exercises will also take time to run depending on your machine’s RAM. However, working on data manipulation otherwise is easier, especially with packages like Plyr and DPlyr.
This absolutely is not a problem that we face in Python. With extensions like Panda and NumPy, among many others, data handling and basic analysis works like a breeze in Python.
However, on the parameter of data manipulation, all three of SAS, R and Python fare equally well.
Graphical and Visualization Utilities
When it comes to data science, data visualization and graphical capabilities is an important aspect to better understand the data. R for this case wins hands-down with packages like GGPlot, Lattice, GGVIS, RGIS etc. Though Python has good graphical capabilities with packages like Matplotlib, VisPy, but relatively in comparison to R, they are still labyrinthine.
In recent times, Base SAS has worked on improvising its graphical capabilities, the options available are still not able to match up with those available in R and Python. Also, the graph packages of SAS are not well documented as compared to R and Python.
Hence, R definitely takes a lead here.
While SAS, R and Python come to the same footage when it comes to the standard statistical and modeling capabilities, in today’s world where we talk about advanced algorithms like machine-learning, and more nuanced options, R and Python outpace SAS hands-down.
Considering that R was designed to make statistician’s lives easy, it does have some field specific advantages with more than 7500 contributed packages and the list keeps growing on a daily basis. This is done through CRAN - Comprehensive R Archive Network. Owing to this ever active and enterprising community of R, number of the most recent techniques and experimental programs are available in R but not in SAS.
With regards to Python, an incredible number of libraries like NumPy, Pandas, SciPy, Scikit Learn, Matplotlib, there is no strong reason to rate it less than R.
Thus, as the advancements in the Data Science field are increasing, SAS needs quick catching up to do if it wants to continue in the game for the long run.
Big Data Applications
When it comes to Big Data, most of the organizations look for end-to-end applications rather than ad-hoc or standalone analysis. This is where Python steals the thunder over R and SAS. This is also evident from the fact that apart from Scala and JAVA, Python is the only other language that Hadoop-Spark supports.
R, like Python, also integrates very well with Hadoop and also offers great parallelization capabilities and large-scale machine learning capabilities for analytics.
In recent years, SAS has also come up with options to run analytics inside Hadoop (in-memory) without moving the data out of cluster, but given the flexibility with open-source platforms, R and Python stay the first preference for Data science professionals. However, as the world is evolving, SAS needs to move at a faster pace to stay relevant.
Overall while R and Python score over SAS here, Python has a slight edge over R since it is a great platform to develop applications and incorporate into production environment.
Conclusion: Opportunities and Future Outlook
While at one time SAS skills used to be the passport to enter the world of analytics, it is slowly losing in the game and doesn’t command the same position anymore. Open-source has been gaining strong ground for the last couple of years and the trend is expected to stay same.
For analytics aspirants, it has become very important to add atleast one open-source skill along with SAS in their quiver. In last one year, we have particularly seen candidates with R skills along with SAS knowledge gaining an upper hand in job opportunities and also command better salaries targeting wider variety of jobs. And for aspirants coming from technical background, Python offers them great opportunities to blend their existing experience and make headway into Big Data analytics.
It is difficult to make a conclusive argument about these three technology platforms, because selection among these technologies would depend upon parameters like nature of industry, strength of user community, flexibility in terms of usage and integration, future outlook, etc. But most of our clients, even the BFSI ones where we thought that SAS will be indispensable, are introducing R and Python in different pockets of their organizations to leverage the benefits that open-source has to offer and we definitely see a future where SAS will co-exist, maybe not on equal footing, with the new technologies like R and Python.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad