In recent years, there are quite a few things that have happened within the Big Data Ecosystem. Firstly, the market got seriously crowded with big data infrastructure vendors. While open source data infrastructure (MongoDB, Apache Hadoop, Apache Kafka, etc.) emerged to provide a valid solution to big data as a problem statement, legacy data infrastructure providers (Oracle, IBM, HPE, Teradata) latched on this opportunity to provide more viable, managed offerings. These legacy players also addressed their top challenges to support new and growing requirements:
- Improved scalability
- Deeper integration with Hadoop and NoSQL
- Better performance
Secondly, this “crowding up” led to an obvious confusion among customers, amplified by the fact that big data is relatively a recent and complex technology. It’s difficult to ascertain which features really matter for certain business use cases.
The big data vendor landscape can broadly be divided into two kinds of players: the old guard (Oracle, HPE, Teradata, etc.) and the new entrants (Cloudera, Hortonworks, Pivotal, etc.).
It’s important to note here that given how new this technology is, the tradition database players (the old guard) moved very quickly to capture this area. Rightly so, as the demand for traditional database management solutions had plateaued and the advent of big data was seen as a disruption in this space. By our estimates, there are at least 50 players currently in market with enterprise-grade big data offerings.
While early days were marked by vendors adopting and offering built-to-use big data solutions, what we are currently seeing is an all-out battle between these key players. The winner can easily take it all for years to come.
Over 100,000 people subscribe to our newsletter.
See stories of Analytics and AI in your inbox.
We analysed a few big data vendors with an aim to find out what really is beneficial for customers and how each stack up. This is a qualitative comparison and the benefits of each solution might differ based on the customer’s specific use case, industry and other metrics.
Deployment model: Cloud…any cloud
Being available on cloud is a big feature. Most large vendors today have moved to cloud. Traditional players like Microsoft, Oracle, HPE Vertica and Amazon were quick to move to cloud, if not the new entrants.
What differentiates these cloud big data vendors today is being cloud agnostic. MS HDInsight is on Azure, AmazonEMR is on AWS. This is where HPE Vertica scores big.
The idea behind expanded cloud integration is to “enable users to run Vertica in the cloud, that is relevant to them, without being locked into a particular cloud.” The multi-cloud integration is in line with HPE’s strategy of helping customers accelerate their digital transformation. As part of the latest release, HPE Vertica 8 now supports Microsoft Azure Cloud and the new release also features expanded AWS support with access to S3.
On a global level, traditional database players have built custom hardware around commodity components, and core revenues are derived from huge maintenance costs. With a Draconian pricing model and the vendor lock-in period, these database giants’ supremacy is challenged by a cheaper, more cost-effective option.
Vertica has one of the most of moderately priced licenses, as compared to other players (which also require a great deal of effort in configuration). Apart from big licensing fees from other vendors, most deployments also need large teams to deploy and run. On the other hand, Vertica requires few specialists and configuration requirements can be handled by traditional developers. It’s here that HPE Vertica has made a huge play and is nipping at competitors’ heels with the the most price-effective option.
Moreover, the subscription-based model reduces the entry barrier for small players and it seems to have paid off well in India. This allows SMBs to adopt the platform by paying a lesser amount upfront, and not worry about “vendor lock-in.”
Cross-platform big data tools score higher here. And this is where most players falter. New entrants like Cloudera and Hortonworks are tied closely with Hadoop. Databricks is Spark. Vertica 8, on the other hand, has extensive integrations to Apache Spark, HDFS and Kafka as well, which means that customers can analyse data as it is without transforming or moving it.
Vertica is packed with built-in prediction modelling, sentiment and geospatial analytical capabilities that gives it an edge over other competitors. As part of the latest release, Vertica also has support for native machine-learning algorithms. Under the new release, for in-database machine learning, the parallel machine learning algorithms have been brought inside Vertica, so that users can effectively analyse data and make predictions without exporting it out of Vertica. Most platforms today do not offer a built-in data science capabilities.
Here are our final thoughts. If you are looking at sheer number-crunching ability of heavy data sets and prefer a SQL database, go for Vertica—as data loads quickly and is best for heavy-duty queries. In many ways, Vertica can be utilized for small and big enterprises alike and it is best for intensive business intelligence. If you want a true columnar storage option, Vertica also has the best analytics platform capabilities, so it is the ideal fit.
Now, with data management headed for the cloud, the DB wars will now be fought in the cloud rather than on premise. Competitors such as Amazon Redshift, Cloudera and Snowflake Computing offer more cloud elasticity.
No matter where we stand today, the course from here on would decide much of big data’s future. Very soon, other players would adopt these features and be competition ready. The space itself might see changes, given all the technological advancement that’s happening so quickly with big data. There’s no clear winner as of now, but that might change very soon.