There are only 28 days in February this year, but for some companies, it’s still an action-packed month. Just a bit more than two weeks after announcing its blockbuster $1B Series G funding round, Databricks is today announcing that it’s available on Google Cloud, with an implementation that integrates with that cloud’s home-grown data services, including Google Cloud Storage, BigQuery and Looker.
Brick by brick
The joint press release from Google and Databricks says the latter is now “the only unified data platform across all three clouds.” While that may seem a bit hyperbolic, Databricks does indeed offer a premium Apache Spark-based platform that customers can make themselves at home with, on any one, or any combination, of the big three clouds.
Beyond the core Databricks Runtime (a proprietary implementation of Spark that the company has long said is 7x faster than the open source distribution), the platform features Delta Lake and Delta Engine for data “lakehouse” implementations, SQL Analytics for ad hoc querying and data visualization, and MLflow for managing machine learning operations that combine Spark ML with Databricks’ other facilities.
ZDNet spoke with David Meyer, Databricks’ SVP of Product Management and Kevin Ichhpurani, Google Cloud’s Corporate VP. The two provided a combination of important facts and vision around the partnership, supplying critical information about how the service is offered, and its integration with other services on Google Cloud.
The Amazon Web Services implementation of the service is an offering from Databricks on the Amazon Marketplace, whereas Azure Databricks is in fact a first-party offering from Microsoft. The Google Cloud version of Databricks lies somewhere in between.
On the one hand, it is a marketplace offering, and not Google-branded, making it somewhat like the AWS version. On the other hand, Google Cloud and Databricks are executing a joint go-to-market effort; Google Cloud Marketplace offerings provide unified billing with other services — making Google the vendor of record; and Databricks on Google Cloud offers tight integrations with other Google services, as a result of close partnership and collaboration. All of this makes the Databricks on Google Cloud offering functionally similar to Azure Databricks.
At a more detailed level, Databricks’ integration with Google’s data services include pre-built connectors for BigQuery, Google Cloud Storage, and Pub/Sub. Also, machine learning workflows running on Databricks can leverage Google’s AI Platform as a compute service for training and as a hosting service for model deployment.
Speaking of compute and deployment, the Google Cloud implementation of Databricks is the first to run in a completely container-based fashion, leveraging Kubernetes natively. That means there’s also tight integration with Google Kubernetes Engine, and a corresponding ability for Databricks clusters to be provisioned and spun up very quickly.
Should GOOG stay or should it go?
While it’s all well and good for Databricks to run on all three major clouds, it brings up the question of Google Cloud’s position in that cohort. Revenue-wise the company is still a distant number three, leading some in the industry to ask about Google’s staying power in the market given its core advertising business is so much bigger.
But, in the data and analytics ecosystem, Google Cloud has VIP status. For example, data warehouse darling Snowflake runs on Google Cloud, as does the cloud service for geo-distributed operational database CockroachDB. If that’s not enough, the press release lists Collibra, Confluent, Fishtown Analytics, Fivetran, Immuta, Informatica, Infoworks, MongoDB, Privacera, Qlik, Tableau and Trifacta as members of Google’s partner network.
Clearly, Google Cloud is serious about data, analytics and AI. As such, Databricks was smart to add it to its cloud collection. It’s one more proverbial brick in the wall, as the company readies itself for the next big thing.