At its Ignite conference in Orlando today, Microsoft released a new version of its core relational database, SQL Server. The new version, SQL Server 2019, takes important capabilities in its previous releases — including and the ability to run on Linux and in containers, and its PolyBase technology for connecting to big data storage systems — and greatly expands on them. SQL Server 2019 leverages PolyBase for full-on data virtualization and combines its Linux/container compatibility with Kubernetes (K8s) to deliver a new technology called Big Data Clusters (BDC).
Also read: Review: SQL Server 2017 adds Python, graph processing and runs on Linux
As I covered the BDC technology in a previous post from last year’s Ignite conference, when the preview of SQL Server 2019 was announced, I’ll not repeat all the details here. In broad strokes, though, BDC implements a K8s-based multi-clustered implementation of SQL Server and combines it with Apache Spark, YARN and HDFS (the Hadoop Distributed File System) to deliver a single platform that can take on OLTP (Online Transaction Processing), data lake and even machine learning requirements. It can be deployed to any K8s cluster, on-premises and in-cloud, including on Microsoft’s own Azure Kubernetes Services (AKS).
In a briefing with ZDNet, Microsoft’s Corporate Vice President, Azure Data Engineering, Rohan Kumar, took me through many of SQL Server 2019’s new capabilities beyond BDC, allowing me to focus on them in this post. So here goes.
To begin with, although the Linux and container compatibility is a big part of what makes BDC possible, so too is an expanded capability set in PolyBase. While PolyBase can still connect to Hadoop clusters and Azure storage, it can now connect to other SQL Server instances too. That’s what allows the BDC master node to communicate with the BDC compute, data and storage pools, and what allows the nodes in the storage pool to connect to data in the co-located HDFS storage. Microsoft also provides Azure Data Studio, a new cross-platform tool for T-SQL querying, notebook development and even running Spark jobs on BDC deployments, to tie everything together.
Also read: Microsoft’s PolyBase mashes up SQL Server and Hadoop
But that’s not the extent of PolyBase’s enhanced portfolio and importance. PolyBase can also connect to Oracle, Teradata, MongoDB and, therefore, Azure Cosmos DB. In addition, PolyBase can connect to any data source for which the customer possesses and ODBC driver. And, with that capability, Microsoft now sees SQL Server as able to take on data virtualization workloads, too. In other words, SQL Server can be a one-stop shop for connecting to multi-platform OLTP, NoSQL, data warehouse and data lake workloads. But because much of the connectivity is virtual, data can remain in its native repository, with as much of the query work as possible being “pushed down” (i.e. delegated) to the remote platform.
Don’t neglect the (data)base
But what about the core engine, and down-and-dirty performance? There’s much to report here as well.
- SQL Server’s TempDB can now be configured as a memory-optimized database (using SQL Server’s In-Memory OLTP technology), which can vastly improve performance
- SQL Server gains persistent memory capabilities
- SQL Server’s query optimizer is now smarter, and able to allocate more resources during processing as queries are processed
- The SQL Server platform also now supports UTF-8 character encoding (which may sound esoteric and picayune, but Microsoft’s got a long blog post explaining why it’s important and a longtime-requested feature)
- Developers can now execute Java code, in addition to code written in R and Python, from SQL Server scripts and stored procedures
And, on the data availability and security fronts, there’s more:
- Accelerated Database Recovery helps deal with long-running transactions that are disrupted by power cuts or other severe disruptions, allowing the database to be recovered more quickly, independent of where in the transaction the failure occurred
- A new feature, called Secure Enclaves, enhances SQL Server’s Always Encrypted feature to be compatible with comparison operators beyond simple equality checks
Permutations and Combinations
Heading back to the BDC capabilities for a moment, I need to point out that SQL Server 2019 isn’t the only Microsoft product to integrate Apache Spark. Spark is a key component in two different HDInsight cluster types; Cosmos DB integrates Spark as well (though that capability is still in preview); and Spark is the very essence of Azure Databricks. Beyond those options, though, Microsoft is today, at Ignite, announcing a new Azure service called Synapse Analytics, which represents a third major rev of Azure SQL Data Warehouse combined with — you guessed it — Apache Spark. I cover Azure Synapse Analytics in a separate post in which I explain that Microsoft is effectively releasing two Spark/SQL Server technology mashups on the same day, at the same event. In that post I also try to explain the differences between the SQL Server 2019 and Azure Synapse Analytics use cases.
Also read: Microsoft ‘Builds’ its data story, in the cloud and at the edge
This confusion aside, the key takeaway with SQL Server 2019 is this: SQL Server practitioners continue to receive updates to the core platform and accessible adaptations of newer technologies from around the database and analytics world. It’s been that way for about three decades, and it doesn’t look like it’s going to stop any time soon. There’s power in a platform that defines its own culture and ecosystem, and Microsoft takes its care and feeding very seriously.
Brust is a Microsoft Data Platform MVP and has done work for various Microsoft data technology/product teams.