Every spring, has served as a vehicle for numerous Microsoft data platform announcements, most of them pertaining to the Azure cloud. Although this year’s event is virtual, Microsoft’s data platform announcements are still in abundance. But, this year, they seem less about raw capabilities and more about fit, finish, and integration. The timing seems good: In the era of the COVID-19 pandemic, customers need lower-friction routes for implementing solutions more than they need shiny new standalone capabilities.
Also: Microsoft builds a supercomputer for OpenAI for training massive AI models
Synapses connect neurons; Azure Synapse connects services
Azure Synapse Analytics is, by far, the major nexus of integration in the Azure Analytics stack. Announced at Microsoft’s Ignite conference in November, Synapse is the next iteration of what was Azure SQL Data Warehouse. Going well beyond a re-brand, though, Synapse also integrates data lake and data science capabilities.
Also: Azure Synapse Analytics combines data warehouse, lake, and pipelines
These extra features are made possible through the integration of Apache Spark and Synapse SQL, a standalone query engine that allows queries in the company’s Transact-SQL (T-SQL) language to be executed directly against files in Azure Data Lake Storage (ADLS). Synapse provides an outer layer of integration, too, since its Synapse Studio browser-based development environment brings Azure Data Factory and Power BI under the Synapse umbrella. And all of this can be accessed via provisioned infrastructure or in an on-demand/serverless fashion.
When these important new features were announced at Ignite, it was in the context of a private preview release. But, today, at Build 2020, Microsoft is announcing its advancement to public preview.
Microsoft taps HTAP
The Synapse announcements aren’t just about wider availability of previously-announced features, however. The company is announcing, also in public preview, new “Azure Synapse Link” functionality, a cloud-native implementation of hybrid transactional-analytical processing (HTAP). The HTAP moniker describes platforms that can deliver analytics on existing operational data, without that data needing to be transformed or moved.
Also: Hybrid transactional analytical processing
In the case of Synapse Analytics, this is achieved by extending the standalone T-SQL query capabilities beyond ADLS, to work over data stored in Azure operational data services, beginning with Cosmos DB, which makes data available to Synapse Link in a columnar structure. The result is that the T-SQL query engine can query data in the lake and in operational stores, making the hybrid capabilities a reality. Microsoft has also committed to making Synapse Link available for Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL.
Cosmos DB: NoSQL for developers
Speaking of Cosmos DB, Microsoft has a series of announcements there too. For instance, new autoscale (originally named autopilot) and serverless modes of operation are being announced, allowing better alignment of billing to active usage. Autoscale works by managing provisioned request units from between 10% and 100% of a customer-declared maximum, based on demand. Serverless, meanwhile, implements per-operation compute pricing.
These options should make Cosmos DB more price-effective and generally appealing to the broad range of developers, many of whom have smaller workloads than Cosmos DB’s geographically-distributed capabilities were previously positioned to accommodate. Cosmos DB is now being positioned as “Microsoft’s fast NoSQL database with open APIs at any scale.”
In other words, Microsoft is grooming Cosmos DB as its developer-friendly NoSQL database (perhaps to compete with MongoDB, and certainly with Amazon’s DynamoDB) that accommodates massive global web-scale applications but can also be cost-effective for much smaller ones. And along those developer-friendly lines, Microsoft is also announcing a new release of the Java SDK for Cosmos DB (version 4) that should make programming against Cosmos much more streamlined and easy for the huge stable of developers in the Java ecosystem. New Cosmos DB Change Feed capabilities, like delete functionality, should sweeten the deal as well; so too should a new point-in-time backup/restore feature.
All new Cosmos DB features will be generally available (GA) this summer.
Open-source databases, too
In addition to Microsoft’s proprietary NoSQL database, the company is adding new capabilities to two of its three open-source relational database services as well: Azure Database for MySQL and Azure Database for PostgreSQL.
Both platforms gain Azure Active Directory authentication, Private Link, and three-year reserved instances pricing, all of which are GA. In addition, data encryption at rest with customer-managed keys (“BYOK”) will be released in preview next month. On top of those features, Database for PostgreSQL gets two more: logical decoding with wal2json (released in preview) and online migration to Azure Database for PostgreSQL Hyperscale using Azure Database Migration Service (GA).
Edge and beyond
At Build, Microsoft is also announcing that its SQL Edge product is now available in public preview. Announced at last year’s Build conference under the name “Azure SQL Database Edge,” the product is a version of the Azure SQL Database can run on small edge devices, including those based on ARM processors. SQL Edge also integrates a specially-implemented version of Azure Stream Analytics.
Also: Microsoft ‘Builds’ its data story, in the cloud and at the edge
With SQL Edge’s public preview, it’s important to note that Microsoft’s T-SQL language now works at the edge, on-premises and in the cloud, on relational and NoSQL operational data, the data warehouse, the data lake and in HTAP implementations. All of which builds (ahem) on the integration theme in this year’s announcements, and the ethos of scaling both up and down.