Today is a pretty big day for advanced analytics on the Azure cloud, marked by a virtual event titled “Shape Your Future with Azure Data and Analytics,” which will feature Microsoft CEO Satya Nadella. During the event, the company will announce general availability (GA) of the latest version of its flagship cloud analytics service, Azure Synapse Analytics. It will also announce the public preview of a new companion data governance service called Azure Purview.
I’ve reported on Azure Synapse before. It is both an evolution of the former Azure SQL Data Warehouse and a complete revamp of that service to include significant Apache Spark-based data lake functionality. Synapse also sports integration with Azure Data Factory for data prep/data engineering, Power BI for business intelligence, Azure Machine Learning for AI, Cosmos DB and Azure Data Share. Until today, the data lake features and these integrations were in public preview; starting today, they’re GA.
But while the Synapse GA is significant, it makes even more acute the need for a solid first-party data governance solution on the Azure cloud. Yes, there was some semblance of this in Azure Data Catalog (ADC), but that service was more focused on metadata management than true data governance. While ADC could inventory, search and tag data sources, data sets and the columns/fields within them, it lacked important data classification and other governance capabilities, thus making it difficult to help customers comply with data protection regulations like the European Union’s GDPR and California’s CCPA.
To be fair, the first-party catalog offerings on Amazon Web Services (AWS) and Google Cloud have underwhelmed as well. Perhaps that’s why Alation, a well-respected independent data catalog provider, announced yesterday a partnership with AWS. According to Alation’s press release, the partnership will allow use of Alation to “search & discover and govern data across AWS services including Amazon Redshift, Amazon EMR, Amazon Simple Storage Service (Amazon S3), AWS Glue, Amazon Relational Database Service (Amazon RDS), and Amazon Athena.”
Competitor issues aside, though, Microsoft customers and experts in its ecosystem have been told, literally, for years now, that an ADC update was coming. Today, it finally has. And just as the evolution of SQL Data Warehouse resulted in a rebranding, so too has the evolution of ADC, with its successor christened as Azure Purview.
According to a blog post penned by Microsoft corporate vice president Julia White, Purview “helps discover all data across your organization, track lineage of data, and create a business glossary wherever it is stored: on-premises, across clouds, in SaaS applications, and in Power BI.” The integration with Power BI is important since that service had already introduced data governance features of its own, including integration with Microsoft Information Protection. Indeed, Purview will offer this as well, allowing users to apply sensitivity labels defined in Microsoft 365 Compliance Center to assets in the Purview catalog, just as Power BI users can do with that service’s reports and other assets
Also read: Microsoft Ignite 2019: Power BI gets new data protection features
In tandem with the Microsoft Information Protection integration, Purview will feature “AI classifiers that automatically look for Personally Identifiable Information (PII), sensitive data, and pinpoint out-of-compliance data” according to White’s blog post. These capabilities should go a long way toward addressing the GDPR/CCPA compliance needs I mentioned above.
Purview will also, of course, integrate with Synapse, which should be a boon to key Microsoft customers who have built strategic solutions on that platform. In another blog post, Microsoft’s Chris Stetkiewicz describes how FedEx is using Synapse for its FedEx Surround project. Surround combines data from package scans (as many as a dozen scans for each of the 6 million packages FedEx delivers daily) with traffic and weather data to predict disruptions and remediate them by re-routing deliveries. The blog post goes on to explain that FedEx Surround will support the distribution of COVID-19 vaccines, leveraging IoT sensor data to monitor their location, map traffic conditions along the route and ensure the vaccine doses are preserved within the necessary temperature range.
Purview began as a multi-year internal effort to assist in Microsoft’s own digital transformation and privacy compliance efforts. Those needs are clearly non-trivial, so even if Purview is new to the public, it apparently comes with rigorous internal use and testing at Microsoft. Mike Flasko, director of products for Azure Purview, said “As we modernize and work through our own needs, we’ve learned a lot about what it takes to digitally transform Microsoft and manage data privacy.”
Flasko has been around the Microsoft data stack for a very long time. I first met him over a decade ago in a software design review on the Microsoft campus when I was CTO at a Microsoft Gold Partner consultancy. Flasko is highly technical and has experience helming the revamp of Azure data-related services; in fact he did so for the current version of Azure Data Factory, which was a night-and-day improvement over its predecessor.
All together now
Microsoft seems to get that offering a bunch of disparate data and analytics cloud services isn’t sufficient to enable its customers’ digital transformation efforts. Instead, it needs to help customers use those services together and provide the integrations and ancillary capabilities that make that possible. As CVP White says in her blog post, “too many businesses…have silos of skills and silos of technologies, not just silos of data.” The evolution of Azure SQL Data Warehouse to Synapse Analytics was designed to address that fragmentation. Hopefully the aptly named Purview will address it further.
Microsoft is a customer of Brust’s advisory firm, Blue Badge Insights. Brust is also a Microsoft Data Platform MVP and a member of Microsoft’s Regional Director program.