Privately-held Informatica is announcing closing of a deal for acquiring Compact Solutions, a provider of a metadata lineage tool that will eventually plug into the company’s Enterprise Data Catalog offering. Compact Solutions is a nearly 20-year old company that, like many tool providers, began life as a consulting firm that eventually pivoted to packaged software based on tools it was using in its client engagements.
Currently numbering roughly 50 staff, Compact Solutions has a development center in Krakow, Poland. A partner of Informatica since 2018, both share roughly a dozen joint customers.
The context to all this is building a more complete data catalog. It’s a challenge that has snowballed given that in most organizations the data estate has grown more scattered than ever – not only spanning multiple data types, but increasingly, multiple clouds plus on-premises systems. Like the traditional push for master data management, there is the concern to identify the golden copy of data, and with growing security, risk mitigation, and data privacy mandates, the urgency to collect comprehensive data lineage that shows, not only the provenance of the source, but how, when, and where data is consumed, and by whom.
As Big on Data bro Andrew Brust has pointed out, data catalogs have sprouted like wildfire, and there are many different flavors of data catalogs – different products do different things. And with it there is an explosion where it comes to data lineage information – metadata management tools collect it, so do many analytics tools and data warehouses. It gets to the point where there are so many sources of data lineage, data officers are often dealing with Rashomon-like scenarios of choosing which slice of reality to designate as the definitive source(s).
That’s where Informatica, with its Enterprise Data Catalog (EDC) comes in, and where Compact Solutions fills a gap. Not that Informatica is promising that its catalog is the catalog to end all catalogs, but it positions its offering as complementing those, such as AWS Glue or Azure Data catalog, that are masters of their own domains. EDC can ingest the content from these and integrate them with metadata from the rest of the enterprise, making in effect, a “catalog of catalogs.”
Compact Solutions accesses metadata from sources that have proven elusive to most data catalog offerings. While most catalogs can take in metadata that is static, Compact Solutions’ tool, MetaDex, can parse dynamic scripts, such as from stored procedures in databases, or even simple SQL statements that are executed conditionally. It includes roughly a couple dozen metadata sources including the specific flavors of SQL from SQL Server, Oracle, and Teradata; BI and analytics apps from providers like SAP, SAS, and Microsoft; third-party ETL tools; and mainframe COBOL code. In so doing, it adds a missing piece for Informatica’s Enterprise Data Catalog adding support for more difficult data sources.
Informatica plans to wrap the MetaDex functionality into EDC and apply the technology to tap other emerging sources, such as the Jupyter notebooks that are used for developing machine learning models in Python, or interfaces such as PySpark. By adding these elusive data sources, Informatica’s catalog could also provide the trail of breadcrumbs to make ML models explainable. Informatica is also looking to utilize Compact’s Poland R&D center as a beachhead to tap more technical talent in Eastern Europe.