Introduction and Motivation
The blockchain technology has led to a strong foundation for different applications related to asset management, medical/health, ﬁnance, and insurance. Data analytics provided by the blockchain network helps efficient data management, analysis, privacy, quality assurance, access, and integration in heterogeneous environments.
The role of blockchain in data privacy is evidently becoming more strong when the current breakthroughs in quantum computing render present encryption technologies ineffective and make them susceptible to brute-force attacks. As the volume of data that blockchain networks store is also rapidly increasing over time, let’s explore how blockchain technology can play a dominant role in Data Governance.
Its ability to store failure transactions find the huge potential of bitcoin in financial and banking services.
In this blog, let’s explore the key functionalities of blockchain to serve as a data governance framework and see it with an example using Google Cloud Platform
The blockchain architecture promotes a data store with a range of governance functionalities in times when regulatory mechanisms like GDPR has come to function. It has been known as a unique data processing platform that removes the need for a centralized authority. Moreover, as an append-only, permanent data storage, it comes with characteristic features of ensuring high-quality data across organizational units, data security, consistency, and manage regulatory risks that are discussed below:
Transparency – Data stored on a blockchain are accessible to all participants who have internet.
Immutability –This feature comes with the distributed consensus process which helps in the public audit train process. As all transactions in the blockchain network are stored as immutable records, it ensures durability where nothing can be deleted or modified.
Consistency – The data stored with distributed consensus protocol helps in single truth across the blockchain network, ensuring its consistency which is one of the essential characteristics of any governance framework.
Equal rights – The disintermediation feature allows every participant of the network has the same rights to manipulate and access the blockchain. The access rights are governed by the computation power or stake owned by the participant, dependent on the consensus protocol.
Availability – The nodes in the blockchain network allows every participant to preserve a full replica of the blockchain data, which remain available as long as the nodes remain available.
Data Provenance – It serves as a historic document, storing all transactions and data lineage. The history of data needs and migrations are all tracked and each change tied to its author. In addition, any rollbacks to the previous state are accounted for. The framework further provides access to data lineage where the data needs to be moved and operated upon across organizational lines.
Traceability – Audit trail and data lineage add to the traceability of data.
Compliance – Blockchain provides sufficient attributes to follow standards, conventions, or regulations and similar rules relating to data quality.
Conﬁdentiality – The framework allows access to authorized users, thereby providing confidentiality.
Credibility – The frameworks provide credibility which is taken as true and believable by users.
Actually, a bitcoin network resembles a database more in terms of providing Data Governance functionalities by embedding the sequence of steps of data transition within itself. As illustrated in the figure below, the bitcoin network follows a database in terms of preserving consistency, validity with the complete state, history, and governing expectations that can be shared amongst multiple stakeholders and can be operated on independently.
Applications that utilize blockchain technology are:
Currency (e.g., Bitcoin and micro-payments), Contracts (e.g., escrow and automated insurance process based on agreed terms), Asset management tools (e.g., land registry and digital coupons)
With the need for Data Governance in crypto-currencies particularly for transaction fees (e.g., smart contract executions), tokenized assets (value or equity) organizations using blockchain as a Data Governance framework, should allow the content of a smart contract to be GDPR compliant. In addition, the public key of a blockchain account (Personally Identiﬁable Information(PII)) should be protected using any of :
Mixing keys, especially in UTXO-based blockchains), Value transfer (e.g., zero-knowledge proofs/arguments), and Blind signatures and data payloads(e.g., encryption and read permissions as assets)
While several studies have addressed a range of governance issues of data on blockchains, there is a lack of a comprehensive approach to deal with those issues and effectively orchestrate data management processes. It denotes that there is a need for a novel governance framework for both a blockchain platform and a blockchain-based application
Blockchain As Data Governance Accelerator
The blockchain serves global financial services not limited to insurance, and investments, but also through its cost-competitive, operational models is able to accelerate and enhance the risk-management functions of the businesses.
- MDM and blockchain can reap benefits from mutual integration.
- MDM can utilize blockchain for Data Distribution and Data Governance, leaving access to great master data to blockchain technology.
Data Governance with blockchain works as a unit to offer insurance customers more secure products. For instance, Coinbase, a large Bitcoin wallet, is known for protecting people by providing insurance against employee theft and hacking.
Along with MDM, the foremost capabilities of Blockchain lies in building an increased trust ecosystem in terms of a Data Governance Framework.
- Access to third-party controls, with enhanced risk management Decentralized ledger systems with full auditing features
- Timely alerts and notifications informing changes
- Acts as a guarantor for the integrity of a digital representation of a physical entity.
- Acts as a Data Chain of Custody
- Helps in the reinterpretation of Events
- Source of Truth and Doubt
- Ability to create blockchain-based data escrow by a combination of cryptographic techniques like secret sharing with smart contracts, where the encrypted and published data is available to a critical number of stakeholders.
One of the use of blockchain in the supply chain industry as DG is where certiﬁcates are given to food products to ensure their authenticity. The metadata about the certiﬁcates could be stored on a blockchain, and a buyer can verify the purchased product through verifying the certiﬁcates with the metadata stored on the blockchain.
BlockChain Data Governance with GCP
The following figure illustrates a conventional/traditional Data Governance Architecture with Google Cloud.
The Data Catalog API supports the ingestion of technical metadata from non-Google Cloud data assets as well.
In addition, its integration with Cloud Data Loss Prevention (Cloud DLP) enables users to run Cloud DLP inspection jobs on BigQuery. This, in turn, helps to automatically create Data Catalog tags for identifying PII data.
Blockchain.com’s tech team began hosting some of its IT infrastructure within GCP Compute Engine instances and added Google Cloud Platform’s Managed Services:
GCP makes it easy to get the basics of security right. Google Cloud goes above and beyond to protect data, infrastructure, and services from external threats, while internally, the permission model integrated with Google Workspace gives granular control over access rights.
Public blockchain data are freely available in BigQuery through the Google Cloud Public Datasets Program for eight different cryptocurrencies which are referred to here as Google’s crypto public datasets.
Cloud Spanner – Spanner Server services allow fast scaling (with no downtime), provide high-availability, strong consistency with low operational overhead by leveraging the globally distributed databases. The cost-effective solution helps to ingest raw blocks from Ethereum nodes in real-time, transform that data, and persist in Google Cloud Spanner. It is also capable of restoring huge databases in just hours.
Cloud Identity Access Management [Cloud IAM] and VPC firewall allow Blockchain to lock down access to resources according to the least privilege principle and implement defense in depth.
Stackdriver – It’s logging and monitoring capability enables us to be alerted to any unusual activities in real-time.
Google’s Cloud Identity-Aware Proxy (Cloud IAP) for user identity verification purposes, within the customer-facing part of its platform, and also within its back-office application environments.
Authentication – Easy authentication mechanism allowing to activate applications based on G Suite/Google WorkSpace accounts.
Further, watch this youtube video to build a “Blockchain on GCP using Hyperledger Fabric and Composer“.
- Latency in establishing consensus.
- Sensitive data needs to be forgotten.
- Due to high storage volume, features like auto-deletion beyond a certain volume of data has to be incorporated. However, they make other blockchain functions complex.
- Network governance could break down.
- Monopoly or single organizations controlling data ecosystem (generation, access, and regulation of all data, a blockchain) limits its value. Hence it should encourage the involvement and participation of peers in the blockchain networks.
- Reading Blockchain transactions involves reading receipt-based transient synchronous communication, that does not directly return results or indicate whether the transaction was successful.
Future Work & Unanswered Questions
Blockchain technology looks promising for next-generation Data Governance Framework with the following enhanced business functionalities:
- Better Decision Making with consistency, completeness, and accuracy
- Operational Efficiencies with fact-based decisions that become real-time events.
- Improved data understanding and lineage (removal of confusion, adding clarity and meaning)
- Data alignment leading to Regulatory Compliance
- Increased Revenue obtained by added data confidence and insights sharing.
In spite, there remains much work/research need to be done for
- Strategizing end to end Machine Learning pipelines with Online and Batch/Stream
- Processing events with Blockchain
- Can we integrate AI/ML algorithms with blockchain events (live crypto-currency feed) along with events from disparate data sources ( say Iot)
- How do we ensure the privacy and decentralization of non-blockchain events in the same pipeline?
- In that do, we plan to add Data Catalog and DLP in the architecture.
- How do we assemble and ensure fair data for blockchain (cryptocurrency exchanges) events and build fair ML models?
Note : Blockchain for Data and Model Governance (US20200082302A1, Application US16/128,359)
There are key processes in building ML models and helping others in the team/organization to understand the same. and collaborate with each other. Some of the crucial requirements to fulfill model understanding and feature sharing across teams within the organization include:
- Features causing bias, model sensitivity, or target leaks.
- Steps to build the model, datasets used for training, validation, and testing.
- Model interpretability explaining the factors for the Model’s behavior.
- Model explainability and accountability for all use-cases that could alleviate risk and prevent time and effort in Model re-development.
- How can blockchain help to codify accountability?
In addition, the model development process on the blockchain helps in the governance process by:
- By providing the analytic model its own entity, life, structure, and description, with detailed structure and documentation Helping to create Models with more explainability and less bias with an increased focus to deliver Ethical and Explainable AI technology solutions.
- Increasing the scope for future work by creating essential assets for the organization.
- Facilitating analytic tracking document (ATD) and agile model development process which can be used by parties outside the development organization. It also helps regulatory bodies in Model audits.
Blockchain techniques are uniquely suited to data governance systems.
Blockchain Technology comes in-built with Data Governance due to its capability of maintaining history can honor the privacy and the right usage. In general, blockchain networks are particularly more suited among cooperating peer organizations or business units cooperative among each other in a mutually beneficial manner, or even when with aforesaid regulations controlling movement or sharing data among each other.
The role of Blockchain in Model Governance can transform the entire AI/ML pipeline with not only model-level information and cause-effect relations but also with accounting details for data scientists, big data, analytics, QA professionals for each change, transformation, re-training, validation, testing that have been undertaken at each sprint within each story.