With digital transformation, we are stepping through a worm-hole that takes us to a different time-line while re-defining the way we are doing business and consuming services. Now, most of us prefer to embrace digital transactions and try out door-delivery of goods.
As we are increasingly moving towards distributed work environments — perhaps our homes — Firms will look at embracing distributed agile delivery practices for solutions, in Information Technology.
One such aspect is Continuous integration and continuous delivery where delivery of quality software at frequent intervals, is enabled through automated ways of detecting, pulling, building, and unit testing code.
Integrating Data Quality into the change life-cycle of the organization is important for better operational outcomes from the solutions builds.
In continuous integration, most often code review is optional, but having code review enabled as a best practice, enables one to include certain pre-checks of data quality that can be performed one-time while doing the test builds. Validation routines like precision, format conformance can be easily spotted in this review. Frameworks like Gerrit allow these features.
Another aspect of continuous integration is continuous unit testing where smaller builds, in isolation, are tested for basic functions. In a typical data lifecycle like “POSMAD”, planning for data includes modeling for data for better outcomes from data acquired by the organization.
Data Quality checks during data modeling catch costly errors during the planning stages of product or solution development. Even in modern databases like Graph, one needs to decide which entities can be nodes while others can be edges. Various unit-testing solutions like Junit can be leveraged for unit tests while coding in Java.
In continuous unit testing, there can test cases specific to the below dimensions of data quality
- Consistency routines — Structural consistency between data structures to avoid loss of data
- The precision of fields that includes a number of decimals
- Validity routines — Conformance to Data Type and a specific Format
- Integrity routines — The structural or relational quality of data sets
- Uniqueness routines associated with having non-duplicate values and identifiers
1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources
2. Deep Learning in Self-Driving Cars
3. Generalization Technique for ML models
4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)
Continuous delivery encompasses Continuous integration and continuous testing as concepts. These concepts are translated into features that are made available through an integrated framework and toolsets. One data quality solution can be leveraged to perform a complete test coverage including unit tests, integration tests, functional tests.
However, the available test environment often has contextual data that can be profiled and the profiling results from the quality assessment provide a platform to explore and analyze data quality using the Data Quality Validation Routines or Checks. The selection of data Quality validation routines varies across the data lifecycle and the Software development lifecycle.
Summarizing the key aspects —
💭 What if your organization is actively embracing agile practices & toolsets — Should the data quality practices mature as well?
🔑 In a Distributed & Disciplined Agile environment, Data Quality Management can be integrated with the DevOps integration tools to support Continuous Integration & Delivery.
🔑 In such fast-paced deployments, the data quality test automation in unit & integration test automation stages is required.
🔑 Often, Data Quality tools have their own code repository and versioning, & Integration, as well as Deployment capabilities and having to integrate them with CI/CD toolsets, can be a challenge.
🔑 Automating data quality management by running pre-built or templated rules in an automated manner assists in integrating the feedback faster by the developers.
🔑 Integration with Test management tools would also be beneficial to raise data quality issues and assigning them to the developers and data owners.