White Paper 1: Data Profiling and Remediation
Welcome to the
COSOL BUSINESS GUIDE TO DATA MIGRATION
COSOL is an Australian-based organisation with 20+ years of experience in complex ERP system upgrades, migrations, and consolidations. Through this experience, COSOL has developed a strong reputation for end-to-end data migrations, which remains at the very core of COSOL’s expertise. This paper is the first in a series summarising lessons learned, to provide insight for executives who are faced with a possible large-scale program of this nature within their own organisation.
Our point of view
Experience shows that one critical success factor for these programs to realise the value in their business cases is data migration. The axiom “garbage in, garbage out” remains as relevant today as it did when computers were first invented. COSOL strongly believes:
- That digital transformation is well underway, and every board is, and should be worried about how to become a truly digital enterprise.
- Strong enterprise data foundations will be required to enable adoption of digital solutions including advanced analytics, robotic process automation, machine learning and artificial intelligence which are the next frontiers to productivity and market competitiveness.
- Enterprise data is the glue, the fact base, that drives decision making and business improvement, allowing organisations to meet stakeholder expectations in a timely and efficient manner; and
- For organisations to succeed, data must be treated as a mission critical asset; it is the single biggest success factor in a digital transformation journey, and most organisations are ill prepared due to many islands of disconnected data that is of unknown and/or poor quality.
Introduction to data migration
Unfortunately, this is typically in a negative light when it is exposed as a major risk to a program with a major cost attached to remediate the data and mitigate the risk. This is a symptom of being under-valued in a traditional business model that has survived on paper-based systems and manual data entry which is not sustainable in the digital age.
As a “domain”, data should and can be managed before, during and after a major program. This strengthens organisation’s overall digital capability and mitigates future risks and costs by ensuring data remains evergreen.
Data migration can best be explained as three distinct phases
- Pre-migration: data profiling and remediation (White Paper #1)
- During migration: data standardization and loading (White Paper #2)
- Post migration: data reconciliation and archiving (White Paper #3)
This paper focuses on the pre-migration phase.
The pre-migration phase is preparing your organisation’s data for migration. Establishing data owners who can then contribute to and approve Data Quality Management (DQM) objectives and guiding principles will define the profiling and remediation requirements of the pending data migration.
Data profiling is the systematic analysis of the source data based on the requirements given. The desired outcome of the profiling activity is to provide a correct and complete model for the target solution. Once data has been assessed, the results of the profiling will drive the data remediation by determining the actions needed to cleanse and prepare the source data.
Data remediation can involve correcting duplicate, incomplete, inaccurate, and corrupt records in existing systems. There may also be a requirement for the standardisation and harmonisation of records to align with the requirements of the new system. There are six dimensions of data quality as outlined in Table 1: Six Dimensions of Data Quality shown below.
When and where to start
A common practice and misconception in traditional, less digitally mature business is that a program, together with some technology, will fix whatever issues are encountered, and once fixed it will be fixed for good. Two things happen as a result:
- The program risks and costs increase significantly due to a lack of data readiness; and
- The program will, by nature, only focus on doing the minimum work required for the program to achieve its objectives, often described as a new system going live. This approach most often does not include ensuring the business has a sustaining capability and as a result, data will sadly fall into a state of disrepair over time, and the benefits of the transformation will not be sustained.
A better/best practice is to acknowledge that this is a business issue first and foremost and that the business must take ownership and accountability of its data as a strategic asset.
Data owners, along with the organisation’s process owners should meet regularly in an enterprise process and data owners forum to discuss, review and agree on performance, quality and remediation activities needed to continually improve business performance.
In-program vs. pre-program data remediation
As mentioned above, a common misconception is that program commencement is needed to kick-off data profiling and remediating.
Per Figure 3: Common vs. Best Practice Approach to Data Migration, a better practice Is for data owners to commence data profiling as soon as practical to understand the baseline quality of their data.
Data quality can be improved ahead of a data migration program which reduces time, effort, cost and risk to the actual program, and further strengthens the sustaining capability to maintain data quality beyond the program completion.
Initial data quality efforts should focus on the top three dimensions of data quality dimensions shown here in Table 1: Six Dimensions of Data Quality.
|Duplication||Duplicate records may exist (e.g. specific master data may exist multiple times with each instance having a variation to the original master data naming convention). Any duplicates are to be deleted and relationships amended to link the surviving row.|
|Redundancy||Data that is no longer current. These should be identified in source systems and corrected accordingly (e.g. active vendors without an invoice in the last 14 months).|
|Standardisation||The data to be migrated will need to conform to an approved or conventional standard (e.g. master data standard). Cross-validation of datasets across table structures against the agreed standards will need to be monitored and rectified as appropriate until go-live.|
|Incorrect||The data has the incorrect business value (e.g. entitlement amount is incorrect, bank account name is incorrect, addresses inclusive of post codes are incorrect, field values are not aligned to their original planned usage patterns).|
|Integrity||Relationships are not maintained correctly, such as orphaned records (e.g. an account balance for an account that no longer exists). Tables and application functional areas must be maintained (e.g. organisational structure, accounts, reports to, etc.).|
|Completeness||Is a measure of data content quality expresses as a percentage of the columns or fields of a table or file that should have values in them, or fields that need to be left blank and have values in them. (e.g. first names not in preferred name column).|
Table 1: Six Dimensions of Data Quality
Data profiling and data remediating takeaways
- Data will typically be exposed as a major risk and cost in large scale digital transformations due to unknown and/or poor quality. If you don’t know the quality of your data today, this will likely be true for you.
- Improve your readiness by:
- Establishing strong data ownership and data governance immediately. These roles are needed now and will be required into your digital future. Start now and begin to develop that capability.
- Ensuring the data owners define data quality objectives as their priority task. These objectives are used for profiling the data and baselining your current situation. Without this, your organisation and any pending programs are flying blind.
- Starting data remediation as soon as possible. This is the most common pitfall, where organisations leave it to the ‘program’ to fix the problem. Data owners can, and should, resolve duplicate data and redundant data in existing systems. There is no dependency on new systems for this to occur, and this exercise alone helps develop a critical enduring capability.
We hope that these guidelines help you in some way with your digital transformation journey in this critical, yet often overlooked area of data migration.
The next white paper in this series will focus on the next phase – data loading.
Appendix A : Data owners and data steward roles and responsibilities guide
- Overall responsibility, ownership, and authority for a set of business data, usually in their area of business expertise.
- Typically, directly affected by the data’s accuracy, integrity, and timeliness in their day to day activities.
- Approves/endorses the data migration results based on the reviews performed by the data stewards.
- The person that works with or uses the data on a day-to-day basis. They are involved in creating and maintaining the data (sometimes with the assistance of data custodians).
- Is affected the most by poor quality data but is also best placed to resolve data quality issues.
- Nominates and/or endorses data stewards
- Governance (*)
- Sign off data reconciliation plan
- Sign off trial data migrations
- Approve mitigation plan for data quality issues unlikely to be cleansed by go-live
- Sign off final data migration
- Maintain data quality post go-live
- Defining target data set
- Participate in development of data migration requirements (including data extraction and mapping rules)
- Data quality / cleansing activities
- Participate in data quality reviews led by data migration team
- Assign actions from data quality reviews
- Perform data cleansing and data collection
- Profile and analyse pre-load and post-load data
- Data reconciliation
- Participate in data reconciliation planning
- Contribute to and validate the data reconciliation plan
- Perform data reconciliation for the trial and go-live data loads
- Send endorsement to data owners confirming that the data has been cleansed, validated, and reconciled with support from the data migration team