This post goes through several tasks that are typically part of or start during the first several days of this effort.
Reach out to all business users to identify all legacy data sources.
- Frequently this process more difficult than it sounds. In addition to the main systems, people are maintaining separate excel workbooks, access databases, other files, and even notebooks that may or not be officially allowed. While these files might not be allowed, they usually facilitate a feature that is not supported by the current systems, but are critical to run the business. I worked one project for a retailer where legacy system that held all the fixed asset information really old and incredibly difficult to query and develop reports for. The really important day to day work, reporting and queries, and some maintenance took place in an access database.
Get access to all of the legacy data sources
- Getting access to legacy data sources is also sometimes harder than it should be. There are security issues and often time many steps of approval need to take place before access is granted. It’s important to start this process right away even if some systems don’t need to be accessed until well into the future.
Generate data statistics\Data profile reports on every single relevant table\field across all legacy data sources
- Profiling reports provide some a wealth of information about the data across all of the disparate systems and provide a good reference point for creating the data mapping specifications as well as a starting point for figuring out additional analysis that could be performed.
Ask business users about concerns about the data.
- Identifying the known pain points, the known gaps, etc. will give an idea as to what types of known data quality issues there are. It will also provide insight into potentially additional data quality checks that should be performed.
Learn about current data governance standards
- Learning about their current data governance and data standards is important to know what type of ongoing data checks the organization currently performs. More often than not, there aren’t any data governance standards\processes that are run against the legacy systems. In the cases there are, apply those checks into the analysis reporting and analyze their current procedures for holes. When applying the organization current rules to the data assessment effort, it is frequently discovered that there are many places where the defined standards aren’t in place.
Generate duplicate candidate reports
- Duplicate data is almost always an issue and it is a difficult issue to resolve. Duplicate candidate analysis needs to start as soon as possible to get the business thinking about rules for resolving duplicates; whether those rules be automated, manual, a programmatic\manual hybrid, etc.
Start to talk with target system experts about concerns
- Armed with profile reports and some initial data analysis reports, start to talk with the target system experts and the business around the business requirements for the new system and the concerns that they have. Use these concerns to begin to put together the next level of analysis reports
Generate additional in-depth data analysis reports
- With general data statistics in hand, build additional report that build upon the discussions with the legacy and target system experts. At this point, the team should start to have an idea of some of the main data quality issues and start to develop a deeper understanding of what is contained within the legacy systems.
Start to develop an ongoing data cleansing and process
- With the findings that the preliminary analysis and profiling reports uncover, it is possible to start to hash out what the overall data cleansing process will look like for various issues that need to be resolved with the team.
These starting tasks provide a good foundation for a successful data migration project as they allow the data team to quickly provide business a lot of knowledge about the underlying data and allow them to begin develop a strategy on how the data should look in the future.
If you have any questions, comments about this post or on data quality and migration projects in general, please give me a call at 773.549.6945 or email me at firstname.lastname@example.org