These data sources will come with a litany of data issues. There will be missing data, overlapping data sets, and duplicated data across\within systems. In addition to data quality issues, each legacy system will need to be transformed into the target format.
There are three possible places where this work can happen, the legacy systems, the target system, or an intermediary location.
The first option would be to use the legacy system for this work and that is usually a terrible idea. Whenever the legacy systems were selected to be used, we were brought in to rescue the failing data migration effort. It only works if the process is going to be fairly simple, with just a single legacy system, and take place on a fairly modern database technology.
The second option of using the target database is never used because it is either not ready in time or too unstable when a large chunk of the data migration work needs to take place.
That leaves us with using a third, intermediary place to do the data migration work. This intermediary place is centralized data repository or hub that can connect to all disparate systems. This data hub is an effective way to manage all of the data quality and transformation issues on data migration projects. There are several reasons why centralizing the data through a data repository between legacy system works on data migration projects.
The main reason is a centralized data hub allows for comparison of data across systems and within a system. It is critical to be able address the duplicate issues that are encountered in the system. There was one project where the client was attempting to migrate customer from many legacy systems across the world into a single ERP system. The legacy data was in various languages and maintained by different data teams across the world. The original plan was for a technical resource knowledgeable about their respective legacy system put the data into a standard template.
This template would then be loaded independently of the other templates into the target ERP. The client attempted one wave with this method and it went poorly. One of the major issues was that the same customers existed across the different legacy systems. This caused many duplicates within the new target ERP system. When our team joined the project, one of our first activities was to set up a central repository that was able to identify duplicate candidates and consolidate true duplicate data within and across legacy systems and within the target ERP. Once we where able to address the duplicate data, the next phase of the project started to go much smoother.
The centralized repository also facilitates a quicker reaction to specification changes. One of my colleagues likes to say that data migration projects are all about specification changes. It's about being able to manage, track, and quickly react to the specification changes. It is not uncommon to be getting specification changes due to changes in data design, MDM, and data cleansing efforts up through or even after go-live. Even the best up front profiling and analysis can't eliminate the specification changes that happen during complex implementations. The majority of data migration work revolves around unexpected and constant stream of specification changes. Having one place and one team address the spec changes streamlines this process.
Using that same project where initially separate teams were populating a standard template as an example. The specifications for the standard template changed several times during the initial phase. Since the template changed, the specifications needed to be communicated to multiple teams, in different parts of the world, speaking different languages. The communications for the simplest change turned into a game of telephone where the end result was different from the starting message. Once the repository was in place, specification change communication simplified and it facilitated the ability to quickly react to specification changes across the disparate systems. In addition to the simplified communications, the actual development was faster because one team could quickly replicate\reuse the same routines across all the disparate legacy systems.
The centralized repository also provides many other advantages for migration projects, including removing impact on production environments, easier post conversion reconciliation, pre-conversion test runs, testing on stationary sets of data, and several others all of which I've outlined here.
Please reach out to me at firstname.lastname@example.org if you have any questions, comments, or want to discuss any other data issues that you might be encountering.