One of the key deliverables of the data assessment is to figure out where each major data area resides and what the actual data values are in each column. The only way to accomplish this is through data profiling. Sometimes this portion of the data assessment phase is referred to as the data profiling phase, as the deliverable and data profiling output closely correspond to each other. Once the initial round of data profiling is complete, the team can use the results to drive the remaining phases of the project.
During the data mapping phase, the project team will take the data profiling results and use them to put together the mapping specifications. Thus, the team will be using real data to drive the specifications and be able to see all combinations of currently used values when coming up with the transformation specifications. This is opposed to trying to figure out the rules by just using front-end screens and a small sample data dump, hunting and pecking at individual fields, and scrounging up out-of-date documentation. All of these techniques can be used to help create the mapping specifications, but without using the profiling results, the initial version of the mapping specifications will be significantly incomplete. If we used the above example, the team would recognize that when mapping the item status, the values A, X, F, R, OT, and null would need to be taken into account, but the documentation and front-end screen might have listed only A and X as valid values.
The data profiling results are also used to assist with the data cleansing and data enhancement portions of the data migration. Data profiling identifies some of the outlying values, strange patterns, and missing values that need to be cleaned up. In the above example results, the low and high effective date values are suspect. There are also some invalid values that should be researched. There are also email addresses in the phone number field, as well as strange phone number patterns that should be cleansed.
Data profiling is also a helpful tool that is frequently used during several aspects of the post-conversion reconciliation. It is possible to validate counts, translated values, missing values, etc., by running data profiling across the disparate systems and comparing the results. Using a general data profiling process to help identify extraction, translation, and load issues is usually easier than putting together individual field-by-field tests. Field-by-field and row-by-row tests are still potentially needed, but profiling can eliminate the need for some of those tests.