Easy to replicate the legacy and target systems table structure
The data repository needs to allow for the automatic creation of the legacy and target data structures without the need of a DBA or the need to write SQL scripts that convert flat file structures into a table. Throughout the duration of a project, data sources are discovered and they need to be accessed immediately. If it takes more than a few minutes for the data migration team to set up the meta-data to bring in multiple data sources, the data repository is not as agile as it needs to be.
Easy to create additional table structures that reside only in the repository
In order for the repository to serve as an easy to use sandbox for the data migration team to play in, it also needs to make it easy to create additional tables that are not defined in either the legacy or target systems. The team is going to be receiving all sorts of the specification changes, data enhancement\cleansing spreadsheets, cross reference information throughout the course of the project. To handle these changes, the data migration team needs to be able to create, drop, and alter data structures on the fly to react to all requests from the functional team, business team, and that their own analysis.
Gracefully handle bad data
Data migration projects are all about handling data and if the migration processes can’t easily handle bad data, it’s going to be difficult for it to be successful. The repository has to be able to handle character data in numeric fields, invalid dates, etc gracefully and without losing rows. If rows are load upon insert into the repository, the integrity of the data is lost and further analysis of the legacy data can become convoluted.
Easy to load data in and get data out
Moving data from environment to environment is the core procedure of a data migration project. If it’s difficult to get data in or out of the data repository, the project is going to have problems and defeats the purpose of having it in the first place. The data repository needs to be able to facilitate getting data in and out with minimal effort. When possible, a direct database connection should be used to bring the legacy data into it. In some legacy environments, mainframe especially, that’s not always possible. However, bringing in\exporting out flat, ebcdic, or any other file format should be a simple process. There there could be 1000’s of tables\files and if it takes more than a small amount of time to get at that data, it could quickly devolve into a time consuming process.
Easy to build components that analyze the data once it’s in the repository
It is incredibly important that the repository is a place that’s easy to query, combine, harmonize, and separate out data within a single system and across the disparate systems. If the repository doesn’t facilitate this cross system analysis easily, the effectiveness of the repository will diminish significantly.
If you have any questions/comment regarding this post or on your current data project, please email me at email@example.com or call me at 773.549.6945.