This is the third and final post in a series of posts to break down the questions from my Queries on Queries talk. The full talk is available here.
Is your solution reusable?
Migrations feel like one off processes, but teams that migrate once usually migrate again.
Have you ensured that as much of your solution as possible can be reused? Do you have a shared library of migration tools that your whole team can access? When you create new functionality are you thinking about ways to make it usable in your next project?
On any technology project you will generally benefit from designing for re-usability. I mentioned in my comments on the question about repeatability that people get tempted to see migration work as fundamentally one-off, but you need to plan for many runs. That question is focused on repeating the same project, this is about recycling parts of this project in another.
To a consultant, the value of reuse should be obvious: we like to sell projects to new clients based on successful projects for another client. For that I want libraries of tools the developer designed for rapidly assembled to meet a new client’s needs.
But even when I was the client, I was moving similar data into the same systems over and over. I created API libraries, and rough interfaces, to handle some of that work so I didn’t have to do the same tedious work again and again.
In both cases those libraries are only useful if whoever needs them knows they exist, has access to them, and can figure out how to leverage them.
Is your migration testable?
All good processes are rigorously tested.
Do you have an automated testing solution that validates your process? Can you tell if the data migrated accurately after each test run? Do your tests cover the positive and negative cases?
Testing migrations is hard. Testing software is hard. The testing tools that developers are most familiar with are unit testing tools, test one very small thing at a time. Multi-system data comparison is not their forté. The tools that do exist for such work tend to be quite expensive and/or so complex the task of creating tests is nearly as hard as the task of creating the migration jobs themselves.
But just because testing is hard does not mean you shouldn’t do what you can do within the budget and time you have. When you cannot use something like MuleSoft’s MUnit you can still create queries that sanity check the migrated and generated data. You select records for spot checking that cover edge cases you are aware of, and some that represent primary use cases. You can look for records that create invalid data states that would violate your new validation rules.
Is your work fixable?
Migrated data often needs to be updated after the jobs have all run.
Do you have a plan to fix your data if errors are found post migration? Does your plan include ensuring you have external Ids, or other connections, to be able to update all records of every type? Have you validated this plan will work in practice?
When you do a data migration, because everything is determinant, you feel like perfection is possible. But when you’re moving millions of records that were entered by humans, extracted by humans, mapped by humans, validated by humans, and represent human behaviors, there is a lot of room for human error.
You can either pretend your process is good enough to squeeze out the error, or build a process that allows you to fix the errors that slip through. Obviously I don’t believe the first is possible, so I encourage the second.
Make sure you can go back and update anything. If you’re migrating into a database that allows for a lot of easy changes – great. If you’re migrating into a financial system – make sure you understand the rules for editing.
Planning for mistakes you don’t want to have makes it far easier to recover from those mistakes when they appear.