<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Data-Migrations on Spinning Code</title>
    <link>https://spinningcode.org/tags/data-migrations/</link>
    <description>Recent content in Data-Migrations on Spinning Code</description>
    <generator>Hugo -- 0.152.2</generator>
    <language>en-US</language>
    <lastBuildDate>Mon, 23 Oct 2023 01:55:06 +0000</lastBuildDate>
    <atom:link href="https://spinningcode.org/tags/data-migrations/feed.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Salesforce Data Migration Pattern</title>
      <link>https://spinningcode.org/2023/10/a-salesforce-data-migration-pattern/</link>
      <pubDate>Mon, 23 Oct 2023 01:55:06 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=2097</guid> 
      <description>Over the last few years I’ve done a number of large data migrations into Salesforce, and developed a pattern I like to follow.</description>
      <content:encoded><![CDATA[<p>Loading large amounts of data into Salesforce is a non-trivial exercise. While traditional databases can often be loaded in nearly any order, or with just a few simple considerations for foreign keys, Salesforce’s platform behaviors require several special considerations.</p>
<p>Over the last few years I’ve done a number of large data migrations into Salesforce, and developed a pattern I like to follow. This pattern allows me to load data efficiently at any scale. While the implementation details will vary, you can adapt this pattern to your projects.</p>
<p>Efficiency matters more the larger your project: for a small project, this is overkill. If you are loading 1,000 Contacts it will probably take you longer to setup my process than just format the file in Excel and load it through Data Loader. But if you need to load 100’s of thousands of records, millions of records, across lots of different objects, this pattern can save hours or even days.</p>
<h2 id="migration-process-overview">Migration Process Overview</h2>
<p>The general concept here, is that you’ll run your migration in two major phases:</p>
<ol>
<li><strong>Prepare the data in a staging database.</strong></li>
<li><strong>Load the data into Salesforce.</strong></li>
</ol>
<p><figure>
  <a href="https://lh4.googleusercontent.com/XAowaCgtZ8KVRhd2apma0_S-_Ru0XGSpi_q-6cehO0hNV7Nl9GtaaSl6K47ti4gvbfMWZqBZ9fPmBwnQh-4Bo_bBZOQcKxji2j0JlyMu2UZOQPTMQE1TE48TfMM7XTQ9xVVdL3i3yxdzLs6yPldEopM" target="_blank" rel="noopener noreferrer">

    
    
    
    
    

    <img class="rcf-image external-image" src="https://lh4.googleusercontent.com/XAowaCgtZ8KVRhd2apma0_S-_Ru0XGSpi_q-6cehO0hNV7Nl9GtaaSl6K47ti4gvbfMWZqBZ9fPmBwnQh-4Bo_bBZOQcKxji2j0JlyMu2UZOQPTMQE1TE48TfMM7XTQ9xVVdL3i3yxdzLs6yPldEopM"alt="Diagram of a two stage migration, first from the source data into a Salesforce staging database, and then from the staging database into Salesforce."  loading="lazy" />

    </a>

  
  
</figure></p>
<h2 id="salesforce-schema-mirror-staging-database">Salesforce Schema Mirror Staging Database</h2>
<p>The key to this process is that staging database in the middle.</p>
<p>In my experience having a database that is a clone of Salesforce’s schema allows you to fully prepare the data prior to loading. It also gives you a source of truth when handling partially loaded data.</p>
<p>Salesforce is slow to load compared to most traditional databases. By having a staging database you can load fast, gives you a chance to insert steps into your process that are hard in other contexts. These steps allow for testing, speed-enhancements, and error recovery.</p>
<p>Some ETL tools make a staging database easy to build, others do not. If you aren’t sure how to build such a database (or it seems like a huge effort to re-create all those tables), you can use <a href="/2022/05/getting-started-with-salesforce2sql/">Salesforce2Sql</a> – that’s why I created it. It will clone your Salesforce org&rsquo;s schema into any of its supported databases.</p>
<h3 id="testing-and-error-recovery">Testing and Error Recovery</h3>
<p>The staging database lets you test for errors after you do your initial conversion; before you load it into Salesforce. You can leverage reporting and scripting engines designed for that database. You can log and error trap during your loading process far more gracefully than the Salesforce APIs support by default.</p>
<p>I often add one more table than just the objects: a logging table. This allows me a place to write the rows from Salesforce error files, and also log the time it takes for each process to run. I can see exactly what errors my process encounters at the record level during testing, and measure the running time.</p>
<p>This database will also give you a place to trace what has, and has not, been loaded into Salesforce. More about how to implement this and drive performance to come.</p>
<h2 id="transform-the-data">Transform the Data</h2>
<p>Using the tool of your choice, create a process to transform the data from the source data into your staging database. How you do this stage could be a series of posts by itself. For my ideas on a good process for this I suggest <a href="/2023/06/queries-on-data/">my Queries on Queries talk</a>.</p>
<p>Your process will have a mountain of small details – I often describe it as &ldquo;hard <em>and</em> boring&rdquo;. Done well this is your best point in the process for testing your work. Test thoroughly! You should run this process so many times you lose count.</p>
<h3 id="salesforce-migration-keys">Salesforce Migration Keys</h3>
<p>One important detail is that you will want to leave the main record <code>Id</code> field null. Your legacy Id goes into a legacy Id field, but the main Id field should be empty. We’ll use that in the loading stage to determine which records were successfully loaded and which need follow up attention.</p>
<p>Every object you are migrating should have a legacy Id field that links back to your source database. These should generally be text fields, set to be external Ids and unique. These fields will both help with the migration itself, but also the validation process – and should you need to, you will be able to update the data post-migration using those same keys.</p>
<p>To handle references between records use the legacy Ids as the lookup Id values. For example, on a Salesforce Contact there is an <code>AccountId</code> field to reference the parent account. The Account’s legacy Id should be in <code>AccountId</code>. Often this value is already in your foreign key fields so it can be a real time saver in your transformation build. We’ll see in a minute how we use those to resolve to new Salesforce Ids as we load data.</p>
<h3 id="data-cleaning">Data Cleaning</h3>
<p>This is also the time and place to do whatever data cleansing you plan to do in your process. You can do that work post-launch as well (mostly). I highly recommend this cleansing be automated for large data sets. If you can&rsquo;t automate it, do it pre-migration in your old system, or post-migration in Salesforce.</p>
<h3 id="pre-load-data-validation">Pre-Load Data Validation</h3>
<p>Using the staging database your transformed data can be fully validated before you load it.</p>
<ul>
<li>Check your references: Make sure all your lookup fields are populated with valid data.</li>
<li>Check your record counts: Do you have the expected number of records in every table?</li>
<li>Check your critical fields: All data points are created equal, but some are more equal than others. Check those a few extra times.</li>
</ul>
<p>If you have the time and resources, you can write scripts and other automations to run these tests for you. The more the better.</p>
<h2 id="loading-salesforce">Loading Salesforce</h2>
<p>Finally, all that data you just transform and staged is ready for high volume loading. For each object you run two steps:</p>
<ol>
<li>Insert the data via the Bulk API ( <em>Insert not Upsert!</em>), record the start and end time and all errors in your log table.</li>
<li>Update the records in your staging database to add the new Salesforce Id into the source record&rsquo;s Id column (the one I told you to leave blank before).</li>
</ol>
<p>When there are no null values left in the Id column, you have loaded all your data. If there are records that refuse to load, for any reason, you will know because the Id will be null. If you logged the errors you can see why.</p>
<p>You will also use those Ids in later jobs to update the reference Ids. Remember, we put the legacy Id into your reference fields, when you actually load the data you need to replace that legacy Id with the actual Salesforce Id.</p>
<p>When possible you should build these load jobs to only load records without a Salesforce Id already assigned. That will allow you to safely re-run the job if it encounters errors that lead to partial success (like record locking, see below).</p>
<h3 id="why-not-upsert">Why Not Upsert?</h3>
<p>People used to loading small amounts of data will be tempted to use Salesforce&rsquo;s upsert command. The benefit is that it allows you to use those legacy Id values directly instead of swapping for the newly generated Id. But as record volumes grow, upsert performance drops – I’ve had projects where I measured it at ⅓ the speed of insert, I heard of projects where it got far worse than that. The larger the dataset, the more important it is to use Insert.</p>
<h2 id="playing-nice-with-salesforce">Playing Nice with Salesforce</h2>
<p>To make sure your data loads correctly, and efficiently, there are three more important details you still need to plan for:</p>
<ol>
<li>Automations and Sharing Rules</li>
<li>Object Load Order</li>
<li>Record Load Order</li>
</ol>
<h3 id="automations-and-sharing-rules">Automations and Sharing Rules</h3>
<p>Automations take time to run, even small amounts of time add-up when loading large amounts of data. To the degree possible, you want automations off. Some automations you want to replicate in your transformation process – particularly if it’s a simple field value or record creation. Some automations you want to defer and run later, like custom roll up values via <a href="https://github.com/SFDO-Community/declarative-lookup-rollup-summaries">DLRS</a>, NPSP rollups, or similar approaches. And some automations you cannot disable at all.</p>
<p>Sharing calculations in Salesforce are really a special-purpose automation. Just not one you often think about unless you’re doing manual sharing. Like all automations in Salesforce, the more data you load, the larger the impact of these calculations. Salesforce allows you to <a href="https://help.salesforce.com/s/articleView?id=sf.security_sharing_defer_sharing_calculations.htm&amp;type=5">defer these calculations and run them in the future</a>. The more complex your security setup, the more impact this will have (open security models can generally ignore this consideration).</p>
<p>The person doing the data loading needs to work with the folks that implemented those automations to map out which can be disabled, which can be deferred, and which must to be tolerated.</p>
<h3 id="object-load-order">Object Load Order</h3>
<p>In Salesforce, object load order is critical. You cannot disable or defer assignment of required references. So you need to understand the object hierarchy and relationships.</p>
<p>Generally you start with objects that have no dependencies: e.g. Account, Campaign, Product, Lead.</p>
<p>Then proceed to objects that have relationships to those: e.g. Contact</p>
<p>Then to objects that can have relationships to objects from that previous layer: e.g. Opportunity, Account Contact Relation, Campaign Member</p>
<p>When possible, test running two objects in parallel. What exact combination is most efficient will vary by org details and data volumes. My experience is that you will be able to run objects in 4-5 groups usually with two or three objects loading in parallel.</p>
<p>Ideally we’d just load records and not have to go back and update, but if there are circular references, or record hierarchies you’ll need to update records after insert. Plan that second pass into your sequence.</p>
<h4 id="users">Users</h4>
<p>Salesforce Users are a special case. If you have a security model where record ownership is important, you need to load Users first.  If you have an open security model, I recommend loading Users last – and the smallest number of Users possible.  Remember, Salesforce bans User deletion, so you must be as careful as possible about loading them.  I <em>never</em> like to load Experience Cloud Users if I can avoid it – 1,000’s of accounts that will never be used but cannot be deleted is sub-optimal.</p>
<h3 id="record-load-order-and-record-locks">Record Load Order and Record Locks</h3>
<p>Salesforce has aggressive record locking to deal with concurrent edits and updates across relationships. Great for day-to-day operation; frustrating when you&rsquo;re loading data.</p>
<p>The first place people often encounter this is when they go to load Opportunities. Opportunity bulk load can run into massive problems with Account records being locked because another Opportunity is being loading for the same Account in a parallel process. If you <a href="https://developer.salesforce.com/blogs/2014/02/the-salesforce-bulk-api-maximizing-parallelism-and-throughput-performance-when-integrating-or-loading-large-data-volumes">sort the records by the locking parent record</a> you can often reduce, if not eliminate, your record locking issues.</p>
<ul>
<li>Sort Contacts and Opportunities by Account.</li>
<li>Sort Campaign Member by Campaign.</li>
<li>Some Objects benefit from sorting by multiple fields: e.g. <a href="https://developer.salesforce.com/blogs/engineering/2014/08/managing-task-locks-data-loads">Tasks which you want to sort by WhoId and WhatId</a>.</li>
</ul>
<p><strong>Use Serial Mode only as a last resort.</strong> Serial mode is ⅕ the speed of Parallel mode most of the time. There <em>are</em> situations that call for it. But it should never be your default go-to solution. Try everything else first before resorting to serial mode. Since you have tracking of which records were loaded or failed, if you design your load job carefully you can just re-run to resolve small numbers of record locks.</p>
<h3 id="extra-sorting-trick">Extra Sorting Trick:</h3>
<p>It turns out, in many cases the way data gets entered over time will gather it in useful patterns. So sorting data by a date field can radically reduce record lock contention. If you cannot figure out what field to sort by (often because sorting by field 1 causes locking issues on another object) try sorting by a date field and see if that helps.</p>
<p><em>Warning: depending on your data patterns, it can make the problem vastly worse too.</em></p>
<h2 id="mock-runs">Mock Runs</h2>
<p>A mock run is a test load into a sandbox that should involve you going through all the steps to load the data – starting with extracting it from the source system.</p>
<p>I personally recommend at least two full test mocks of your process.</p>
<p>If you’re working on a tight budget that may not be feasible (migrations are the first place project leaders trim budgets, and the first place users complain about errors), but that doesn’t mean multiple tests aren’t valuable.</p>
<p>The first test will go poorly, but you’ll learn a lot. The second test will, hopefully, go far better, but you will still learn a great deal.</p>
<p>In your testing you should expect to find places where your mappings are wrong, your transformations are incorrect, your testing is inadequate, your load order doesn’t work, you have source data patterns not accounted for, and more. Make time for good testing, you’ll thank yourself later.</p>
<h2 id="final-considerations">Final Considerations</h2>
<p>Large volume data loading in Salesforce is a deep topic. For all this is a long article, I’ve left out a lot of details. I designed this pattern to support high speed loads, rigorous testing, and error recovery. But within each of step of this pattern I could write articles this long or longer. You should continue to research the topic and adapt your implementation to your project.</p>
<p>A few sample topics you might consider:</p>
<ul>
<li><a href="https://help.salesforce.com/s/articleView?id=000385636&amp;type=1">How to set audit fields.</a></li>
<li>How to <a href="https://help.salesforce.com/s/articleView?id=000385540&amp;type=1">enable and disable triggers in production</a>.</li>
<li>Wipe and reload an org&rsquo;s data.</li>
</ul>
<p>You may even need to do something I’ve never encountered before.</p>
<p>But in any large volume Salesforce data load, the general pattern outlined here will serve you well.</p>
]]></content:encoded>
    </item>
    <item>
      <title>The Queries Part 3 of 3</title>
      <link>https://spinningcode.org/2023/07/the-queries-part-3-of-3/</link>
      <pubDate>Tue, 25 Jul 2023 00:31:59 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=2067</guid> 
      <description>More questions you can use to challenge your team to improve your migration game.</description>
      <content:encoded><![CDATA[<p>This is the third and final post in <a href="/tags/queries-on-queries/">a series of posts</a> to break down the questions from my Queries on Queries talk. <a href="/2023/06/queries-on-data/">The full talk is available here</a>.</p>
<h1 id="is-your-solution-reusable">Is your solution reusable?</h1>
<blockquote>
<p><em>Migrations feel like one off processes, but teams that migrate once usually migrate again.</em></p>
<p>Have you ensured that as much of your solution as possible can be reused? Do you have a shared library of migration tools that your whole team can access? When you create new functionality are you thinking about ways to make it usable in your next project?</p>
</blockquote>
<p>On any technology project you will generally benefit from designing for re-usability. I mentioned in my comments on the question about repeatability that people get tempted to see migration work as fundamentally one-off, but you need to plan for many runs. That question is focused on repeating the same project, this is about recycling parts of this project in another.</p>
<p>To a consultant, the value of reuse should be obvious: we like to sell projects to new clients based on successful projects for another client. For that I want libraries of tools the developer designed for rapidly assembled to meet a new client’s needs.</p>
<p>But even when I <em>was</em> the client, I was moving similar data into the same systems over and over. I created API libraries, and rough interfaces, to handle some of that work so I didn’t have to do the same tedious work again and again.</p>
<p>In both cases those libraries are only useful if whoever needs them knows they exist, has access to them, and can figure out how to leverage them.</p>
<h1 id="is-your-migration-testable">Is your migration testable?</h1>
<blockquote>
<p><em>All good processes are rigorously tested.</em></p>
<p>Do you have an automated testing solution that validates your process? Can you tell if the data migrated accurately after each test run? Do your tests cover the positive and negative cases?</p>
</blockquote>
<p>Testing migrations is hard. Testing software is hard. The testing tools that developers are most familiar with are unit testing tools, test one very small thing at a time. Multi-system data comparison is not their forté. The tools that do exist for such work tend to be quite expensive and/or so complex the task of creating tests is nearly as hard as the task of creating the migration jobs themselves.</p>
<p>But just because testing is hard does not mean you shouldn’t do what you can do within the budget and time you have. When you cannot use something like MuleSoft’s MUnit you can still create queries that sanity check the migrated and generated data. You select records for spot checking that cover edge cases you are aware of, and some that represent primary use cases. You can look for records that create invalid data states that would violate your new validation rules.</p>
<h1 id="is-your-work-fixable">Is your work fixable?</h1>
<blockquote>
<p><em>Migrated data often needs to be updated after the jobs have all run.</em></p>
<p>Do you have a plan to fix your data if errors are found post migration? Does your plan include ensuring you have external Ids, or other connections, to be able to update all records of every type? Have you validated this plan will work in practice?</p>
</blockquote>
<p>When you do a data migration, because everything is determinant, you feel like perfection is possible. But when you’re moving millions of records that were entered by humans, extracted by humans, mapped by humans, validated by humans, and represent human behaviors, there is a lot of room for human error.</p>
<p>You can either pretend your process is good enough to squeeze out the error, or build a process that allows you to fix the errors that slip through. Obviously I don’t believe the first is possible, so I encourage the second.</p>
<p>Make sure you can go back and update anything. If you’re migrating into a database that allows for a lot of easy changes – great. If you’re migrating into a financial system – make sure you understand the rules for editing.</p>
<p>Planning for mistakes you don’t want to have makes it far easier to recover from those mistakes when they appear.</p>
]]></content:encoded>
    </item>
    <item>
      <title>The Queries Part 2 of 3</title>
      <link>https://spinningcode.org/2023/07/the-queries-part-2-of-3/</link>
      <pubDate>Mon, 17 Jul 2023 13:32:00 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=2055</guid> 
      <description>More questions you can use to challenge your team to improve your migration game.</description>
      <content:encoded><![CDATA[<p>This is the second in <a href="/tags/queries-on-queries/">a series of posts</a> to break down the questions from my Queries on Queries talk. <a href="/2023/06/queries-on-data/">The full talk is available here</a>.</p>
<h2 id="is-your-work-repeatable">Is your work repeatable?</h2>
<blockquote>
<p><em>You will need to do this more than once.</em></p>
<p>Is your process designed so you can run it over and over without error? Can you easily erase test attempts and start over from a clean slate? Do you have the capacity to do all the practice runs you need to complete your project successfully and on schedule?</p>
</blockquote>
<p>Because a migration is fundamentally a one-way operation, designed to move data once, it’s tempting to build the whole process as a one-off affair. I’ve seen (even used) migration processes that required hours or days of hand polishing data to get it to load – this is a terribly way to do the job.</p>
<p>A good migration process should be automated. To automate anything you need to test it. If you test something you should expect it to fail many times before it works. And when it fails you need to run it again and again until it works.</p>
<p>By their very nature data migrations create data – in a target system no less – and so you need a way to roll back your changes to migration to a pre-run state for each subsequent test. I like to use a staging database for the main complex parts of my migrations. I created <a href="/2022/05/getting-started-with-salesforce2sql/">Salesforce2Sql</a> just to make that so easy no one would be tempted to skip that step. When I create processes in an ETL, I like to have jobs start by deleting data from the staging database related to the job, so I can make <a href="https://developer.salesforce.com/blogs/engineering/2013/01/implementing-idempotent-operations-with-salesforce">Idempotent jobs</a> as much as possible. Run, test, adjust, repeat. If you know how many times you ran your migration process, you didn’t run the jobs enough.</p>
<h2 id="is-your-work-measurable">Is your work measurable?</h2>
<blockquote>
<p><em>To know you moved all the data, you must know how much data is going in and how much should come out.</em></p>
<p>Can you accurately predict your output data volume based on the input size? Do you have valid estimates of the running time required for each stage based on the data volumes? Are the estimates of expected data set size from a reliable source?</p>
</blockquote>
<p>It seems like knowing how to measure your work should be obvious, but in truth most interesting migrations are not a simple record-in, record-out – they involve splitting records, combining tables, filtering data, converting tables to fields, fields to tables, and other similar adjustments. But the only way to know if you got it all to work out right is to work out the math wherever you can.</p>
<p>It’s also important to know how long a process will take. Sometimes a few thousand records here or there doesn’t matter much, but sometimes that is a matter of hours. Particularly when running samples it’s important to know the average running time. I’m working on a project right now where we know that the first 3 million records will load in about 6 hours, the last 45,000 records will take 12 hours.</p>
<p>In that project we’ve worked out those running times, and we have a good understanding of total records counts. In other projects we thought we knew, only to discover the person giving us the source record counts was talking about the per-year instead of total expected migration size. But with per-record estimates we can adjust expectations quickly when information changes.</p>
<h2 id="do-you-scope-your-data-migrations-carefully">Do you scope your data migrations carefully?</h2>
<blockquote>
<p><em>Limiting bad data in your system allows for better decisions in the future.</em></p>
<p>Do you only load data into the new system that you truly need? Can you easily spot the difference between new and old records? Are there data points getting loaded that have no use case or maintenance plan in the target system?</p>
</blockquote>
<p>Everyone wants to keep all their data. My entire career I have understood that storage is cheap, and big data is king. AI driven data analytics have been around for a few years, and now we have all the attention on generative AIs, both benefit from large data sets.</p>
<p>These all tools are great, but they aren’t magic.</p>
<p>Big data processing, whether it be AI driven or not, is all about correlations. If you give a correlation engine bad data, it will give you bad results. Garbage in is still garbage out.</p>
<p>You only want to migrate data that’s good.
You only want to migrate data that’s useful.
You only want to migrate data that you will maintain.</p>
<p>So before you start a migration make sure you know your data will fall into those categories. Organizations can always archive data they don’t migrate.</p>
<p>There are other reasons more data isn’t always better.</p>
<p>If your system, or data archive, is ever breached that presents a risk to an organization. Privacy laws are steadily tightening, increasing the chances you will have to admit to your audience you were the cause of their information falling into the hands of bad actors.</p>
<p>Also, old data is often bad data. Colleges often have the email address used by their applicants squirreled away in their alumni systems.  How useful do you think the AOL address I used in 1997 is to Hamilton College today? If they use it, they will fail to reach me. It provides them no value, but does provide them the chance to make mistakes. Same is true of old phone numbers, addresses, and more.</p>
<p>Keep the good stuff, let go of the stuff you don’t need.</p>
]]></content:encoded>
    </item>
    <item>
      <title>The Queries Part 1 of 3</title>
      <link>https://spinningcode.org/2023/07/the-queries-part-1-of-3/</link>
      <pubDate>Mon, 10 Jul 2023 13:00:00 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=2050</guid> 
      <description>Questions you can use to challenge your team to improve your migration game.</description>
      <content:encoded><![CDATA[<p>This is the first in <a href="/tags/queries-on-queries/">a series of posts</a> to break down the questions from my Queries on Queries talk. <a href="/2023/06/queries-on-data/">The full talk is available here</a>.</p>
<h2 id="are-your-tools-good-enough">Are your tools good enough?</h2>
<blockquote>
<p><em>Our migrations live and die by our tools.</em></p>
<p>Are your tools built for the scale of your project? Do they empower you to do your best work or impede rapid progress? Would a new tool serve you better now or in the future?</p>
</blockquote>
<p>Having <a href="https://www.hseblog.com/importance-of-selecting-the-right-tools-for-the-job/">the right tools</a> is critical to any job. In data migration we primarily talk about ETLs ( <strong>E</strong> xtract, <strong>T</strong> ransform, and <strong>L</strong> oad): tools like <a href="https://www.jitterbit.com/">Jitterbit</a>, <a href="https://www.informatica.com/">Informatica</a>, <a href="https://www.mulesoft.com">Mulesoft</a>, <a href="https://www.talend.com/">Talend</a>, etc. We also use additional tools to help support the process: a task tracker like <a href="https://www.atlassian.com/software/jira">Jira</a>, a Data Modeler like <a href="https://www.lucidchart.com/pages/">Lucidchart</a>, staging database prep like <a href="/2022/05/getting-started-with-salesforce2sql/">Salesforce2Sql</a>, and more.</p>
<p>It’s easy to say that <a href="https://thecontentauthority.com/blog/what-does-it-is-a-poor-workman-who-blames-his-tools-mean">it’s a poor carpenter who blames his tools</a>, but anyone who has spent time with actual carpenters knows they care a great deal about what tools they use. They might be able to make due with poor tools, but they will do their best work with the right tools for the job.</p>
<p>Each tool you use needs to meet your team’s needs. It should play to your strengths, supports the kinds of projects do you do, and has an eye to the future. A tool that works great for a team of declarative Salesforce consultants might drive developers crazy. A tool that works great for 10’s of thousands of records might struggle with millions; a tool scaled for 10s of millions of records may be overly complex for a project of 30,000.</p>
<p>Make sure you’re using the tools that let you do your best work, now and in the future.</p>
<h2 id="do-you-make-the-data-atomic-for-processing">Do you make the data atomic for processing?</h2>
<blockquote>
<p><em>Smaller pieces of data are easier to track, manipulate, and test.</em></p>
<p>Do you divide the source data into its constituent parts? Can you process individual pieces of data easily and cleanly? Can you stop your process after each stage to validate the results?</p>
</blockquote>
<p>It can be tempting to process data as it comes: handling whole rows of data in the form they were provided and treating fields as a single data point. In practice exports may have extra rows or columns to deal with related records. Organizations may have encoded multiple points of data into fields like ticket names including a show name, date, and time into the name field. Fields can also contained semi-structured data, like Joomla’s use of arbitrary JSON blobs.</p>
<p>To process this data it is often easier and clearer to extract it from these structures prior to direct processing. It’s not always needed, and rarely required, but doing this clean up of structure – like creating interstitial database tables or predictable data objects – can greatly ease the rest of your job.</p>
<p>Like many problems in software engineering, it’s easier to do good work when you are operating on atomic pieces. Think about the right ways to pull your data into constituent parts when they aren’t there already.</p>
<h2 id="can-you-process-samples-of-your-data-set">Can you process samples of your data set?</h2>
<blockquote>
<p><em>When you have lots of data you need to test small parts to be sure your process works.</em></p>
<p>Do you know how to create and run small segments of your total input? Are your segments made up of complete and valid samples? Does your sample include all the errors and edge cases your data set will throw at your process?</p>
</blockquote>
<p>If you are working with small data sets your sample can be all the data. But when you have a large data set you need to test your process with samples. When you have a multi-step migration you likely need to test the second phase while the first phase is still under construction – again a sample data set is critical.</p>
<p>Having valid test data, that covers all your edge cases, is critical to making sure you have a working solution.</p>
<p>A few years ago I worked on a project that involved a two stage migration for a membership organization with some 600,000 active contacts. Every one of them needed to be migrated into Salesforce and then into Drupal. To test the Drupal migration we needed samples of all the types of membership statuses we would see, which involved hand creating several hundred records. At the next Salesforce Commons Sprint I raised the idea of needing a better tool for this kind of work, that question eventually helped lead to <a href="https://www.linkedin.com/in/paulprescod/">Paul Prescod</a>&rsquo;s <a href="https://developer.salesforce.com/podcast/2021/07/episode-89-snowfakery-data-generation-with-paul-prescod">creation of Snowfakery</a>. Snowfakery will build you testing data sets of any size and complexity to make sure your processes will succeed.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Queries on Queries: Improve your data migration</title>
      <link>https://spinningcode.org/2023/06/queries-on-data/</link>
      <pubDate>Fri, 30 Jun 2023 00:05:52 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=2045</guid> 
      <description>Queries on Queries presents opinionated questions to help you evaluate and improve your data migration process and practices</description>
      <content:encoded><![CDATA[<p>Last week I gave my <a href="/2021/07/sc-dug-july-2021-queries-on-queries/">Queries on Queries talk</a>, intended to help you improve your data migration process, as a webinar for <a href="https://attainpartners.com/">Attain Partners</a>.  It’s a revised and improved version from the last time I gave it.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/HfxGiHlzACE?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>These questions aren’t like the <a href="https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/%5D">old Joel Test</a> (which is still useful) where the right answer is “yes”. These questions are designed to point you in a direction but allow you to change your answer over time. I generally answer this questions with a paragraph not a word. Use these questions as a challenge to make you and your team better.</p>
<p>Over the next few weeks I’m planning to <a href="/tags/queries-on-queries/">publish a series</a> that will include each query and why I think it’s useful in helping you think about how to improve your process.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Salesforce Data Migration Lessons</title>
      <link>https://spinningcode.org/2023/03/salesforce-data-migrations/</link>
      <pubDate>Wed, 29 Mar 2023 14:00:00 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=2012</guid> 
      <description>Three lessons I wish I knew on when I started doing Salesforce Data Migrations.</description>
      <content:encoded><![CDATA[<p>Last week I was part of <a href="https://attainpartners.com/event/data-migration-for-education-institutions-and-nonprofits/">a webinar for Attain Partners</a> talking about Salesforce data migrations. One of the questions the moderator, <a href="https://www.linkedin.com/in/ericmagnuson/">Eric Magnuson</a>, asked was the three lessons I&rsquo;d learned doing data migration work.</p>
<p>The answers I chose weren&rsquo;t so much from my first data migration projects, as from more recent projects. Those early migrations were tiny by my current standards. Back then, I was mainly the consumer of migrated data. When I did migrations they were small and manual. The intellectual process was similar but the scale meant I had lots left to learn about large data projects. The lessons I chose are focused on what I wish I knew when doing the migration for clients with vastly larger data sets. When I became the producer of migrated data.</p>
<p>My three lessons:</p>
<ol>
<li>Manage expectations from the start.</li>
<li>Don&rsquo;t be a hero.</li>
<li>Understand your tools, and use them well.</li>
</ol>
<h2 id="manage-expectations-from-the-start">Manage Expectations from the Start</h2>
<p>This still surprises me, but it&rsquo;s true: one of the biggest reasons data migrations get into trouble is bad expectations.</p>
<p>People tend to think data migrations are easy. You create a map from the old system to the new, run some processes to convert the data, load into Salesforce. But in practice, that map might involved thousands of details. Those conversions are hard to get right for every variation of the old data. And loading large amounts of data is never as easy as it looks.</p>
<p>When people think something is easy, and then the outcome is less then perfect, they get mad.</p>
<p>The problem is, data is always messy. We do migrations when we replace systems. We replace systems when the old one has problems. Problems in a system, lead to data errors.</p>
<p>That reality is made worse by the reality of big system switches. The freshly migrated data lands in a system the primary users are still learning. Those factors lead to confusion, mistakes, errors, and misunderstandings.</p>
<p>From the very first conversation I tell our clients to expect migration errors. Our first mock run of the migration is messy – sometimes very messy. The whole point is to find errors. The second mock run will be much better, but still imperfect; we&rsquo;ll find more errors. No matter how many times we do test runs, there is only some much we can do with imperfect inputs.</p>
<p>We can make your data better then it is, but we can&rsquo;t make it perfect. If you expect perfect data, you will be disappointed. I want you to be thrilled by your new system, and that means you need to understand your migrated data will have flaws.</p>
<h2 id="dont-be-a-data-hero">Don&rsquo;t Be A Data Hero</h2>
<p>When I first did data migrations I was often a one-person show. I was responsible for figuring out all the details, implementing the process, reporting to the client, and fixing all the flaws. It&rsquo;s a common story for people doing migrations.</p>
<p>We find ourselves up late at night, working through piles of detail, trying to make sure the client is satisfied. It encourages a hero mentality: I&rsquo;ll make the project successful through shear will.</p>
<p>And most of us can do it. Being a data hero isn&rsquo;t that hard if you put the hours in. It is, however, miserable.</p>
<p>People doing migrations need, and deserve, support. Now that I&rsquo;m leading a team of people doing migrations I have added the support I should have had. I created a space for us to come together and talk about our projects.</p>
<p>We ask each other for help and suggestions. We offer ideas for how to improve our processes. We talk about ways to address client concerns. And yes, we complain to one another. But mostly what we do is make sure that no one is alone. Everyone, myself included, has support and back up.</p>
<p>A team is stronger than a person. We don&rsquo;t need to be heroes. We do better work, and are happier people, when we support one another. Good work, by happy people, makes for successful projects.</p>
<h2 id="understand-your-migration-tools">Understand Your Migration Tools</h2>
<p>Data processing tools are code generators – understanding that allows you to use them well.</p>
<p>Both parts of that make sense once you say it:</p>
<ul>
<li>Tools that allow you to design an arbitrary process that takes inputs and generates outputs are obviously writing code at some level.</li>
<li>If you understand any tool better, you will use it better. That&rsquo;s true of a hammer, a screw driver, or a piece of software.</li>
</ul>
<p>I learned how to migrate data from people who weren&rsquo;t formally trained developers. They were using the tools the best way they knew how, but didn&rsquo;t have the background to apply software development best practices.</p>
<p>When I combined the tool usage they taught me, with the software engineering practices I already knew, I created vastly superior solutions. Our team now creates processes that are easier to setup, run faster, and allow us to fix all errors (even if we missed them until after launch).</p>
<h2 id="apply-lessons-our-salesforce-data-migrations">Apply Lessons our Salesforce Data Migrations</h2>
<p>I apply these lessons to the Salesforce data migrations I lead in my work at Attain Partners. I combine that with my <a href="/2021/07/sc-dug-july-2021-queries-on-queries/">queries on queries</a> review process, and am constantly building better solutions for our clients, our team, and myself.</p>
]]></content:encoded>
    </item>
    <item>
      <title>SC DUG July 2021 - Queries on Queries</title>
      <link>https://spinningcode.org/2021/07/sc-dug-july-2021-queries-on-queries/</link>
      <pubDate>Fri, 16 Jul 2021 12:30:00 +0000</pubDate>
       <guid isPermaLink="false">https://spinningcode.org/?p=1680</guid> 
      <description>For the July 2021 SC DUG, I gave my new talk titled &amp;#34;Queries on Queries&amp;#34; which poses questions to ask yourself when migrating data between systems.</description>
      <content:encoded><![CDATA[<p>For the July 2021 SC DUG, I gave my new talk titled &ldquo;Queries on Queries&rdquo; which poses questions to ask yourself when migrating data between systems. Data migrations are often critical to project success, but all too often that are treating as a throw-away process. This talk is intentionally platform agnostic building from my experience with both Drupal and Salesforce.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/4yyRwjO3nXA?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p>If you would like to join us please <a href="https://www.meetup.com/SC-Drupal-Users-Group/">check out our up coming events</a> on MeetUp. You’ll find our meeting times and, once you RSVP, remote connection information.</p>
<p>We frequently use these presentations to practice new presentations, heavily revised versions, and test out new ideas with a friendly audience. So if some of the content of these videos seems a bit rough please understand that is some of the point. If you want to see a polished version checkout our group members’ talks at camps and cons.</p>
<p>If you are interested in giving a practice talk, leave me a comment here, <a href="https://www.drupal.org/u/acrosman/">contact me through Drupal.org</a>, or find me on Drupal Slack. We’re excited to hear new voices and ideas. We want to support the community, and that means you.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
