Drupal Salesforce Suite Custom Field Mapping Types

The Drupal 8 Salesforce Suite allows you to map Drupal entities to Salesforce objects using a 1-to-1 mapping. To do this it provides a series of field mapping types that allow you to select how you want to relate the data between the two systems. Each field type provides handling to help ensure the data is handled correctly on each side of the system.

As of this writing the suite provides six usable field mapping types:

  • Properties — The most common type to handle mapping data fields.
  • Record Type — A special handler to support Salesforce record type settings when needed.
  • Related IDs — Handles translating SFIDs to Drupal Entity IDs when two objects are related in both systems.
  • Related Properties — For handling properties across a relationship (when possible).
  • Constant — A constant value on the Drupal side that can be pushed to Salesforce.
  • Token — A value set via Drupal Token.

There is a seventh called Broken to handle mappings that have changed and need a fallback until its fixed. The salesforce_examples module also includes a very simple example called Hardcoded the shows how to create a mapping with a fixed value (similar to, but less powerful than, Constant field).

These six handle the vast majority of use cases but not all.  Fortunately the suite was designed using Drupal 8 annotated plugins , so you can add your own as needed. There is an example in the suite’s example module, and you can review the code of the ones that are included, but I think some people would find an overview helpful.

As an example I’m using the plugin I created to add support for related entities to the webform submodule of the suite (I’m referencing the patch in #10 cause that’s current as of this writing, but you should actually use whatever version is most recent or been accepted).

Like all good annotated plugins to tell Drupal about it all we have to do is create the file in the right place. In this case that is: [my_module_root]/src/Plugins/SalesforceMappingField/[ClassName] or more specifically: salesforce_webform/src/Plugin/SalesforceMappingField/WebformEntityElements.php

At the top of the file we need to define the namespace, add some use statements.

<?php
 
namespace Drupal\salesforce_webform\Plugin\SalesforceMappingField;
 
use Drupal\Core\Entity\EntityInterface;
use Drupal\Core\Form\FormStateInterface;
use Drupal\salesforce_mapping\Entity\SalesforceMappingInterface;
use Drupal\salesforce_mapping\SalesforceMappingFieldPluginBase;
use Drupal\salesforce_mapping\MappingConstants;

Next we need to provide the required annotation for the plugin manager to use. In this case it just provides the plugin’s ID, which needs to be unique across all plugins of this type, and a translated label.

/**
 * Adapter for Webform elements.
 *
 * @Plugin(
 *   id = "WebformEntityElements",
 *   label = @Translation("Webform entity elements")
 * )
 */

Now we define the class itself which must extend SalesforceMappingFieldPluginBase.

class WebformEntityElements extends SalesforceMappingFieldPluginBase {

With those things in place we can start the real work.  The mapping field plugins are made up of a few parts: 

  • The configuration form elements which display on the mapping settings edit form.
  • A value function to provide the actual outbound value from the field.
  • Nice details to limit when the mapping should be used, and support dependency management.

The buildConfigurationForm function returns an array of form elements. The base class provides some basic pieces of that array that you should plan to use and modify. So first we call the function on that parent class, and then make our changes:

 /**
   * {@inheritdoc}
   */
  public function buildConfigurationForm(array $form, FormStateInterface $form_state) {
    $pluginForm = parent::buildConfigurationForm($form, $form_state);
 
    $options = $this->getConfigurationOptions($form['#entity']);
 
    if (empty($options)) {
      $pluginForm['drupal_field_value'] += [
        '#markup' => t('No available webform entity reference elements.'),
      ];
    }
    else {
      $pluginForm['drupal_field_value'] += [
        '#type' => 'select',
        '#options' => $options,
        '#empty_option' => $this->t('- Select -'),
        '#default_value' => $this->config('drupal_field_value'),
        '#description' => $this->t('Select a webform entity reference element.'),
      ];
    }
    // Just allowed to push.
    $pluginForm['direction']['#options'] = [
      MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF => $pluginForm['direction']['#options'][MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF],
    ];
    $pluginForm['direction']['#default_value'] =
      MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF;
    return $pluginForm;
 
  }

In this case we are using a helper function to get us a list of entity reference fields on this plugin (details are in the patch and unimportant to this discussion). We then make those fields the list of Drupal fields for the settings form. The array we got from the parent class already provides a list of Salesforce fields in $pluginForm[‘salesforce_field’] so we don’t have to worry about that part.  Since the salesforce_webform module is push-only on its mappings, this plugin was designed to be push only as well, and so limits to direction options to be push only. The default set of options is:    

'#options' => [
    MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF => t('Drupal to SF'),
    MappingConstants::SALESFORCE_MAPPING_DIRECTION_SF_DRUPAL => t('SF to Drupal'),
    MappingConstants::SALESFORCE_MAPPING_DIRECTION_SYNC => t('Sync'),
 ],

And you can limit those anyway that makes sense for your plugin.

With the form array completed, we now move on to the value function. This is generally the most interesting part of the plugin since it does the work of actually setting the value returned by the mapping.

  /**
   * {@inheritdoc}
   */
  public function value(EntityInterface $entity, SalesforceMappingInterface $mapping) {
    $element_parts = explode('__', $this->config('drupal_field_value'));
    $main_element_name = reset($element_parts);
    $webform = $this->entityTypeManager->getStorage('webform')->load($mapping->get('drupal_bundle'));
    $webform_element = $webform->getElement($main_element_name);
    if (!$webform_element) {
      // This reference field does not exist.
      return;
    }
 
    try {
 
      $value = $entity->getElementData($main_element_name);
 
      $referenced_mappings = $this->mappedObjectStorage->loadByDrupal($webform_element['#target_type'], $value);
      if (!empty($referenced_mappings)) {
        $mapping = reset($referenced_mappings);
        return $mapping->sfid();
      }
    }
    catch (\Exception $e) {
      return NULL;
    }
  }

In this case we are finding the entity referred to in the webform submission, loading any mapping objects that may exist for that entity, and returning the Salesforce ID of the mapped object if it exists.  Yours will likely need to do something very different.

There are actually two related functions defined by the plugin interface, defined in the base class, and available for override as needed for setting pull and push values independently:

  /**
   * An extension of ::value, ::pushValue does some basic type-checking and
   * validation against Salesforce field types to protect against basic data
   * errors.
   *
   * @param \Drupal\Core\Entity\EntityInterface $entity
   * @param \Drupal\salesforce_mapping\Entity\SalesforceMappingInterface $mapping
   *
   * @return mixed
   */
  public function pushValue(EntityInterface $entity, SalesforceMappingInterface $mapping);
 
  /**
   * An extension of ::value, ::pullValue does some basic type-checking and
   * validation against Drupal field types to protect against basic data
   * errors.
   *
   * @param \Drupal\salesforce\SObject $sf_object
   * @param \Drupal\Core\Entity\EntityInterface $entity
   * @param \Drupal\salesforce_mapping\Entity\SalesforceMappingInterface $mapping
   *
   * @return mixed
   */
  public function pullValue(SObject $sf_object, EntityInterface $entity, SalesforceMappingInterface $mapping);
 

But be careful overriding them directly. The base class provides some useful handling of various data types that need massaging between Drupal and Salesforce, you may lose that if you aren’t careful. I encourage you to look at the details of both pushValue and pullValue before working on those.

Okay, with the configuration and values handled, we just need to deal with programmatically telling Drupal when it can pull and push these fields. Most of the time you don’t need to do this, but you can simplify some of the processing by overriding pull() and push() to make sure the have the right response hard coded instead of derived from other sources. In this case pulling the field would be bad, so we block that:

  /**
   * {@inheritdoc}
   */
  public function pull() {
    return FALSE;
  }

Also, we only want this mapping to appear as an option if the site has the webform module enabled. Without it there is no point in offering it at all. The plugin interface provides a function called isAllowed() for this purpose:

  /**
   * {@inheritdoc}
   */
  public static function isAllowed(SalesforceMappingInterface $mapping) {
    return \Drupal::service('module_handler')->moduleExists('webform');
  }

You can also use that function to limit a field even more tightly based on the mapping itself.

To further ensure the configuration of this mapping entity defines its dependencies correctly we can define additional dependencies in getDependencies(). Again here we are tied to the Webform module and we should enforce that during and config exports:

  /**
   * {@inheritdoc}
   */
  public function getDependencies(SalesforceMappingInterface $mapping) {
    return ['module' => ['webform']];
  }

And that is about it.  Once the class exists and is properly setup, all you need to do is rebuild the caches and you should see your new mapping field as an option on your Salesforce mapping objects (at least when isAllowed() is returning true).

Bypass Pantheon Timeouts for Drupal 8

Pantheon is an excellent hosting service for both Drupal and WordPress sites. But to make their platform work and scale well they have set a number of limits built into the platform, these include process time limits and memory limits that are large enough for the vast majority of projects, but from time to time run you into trouble on large jobs.

For data loading and updates their official answer is typically to copy the database to another server, run your job there, and copy the database back onto their server. That’s fine if you can afford to freeze updates to your production site, setup a process to mirror changes into your temporary copy, or some other project overhead that can be limiting and challenging. But sometimes that’s not an option, or the data load takes too long for that to be practical on a regular basis.

I recently needed to do a very large import for records into a Drupal database and so started to play around with solutions that would allow me to ignore those time limits. We were looking at needing to do about 50 million data writes and the running time was initially over a week to complete the job.

Since Drupal’s batch system was created to solve this exact problem it seemed like a good place to start. For this solution you need a file you can load and parse in segments, like a CSV file, which you can read one line at a time. It does not have to represent the final state, you can use this to actually load data if the process is quick, or you can serialize each record into a table or a queue job to actually process later.

One quick note about the code samples, I wrote these based on the service-based approach outlined in my post about batch services and the batch service module I discussed there. It could be adapted to a more traditional batch job, but I like the clarity the wrapper provides for breaking this back down for discussion.

The general concept here is that we upload the file and then progressively process it from within a batch job. The code samples below provide two classes to achieve this, first is a form that provides a managed file field which create a file entity that can be reliably passed to the batch processor. From there the batch service takes over and using a bit of basic PHP file handling to load the file into a database table. If you need to do more than load the data into the database directly (say create complex entities or other tasks) you can set up a second phase to run through the values to do that heavier lifting. 

To get us started the form includes this managed file:

   $form['file'] = [
     '#type' => 'managed_file',
     '#name' => 'data_file',
     '#title' => $this->t('Data file'),
     '#description' => $this->t('CSV format for this example.'),
     '#upload_location' => 'private://example_pantheon_loader_data/',
     '#upload_validators' => [
       'file_validate_extensions' => ['csv'],
     ],
   ];

The managed file form element automagically gives you a file entity, and the value in the form state is the id of that entity. This file will be temporary and have no references once the process is complete and so depending on your site setup the file will eventually be purged. Which all means we can pass all the values straight through to our batch processor:

$batch = $this->dataLoaderBatchService->generateBatchJob($form_state->getValues());

When the data file is small enough, a few thousand rows at most, you can load them all right away without the need of a batch job. But that runs into both time and memory concerns and the whole point of this is to avoid those. With this approach we can ignore those and we’re only limited by Pantheon’s upload file size. If they file size is too large you can upload the file via sftp and read directly from there, so while this is an easy way to load the file you have other options.

As we setup the file for processing in the batch job, we really need the file path not the ID. The main reason to use the managed file is they can reliably get the file path on a Pantheon server without us really needing to know anything about where they have things stashed. Since we’re about to use generic PHP functions for file processing we need to know that path reliably:

$fid = array_pop($data['file']);
$fileEntity = File::load($fid);
$ops = [];

if (empty($fileEntity)) {
  $this->logger->error('Unable to load file data for processing.');
  return [];
}
$filePath = $this->fileSystem->realpath($fileEntity->getFileUri());
$ops = ['processData' => [$filePath]];

Now we have a file and since it’s a csv we can load a few rows at time, process them, and then start again.

Our batch processing function needs to track two things in addition to the file: the header values and the current file position. So in the first pass we initialize the position to zero and then load the first row as the header. For every pass after that we need to find point we left off. For this we use generic PHP files for loading and seeking the current location:

// Old-school file handling.
$path = array_pop($data);
$file = fopen($path, "r");
...
fseek($file, $filePos);

// Each pass we process 100 lines, if you have to do something complex
// you might want to reduce the run.
for ($i = 0; $i < 100; $i++) {
  $row = fgetcsv($file);
  if (!empty($row)) {
    $data = array_combine($header, $row);
    $member['timestamp'] = time();
    $rowData = [
             'col_one' => $data['field_name'],
             'data' => serialize($data),
             'timestamp' => time(),
    ];
    $row_id = $this->database->insert('example_pantheon_loader_tracker')
             ->fields($rowData)
             ->execute();

    // If you're setting up for a queue you include something like this.
    // $queue = $this->queueFactory->get(‘example_pantheon_loader_remap’);
    // $queue->createItem($row_id);
 }
 else {
   break;
 }
}
$filePos = (float) ftell($file);
$context['finished'] = $filePos / filesize($path);

The example code just dumps this all into a database table. This can be useful as a raw data loader if you need to add a large data set to an existing site that’s used for reference data or something similar.  It can also be used as the base to create more complex objects. The example code includes comments about generating a queue worker that could then run over time on cron or as another batch job; the Queue UI module provides a simple interface to run those on a batch job.

I’ve run this process for several hours at a stretch.  Pantheon does have issues with systems errors if left to run a batch job for extreme runs (I ran into problems on some runs after 6-8 hours of run time), so a prep into the database followed by running on queue or something else easier to restart has been more reliable.

Docksal Pantheon Setup from Scratch

I recently had reason to switch over to using Docksal for a project, and on the whole I really like it as a good easy solution for getting a project specific Drupal dev environment up and running quickly. But like many dev tools the docs I found didn’t quite cover what I wanted because they made a bunch of assumptions.

Most assumed either I was starting a generic project or that I was starting a Pantheon specific project – and that I already had Docksal experience. In my case I was looking for a quick emergency replacement environment for a long-running Pantheon project.

Fairly recently Docksal added support for a project init command that helps setup for Acquia, Pantheon, and Pantheon.sh, but pull init isn’t really well documented and requires a few preconditions.

Since I had to run a dozen Google searches, and ask several friends for help, to make it work I figured I’d write it up.

Install Docksal

First follow the basic Docksal installation instructions for your host operating system. Once that completes, if you are using Linux as the host OS log out and log back in (it just added your user to a group and you need that access to start up docker).

Add Pantheon Machine Token

Next you need to have a Pantheon machine token so that terminus can run within the new container you’re about to create. If you don’t have one already follow Pantheon’s instructions to create one and save if someplace safe (like your password manager).

Once you have a machine token you need to tell Docksal about it.  There are instructions for that (but they aren’t in the instructions for setting up Docksal with pull init) basically you add the key to your docksal.env file:

SECRET_TERMINUS_TOKEN="HASH_VALUE_PROVIDED_BY_PANTHEON_HERE"

 Also if you are using Linux you should note that those instructions linked above say the file goes in $HOME/docksal/docksal.env, but you really want $HOME/.docksal/docksal.env (note the dot in front of docksal to hide the directory).

Setup SSH Key

With the machine token in place you are almost ready to run the setup command, just one more precondition.  If you haven’t been using Docker or Docksal they don’t know about your SSH key yet, and pull init assumes it’s around.  So you need to tell Docksal to load it but running:
fin ssh-key add  

If the whole setup is new, you may also need to create your key and add it to Pantheon.  Once you have done that, if you are using a default SSH key name and location it should pick it up automatically (I have not tried this yet on Windows so mileage there may vary – if you know the answer please leave me a comment). It also is a good idea to make sure the key itself is working right but getting the git clone command from your Pantheon dashboard and trying a manual clone on the command line (delete once it’s done, this is just to prove you can get through).

Run Pull Init

Now finally you are ready to run fin pull init: 

fin pull init --hostingplatform=pantheon --hostingsite=[site-machine-name] --hosting-env=[environment-name]

Docksal will now setup the site, maybe ask you a couple questions, and clone the repo. It will leave a couple things out you may need: database setup, and .htaccess.

Add .htaccess as needed

Pantheon uses nginx.  Docksal’s formula uses Apache. If you don’t keep a .htaccess file in your project (and while there is not reason not to, some Pantheon setups don’t keep anything extra stuff around) you need to put it back. If you don’t have a copy handy, copy and paste the content from the Drupal project repo:  https://git.drupalcode.org/project/drupal/blob/8.8.x/.htaccess

Finally, you need to tell Drupal where to find the Docksal copy of the database. For that you need a settings.local.php file. Your project likely has a default version of this, which may contain things you may or may not want so adjust as needed. Docksal creates a default database (named default) and provides a user named…“user”, which has a password of “user”.  The host’s name is ‘db’. So into your settings.local.php file you need to include database settings at the very least:

<?php
$databases = array(
  'default' =>
    array(
      'default' =>
      array(
        'database' => 'default',
        'username' => 'user',
        'password' => 'user',
        'host' => 'db',
        'port' => '',
        'driver' => 'mysql',
        'prefix' => '',
      ),
    ),
);

With the database now fully linked up to Drupal, you can now ask Docksal to pull down a copy of the database and a copy of the site files:

fin pull db

fin pull files

In the future you can also pull down code changes:

fin pull code

Bonus points: do this on a server.

On occasion it’s useful to have all this setup on a remote server not just a local machine. There are a few more steps to go to do that safely.

First you may want to enable Basic HTTP Auth just to keep away from the prying eyes of Googlebot and friends.  There are directions for that step (you’ll want the Apache instructions). Next you need to make sure that Docksal is actually listing to the host’s requests and that they are forwarded into the containers.  Lots of blog posts say DOCKSAL_VHOST_PROXY_IP=0.0.0.0 fin reset proxy. But it turns out that fin reset proxy has been removed, instead you want: 

DOCKSAL_VHOST_PROXY_IP=0.0.0.0 fin system reset.  

Next you need to add the vhost to the docksal.env file we were working with earlier:

 VIRTUAL_HOST="test.example.org"

Run fin up to get Docksal to pick up the changes (this section is based on these old instructions).

Now you need to add either a DNS entry someplace, or update your machine’s /etc/hosts file to look in the right place (the public IP address of the host machine).

Anything I missed?

If you think I missed anything feel free to let know. Particularly Windows users feel free to let me know changes related to doing things there. I’ll try to work those in if I don’t get to figuring that out on my own in the near future.

FormAssembly Dynamic Parameter Signing

For a project I’ve been working on recently we had need to create a module that provides secure redirects from a Drupal site to FormAssembly. Overall the module does a number of things, but handling dynamic parameter signing was the thing that took the most time.

FormAssembly provides a variety of great features for creating flexible forms that integrate with Salesforce. One of the more popular features is its ability to pull data from Salesforce to prefill fields on a form. But the downside is that it is easy to create forms that leak information from Salesforce into those forms, and create privacy risks.

To address this, FormAssembly allows 3rd party tools to securely sign URLs that contain parameters (often Salesforce IDs) that could be used to extract information through an iteration attack and other basic approaches. This secure signing process can be done statically but for most interesting projects you want to sign the URLs dynamically. The dynamic signing process allows you alter the parameters on the fly and set an expiration date to limit the value of a stolen link. Our project required this approach.

But the dynamic signing process has a couple sharp corners. First, it’s rarely done outside of Salesforce so there aren’t a lot of code samples around, and none that I could find in PHP.  Second, FormAssembly is very open and honest about the fact that they do not provide support on this feature. So I had to create my own process from the documentation they provide.  The docs are good, but very Salesforce centric, with all code samples in APEX.

The process involves preparing the data for signature, generating a HMAC-SHA256 with a form specific pre-shared key (in binary mode), converted to a string using base64, and finally URL encode the result.

Their convention for preparing the data is straightforward. You format all parameters as just their key and value strung together: key1Value1key2Value2

The interesting part is the actual HMAC-SHA256, which needs to be generated in binary mode, something that is often the default mode but not in PHP (in fact most PHP devs I’ve talked don’t realize the last parameter to hash_hmac() is useful, if you are doing this in another language check out this collection of examples).

From there you encode the output in base-64 (which results in a 44 character hash), and URL encode the hash to make sure it’s URL safe, and you’ll end up a few characters longer.

Finally you add you hash to the query string, and you’re ready to go.

To help anyone else who needs to do this, I generalized this part of the solution and I created and tossed it into Gist.

SC DUG February 2019

Will Jackson – Local Development in Docksal

For the SC DUG meeting this month Will Jackson from Kanopi Studios gave a talk about using Docksal for local Drupal development. Will has the joy of working with some of the Docksal developers and has become an advocate for the simplicity and power Docksal provides.

We frequently use these presentations to practice new presentations, try out heavily revised versions, and test out new ideas with a friendly audience. If you want to see a polished version checkout our group members’ talks at camps and cons. So if some of the content of these videos seems a bit rough please understand we are all learning all the time and we are open to constructive feedback.

If you would like to join us please check out our up coming events on Meetup for meeting times, locations, and connection information.

SC DUG September 2018

Chis Zietlow – Using Machine Learning to Improve UX

This fall the South Carolina Drupal User’s Group started using Zoom are part of all our meetings. Sometimes the technology has worked better than others, but when it works in our favor we are recording the presentations and sharing them when we can.

Chris Zietlow presented back in September about using Machine Learning to Improve UX.

We frequently use these presentations to practice new presentations and test out new ideas. If you want to see a polished version hunt group members out at camps and cons. So if some of the content of these videos seems a bit rough please understand we are all learning all the time and we are open to constructive feedback.

If you would like to join us please check out our up coming events on Meetup for meeting times, locations, and connection information.

Drupal 8 Batch Services

For this month’s South Carolina Drupal User Group I gave a talk about creating Batch Services in Drupal 8. As a quick side note we are trying to include video conference access to all our meetings so please feel free to join us even if you cannot come in person.

Since Drupal 8 was first released I have been frustrated by the fact that Drupal 8 batch jobs were basically untouched from previous versions. There is nothing strictly wrong with that approach, but it has never felt right to me particularly when doing things in a batch job that I might also want to do in another context – that really should be a service and I should write those core jobs first. After several frustrating experiences trying to find a solution I like, I finally created a module that provides an abstract class that can be used to create a service that handles this problem just more elegantly. The project also includes an example module to provide a sample service.

Some of the text in the slides got cut off by the Zoom video window, so I uploaded them to SlideShare as well:


Quick Batch Overview

If you are new to Drupal batches there are lots of articles around that go into details of traditional implementations, so this will be a super quick overview.

To define a batch you generate an array in a particular format – typically as part of a form submit process – and pass that array to batch_set(). The array defines some basic messages, a list of operations, a function to call when the batch is finished, and optionally a few other details. The minimal array would be something like:

  <?php  // Setup final batch array.
    $batch = [
      'title'    => 'Page title',
      'init_message' => 'Openning message',
      'operations'  => [],
      'finished' => '\some\class\namespace\and\name::finishedBatch',
    ];

The interesting part should be in that operations array, which is a list of tasks to be run, but getting all your functions setup and the batch array generated can often be its own project.

Each operation is a function that implements callback_batch_operation(), and the data to feed that function. The callbacks are just functions that have a final parameter that is an array reference typically called $context. The function can either perform all the needed work on the provided parameters, or perform part of that work and update the $context['sandbox']['finished'] value to be a number between 0 and 1. Once finished reaches 1 (or isn’t set at the end of the function) batch declares that task complete and moves on to the next one in the queue. Once all tasks are complete it calls the function provided as the finished value of the array that defined the batch.

The finish function implements callback_batch_finish() which means it accepts three parameters: $success, $results, and $operations: $success is true when all tasks completed without error; $results is an array of data you can feed into the $context array during processing; $operations is your operations list again.

Those functions are all expected to be static methods on classes or, more commonly, a function defined in a procedural code block imported from a separate file (which can be provided in the batch array).

My replacement batch service

It’s those blocks of procedural code and classes of nothing but static methods that bug me so much. Admittedly the batch system is convenient and works well enough to handle major tasks for lots of modules. But in Drupal 8 we have a whole suite of services and plugins that are designed to be run in specific contexts that batch does not provide by default. While we can access the Drupal service container and get the objects we need the batch code always feels clunky and out of place within a well structured module or project. What’s more I have often created batches that benefit from having the key tasks be functions of a service not just specific to the batch process.

So after several attempts to force batches and services to play nice together I finally created this module to force a marriage. Even though there are places which required a bit of compromise, but I think I have most of that contained in the abstract class so I don’t have to worry about it on a regular basis. That makes my final code with complex logic and processing far cleaner and easier to maintain.

The Batch Service Interface module provides an interface an an abstract class that implements parts of it: abstract class AbstractBatchService implements BatchServiceInterface. The developer extending that class only needs to define a service that handles generating a list of operations that call local methods of the service and the finish batch function (also as a local method). Nearly everything else is handled by the parent class.

The implementation I provided in the example submodule ends up four simple methods. Even in more complex jobs all the real work could be contained in a method that is isolated from the oddities of batch processing.

<?php

namespace Drupal\batch_example;
use Drupal\node\Entity\Node;
use Drupal\batch_service_interface\AbstractBatchService;

/**
 * Class ExampleBatchService logs the name of nodes with id provided on form.
 */
class ExampleBatchService extends AbstractBatchService {

  /**
   * Must be set in child classes to be the service name so the service can
   * bootstrap itself.
   *
   * @var string
   */
  protected static $serviceName = 'batch_example.example_batch';

  /**
   * Data from the form as needed.
   */
  public function generateBatchJob($data) {
    $ops = [];
    for ($i = 0; $i < $data['message_count']; $i++ ) {
      $ops[] = [
        'logMessage' => ['MessageIndex' => $i + 1],
      ];
    }

    return $this->prepBatchArray($this->t('Logging Messages'), $this->t('Starting Batch Processing'), $ops);
  }

  public function logMessage($data, &amp;$context) {

    $this->logger->info($this->getRandomMessage());

    if (!isset($context['results']['message_count'])) {
      $context['results']['message_count'] = 0;
    }
    $context['results']['message_count']++;

  }

  public function doFinishBatch($success, $results, $operations) {
    drupal_set_message($this->t('Logged %count quotes', ['%count' => $results['message_count']]));
  }

  public function getRandomMessage() {
    $messages = [
      // list of messages to select from
    ];

    return $messages[array_rand($messages)];

  }

}

There is the oddity that you have to tell the service its own name so it can bootstrap itself. If there is a way around that I’d love to know it. But really one have one line of code that’s a bit strange, everything else is now fairly clear call and response.

One of the nice upsides to this solution is you could write tests for the service that look and feel just like any other services tests. The methods could all be called once, and you are not trying to run tests against a procedural code block or a class that is nothing but static methods.

I would love to hear ideas about ways I could make this solution stronger. So please drop me a comment or send me a patch.

Related core efforts

There is an effort to try to do similar things in core, but they look like they have some distance left to travel. Obviously once that work is complete it is likely to be better than what I have created, but in the meantime my service allows for a new level of abstraction without waiting for core’s updates to be complete.

Waterfall-like Agile-ish Projects

In software just about all project management methodologies get labeled one of two things: Agile or Waterfall. There are formal definitions of both labels, but in practice few companies stick to those definitions particularly in the world of consulting. For people who really care about such things, there are actually many more methodologies out there but largely for marketing reasons we call any process that’s linear in nature Waterfall, and any that is iterative we call Agile.

Classic cartoon of a tree swing being poorly because every team saw it differently.
Failure within project teams leading to disasters is so common and basic that not only is there a cartoon about it but there is a web site dedicated to generating your own versions of that cartoon (http://projectcartoon.com/).

Among consultants I have rarely seen a company that is truly 100% agile or 100% waterfall. In fact I’ve rarely seen a shop that’s close enough to the formal structures of those methodologies to really accurately claim to be one or the other. Nearly all consultancies are some kind of blent of a linear process with stages (sometimes called “a waterfall phase” or “a planning phase”) followed by an iterative process with lots of non-developer input into partially completed features (often called an “agile phase” or “build phase”). Depending on the agency they might cut up the planning into the start of each sprint or they might move it all to the beginning as a separate project phase. Done well it can allow you to merge the highly complex needs of an organization with the predefined structures of an existing platform. Done poorly it can it look like you tried to force a square peg into a round hole. You can see evidence of this around the internet in the articles trying to help you pick a methodology and in the variations on Agile that have been attempted to try to adapt the process to the reality many consultants face.

In 2001 the Agile Manifesto changed how we talk about project management. It challenged standing doctrine about how software development should be done and moved away from trying to mirror manufacturing processes. As the methodology around agile evolved, and proved itself impressively effective for certain projects, it drew adherents and advocates who preach Agile and Scrum structures as rigid rules to be followed. Meanwhile older project methodologies were largely relabeled “Waterfall” and dragged through the mud as out of date and likely to lead to project failure.

But after all this time Agile hasn’t actually won as the only truly useful process because it doesn’t actually work for all projects and all project teams. Particularly among consulting agencies that work on complex platforms like Drupal and Salesforce, you find that regardless of the label the company uses they probably have a mix linear planning with iterative development – or they fail a lot.

Agile works best when you start from scratch and you have a talented team trying to solve a unique problem. Anytime you are building on a mature software platform you are at least a few hundred thousand hours into development before you have your first meeting. These platforms have large feature sets that deliver lots of the functionality needed for most projects just through careful planning and basic configuration – that’s the whole point of using them. So on any enterprise scale data system you have to do a great deal of planning before you start creating the finished product.

If you don’t plan ahead enough to have a generalized, but complete, picture of what you’re building you will discover very large gaps after far too many pieces have been built to elegantly close them, or your solution will have been built far more generically than needed – introducing significant complexity for very little gain. I’ve seen people re-implement features of Drupal within other features of Drupal just to deal with changing requirements or because a major feature was skipped in planning. So those early planning stages are important, but they also need to leave space for new insights into how best to meet the client’s need and discovery of true errors after the planning stage is complete.

Once you have a good plan the team can start to build. But you cannot simply hand a developer the design and say “do this” because your “this” is only as perfect as you are and your plan does not cover all the details. The developer will see things missed during planning, or have questions that everyone else knows but you didn’t think to write down (and if you wrote down every answer to every possible question, you wrote a document no one bothered to actually read). The team needs to implement part of the solution, check with the client to make sure it’s right, adjust to mistakes, and repeat – a very agile-like process that makes waterfall purists uncomfortable because it means the plan they are working from will change.

In all this you also have a client to keep happy and help make successful – that’s why they hired someone in the first place. Giving them a plan that shows you know what they want they are reassured early in the project that you share their vision for a final solution. Being able to see that plan come together while giving chances to refine the details allows you to deliver the best product you are able.

Agile was supposed to fix all our problems, but didn’t. The methodologies used before were supposed to prevent all the problems that agile was trying to fix, but didn’t. But using waterfall-like planning at the start of your project with agile-ish implementation you can combine the best of both approaches giving you the best chances for success.  We all do it, it is about time we all admit it is what we do.

Cartoon of a developer reviewing all the things he's done: check technical specs, unit tests, configuration, permissions, API updates and then says "Just one small detail I need to code it."
Cartoon from CommitStrip

Thoughts on Hacktoberfest 2018

This year I took part in Hacktoberfest. Partially to see what all the fuss is about, partially to get myself involved in projects I didn’t know about, and partially for the free t-shirt (which do come in men’s and women’s cuts).  If you haven’t run into this project before it’s an effort by Digital Ocean to get people to participate in open source projects. Once you sign up they count all public pull requests you make on Github toward a goal of 5. I participated both as a developer, and by tagging a few issues on my own projects so people would find them.

As a developer:

It was a great excuse to go find new projects and look at ways I can contribute.  While I’d have plenty of experience on open source projects, often they have been outside Github or are repos I have commit access to – so I don’t open a lot of pull requests on Github. That meant that Hacktoberfest was a chance to find new projects and practice a basic process for contributing code to teams.

In that regard it was a pretty good success. I opened six PRs on four different projects. Mostly they were small stuff like linting code, updating packages, or tweaking a README file.  

In terms of drawing me into projects we’ll see. I did keep up with one after I finished the 5 required (hence having six PRs), but I didn’t dive into anything truly hard on that project.  

In terms of getting me to provide truly useful code think that was limited. The largest piece of code I wrote was initially rejected so I re-wrote in a different style, and then re-written by the project maintainer the day after he accepted the PR. He was really nice about it, and it helped him get something done that had been on the to-do list for a long time, but even that was example code to be used in classrooms (which was why he was so concerned about style – he didn’t want it to be idiomatically correct for Python he wanted to clear to beginners).

It did give me a chance to play around in other people’s code bases and I did resolve some issues for people that would have otherwise lingered longer than they already had.  It also forced me to meet other people’s standards, lint to their specifications, and pass their automated tests – all good things for everyone to do now and again to see if there are solutions you like better than the ones you use every day.

As a project owner:

Once I got through the contributions I needed to get a shirt, I figured I’d look over my own projects to see if there were issues I could label for beginners to help them find ways to get started. I listed several issues are both Hacktoberfest and good first issues. Almost all the ones I flagged as good first issues got PRs opened – sometimes more than one.

I got two problems solved that I wouldn’t have known how to solve without a bit of research, and those were great. But most of the PRs were simple things that took me longer to solve collaboratively than it would have taken me to solve myself. That’s okay, in part because some of my PRs caused the same problem for their project maintainers, and because it forced me to final learn how to setup CircleCI so the code gets checked and tested automatically when PRs are opened in the future.

What I don’t expect it caused was anyone to be truly interested in the project and helping it move forward over time. So while I solved a couple small problems, I did not get new help that going to keep engaging. That made it useful as a sprint, but not useful to helping build great projects.

But even if there is room for improvement my shirt is ordered and on the way.

We can do better

This week, for the second time in a year, the unacceptable behavior of a high profile man in the Drupal community has been the topic of public discussion and debate. This time the organizations involved acted more clearly and rapidly, if imperfectly. The issue came to the forefront during #metoo campaign, and again showed that the Drupal community reflects the world around us. I listened to friends and colleagues respond in various ways to the events and to the recognition of several men that they had been allowing this behavior to go on right in front of them for years without intervention. It is clear that in Drupal, like in all parts of our society we can – and must – do better.

As I reflected on these discussions this weekend I read back through some posts from Danah Boyd I’d read a while ago and stashed with my list of ideas of topics for blog posts. In particular I read her comments from last March on how failures to understand people’s hate fuels it and then a piece I had initially missed from July on change in the tech community. Her experiences and perspective are worthy of a few minutes read in their own right, but this week her views seem particularly timely.

One of the things that struck me about Boyd’s piece from July was her clear simple ask:

…what I want from men in tech boils down to four Rs: Recognition. Repentance. Respect. Reparation.

To me the first two are painfully obvious and most men who care about these issues have been working through those for a while now (too many men still need to learn to care at all). I count myself among the group of men who care, and the this post is mostly directed at that group of peers.

More and more you will hear men acknowledge they believe women are telling the truth, recognizing there are more stories we don’t hear than we hear, and apologizing for their own actions or inactions in the past. But of course just believing people and saying sorry doesn’t get us very far. The next two on Boyd’s list are the places where real forward looking change comes from.

Respect should be easy, but too often it is the first place we get into trouble. It is the part her call to action that will always be true no matter what future progress we make on these issues. Respect is an ongoing act requiring constant care, attention, and effort. Meaning to be respectful is not the same as actually being respectful. It requires actively listening to the ideas of women and people of color and considering them as fully as you do anyone else’s. It means tracking in yourself when you fail to do listen and making the personal change required to do better going forward. It includes monitoring our own behavior in meetings, hallway interactions, and one-on-one discussions to make sure you understand how you are being perceived differently by different people – your friendly or silly gesture to one colleague could be insulting or threatening to another. Respect is not something special that women and people of color are suddenly asking for, it’s something that we all already knew we should be extending to all our colleagues but too often fail to show. And when we fail to show true respect for coworkers – regardless of why we failed or which demographic categories they fall into – it’s our responsibility to recognize it and repent.

Finally Boyd also calls for Reparations. Reparations is a word that lots of us fear for no particularly defendable reason since it’s just about attempt to undo some of the harm we’ve benefited from. And in this case her ask is so direct, plain, and frankly easy that I’m giving her the last words:

Every guy out there who wants to see tech thrive owes it to the field to actively seek out and mentor, support, fund, open doors for, and otherwise empower women and people of color. No excuses, no self-justifications, no sexualized bullshit. Just behavior change. Plain and simple. If our sector is about placing bets, let’s bet on a better world. And let’s solve for social equity.