Simple Electron Starter

Earlier today I push my Electron Simple Starter to Github. It has dependencies only to Electron, Electron Debug, and ESlint (but no specific settings, you can add those yourself). All the basic pieces are in place to encourage good security practices. It will run without warnings or errors, and puts in place all plumbing you need for their current inter-process communication, default overrides, and process sandboxing to help you write a secure app.

Off and on I’ve been playing at writing simple programs in JavaScript using Electron. As a long-time web developer the idea of writing a web app that can be compiled to an native application across a web swath of major operating systems has massive appeal.

But when I started to write Electron apps to scratch various itches, I was quickly annoyed at the number of security warnings I got when following the project tutorials. The PHP community used to ignore bad security in tutorials to the detriment of web, so it bugs me to see that behavior crop up in other places. With my most recent side project – a Salesforce API exploration tool – I finally decided I was overdue in figuring out how to resolve all the warnings the basic quick start from the main project triggers. Using a combination of this secure electron project template and the main project’s security tutorial I finally got there.

Then I wanted to scratch a different itch, which hasn’t really gone anywhere, cause all the work to get started in a secure way felt like a mountain to climb again. The secure electron template is too opinionated for me to use directly for a small toy project, and ElectronForce has all kinds of other code already in place, so I spun my wheels for awhile. Then I finally bit the bullet and extracted the bits I needed for the next project. Once I realized I had a fairly clean baseline, I figured I would probably want it again soon (I create projects frequently to explore an idea or scratch an itch) so I created a new project template that sets the baseline and is fairly unopinionated. My goal is to have something I can grab to start writing a simple application quickly.

While I’ve made some effort to secure this project baseline, security is always the project developer’s responsibility – you are still responsible for your project’s security. Please feel free to use my template, but understand that you still have to follow best practices to keep your app secure and those will change over time. The Electron project will inevitably evolve and change their security system again, and I will not promise to keep up. Also this is a template, not a library, when some future Electron adds features I didn’t use, you’ll need to update your project.

If some specific piece of this template confuses you, please feel free to ask either here or on Github. I can try to explain as best I am able, and maybe you’ll inspire another post sometime in the future to cover it in depth.

SC DUG March 2020

This month’s SC DUG featured Chris from MindGrub and Kaylan from Pantheon talking about Load Testing.

Launching a website can be a nerve-wracking experience, often times with developers working up until the wire trying to finish that one last feature. If only there was a crystal ball that would show you a vision of how your site would fare when the masses were set loose upon it.

Good news for you, there is! Load testing.

View the slides from this talk.

We frequently use these presentations to practice new presentations, try out heavily revised versions, and test out new ideas with a friendly audience. If you want to see a polished version checkout our group members’ talks at camps and cons. So if some of the content of these videos seems a bit rough please understand we are all learning all the time and we are open to constructive feedback.

If you would like to join us please check out our up coming events on MeetUp for meeting times, locations, and remote connection information.

SC DUG February 2020

Suggestions for learning the skills we all need to advance.

This month for SC DUG I gave a talk on the importance of self-directed learning for professional development as a developer — or really any other modern career. It was an extension and revision of my December blog post on the same topic. The presentation runs a hair over 30 minutes, and parts of the discussion are included as well.

We frequently use these presentations to practice new presentations, try out heavily revised versions, and test out new ideas with a friendly audience. If you want to see a polished version checkout our group members’ talks at camps and cons. So if some of the content of these videos seems a bit rough please understand we are all learning all the time and we are open to constructive feedback.

If you would like to join us please check out our up coming events on MeetUp for meeting times, locations, and remote connection information.

On Being Self-Taught

Eventually we are all mostly self-taught.

From time to time conversations come up among developers, and other fellow travelers, about being self-taught vs getting formal training. Over time I’ve come to realize that the further and further you get into your career, the less the distinction means anything; eventually we are all mostly self-taught.

I’ve written before about the value of my liberal arts education and I stand by my assertion that what I learned in that setting was, and is, valuable to my life and work. But just because something was useful to life does not mean it was the only way to acquire the skills. It’s a good way for many people, but far from the only way.

For anyone in a technical field, and most professional fields really, to succeed over time you need to learn new tools, skills, and techniques. The tools I knew when I graduated college are all largely outmoded or significantly upgraded, and I’ve had to learn a variety of technologies that didn’t exist in 2001.

Within the Drupal community lots of people talk about being self-taught, sometimes with pride sometimes with embarrassment, but in truth very few people were formally trained on the platform. Lots of very successful developers in the Drupal community (and beyond) have degrees in fields like religion and art history, not computer science, and have taught themselves how to do awesome things. In fact, I’ll argue that just about every Drupaler taught themselves most of what they know about Drupal. How they did that can vary widely, but we are a community with few formal training programs and lots of people who stumbled into Drupal trying to solve a non-technical problem. Even advanced workshops at conferences dig deep into one small area and expect you to generalize that knowledge to your projects, which I count as self-teaching. For example, I had a friend ask the other day about how to control the PDO connection settings in Drupal 7 — which I didn’t know how to do, but knew they were similar to Drupal 8 — so I sent him my Drupal 8 instructions and he figured it out how from there. He’s now taught himself how to do what he needed for that project and in the process generalized the approach for whatever he may need next time.

So then it is important for all of us to find, and hopefully share, techniques for self-teaching — even for those who have some kind of formal training. Here are my suggestions for people who are starting out and haven’t yet found the pattern that works for them:

  1. Assume you first solution is wrong. Most of us have, or will, stumble our way through a project where we don’t really know what we’re doing without a lot of support. We usually learn a great deal in the process, and launching those projects can feel pretty good cause you’ve succeeded at something hard. It is easy to get into the habit of assuming the solutions from that project were correct because they worked. In truth those projects are really rough around the edges, and just because we got it to work does not mean the solution was good. Assuming the first solution is good enough forever is how you become an expert beginner which then takes a lot of effort to undo. Once you have a working solution, step back and see if you can think of a better one, or see if you now can guess better search terms to see if someone else wrote up a different solution to the same problem. Admit your work could be better and try to improve it.
  2. Learn a few more programming languages. Most people who are self-taught from the start, and even some who have a BA/BS in Computer Science, only know 2 or 3 programming languages (PHP, JS, and CSS+HTML are often the only languages new people learn at first). One of the courses I took by chance in college forced me to learn 8 in 16 weeks. It was grueling, miserable, and darned useful. I can still learn a new language in just a couple weeks and rarely do I hit a language construct I don’t recognize. You don’t need to go that far. When I first started out a mentor told me you should learn a new language every year, and for several I did. Some of those, not the languages I learned in college, are the ones I use most day-to-day. All told I’ve spent time writing code in more than twenty different languages. That many isn’t terribly useful but the more languages you learn, the more you learn to understand the elements of your primary language.
  3. Learn basic algorithms and to measure complexity. The kind of thinking that goes into formal algorithms will help you be a better developer overall; badly thought through processes is the place I tend to see the largest gaps between developers with and without formal training. Any college-level CS program will put you through an algorithms course that teaches a variety of specific algorithms and force you to understand their structures. If you didn’t go through one of those programs, this is probably the course that will help you the most. On the one hand most of us rarely rewrite these algorithms as on modern platforms some library or another will provide a better version than we are likely to craft for our project. But learning what they are, when they are used, and how to understand their performance is useful for any project that involves lots of data or processing. MIT has a version of their algorithms course from 2011 online, or find one through another provider. Even if you just watch the lectures (really watching, not just vaguely have them on while cooking and cleaning), you can learn a great deal of useful information. I learned a lot watching those lectures as it refreshed and updated my understanding of the topics.
  4. Find and learn from mentors. Notice I used a plural there; you should try to find a few people willing to help you learn your profession, and more generally help you learn to advance in your field. Most of us benefit from learning from the experiences of multiple people, and who we need to learn from changes over time. I had the great experience of having a few wonderful mentors when I was first starting out, and much of the advice they gave me still serves me well. Some of it contradicted, and resolving those contradictions forced me to learn to do things my own way and find my own solutions.
  5. Learn other platforms. This is both a protection against future shifts in the market, and also a way to see how things work from outside your current professional bubble. Drupal developers can learn a lot from writing a WordPress plugin, or better yet an add-on for a platform in another language (think about Plone, Gatsby, or Hugo). Or try to learn to work with a platform like Salesforce or AWS. Other platforms have different communities, different learning styles, and different patterns. Like understanding additional languages, different platforms help you broaden your understanding and provide insights you can bring back to your main work.
  6. Learn to give and take criticism. Part of learning is getting feedback on your work, and part of being on a team is sharing feedback with others. If you took art or music classes in high school or college you probably learned some of the basic lessons you need here, but if you didn’t, consider taking one now at your local community college or art center. The arts are wonderful for getting experience with criticism. For all art is often open to interpretation, it also requires specific skills. If you play off-key, it sounds wrong. If your sculpture collapses under its own weight, the project failed. If your picture’s subject is out of focus, you need to re-shoot it. Sure there are brilliant artists who can violate all the rules, but if you have never experienced an art critique you are not one of those artists. The experience of getting direct, blunt, and honest feedback will help you understand its value and how to give that feedback yourself.
  7. Share what you think you know. We learn a great deal with we teach others. Both because it forces us to refine our thinking and understanding so we can explain it, and because learners ask questions we cannot answer off the top of our heads. This can be user group or conference presentations, internal trainings for your team, mentoring junior developers, writing a blog, or anything else that gets your from learning to teaching. It’s okay if you’re not 100% right, that’s part of how we learn. A few years ago I was doing a joint project with a junior developer who asked me a lot of questions, and pushed hard when she thought I was making mistakes. When she asked why I was selecting a solution or setting a pattern, she was never satisfied with “because that’s the best way to do it.” She wanted me to explain why that was the best way. If I couldn’t walk her through it right away, I went back and hunted for reference material to explain it or if that failed I tested her counter ideas against my plans to see if I was missing something. While I was usually right, not always and we did make changes based on her feedback. More importantly it forced me to show my work in fine detail which was a good exercise for me and gave her insights to help her do better work.
  8. Find your own patterns. At the start I said this list was for people who didn’t have their own patterns yet. In the long-run of your career you need to figure out what you need to know to get to where you want to go next. Eventually you will need to find a pattern that works for you and the life you are living. No one can tell you what that is, nor how to learn it all yourself. Experiment with learning styles, areas of work, roles, and types of projects as much as you are able until you feel your way to the right solutions for you.

Some Things Every Developer Should Read

Every developer needs to be learning new things all the time. We need to have a good grounding in the ideas that have come before us and those that are emerging around us within our field, and we need to understand the larger social impact of our work. To be fair this isn’t just true of developers, but we are a field that has a bad habit of arguing our work is totally new and original – which is rarely true in any meaningful way.

Part of that push for constant learning is to read books about software development, about life, about business, about writing, and really anything that pushes how and what you think. There isn’t really a settled cannon of things every developer must have read but there are definitely some works you need to be familiar with the ideas in (even if you haven’t read them) to keep in conversations over lunch at a conference like the Mythical Man Month and the Agile Manifesto (both of which came up over a lunch at my last conference). There is also a much larger set of works that many developers benefit from reading even if they aren’t really about software.

So here is my list of works that I think every developer should read at least once in their life. I’m breaking the list into two pieces: specific books I think every developer should actually read, and a category of types of things that every developer should be on the lookout to read on a regular basis. I’d love to hear suggestions about should be added and what I should have read myself.

Specific Books

These are some specific books that I’ve found helpful for making me a better developer. None of the ones here are directly about programming – that’s not an accident. Many of the books I’ve read about development and the creation of software were helpful and I am glad to have spent time with them but these are works that helped me think more broadly about the craft and the when, why, and how I write code.

  • Design of everyday things: While many of the specific recommendations for how to fix some devices are out-of-date and flawed (I will never forget reading that all office phones needed to be easy to use was a two-line digital display while working at a desk with a proof the display had not solved the problem), the general philosophy holds up. Also, it will make you justifiably angry at every poorly installed door you struggle to open.
  • Zen and the Art of Motorcycle Maintenance: This is a long, slow, dense read. And if I hadn’t need assigned it in college I probably never would have gotten through it. But I find myself coming back to its discussion of quality all the time.
  • Technically Wrong: This is the newest book on this list, and it makes the cut cause the ideas are so important. I had the opportunity to see Sara Wachter-Boettcher speak at DrupalCon before she published the book and it was one of those talks that changed how lots of people in the room think about software and the role of our work in people’s lives.
  • Elements of Style (aka “Strunk and White”): I was assigned to read this in a history course in college and then again when I was promoted to Web Director at AFSC – cause the Director of Communications wanted to be sure I remembered the importance of writing well in that job (I did, but I re-skimmed the book anyway). If everyone wrote English like Strunk and White suggest writing would be much better on the whole – although fairly boring. Read it not because their rules are perfect, read it because you should think about the rules of writing when you break them.
  • Blah Blah Blah: This is something you should read after you’ve read a lot about writing clearly since it rather pitches the reverse concept: sometimes a rough picture is clearer and more useful than a carefully written description.

Things that push and inspire

These are examples of works that are helpful to push boundaries. The specific examples I cite may or may not be valuable to you, but they are examples of things within the category I’m describing.

Books that show you code you’d never write:

These are books that talk about code and concepts that either have no application to your work, or are ways of showing off what a language can do – even when you shouldn’t follow the example. I have two examples here from opposite ends of the spectrum:

  • If Hemingway wrote javascript: This book is brilliant. It’s a series of examples of how various writers might have completed common CS class programming assignments using JavaScript: DO NOT WRITE CODE LIKE THIS. It shows off how wonderfully flexible JavaScript is as a language, and presents some of the worst possible solutions to those problems along the way. It is a great read that will teach almost anyone a few things about JavaScript, and push you to think about different ways to approach a problem. It is a light, fast, and entertaining read.
  • Beautiful code: This is a fairly dry read. It is made up of series of examples of truly well written code: WRITE CODE LIKE THIS. It looks at code in a variety of languages, not all of which are in common use, solving problems that are sometimes familiar and sometimes rare (in part because of the code you’re reading). It is a slow dense, and at times boring read, but it’ll give you insight into what other people will find graceful in your work.

A college textbook

I mean actually read and actual textbook. Sure if you went to college someone assigned bits of pieces of these things to read, maybe they even assigned all the chapters in order. But how many people actually did all the reading in order? That doesn’t mean the professors weren’t right that those things are useful – they do not get assigned for the professors health (and if you think professors get rich off assigning their own books you need to spend time learning about the economics of academic publishing). 

I have a couple around that I kept as reference after college, and one or two I’ve now actually read end to end. I will grant you these are not exciting works. I don’t do this as a regular reading habit, but it’s worth having done. If you don’t have one around, most libraries still have books and often a few reasonable high level textbooks.

Articles by someone whose ideas you don’t like

If you aren’t used to doing this I should be clear that this is about getting outside your comfort zone on topics you think you know a lot about, and reading things that push you to justify – or better yet modify – your understanding of the field. This doesn’t have to mean going and reading material you find deeply troubling or revisit ideas you know are morally reprehensible. Bothering with ideas from ideologies of hate is unlikely beneficial for most of us, think closer to home here. In this category I’m talking about project management practices you think are overblown: love Agile, read people who point out where it routinely fails; hate Agile, read some material about why it always works when done right. For me it is sometimes reading people who still argue technology is morally neutral. But best is often to find a topic you are still trying to resolve your own thinking on and absorb some uncomfortable ideas and wrestle with them. 

Drupal Salesforce Suite Custom Field Mapping Types

The Drupal 8 Salesforce Suite allows you to map Drupal entities to Salesforce objects using a 1-to-1 mapping. To do this it provides a series of field mapping types that allow you to select how you want to relate the data between the two systems. Each field type provides handling to help ensure the data is handled correctly on each side of the system.

As of this writing the suite provides six usable field mapping types:

  • Properties — The most common type to handle mapping data fields.
  • Record Type — A special handler to support Salesforce record type settings when needed.
  • Related IDs — Handles translating SFIDs to Drupal Entity IDs when two objects are related in both systems.
  • Related Properties — For handling properties across a relationship (when possible).
  • Constant — A constant value on the Drupal side that can be pushed to Salesforce.
  • Token — A value set via Drupal Token.

There is a seventh called Broken to handle mappings that have changed and need a fallback until its fixed. The salesforce_examples module also includes a very simple example called Hardcoded the shows how to create a mapping with a fixed value (similar to, but less powerful than, Constant field).

These six handle the vast majority of use cases but not all.  Fortunately the suite was designed using Drupal 8 annotated plugins , so you can add your own as needed. There is an example in the suite’s example module, and you can review the code of the ones that are included, but I think some people would find an overview helpful.

As an example I’m using the plugin I created to add support for related entities to the webform submodule of the suite (I’m referencing the patch in #10 cause that’s current as of this writing, but you should actually use whatever version is most recent or been accepted).

Like all good annotated plugins to tell Drupal about it all we have to do is create the file in the right place. In this case that is: [my_module_root]/src/Plugins/SalesforceMappingField/[ClassName] or more specifically: salesforce_webform/src/Plugin/SalesforceMappingField/WebformEntityElements.php

At the top of the file we need to define the namespace, add some use statements.

<?php
 
namespace Drupal\salesforce_webform\Plugin\SalesforceMappingField;
 
use Drupal\Core\Entity\EntityInterface;
use Drupal\Core\Form\FormStateInterface;
use Drupal\salesforce_mapping\Entity\SalesforceMappingInterface;
use Drupal\salesforce_mapping\SalesforceMappingFieldPluginBase;
use Drupal\salesforce_mapping\MappingConstants;

Next we need to provide the required annotation for the plugin manager to use. In this case it just provides the plugin’s ID, which needs to be unique across all plugins of this type, and a translated label.

/**
 * Adapter for Webform elements.
 *
 * @Plugin(
 *   id = "WebformEntityElements",
 *   label = @Translation("Webform entity elements")
 * )
 */

Now we define the class itself which must extend SalesforceMappingFieldPluginBase.

class WebformEntityElements extends SalesforceMappingFieldPluginBase {

With those things in place we can start the real work.  The mapping field plugins are made up of a few parts: 

  • The configuration form elements which display on the mapping settings edit form.
  • A value function to provide the actual outbound value from the field.
  • Nice details to limit when the mapping should be used, and support dependency management.

The buildConfigurationForm function returns an array of form elements. The base class provides some basic pieces of that array that you should plan to use and modify. So first we call the function on that parent class, and then make our changes:

 /**
   * {@inheritdoc}
   */
  public function buildConfigurationForm(array $form, FormStateInterface $form_state) {
    $pluginForm = parent::buildConfigurationForm($form, $form_state);
 
    $options = $this->getConfigurationOptions($form['#entity']);
 
    if (empty($options)) {
      $pluginForm['drupal_field_value'] += [
        '#markup' => t('No available webform entity reference elements.'),
      ];
    }
    else {
      $pluginForm['drupal_field_value'] += [
        '#type' => 'select',
        '#options' => $options,
        '#empty_option' => $this->t('- Select -'),
        '#default_value' => $this->config('drupal_field_value'),
        '#description' => $this->t('Select a webform entity reference element.'),
      ];
    }
    // Just allowed to push.
    $pluginForm['direction']['#options'] = [
      MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF => $pluginForm['direction']['#options'][MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF],
    ];
    $pluginForm['direction']['#default_value'] =
      MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF;
    return $pluginForm;
 
  }

In this case we are using a helper function to get us a list of entity reference fields on this plugin (details are in the patch and unimportant to this discussion). We then make those fields the list of Drupal fields for the settings form. The array we got from the parent class already provides a list of Salesforce fields in $pluginForm[‘salesforce_field’] so we don’t have to worry about that part.  Since the salesforce_webform module is push-only on its mappings, this plugin was designed to be push only as well, and so limits to direction options to be push only. The default set of options is:    

'#options' => [
    MappingConstants::SALESFORCE_MAPPING_DIRECTION_DRUPAL_SF => t('Drupal to SF'),
    MappingConstants::SALESFORCE_MAPPING_DIRECTION_SF_DRUPAL => t('SF to Drupal'),
    MappingConstants::SALESFORCE_MAPPING_DIRECTION_SYNC => t('Sync'),
 ],

And you can limit those anyway that makes sense for your plugin.

With the form array completed, we now move on to the value function. This is generally the most interesting part of the plugin since it does the work of actually setting the value returned by the mapping.

  /**
   * {@inheritdoc}
   */
  public function value(EntityInterface $entity, SalesforceMappingInterface $mapping) {
    $element_parts = explode('__', $this->config('drupal_field_value'));
    $main_element_name = reset($element_parts);
    $webform = $this->entityTypeManager->getStorage('webform')->load($mapping->get('drupal_bundle'));
    $webform_element = $webform->getElement($main_element_name);
    if (!$webform_element) {
      // This reference field does not exist.
      return;
    }
 
    try {
 
      $value = $entity->getElementData($main_element_name);
 
      $referenced_mappings = $this->mappedObjectStorage->loadByDrupal($webform_element['#target_type'], $value);
      if (!empty($referenced_mappings)) {
        $mapping = reset($referenced_mappings);
        return $mapping->sfid();
      }
    }
    catch (\Exception $e) {
      return NULL;
    }
  }

In this case we are finding the entity referred to in the webform submission, loading any mapping objects that may exist for that entity, and returning the Salesforce ID of the mapped object if it exists.  Yours will likely need to do something very different.

There are actually two related functions defined by the plugin interface, defined in the base class, and available for override as needed for setting pull and push values independently:

  /**
   * An extension of ::value, ::pushValue does some basic type-checking and
   * validation against Salesforce field types to protect against basic data
   * errors.
   *
   * @param \Drupal\Core\Entity\EntityInterface $entity
   * @param \Drupal\salesforce_mapping\Entity\SalesforceMappingInterface $mapping
   *
   * @return mixed
   */
  public function pushValue(EntityInterface $entity, SalesforceMappingInterface $mapping);
 
  /**
   * An extension of ::value, ::pullValue does some basic type-checking and
   * validation against Drupal field types to protect against basic data
   * errors.
   *
   * @param \Drupal\salesforce\SObject $sf_object
   * @param \Drupal\Core\Entity\EntityInterface $entity
   * @param \Drupal\salesforce_mapping\Entity\SalesforceMappingInterface $mapping
   *
   * @return mixed
   */
  public function pullValue(SObject $sf_object, EntityInterface $entity, SalesforceMappingInterface $mapping);
 

But be careful overriding them directly. The base class provides some useful handling of various data types that need massaging between Drupal and Salesforce, you may lose that if you aren’t careful. I encourage you to look at the details of both pushValue and pullValue before working on those.

Okay, with the configuration and values handled, we just need to deal with programmatically telling Drupal when it can pull and push these fields. Most of the time you don’t need to do this, but you can simplify some of the processing by overriding pull() and push() to make sure the have the right response hard coded instead of derived from other sources. In this case pulling the field would be bad, so we block that:

  /**
   * {@inheritdoc}
   */
  public function pull() {
    return FALSE;
  }

Also, we only want this mapping to appear as an option if the site has the webform module enabled. Without it there is no point in offering it at all. The plugin interface provides a function called isAllowed() for this purpose:

  /**
   * {@inheritdoc}
   */
  public static function isAllowed(SalesforceMappingInterface $mapping) {
    return \Drupal::service('module_handler')->moduleExists('webform');
  }

You can also use that function to limit a field even more tightly based on the mapping itself.

To further ensure the configuration of this mapping entity defines its dependencies correctly we can define additional dependencies in getDependencies(). Again here we are tied to the Webform module and we should enforce that during and config exports:

  /**
   * {@inheritdoc}
   */
  public function getDependencies(SalesforceMappingInterface $mapping) {
    return ['module' => ['webform']];
  }

And that is about it.  Once the class exists and is properly setup, all you need to do is rebuild the caches and you should see your new mapping field as an option on your Salesforce mapping objects (at least when isAllowed() is returning true).

Bypass Pantheon Timeouts for Drupal 8

Pantheon is an excellent hosting service for both Drupal and WordPress sites. But to make their platform work and scale well they have a number of limits built into the platform. These include process time limits and memory limits. While they are large enough for the majority of projects, large projects can have trouble.

For data loading their official answer is typically to copy the database to another server, run your job there, and copy the database back onto their server. That’s fine if you can afford to freeze updates to your production site. And have the time to setup a process to mirror changes into your temporary copy. And can afford some additional project overhead. But sometimes those things are not an option. Or the data load takes too long, or happens too often, for that to be practical on a regular basis.

I recently needed to do a very large import of records into Drupal on a Pantheon hosted site. The minimize user impact I started to play around with solutions that would allow me to ignore those time limits. We were looking at dong about 50 million data writes for the project. When I first estimated the process the running time was over a week.

The Outline

Since Drupal’s batch system was created to solve this exact problem it seemed like a good place to start. For this solution you need a file you can load and parse in segments, like a CSV file, which you can read one line at a time. It does not have to represent the final state of your data. While you can actually load the data raw, you can also load each record into a table or a queue to process later.

One quick note about the code samples, I wrote these based on the service-based approach outlined in my post about batch services and the batch service module I discussed there. It could be adapted to a traditional batch job, but I like the clarity the wrapper provides for this discussion.

The general concept here is that we upload the file and then progressively process it from within a batch job. My code samples below provide two classes to achieve this. The first is a form that provides a managed file field which create a file entity that can be reliably passed to the batch processor. From there the batch service uses a bit of basic PHP file handling to copy data into the database. If you need to do more than load the data into the database directly (say create complex entities or other tasks) you can set up a second phase to run through the values to do that heavier lifting. 

Load your file

To get us started the form includes this managed file:

   $form['file'] = [
     '#type' => 'managed_file',
     '#name' => 'data_file',
     '#title' => $this->t('Data file'),
     '#description' => $this->t('CSV format for this example.'),
     '#upload_location' => 'private://example_pantheon_loader_data/',
     '#upload_validators' => [
       'file_validate_extensions' => ['csv'],
     ],
   ];

The managed file form element automagically gives you a file entity. The value in the form state is the id of that entity. This file will be temporary and have no references once the process is complete and so depending on your site setup the file will eventually be purged. Which adds up to mean we can pass all the values straight through to our batch processor:

$batch = $this->dataLoaderBatchService->generateBatchJob($form_state->getValues());

If the data file is small, a few thousand rows at most, you can load it right away. But that runs into both time and memory concerns and the whole point of this is to avoid those. With my approach we can ignore those and we’re only limited by Pantheon’s upload file size. If the file size is too large for that you can upload the file via sftp and read from the file system, so you have options.

As we setup the file for processing in the batch job, we really need the file path not the ID. The main reason to use the managed file is they can reliably get the file path on a Pantheon server without us needing to know where they have things stashed. Since we’re about to use generic PHP functions for file processing we need to know that path reliably:

$fid = array_pop($data['file']);
$fileEntity = File::load($fid);
$ops = [];

if (empty($fileEntity)) {
  $this->logger->error('Unable to load file data for processing.');
  return [];
}
$filePath = $this->fileSystem->realpath($fileEntity->getFileUri());
$ops = ['processData' => [$filePath]];

Create your batch

Now we have a file know where it is. Since it’s a csv we can load a few rows at time, process them, and then loop back.

Our batch processing function needs to track two things in addition to the file: the header values and the current file position. So in the first pass we initialize the position to zero and then load the first row as the header. For every pass after that we need to find point we left off. For this we use generic PHP files for loading and seeking the current location:

// Old-school file handling.
$path = array_pop($data);
$file = fopen($path, "r");
...
fseek($file, $filePos);

// Each pass we process 100 lines, if you have to do something complex
// you might want to reduce the run.
for ($i = 0; $i < 100; $i++) {
  $row = fgetcsv($file);
  if (!empty($row)) {
    $data = array_combine($header, $row);
    $member['timestamp'] = time();
    $rowData = [
             'col_one' => $data['field_name'],
             'data' => serialize($data),
             'timestamp' => time(),
    ];
    $row_id = $this->database->insert('example_pantheon_loader_tracker')
             ->fields($rowData)
             ->execute();

    // If you're setting up for a queue you include something like this.
    // $queue = $this->queueFactory->get(‘example_pantheon_loader_remap’);
    // $queue->createItem($row_id);
 }
 else {
   break;
 }
}
$filePos = (float) ftell($file);
$context['finished'] = $filePos / filesize($path);

The example code just dumps this all into a database table. This can be useful as a raw data loader. If you need to add a large data set to an existing site that’s used for reference data or something similar.  It can also be used as the base to create more complex objects. The example code includes comments about generating a queue worker to run on cron or as another batch job. the Queue UI module provides a simple interface to run those on a batch job.

Final Considerations

I’ve run this process for several hours at a stretch.  Pantheon does have issues with systems errors if left to run a batch job for extreme runs. I ran into problems on some runs after 6-8 hours of run time. So a prep into the database followed by running on queue restart has been more reliable.

Docksal Pantheon Setup from Scratch

I recently had reason to switch over to using Docksal for a project, and on the whole I really like it as a good easy solution for getting a project specific Drupal dev environment up and running quickly. But like many dev tools the docs I found didn’t quite cover what I wanted because they made a bunch of assumptions.

Most assumed either I was starting a generic project or that I was starting a Pantheon specific project – and that I already had Docksal experience. In my case I was looking for a quick emergency replacement environment for a long-running Pantheon project.

Fairly recently Docksal added support for a project init command that helps setup for Acquia, Pantheon, and Pantheon.sh, but pull init isn’t really well documented and requires a few preconditions.

Since I had to run a dozen Google searches, and ask several friends for help, to make it work I figured I’d write it up.

Install Docksal

First follow the basic Docksal installation instructions for your host operating system. Once that completes, if you are using Linux as the host OS log out and log back in (it just added your user to a group and you need that access to start up docker).

Add Pantheon Machine Token

Next you need to have a Pantheon machine token so that terminus can run within the new container you’re about to create. If you don’t have one already follow Pantheon’s instructions to create one and save if someplace safe (like your password manager).

Once you have a machine token you need to tell Docksal about it.  There are instructions for that (but they aren’t in the instructions for setting up Docksal with pull init) basically you add the key to your docksal.env file:

SECRET_TERMINUS_TOKEN="HASH_VALUE_PROVIDED_BY_PANTHEON_HERE"

 Also if you are using Linux you should note that those instructions linked above say the file goes in $HOME/docksal/docksal.env, but you really want $HOME/.docksal/docksal.env (note the dot in front of docksal to hide the directory).

Setup SSH Key

With the machine token in place you are almost ready to run the setup command, just one more precondition.  If you haven’t been using Docker or Docksal they don’t know about your SSH key yet, and pull init assumes it’s around.  So you need to tell Docksal to load it but running:
fin ssh-key add  

If the whole setup is new, you may also need to create your key and add it to Pantheon.  Once you have done that, if you are using a default SSH key name and location it should pick it up automatically (I have not tried this yet on Windows so mileage there may vary – if you know the answer please leave me a comment). It also is a good idea to make sure the key itself is working right but getting the git clone command from your Pantheon dashboard and trying a manual clone on the command line (delete once it’s done, this is just to prove you can get through).

Run Pull Init

Now finally you are ready to run fin pull init: 

fin pull init --hostingplatform=pantheon --hostingsite=[site-machine-name] --hosting-env=[environment-name]

Docksal will now setup the site, maybe ask you a couple questions, and clone the repo. It will leave a couple things out you may need: database setup, and .htaccess.

Add .htaccess as needed

Pantheon uses nginx.  Docksal’s formula uses Apache. If you don’t keep a .htaccess file in your project (and while there is not reason not to, some Pantheon setups don’t keep anything extra stuff around) you need to put it back. If you don’t have a copy handy, copy and paste the content from the Drupal project repo:  https://git.drupalcode.org/project/drupal/blob/8.8.x/.htaccess

Finally, you need to tell Drupal where to find the Docksal copy of the database. For that you need a settings.local.php file. Your project likely has a default version of this, which may contain things you may or may not want so adjust as needed. Docksal creates a default database (named default) and provides a user named…“user”, which has a password of “user”.  The host’s name is ‘db’. So into your settings.local.php file you need to include database settings at the very least:

<?php
$databases = array(
  'default' =>
    array(
      'default' =>
      array(
        'database' => 'default',
        'username' => 'user',
        'password' => 'user',
        'host' => 'db',
        'port' => '',
        'driver' => 'mysql',
        'prefix' => '',
      ),
    ),
);

With the database now fully linked up to Drupal, you can now ask Docksal to pull down a copy of the database and a copy of the site files:

fin pull db

fin pull files

In the future you can also pull down code changes:

fin pull code

Bonus points: do this on a server.

On occasion it’s useful to have all this setup on a remote server not just a local machine. There are a few more steps to go to do that safely.

First you may want to enable Basic HTTP Auth just to keep away from the prying eyes of Googlebot and friends.  There are directions for that step (you’ll want the Apache instructions). Next you need to make sure that Docksal is actually listing to the host’s requests and that they are forwarded into the containers.  Lots of blog posts say DOCKSAL_VHOST_PROXY_IP=0.0.0.0 fin reset proxy. But it turns out that fin reset proxy has been removed, instead you want: 

DOCKSAL_VHOST_PROXY_IP=0.0.0.0 fin system reset.  

Next you need to add the vhost to the docksal.env file we were working with earlier:

 VIRTUAL_HOST="test.example.org"

Run fin up to get Docksal to pick up the changes (this section is based on these old instructions).

Now you need to add either a DNS entry someplace, or update your machine’s /etc/hosts file to look in the right place (the public IP address of the host machine).

Anything I missed?

If you think I missed anything feel free to let know. Particularly Windows users feel free to let me know changes related to doing things there. I’ll try to work those in if I don’t get to figuring that out on my own in the near future.

FormAssembly Dynamic Parameter Signing

For a project I’ve been working on recently we had need to create a module that provides secure redirects from a Drupal site to FormAssembly. Overall the module does a number of things, but handling dynamic parameter signing was the thing that took the most time.

FormAssembly provides a variety of great features for creating flexible forms that integrate with Salesforce. One of the more popular features is its ability to pull data from Salesforce to prefill fields on a form. But the downside is that it is easy to create forms that leak information from Salesforce into those forms, and create privacy risks.

To address this, FormAssembly allows 3rd party tools to securely sign URLs that contain parameters (often Salesforce IDs) that could be used to extract information through an iteration attack and other basic approaches. This secure signing process can be done statically but for most interesting projects you want to sign the URLs dynamically. The dynamic signing process allows you alter the parameters on the fly and set an expiration date to limit the value of a stolen link. Our project required this approach.

But the dynamic signing process has a couple sharp corners. First, it’s rarely done outside of Salesforce so there aren’t a lot of code samples around, and none that I could find in PHP.  Second, FormAssembly is very open and honest about the fact that they do not provide support on this feature. So I had to create my own process from the documentation they provide.  The docs are good, but very Salesforce centric, with all code samples in APEX.

The process involves preparing the data for signature, generating a HMAC-SHA256 with a form specific pre-shared key (in binary mode), converted to a string using base64, and finally URL encode the result.

Their convention for preparing the data is straightforward. You format all parameters as just their key and value strung together: key1Value1key2Value2

The interesting part is the actual HMAC-SHA256, which needs to be generated in binary mode, something that is often the default mode but not in PHP (in fact most PHP devs I’ve talked don’t realize the last parameter to hash_hmac() is useful, if you are doing this in another language check out this collection of examples).

From there you encode the output in base-64 (which results in a 44 character hash), and URL encode the hash to make sure it’s URL safe, and you’ll end up a few characters longer.

Finally you add you hash to the query string, and you’re ready to go.

To help anyone else who needs to do this, I generalized this part of the solution and I created and tossed it into Gist.

SC DUG February 2019

Will Jackson – Local Development in Docksal

For the SC DUG meeting this month Will Jackson from Kanopi Studios gave a talk about using Docksal for local Drupal development. Will has the joy of working with some of the Docksal developers and has become an advocate for the simplicity and power Docksal provides.

We frequently use these presentations to practice new presentations, try out heavily revised versions, and test out new ideas with a friendly audience. If you want to see a polished version checkout our group members’ talks at camps and cons. So if some of the content of these videos seems a bit rough please understand we are all learning all the time and we are open to constructive feedback.

If you would like to join us please check out our up coming events on Meetup for meeting times, locations, and connection information.