Thinking About Languages

I’ve been thinking about languages a lot lately.

Mostly the kind I encounter in my work, programming languages, but also human languages and the intersection between the two.

My thinking runs parallel to what I talked about bringing from my Drupal experience to my Salesforce work. I have been considering what I’ve taken from work on one programming language that made me better in others. 

A few weeks ago I asked some friends in various professional circles about their thoughts on programming languages. “What’s your personal theory about learning new languages? Have you ever learned enough? How often should you learn a new one? How do you count languages you last used 5+ years ago? What’s reasonable to expect of other developers?”

The responses ranged widely, but generally all agreed we benefit from learning more languages throughout our careers, but particularly when we are starting out. Not surprisingly the clarity of thinking ranged as well with more experienced programmers being clear about the value they had taken from different languages they had used over time.

My Experience with Languages

During my first job after college Frank Weber, an AFSC volunteer who served as a mentor of mine, suggested that I should learn a new language every year. As I work with developers who are earlier in their career than I am, I have been trying to decide how important Frank’s advice was: I’ve settled on extremely important.

Portait of Kahlil Gibran
This portrait of Kahlil Gibran I saw on display at the Telfair Academy. His writings influenced my thinking about language and poetry, but always in my native language not his.

As a kid I was taught and learned a small number of languages like BASIC, Logo, and HyperTalk. In college, I learned several more including 8 in the course of a 16 week semester – a rough but useful experience. After college, when Frank told me to learn a language a year, I did that for several years. My career has largely centered around languages I taught myself basically on my own post-college.

As a result, at one point or another, I have written programs in more than 20 different programming languages. I can learn new languages quickly, in part because it is rare for me to encounter a language that has constructs I haven’t used previously. Along the way I discovered I can even help people use a language I don’t know just based on the patterns I know from other languages.

On the other hand, I have tried to learn three human languages in addition to English: French, Russian, and Spanish – all went terribly. And my inconsistent attempts to learn some ASL have fallen in the middle: I had some success, but never put in enough effort to learn it well.

That dichotomy makes it tempting to emphasize the differences between these two types of language. In ways they are very different. Human languages evolve naturally over time; machine languages have rigidly defined rules. Human languages try to help us express the full range of our experience; machine languages are designed to get a machine to do complete tasks. 

But many concepts connect the two sets. Machine languages, like spoken languages, are tools created by people to express themselves. We label their structures as grammar in both, because we stole the formalization concepts from traditional languages when we invented programming languages. They both evolve over time. At this point they both borrow concepts from one another.

Growing Language Understanding

The thing that shifted for me recently was listening to a Hidden Brain episode Watch Your Mouth about human languages. The first section featured Lera Boroditsky talking about the impact spoken languages have on how people think. She cites examples of cultural understanding of ideas around direction and gender. There are similar ideas about our emotions that are also linked with the language we use to describe our emotions.

That had very direct correlations to programming. Hearing those discussions catalyzed the thinking I was already doing.

How I see and solve a problem in code can vary greatly with the language and platform I’m using. A Python application running on a server can use a radically different approach from a solution built in Apex on Salesforce. JavaScript running in a browser requires different solutions and details from solving that same problem in JavaScript running in a Node environment. Java largely frees you of worrying about memory handling, while many languages in the C-family force you to consider the storage implications of nearly every byte. At times when trying to work through a problem in one framework it can help think of how I’d solve it in another better suited to the task.

The second half of the Watch Your Mouth episode included a discussion of how languages change over time. It’s an interview with Linguist John McWhorter, and his work at looking at how language changes over time. He pushes back on the notion that we should all use the version of English someone tried to teach us in school: 

“It’s a matter of fashion, pure and simple. People do need to be taught what the socially acceptable forms are. But what we should teach is not that the good way is logical and the way that you’re comfortable doing it is illogical. It should just be, here is the natural way, then there’s some things that you’re supposed to do in public because that’s the way it is, whether it’s fair or not.“

John McWhorter

Programming languages have fashions and patterns – and they are rarely fair. A new language like Go will become popular in part because it’s new and used by a popular company. An older language will go out of fashion like C++ or Java, because newer languages feel more graceful even if there are technical trade-offs for that grace. They evolve, more formally than English but less formally than people pretend. Just because there is a standards committee does not mean that committee is purely selecting the “best” solution.

Technical Intersections Between Languages

XKCD with an Engineer thinking he can control the flow of languages

There is also a point where all these languages intersect: Unicode. Unicode tries to give us ways to express human languages in machine form – all written language. Every developer should understand how language is stored in files because our code is stored in that format, our servers talk to each other in that format, and humans will express themselves to our applications in that format. 

My Theory on Programming Language Acquisition

Like plumbers, electricians, doctors, nurses, and really anyone with a career: Developers need to master the tools and techniques of their trade. Programming languages are tools, and the techniques we learn from one make us better in another. We do not have to like the language designs we encounter, but we need to understand them. 

Every developer should teach themself a new language on a regular basis, either for work or for their own education. We should pick languages that are uncomfortable to us. Some should be new and popular, some should be old and foundational.

So if you’re a developer, go learn a new language. If you’re not a developer, go learn something different from what you use every day. It’ll force you out of your comfort zone and get you to learn more than you expect.

Languages I’ve written one or more programs in:

Just for fun, here’s the list of languages I can remember writing something substantive in:

  1. C
  2. C++
  3. C#
  4. Objective-C
  5. Python
  6. Ruby (with and without Rails)
  7. Java
  8. JavaScript
  9. PHP (versions 3-8, which only have passing resemblance to each other)
  10. ASP
  11. Logo
  12. BASIC (for Apple ][, and DEC Rainbow)
  13. VisualBasic
  14. VB.Net
  15. VBScript
  16. VBA
  17. HyperTalk
  18. AppleScipt
  19. Apex
  20. R
  21. Scheme
  22. Haskel
  23. Perl
  24. Prolog
  25. SQL, T-SQL, SOQL, and SOSL (sure arguably not languages which is why different flavors are listed as one)
  26. HTML/CSS (together they are mostly Turing complete, so arguably counts)
  27. XML/XSLT

What I Brought from Drupal to Salesforce

If you look over this blog, or know me, you will see that I’ve now have reasonably significant experience as both a Salesforce and Drupal developer. The last couple weeks I have been thinking about what from my Drupal experience supports my work as a Salesforce developer.

I think there are three parallels that are encouraged in Drupal developers that helped me learn to be a good Salesforce developer quickly.

  1. Embrace, don’t fight, Platform Constraints
  2. Extend the Platform’s Strengths
  3. Leverage Events

Embrace Salesforce Platform Constraints

Both Drupal and Salesforce run in constrained environments. Web applications, regardless of their purpose, have to protect themselves against bad actors and bad neighbors. There are execution timeouts and memory limits, for both platforms. Salesforce adds a variety of additional limits and governors, but they are all logical extensions about memory and time. Lots of other platforms allow developers to ignore resource use until they reach a crisis point. Don’t believe me, just check the memory used by Chrome, Electron Apps, or any other Chromium-based application.

Working in resource constrained environments forces you to think through how to use the resources you do have efficiently. While these platforms aren’t like working on hardware with highly limited resources, they still can test your resourcefulness.

Drupal and Salesforce both provide ways to run large jobs across many processing contexts. New developers on both platforms often only resort to using batch and queueable operations as a last resort, but learning to use those solutions is critical to success on most interesting projects. When you try to avoid them you create solutions that appear to work and fail at scale.

Coming to Salesforce from Drupal, I already knew and understood the importance of asynchronous batch processing. So when solutions needed batch processing it was second nature to learn that part of the platform.

Extend the Platform’s Strengths

For all Salesforce’s push and marketing to avoid code, Salesforce developers are often taught that once you write code you just do everything in your code. The interfaces you can use to extend the platform’s existing solutions are treated as advanced topics. But when you work with Drupal, you are encouraged from the start to create modules that build on and extend the platform’s existing strength. Drupal developers are encouraged to leverage the features and utilities all around them.

This has always been true, but even more after Drupal’s move to leverage Symphony plugins and services. As a developer used to extending the platform, I came to Salesforce looking for ways to extend the platform’s declarative tools.

Often Salesforce developers create powerful solutions built purely in code triggered by record changes or simple buttons. They look passed Apex Actions that extend the Flow declarative automatons, platform events, and other tools that extend the system. But when you embrace a platform’s basic structures you often create more flexible solutions than your could with pure code.

The mind set of extending a platform, which I brought with me from my Drupal work helps me create tools and solutions that are designed to adapt over time.

Leverage Events

Event driven architectures are not new, but their popularity continues to grow. Where platforms used to follow informal patterns that equated to event systems, now we see formal event structures being build to replace old habits.

Drupal and Salesforce both have had event frameworks for a long time: Drupal had hooks (events by naming convention), Salesforce had triggers.

Both have seen major upgrades to their event patterns in recent versions. Platform events in Salesforce, still making their full power clear to a lot of developers. Symphony brought proper events to Drupal in version 8 and continue to help push the platform forward.

I have learned to leverage the events systems on both platforms. Understanding them as tightly constrained state machines, and learning to push them to their limits, helps me get the most from both platforms.

My experience with Drupal hooks and events has made it obvious to me when to leverage Salesforce’s Platform events. As Salesforce increases the number of places you can use them in their declarative tools, it increases the value of this approach.

So What?

As a developer, what you learn in one part of your career can make you stronger in the next. As a field we’re not actually that creative. Even if the details are different, the concepts will often carry forward because they are built on the same fundamentals. Whatever platform you are using today, learn how to make it sing – it’ll help you learn the next faster and better.

Getting Started with Salesforce2Sql

Salesforce2Sql is a tool dedicated to doing one thing very well: mirroring Salesforce schema.

A little over a year ago I started work on Salesforce2Sql to help support people doing data work with Salesforce. Salesforce2Sql is a simple Electron app that allows you to clone the schema of a Salesforce org into an SQL database (currently supports MySql/MariaDB and Postgres). These mirrors are useful when you want to stage data during migrations in or out of Salesforce.

When is Salesforce2Sql useful

Salesforce2Sql is a tool dedicated to doing one thing very well: mirroring Salesforce schema. It does not attempt to extract data or convert data in any way.

Salesforce provides excellent APIs for data import and export, but they work best with some prep work first. Salesforce2Sql gives you a staging database that mimics your Salesforce schema. In these schema mirrors you can prepare all your data for high speed processing off-platform. Anyone who is looking to move data into or out of Salesforce at high speeds and large volumes benefits from this setup.

I have seen people write large complex ETL jobs meant to go right from one source system into Salesforce. These jobs can be hard to test and they are slow to run. They generally don’t give you an easy way to review the data before you push to Salesforce. By landing the data in a database clone you can break the jobs into stages, review transformations before they are loaded, and run thousands of iterations for testing instead of dozens.

Setup

Salesforce2Sql is built and released for MacOS, Windows, and Linux (most testing is on Mac and Windows). You can download the installers from the current release and follow the standard patterns for your OS. From there start the application to get to work.

You will also need API access to a Salesforce org and a database to create your schema within.

Basic Salesforce2Sql Use

As of this writing Salesforce2Sql uses the old security token connection method. I would like to add OAuth2 support as well  but haven’t gotten that done; contributions are welcome. So with the application running, and your security token in hand, click the big “Create New Connection” button on the left side of the main interface.

Salesforce2Sql Main Screen

Step 1: Fetch all objects

Once connected click “Fetch Objects”, and the tool will download a list of every object in your org. There will be several hundred. So the next step is to select which you want to mirror. You will notice Salesforce2Sql selected defaults for you (well I did). It will select all custom objects and based on your org’s structure Salesforce2Sql guess which standard objects to select.  There is a search box at the top right to help you find any others you’d like to add but simply checking the box.

Step 2: Fetch all fields

Click the next button to move to the Proposed Schema tab, and then the “Fetch Details” button. Now Salesforce2Sql will query every field on every object you just selected (this may take a moment but honestly I find it much faster than I expected when I first started this project). Once that is complete you can either save that schema to JSON for later re-use, or click Next to move to the “Generated Database” tab.

Step 3: Generate tables

This is the last step. Click “Create Tables” and Salesforce2Sql will ask you for your database credentials. Once you click okay on this final screen the tool will attempt to create all those tables for you. Again this will take a couple minutes if you have a large schema.  Once the process is complete you can also save the SQL statements for editing and/or later re-use.

That’s it, you now have a database with a schema that matches your org’s structure.

Preferences

There are a few preferences you might want to experiment with (although I tried to pick smart defaults) when building mirrors. For me the right choices depend on my use case.

Preference screen from the application with sections for picklist settings, index settings, other defaults, and theme.

Picklists

Salesforce Picklists are a bit of a special beast. The obvious choice is to make a picklist into a SQL Enum to support validation of data. But not all picklists are restricted in Salesforce and aren’t always required. By default the tool will use enum for restricted picklists, varchar for unrestricted picklists, and add blank values to all picklists (since it can’t easily determine if a given picklist is required or not across all page layouts).

If the picklist values in your org are pretty much set, the default settings make a lot of sense. If the picklist values in your org are likely to change you might want to make them all into regular varchar columns.

Auto Indexing

The next important section contains the index settings. While it is possible to over-index a database I think the three sets of default indexes are pretty good guesses: Id columns, external Ids, and picklists. The one you are most likely not to care about are the enums, and therefore the one you might consider disabling if you aren’t processing on those fields at all. Id columns are now case-sensitive even in MySQL by default as of version 0.7.0 since Salesforce Ids are case sensitive.

Additional Settings

The other defaults section let’s you pick a few other system behaviors. The most common to fuss with are first and last in the box.

By default it expects Lookup fields to be 18 character Salesforce Ids – cause that’s what they will be in Salesforce. But during a data migration some people like to put legacy Ids into these fields (I recommend a proper legacy Id field marked as an external Id) so I give the option to use 255 characters instead of 18. 

Salesforce also has two categories of fields that are common to ignore in a migration and you may wish to keep out of your database just for ease of use: the audit fields (createdBy and the like) and read-only fields (like formulas). The final two checkboxes in that other defaults section lets you keep those fields out of your clone schema.

The middle two fields control the behavior of field defaults – generally I like these two settings as is in just about every use case, but you may feel differently.

Salesforce2Sql uses Bootswatch themes for design elements. The preference pane also lets you pick a different look-and-feel from their theme list.

Final Notes

There are a couple other details worth knowing.

First, if the process runs into a problem where the SQL engine complains about row size limits Salesforce2Sql will automatically switch all varchar fields to TEXT fields in an attempt to reduce the row size. That will override all preference settings for these fields.

Second, Knex.js – which Salesforce2Sql uses to handle the actual SQL writing – adds indexes as a table alter even when they could be part of the create statement. This makes the process a bit slow and it means that if there are errors during the creation of indexes you may see some errors in the interface but leave you with a pretty-good schema clone.

Finally, yes there is lots of room for improvement. I work on this project when I can, or when I need a bug fixed for my own work. I am excited to get suggestions, ideas, feedback, documentation edits, and code submissions.

Project Estimates Tool 2.0

A few years ago I wrote a piece about project time estimation and created an estimating tool. My goal was to get project managers to listen to the fact that estimates were inherently a guess not promise. The tool I created took a series of project tasks, the estimated time range, and a level of estimator confidence. It then ran a Monte Carlo simulation with those tasks, and generated a histogram of possible outcomes.

Five years later I still use it for project estimates. But I have grown tired of its interface weaknesses and needed to add cost estimation to keep it useful. So I recently heavily revised the tool and posted an updated version (the old version is still available here).

The interface is still very utilitarian (pull requests welcome), but this version makes it make easier to adjust the tasks. Much more importantly it now also estimates costs, not just time.

Histogram of time estimations
XY Scatter plot of cost projections, in a nice bell curve shape

The new version is faster than the previous. And it adjusts the graph type based on the range of possible outcomes.

Each task now includes inputs for min and max time, confidence, and hourly cost of that task. So if different people at different bill rates are part of the project it can still give you useful numbers.

The histograms broke down when faced with too many bars. So I settled on an XY scatter approach to help visualize the broader range that the cost estimator made normal.

Please give it a try, I’m always open to feedback, suggestions, and pull requests.

Costs Estimates

For this version I added the ability to include a unit cost for each task. The first tool worked just fine when you were estimating the tasks for one person who had one billing rate (or where hourly costs aren’t important). In practice teams need to be to able to do an estimate across all work streams, and different roles will have different billing rates.

This version includes a rate for each task and a graph of projected project costs.

Why I Created A Project Estimator

I wrote the original when I was struggling with project managers who would take any estimate you gave them as a range, pick a number, and promise the client (and themselves) we would hit it. To them an estimate was a promise – one that had to be kept. That lead me to badly overestimate projects so that the lowest end of my range would be a safe number – but that’s just a different form of bad estimation.

Histogram of time estimates.
This graph from the original version helped convince PMs that estimates weren’t promises.

I had a good amount of experience providing estimates, and had read a lot on the topic. I knew there were teams that did better and I wanted to help our team improve.

The original tool was loosely inspired by one Joel Spolsky described ten years earlier. He has several important ideas on his process regardless of your project methodology. But his idea of using Monte Carlo simulations had stuck with me since that article had been new. After failing to find a tool that included it, I wrote my own.

Are the Project Estimates Any Good?

Fundamentally the simulations are only as good as the estimates provided. For any project I have been able to compare my simulated project estimates to final hours my work fell within one standard deviation of the median.

The confidence measure helps more than I expected. Originally, I added the measure of confidence because I needed something to determine how often the simulator should assume people are just plain wrong – and by how much. While I could have hard coded a solution I did not know how to pick good values. I knew that my confidence varies by the task. I also knew the less confident I am the more I am likely to be wildly off. So decide to make confidence an estimator provided variable, and use that to pick the size of overruns.

For every 10% you reduce the confidence, the simulation will allow the upper bound of the estimate to increase by the size or the entered upper bound. On a task you estimate at 7-10 hours, a 90% confident estimate will allow overruns up to 20 hours (just in the 10% of times that aren’t in the 7-10 range), and 30 hours for an 80% estimate.

That extra box also immediately helped me feel comfortable with my estimates. Knowing that the simulator would offset optimism bias for me I could stop trying to do that myself. My estimates can use tighter ranges trusting the software to offset expected bias.

A Value of the Graphs in Project Estimates

The graph has turned out to be the most important feature. Initially I included it because I wanted to play with D3 and have something more impressive than numbers to show. What I discovered was a reminder of the importance of data visualizations – even simple ones.

As I said before I created this tool when working with project managers who simplified all estimate ranges to a single number and held everyone to that number. The first time I presented numbers from the simulator those project managers picked the median and complained I made it too hard. The median was better than what we had before, but not enough to treat as a promise.

When I started presenting the graph those same people immediately started to change how they talked about the project. By visualizing the impact of uncertainty over several tasks they could see that the project might run far over my estimate – or far under. The more uncertainty, the longer the tail on the graph.

Suddenly they were comfortable talking about risks from overruns, finding ways to help clients understand the possible risks, and being understanding when a task proved harder than expected.

The graphs tell the story, and empowers the team to have an honest and productive about project estimates.

Estimate your own project timeline.

Announcing Two New Snowfakery Faker Providers

For the last two years I’ve been fortunate to serve as a leader of the Salesforce Open Source Commons Data Generation Toolkit project. That project has produced and inspired a variety of efforts, including Snowfakery and a collection of starter recipes.

This week, along with my colleague Allison Letts, Salesforce’s Paul Prescod (the creator of Snowfakery), and our fellow project contributor Jung Mun, I helped create two new faker providers for Snowfakery:

What These Faker Providers do

The use of Snowfakery is growing. The more we use it, the more we want the data tailored to specific projects. And the more we find places where the Faker project’s providers do not have quite what we want. In particular the project does not (well did not) have providers for nonprofits and education specific data.

The Nonprofit provider currently just provides organization names:

$ snowfakery snowfakery_nonprofit_example.recipe.yml --target-count 10 nonprofit
nonprofit(id=1, nonprofit_name=Eastern Animal Asscociation)
nonprofit(id=2, nonprofit_name=1st Animal Foundation)
nonprofit(id=3, nonprofit_name=Upper Peace Alliance)
nonprofit(id=4, nonprofit_name=Southern Peace Home)
nonprofit(id=5, nonprofit_name=Unity Home)
nonprofit(id=6, nonprofit_name=Western Peace Home)
nonprofit(id=7, nonprofit_name=Upper History Foundation)
nonprofit(id=8, nonprofit_name=Upper Friends Committee)
nonprofit(id=9, nonprofit_name=Eastern Pets Center)
nonprofit(id=10, nonprofit_name=Northern Animal Foundation)

For the Education provider we have a bit more. You can generate college names, departments, and faculty titles.

$ snowfakery snowfakery_edu_example.recipe.yml
Account(id=1, Name=South Carolina University)
Contact(id=1, FirstName=Roberto, LastName=Stanton, Title=Associate Professor of Microbiology & Immunology)
Account(id=2, Name=French)

The providers can of course run as a standard faker community provider. Once you are setup with Faker just add the new providers with pip:

pip install faker-edu faker-nonprofit

Then you can use the libraries in your code:

from faker import Faker
import faker_edu

fake = Faker()
fake.add_provider(faker_edu.Provider)

for _ in range(10):
    print(fake.institution_name())
from faker import Faker
import faker_nonprofit

fake = Faker()
fake.add_provider(faker_nonprofit.Provider)

for _ in range(10):
    print(fake.nonprofit_name())

How we got here

A few months ago I posted on how to extend Faker to create nonprofit organization names. Allison took that as a starting point to create a similar project to generate names of colleges, departments, and academic titles (and a greatly improved phone number generator but that’s not included since it’s more general). Both of these projects were good proof-of-concept but were rough around the edges. So this week, with Paul’s guidance and Jung’s input, we contributed the more polished versions to the community.

During a virtual working session on Wednesday we restructured the projects, cleaned up code, added sample recipes for Snowfakery, and published them to PyPi. By publishing these as Faker providers on PyPi, and not as Snowfakery plugins, they are available to a wider audience. By having them owned by a larger open source community we are expecting them to enjoy long-term support.

Both are still just getting started. For example the nonprofit one still just generates organization names, but would benefit from job titles, program names, and more. The EDU provider right now just handles colleges, but is expected to generate other education related data in the future. We also have a plan for a third provider to help improve the diversity of the names generated by Faker to make our fake data more representative of real communities.

The Data Generation Toolkit project has been a great example of what happens when you bring people from a wide variety of backgrounds together to solve technical problems. Like the larger Data Generation Toolkit team Paul, Allison, Jung, and I all have different backgrounds, skills, and experiences. By coming together we are able to help each other find better solutions than any one of us would have found on our own.

It’s been an exciting week, and I’m looking forward to more to come.

Snowfakery Custom Plugins Part 2

Last week I posted part 1 of this series on creating plugins for Snowfakery. This second installment covers creating custom Faker Providers to use as plugins for Snowfakery.

Why create a Faker Provider instead of Snowfakery Plugin

Python’s Faker library provides many useful things, but not everything you might want. Fortunately it is designed to be extended, and Snowfakery is designed to help make it possible for those extensions to be project specific when you need to (or when it makes it easier for your team to share recipes).

Sometimes you need to have all the features of a Snowfakery plugin, like maintaining state. But often you just need to kill a gap left by the Faker community offerings. In that case, a Faker Provider may offer longer term benefits like re-usability in other projects, and useful helper functions, that give it the strong choice.

Things you’ll need

You do not need to have reviewed everything in Part 1, but you will want to look at the project structure section and to make sure you have the following tools:

  1. Python 3 (any recent-ish version should be fine), and the experience to read and write simple Python scripts (they aren’t any harder to follow than YML just different).
  2. Snowfakery 1.12 or later. Note: if you have CCI installed it contains Snowfakery but on Windows you may need extra setup.
  3. The Faker module for Python (probably via: pip3 install Faker). Not strictly required but can help you with testing.
  4. A code editor you like that supports both Python and YML (which is pretty much anything good).
  5. You probably want experience working on at least one or two Snowfakery recipes.

Reminder about project structures

In part 1 we build a project around the built-in search patterns of Snowfakery.

Snowfakery looks for plugins in a select number of places:

  • The Python path.
  • In a plugins directory in the same directory as the recipe.
  • In a plugins directory below the current working directory.
  • A sub-directory of the user’s home directory called .snowfakery/plugins.
Recipe directory layout

For this example the plugins directory will live within the folder with our recipe (called recipes), but you can move the plugins directory up a level and it will work just as well.

To get started, in your project create a recipes directory. Next create a plugins directory within recipes. Then create faker_nonprofit directory in plugins. You can ignore the snowHelper and snow_product.yml examples from part 1 in the screenshot on the right, but that’s generally what we’re doing.

Create your first Snowfakery Faker Provider

Faker itself does not have a community supported generator of Nonprofit organization names, and when you work with nonprofits a lot sometimes you need those. That is exactly what we’ll create in just a minute.

In the plugins directory you created in part 1, create a directory for your new provider with the pattern faker_[my_service_name], in my case faker_nonprofit. Then in that new directory create a file named __init__.py this defines the Faker provider Python module.

You can see the full working example here but I’ll walk through the outline.

The file opens by providing a doc string and importing the faker.providers module.

"""Provider for Faker which adds fake nonprofit names, and program names."""
import faker.providers

The next several lines of the faker_nonprofit module provide arrays of words to use in the generation of fake names. Depending on what your provider does this may or may not be useful to you (but it’s very common).

Then we define the Provider itself as a class that extends the BaseProviders class, with whatever methods you want to call:

class Provider(faker.providers.BaseProvider):
   """Provider for Faker which adds fake nonprofit information."""
 
   def nonprofit_name(self):
       """Fake nonprofit names."""
       prefix = self.random_element(PREFIXES)
       suffix = self.random_element(SUFFIXES)
       topic = self.random_element(TOPICS)
       return " ".join([prefix, topic, suffix]).strip()

Those random element selections are from the arrays of words I mentioned a minute ago. Basically we’re just building a name from some selected words.

A note about tests

Unlike Snowfakery plugins, Faker projects have existing patterns for how to setup tests (they are not totally consistent but there are patterns out there). So the complete code on Github includes a testing module as well, and you should consider something similar for yours (particularly if you are considering making it available to the wider community). It’ll allow you to test your provider generically not just when it’s running through Snowfakery.

Linking your Provider to a Snowfakery Recipe

Now that we have a local Faker Provider, we need to connect it to our recipe.

A very simple recipe (in our recipe directory) should demonstrate the output quite nicely:

- plugin: faker_nonprofit.Provider

- object: Account
  fields:
    Name:
      fake: nonprofit_name

Run the recipe through Snowfakery and you can see the name appears believable:

$ snowfakery recipes/sample_recipe.yml
Account(id=1, Name=Southern Unity Community)

The project uses a slightly larger recipe here that uses both plugins and generates more than one object. The full recipe in the repo will create one Salesforce Account object that has a Fake Nonprofit name, a custom field for a Main Service that will be one of our snowy puns, an address, and two related contacts with all their basic information included.

$ snowfakery recipes/sample_recipe.yml 
Account(id=1, Name=Upper Friends Committee, Main_Service__c=Snowmanage, BillingStreet=4773 Giles Plains Suite 878, BillingCity=South Daniel, BillingState=Virginia, BillingPostalCode=34516, BillingCountry=United States, ShippingStreet=7844 Hester Shore Apt. 299, ShippingCity=Maynardview, ShippingState=Indiana, ShippingPostalCode=86323, ShippingCountry=United States, Phone=956.673.3002x471, Fax=+1-786-744-2112x36239, RecordType=Organization)
Contact(id=1, AccountId=Account(1), Salutation=Misc., FirstName=Isaac, LastName=Barr, Email=joypeters@example.com, Phone=+1-808-508-0989x418, MobilePhone=(987)475-7200x8072, Title=Tour manager, Birthdate=1982-03-27)
Contact(id=2, AccountId=Account(1), Salutation=Mx., FirstName=Erica, LastName=Lopez, Email=angel01@example.org, Phone=(011)243-1677x868, MobilePhone=(079)466-5474x52399, Title=Research officer, political party, Birthdate=2000-07-07)

If you have interest in seeing the Nonprofit Provider made into a more complete tool and released as its own project please let me know.

Snowfakery Custom Plugins Part 1

Last November I wrote a bit about creating Salesforce data with Snowfakery.  I’ve continued to use the tool for work, provide feedback to the project maintainer, and help the Salesforce Open Source Commons Data Generation Toolkit Project as we started to build a library of sample recipes. Hopefully I will have more to say on that after the next Community Sprint.

Snowfakery not only gives you a way create carefully shaped relational data sets of nearly any size, it also allows you to create plugins to extend its abilities. those plugins come in two flavors: Snowfakery Plugins, and Faker Providers.

For more technical details you may want to read the project has documentation. My intention here is to provide an end-to-end example of how to make them work.

This article started out as one long piece but to keep it focused I’ve decided to break it into two parts:

  • Part 1 covers Snowfakery Plugins.
  • Part 2 covers creating custom Faker Providers for Snowfakery projects.

The code for both parts is on Github if you want to see the project as a whole.  

Things you’ll need

  1. Python 3 (any recent-ish version should be fine), and the experience to read and write simple Python scripts (they aren’t any harder to follow than YML just different).
  2. Snowfakery 1.12 or later. Note: if you have CCI installed it contains Snowfakery but on Windows you may need extra setup.
  3. A code editor you like that supports both Python and YML (which is pretty much anything good).
  4. You probably want experience working on at least one or two Snowfakery recipes.

Snowfakery Plugin Project Structure

This setup just talks about Snowfakery recipes on their own, not within a larger project, but the concepts are the same even if the details are different.

Snowfakery looks for plugins in a select number of places:

  • The Python path.
  • In a plugins directory in the same directory as the recipe.
  • In a plugins directory below the current working directory.
  • A sub-directory of the user’s home directory called .snowfakery/plugins.
Recipe directory layout

For this example the plugins directory will live within the folder with our recipe (called recipes), but you can move the plugins directory up a level and it will work just as well.

To get started, in your project create a recipes directory. Next create a plugins directory within recipes. Until part 2 you can ignore the faker_nonprofit directory in the screenshot on the right, but that’s generally what we’re doing.

Create Your First Snowfakery Plugin

In the Snowfakery community we use a lot of snow-based puns to name things. So to help create fake sounding products for our projects we might need a simple plugin to generate us new words that match our general naming convention.

In the plugins directory create a new file called snowHelper.py. And copy the follow Python code into your editor:

from snowfakery import SnowfakeryPlugin
 
class SnowPunnary(SnowfakeryPlugin):
   class Functions:
       def snowpunner(self, word):
           return 'Snow' + word

The code here is pretty straight forward if a little nested. We are loading the SnowfakeryPlugin class from Snowfakery itself, and then extending that class to create our plugin. Snowfakery assumes that the plugin has a subclass to hold your plugin functions (called Functions) and that you add your functions to that subclass. Your functions can have a parameter (here the word being punned on) to accept inputs from other parts of the recipe.

Our SnowProduct Recipe

Now we need a recipe that will actually use our plugin. In the recipes directory create a file called snow_product.yml and copy in the following YAML code:

- plugin: snowHelper.SnowPunnary
 
- object: Product
  count: 10
  fields:
    Name:
      SnowPunnary.snowpunner: ${{fake.word}}

In the first line we load our plugin using Python’s module naming convention – because it is getting loaded as a Python module. The pattern here is the file name (without file extension) then the class name. In the last line we then call the plugin’s function by referencing the class name and the function name.

You can run the file directly in Snowfakery and see the outputs:

$ snowfakery snow_product.yml
Product(id=1, Name=Snowplantary)
Product(id=2, Name=Snowbadary)
Product(id=3, Name=Snowmillionary)
Product(id=4, Name=Snoweffortary)
Product(id=5, Name=Snowgreenary)
Product(id=6, Name=Snowbehaviorary)
Product(id=7, Name=Snowcouldary)
Product(id=8, Name=Snowforceary)
Product(id=9, Name=Snowyesary)
Product(id=10, Name=Snowcompanyary)

I’m going to show, but not go into great depth on, one more detail: plugins can save state. Snowfakery provides a mechanism for tracking context variables between calls that allow you to track current state. So we can have ours count the number of times the snowpunner function has been called and return that count in another function:

from snowfakery import SnowfakeryPlugin
 
class SnowPunnary(SnowfakeryPlugin):
   class Functions:
       def snowpunner(self, word):
           context_vars = self.context.context_vars()
           context_vars.setdefault("count", 0)
           context_vars["count"] += 1
           return 'Snow' + word
 
       def currentCounter(self):
           context_vars = self.context.context_vars()
           return context_vars["count"]

Then update the recipe like this:

- object: Product
  count: 10
  fields:
    Name:
      SnowPunnary.snowpunner: ${{fake.word}}
    Index:
      ${{SnowPunnary.currentCounter()}}

Notice that to call the function without a parameter we use the formula syntax.  Run it again and we see the new index that shows the count:

$ snowfakery snow_product.yml
Product(id=1, Name=1: Snowimpact, Index=1)
Product(id=2, Name=2: Snowfund, Index=2)
Product(id=3, Name=3: Snowdark, Index=3)
Product(id=4, Name=4: Snowteach, Index=4)
Product(id=5, Name=5: Snowteam, Index=5)
Product(id=6, Name=6: Snowsummer, Index=6)
Product(id=7, Name=7: Snownew, Index=7)
Product(id=8, Name=8: Snowperform, Index=8)
Product(id=9, Name=9: Snowonto, Index=9)
Product(id=10, Name=10: Snowmodel, Index=10)

It is important to remember that while it would be possible to add a value to the context variable on each iteration, that would cause Snowfakery to consume more memory on each iteration. Snowfakery is designed to generate records by the hundreds of millions if asked, and does so while consuming very little extra memory – you can do things in context variables that would break down on larger runs.

In part 2, I talk about creating a custom Faker Provider and loading it into a Snowfakery recipe.

SC DUG April 2021 – Getting Started with Electron

This month I gave a talk at South Carolina Drupal User Group on Getting Started with Electron. Electron allows you to use your web developer skills to create desktop applications. I based this talk on some of my recent side projects and the Electron Project Starter I posted the end of last year.

If you would like to join us please check out our up coming events on MeetUp for meeting times, locations, and remote connection information.

We frequently use these presentations to practice new presentations, try out heavily revised versions, and test out new ideas with a friendly audience. So if some of the content of these videos seems a bit rough please understand we are all learning all the time and we are open to constructive feedback. If you want to see a polished version checkout our group members’ talks at camps and cons.

If you are interested in giving a practice talk, leave me a comment here, contact me through Drupal.org, or find me on Drupal Slack. We’re excited to hear new voices and ideas. We want to support the community, and that means you.

Salesforce Electron Starter

Back in August I created an Electron project starter that provides a template to use for electron projects with the goal of outlining how to follow the current best practices for writing secure electron apps. I had extracted that from a couple of personal projects I work on from time to time, one of those projects is ElectronForce – a tool to explore Salesforce orgs.  Because I get ideas of things I want to try out from time to time as Salesforce APIs applications I have now created a derivative project that is setup to create apps that leverage JSForce to interact with Salesforce orgs.

Thus I would like to introduce Electron Salesforce Base

Like my generic project starter it is intended to be a jumping off point that handles some of the basic elements of a project.  It’s a bit more opinionated because comes with a little more plumbing in place – there is a simple interface and it’ll actually log into orgs (assuming you have your security token). 

The interface is built using a Bootstrap dark theme from Bootswatch, and is set up to follow the Airbnb ESLint standards (with a couple small tweaks). The interface generates two windows, one that is meant to be the main interface and includes the controls to log into your org, and a second that is meant to keep a running log of events.

The main thread is fully isolated from the render threads, with all requests and data being passed back and forth using the current inter process communication methodology from Electron leveraging the IPCmain object in the main thread and the contextBridge in the render thread – there is no access to remote in the render thread (actually remote is fully disabled as it should be), and the preload.js file largely serves to filter IPC requests to maintain thread isolation. Currently the main thread of this project isn’t what I would call graceful, and I’m actually working on a refactor for ElectronForce to improve the IPC listener definitions (readers with examples of good design patterns are more than welcome to offer suggestions, whatever I settle on will likely get folded back into this project eventually).

To help understand the general pattern that’s emerging as people get better at sandboxing in Electron (and Electron gets better at demanded it) I find it helpful to think of Electron apps in a client-server model with two largely separate applications and a well defined API for communication between them. You’ll see that reflected here, and in my other recent Electron apps. You can implement whatever you’d like in each layer and just pass messages back and forth. This also means you can totally refactor one part of your application without worrying about the other. So if you hate my proto-interface dump it and build something better.

If you look at the code for this base project, then look at ElectronForce, you will see the render thread provides all the details of the interface – including use of render-friendly libraries like jQuery and a collection of helper functions to make life a little easier – makes an API call (with a filter list provided in preload.js), and then handles responses from the main thread. In main.js you see all the IPC listeners defined, which then pass the needed data to JSForce to make the API calls, before handing back structured data for the interface to render.

As you dig through the code you may notice various @TODO statements that are notes to myself about places with obvious room for improvement. I’m always happy to get suggestions, as comments here, issues there, or as pull requests to help resolve those notes with better solutions.

Generate Sample Data for Salesforce NPSP

Snowfakery is a fairly new tool from the team at Salesforce.org. It is designed to generate any amount of arbitrary relational data you can cook up. The Salesforce Open Source Commons Data Generation Toolkit Project has been starting to work on including it in our work (you can read more about my relationship to these projects on the Attain blog) and as part of that, and as part of my actual day job, I’ve been playing with its recipe language, reviewing the documentation, and generally trying to make sure I can use it well.

Snowfakery itself is excellent, but since it’s new there’s a lot of work left to be done on the documentation. The existing docs focus on the generic ways you could use Snowfakery and outline all the features, but I often need it to create very specific things: data for Salesforce orgs with NPSP installed for our nonprofit clients, and the docs aren’t currently great at getting you started at doing that – this post is intended as partial filler in that gap.

The recipe provided here is based on what I’ve learned from my first real successes getting Snowfakery to work on a real-world project.

It give credit where due my first real break through was when Snowfakery’s creator, Paul Prescod, pointed me to a branch in the repo that had some starter NPSP examples. Paul’s samples are helpful but are incomplete and not super well commented. While my solution leverages his work, it simplifies things and takes some short cuts. My goal is functional and understandable, not perfect.

This recipe creates 30 gifts, 20% from companies, 80% from households. They are spread randomly over a set of preexisting Campaigns, and create the needed Accounts, Contacts, Opportunities, and Payments.  It also assumes you are generally comfortable with the major NPSP objects.

This solution is assembled from three files. The main recipe file snowfakery_npsp_basic_recipe.yml for the objects you want to create, and two layered macro files which are used to generate often repeated objects, npsp_macros.yml, and sf_standard_macros.yml. Both are modified from the versions in the example branch of the repo. The Standard Salesforce objects macro file is derived from this one and the NPSP objects recipe macro file is derived from this one.

The code all of them is in a gist, and embedded below (they are numbered in the gist to control display order, remove numbers before actually using). There are inline comments to explain their details. Once you have local copies of all three files (and have Snowfakery installed) you can generate a JSON file that matches the data described:
$ snowfakery --output-format=JSON --output-file gifts.json snowfakery_npsp_basic_recipe.yml

If you are using CumulusCI for your project, you can have CCI load the data directly into your org:

cci task run generate_and_load_from_yaml -o generator_yaml ./datasets/snowfakery_npsp_basic_recipe.yml -o num_records 30 -o num_records_tablename Account --org my_project_sandbox