For the last two years I’ve been fortunate to serve as a leader of the Salesforce Open Source Commons Data Generation Toolkit project. That project has produced and inspired a variety of efforts, including Snowfakery and a collection of starter recipes.
This week, along with my colleague Allison Letts, Salesforce’s Paul Prescod (the creator of Snowfakery), and our fellow project contributor Jung Mun, I helped create two new faker providers for Snowfakery:
What These Faker Providers do
The use of Snowfakery is growing. The more we use it, the more we want the data tailored to specific projects. And the more we find places where the Faker project’s providers do not have quite what we want. In particular the project does not (well did not) have providers for nonprofits and education specific data.
The Nonprofit provider currently just provides organization names:
$ snowfakery snowfakery_nonprofit_example.recipe.yml --target-count 10 nonprofit
nonprofit(id=1, nonprofit_name=Eastern Animal Asscociation)
nonprofit(id=2, nonprofit_name=1st Animal Foundation)
nonprofit(id=3, nonprofit_name=Upper Peace Alliance)
nonprofit(id=4, nonprofit_name=Southern Peace Home)
nonprofit(id=5, nonprofit_name=Unity Home)
nonprofit(id=6, nonprofit_name=Western Peace Home)
nonprofit(id=7, nonprofit_name=Upper History Foundation)
nonprofit(id=8, nonprofit_name=Upper Friends Committee)
nonprofit(id=9, nonprofit_name=Eastern Pets Center)
nonprofit(id=10, nonprofit_name=Northern Animal Foundation)
For the Education provider we have a bit more. You can generate college names, departments, and faculty titles.
$ snowfakery snowfakery_edu_example.recipe.yml
Account(id=1, Name=South Carolina University)
Contact(id=1, FirstName=Roberto, LastName=Stanton, Title=Associate Professor of Microbiology & Immunology)
The providers can of course run as a standard faker community provider. Once you are setup with Faker just add the new providers with pip:
pip install faker-edu faker-nonprofit
Then you can use the libraries in your code:
from faker import Faker import faker_edu fake = Faker() fake.add_provider(faker_edu.Provider) for _ in range(10): print(fake.institution_name())
from faker import Faker import faker_nonprofit fake = Faker() fake.add_provider(faker_nonprofit.Provider) for _ in range(10): print(fake.nonprofit_name())
How we got here
A few months ago I posted on how to extend Faker to create nonprofit organization names. Allison took that as a starting point to create a similar project to generate names of colleges, departments, and academic titles (and a greatly improved phone number generator but that’s not included since it’s more general). Both of these projects were good proof-of-concept but were rough around the edges. So this week, with Paul’s guidance and Jung’s input, we contributed the more polished versions to the community.
During a virtual working session on Wednesday we restructured the projects, cleaned up code, added sample recipes for Snowfakery, and published them to PyPi. By publishing these as Faker providers on PyPi, and not as Snowfakery plugins, they are available to a wider audience. By having them owned by a larger open source community we are expecting them to enjoy long-term support.
Both are still just getting started. For example the nonprofit one still just generates organization names, but would benefit from job titles, program names, and more. The EDU provider right now just handles colleges, but is expected to generate other education related data in the future. We also have a plan for a third provider to help improve the diversity of the names generated by Faker to make our fake data more representative of real communities.
The Data Generation Toolkit project has been a great example of what happens when you bring people from a wide variety of backgrounds together to solve technical problems. Like the larger Data Generation Toolkit team Paul, Allison, Jung, and I all have different backgrounds, skills, and experiences. By coming together we are able to help each other find better solutions than any one of us would have found on our own.
It’s been an exciting week, and I’m looking forward to more to come.