How and why to use CumulusCI to define a standard retrieval process to get changes from an org into source control.
The main focuses of CumulusCI are to automate the creation and configuration of orgs for different stages of the product lifecycle and to automate build and release processes. One area I always wanted to explore more was having CumulusCI be more opinionated over how you get changes from an org into version control.
The current agnostic approach to retrieving metadata was intentional. We supported developer who were very opinionated about their daily development tooling and IDE. We built CumulusCI to let them use CumulusCI to create fully configured scratch orgs, then use whatever tooling they wanted (VS Code, Salesforce CLI, IlluminatedCloud, etc) to retrieve and commit to version control.
But maybe that was the wrong approach? Maybe projects should be more prescriptive about how things are retrieved from a development environment. In this post, I’ll show how to use the built in functionality in CumulusCI to define custom retrieve tasks and flows that retrieve metadata, modify it, and place it in the right location in the repository. Since it’s built using CumulusCI tasks and flows, this custom retrieve process is portable and repeatable to anyone working on the product.
There are three main categories of needs that make a compelling case for a project to define a consistent process for retrieving changes from an org into source control:
There are a number of reasons it makes sense to be able to split your files into different directories to handle different use cases. Even if you don’t need that complexity today, it’s quite likely you’ll encounter the need at some point down the road.
Managing modular file structures becomes more difficult the more modules you have. If everyone is free to retrieve work to version control however they want, how does every stay in sync about what goes where as the project evolves?
Building a consistent retrieve process is ideal for modular file structures because the retrieve process can be configured to put things in the right places. No more option flags, menu diving, etc to get changes committed to source control!
Salesforce does some funny things to some metadata types, many of which are not desirable for source control based development. This is especially true when doing composable development across many different orgs.
For example, retrieved record types include <picklistValues> elements for every picklist field on the object and by default list every picklist value from each field that was in the org where the metadata was retrieved. If you try to then deploy that record type to another org that has different picklist values, it will fail to deploy. In most cases I’ve seen, the intent was never to filter picklist values for the record type. The full list of every picklist value was just an artifact of the platform.
With a consistent retrieval workflow, you can automate manipulating record type XML as it’s retrieved to strip out <picklistValues> elements on picklist fields whose values you don’t intend to restrict. This simple change makes a record type that is far easier to deploy without errors to any org.
There are countless other examples of potential manipulation of incoming changes from an org. You could run prettier on metadata, tokenize references to usernames, etc.
This may be a bit controversial to fierce advocates for developer autonomy, but I’m coming around to the belief that developer productivity is enhanced by having consistent processes for getting changes into source control.
How many troubleshooting calls have you been on amongst developers over merge conflicts, bugs, or other issues outside of normal planned work that ultimately came down to differences about how things got into version control?
I’d argue that in this instance, a standard retrieval process provides direct benefits to developer productivity. Less to remember, less to type, and less to make an all too human error.
One of the biggest challenges the Salesforce ecosystem faces over the coming years is how to truly bring both code based developers, declarative developers, and admins together as full participants in the product lifecycle. I’m growing to believe that having a standard retrieval process is an essential component to achieving that vision.
If it were easy to get non-code stakeholders an environment where they could work their declarative magic then press a button to get everything they did into source control, collaboration between code and low-code becomes much easier. Building new tools and user experiences becomes much easier too.
Alright, hopefully I’ve convinced you of the potential value of a standard retrieval process for your project. Now, how do you go about doing it?
Let’s consider a typical CumulusCI managed package project which contains the package source as well as collections of pre and post install metadata and a dataset. You’d have a directory structure like:
force-app/ # Package Source
unpackaged/pre/record-types # Metadata deployed before package source
unpackaged/post/common # Common post-install global configuration
unpackaged/config/analytics # Optional reports and dashboards pack
CumulusCI will automatically deploy that entire structure in the right order for you by default when creating org. But what happens when you edit the record types, global configuration, and reports and dashboards after that?
CumulusCI has two sets of tasks that can be used to retrieve metadata from an org, shown with command line examples of how to retrieve record types into unpackaged/pre/record-types
cci task run retrieve_unpackaged --package-xml path/to/package.xml --path unpackaged/pre/record-types
cci task run retrieve_packaged --package-name RecordTypePackage --path unpackaged/pre/record-types
cci task run list_changes --include RecordType: --path unpackaged/pre/record-types
cci task run list_changes --include RecordType: --path unpackaged/pre/record-types
You could use these commands to do retrieves to the right places, but you have to always remember the right command, the right metadata filter, and the right path every time you retrieve.
CumulusCI makes it easy to use simple yaml syntax to define new custom tasks for your project with a friendly name, description, task group, and default options baked into the project’s cumulusci.yml stored in source control.
The following added to the project’s cumulusci.yml file creates new tasks to list and retrieve source tracked changes to the entire directory structure.
With those custom tasks defined in cumulusci.yml, the cci task list command will show a new Retrieval command group at the end with all the commands to list and retrieve changes in each module’s metadata:
Now for example, the command to fetch all the changes to reports and dashboards in the analytics package is simple: cci task run retrieve_config_analytics and easily discoverable by anyone on the project.
But, retrieving still means running 4 different commands to retrieve the pre, packaged, post, and optional analytics modules.
Orchestration is important. Doing something in the wrong order or forgetting an option flag can introduce unnecessary errors. Let’s build two CumulusCI flows to make it easier to list and retrieve all the metadata across all the modules.
The following added to cumulusci.yml will create the list and retrieve flows reusing the custom tasks:
Now, you should see the flows in cci flow list:
Just like the tasks, anyone using CumulusCI can now discover these flows nicely grouped at the bottom of the list of flows. They can also inspect the flow with cci flow info retrieve_all_changes:
Now, we could do development across our different modules and retrieve all of it with a single command: cci flow run retrieve_all_changes.
Now that we have a common pipeline for all retrieves that sends files to the right location, we could start applying logic to do cleanups like removing picklist values from the retrieved record types.
For example, you could use CumulusCI’s built-in remove_metadata_xml_elements task to automate that XML manipulation. You could use capture_sample_data to capture a complex relational dataset from the org into source control.
The key is getting everything flowing through common paths so you have a common place to inject any logic you need.
While not for everyone, building a standard automated process for retrieving changes from an org into source control offers many benefits to Salesforce development teams in terms of flexibility, productivity and ease of use, and opening the door to bringing all product stakeholders into the development process.
CumulusCI provides the necessary framework components to build a robust retrieval process that can be iteratively improved and expanded through source control.