so

Building off the merge-base commit

Volume 7, Issue 18; 16 Feb 2023

TIL: It’s possible to persuade GitHub Actions to build off the merge-base commit, but it took me a little while to get there.

The situation is this: you’ve got a public-facing repository that allows anyone to fork it and make a pull request (PR). When someone makes a PR, you want to have your continuous integration (CI) system build and test the code in that PR. If the code in question doesn’t pass the test suite or if the documentation contains markup errors, you want to know that up front.

That’s easy enough to do. I’m using GitHub Actions at the moment, but I’ve done it with other CI tools. Bear in mind, however, that there’s a security issue here. If you accept PRs from the public and run them automatically in your build environment, you’re potentially allowing someone to inject malicous code into your build environment. That might allow them to exfiltrate secret tokens or other private parts of your system.

My approach to mitigating this problem is to checkout the PR, copy the “source code” parts of the PR (and explicitly not the build scripts themselves) to some temporary location, checkout the main branch, copy the source code parts from the PR over that branch, and run the build.

That’s great, I’m running the PR sources against the very latest build system and there’s a substantially lower risk that a malicious PR can run arbitrary code in my build environment.

Good good.

Except.

A few key words in that earlier paragraph are “the very latest build system”. What if there have been build system changesThe real-world example that motivated this effort goes as follows. User forks version 123 of the repository. Version 123 of the build system constructed an image from some data and the documentation refers to that image. Time passes, in version 124 of the repository, the image is removed. Now the source files from the PR forked from version 123 are running on the build system from version 124. The PR doesn’t change anything about the documentation, so the documentation from that PR still contains the image reference. But the image no longer exists which makes the build fail. ☹ that cause the PR to fail? Because the PR build fails, it’s harder to review what’s in the PR. And it’s riskier to accept it. How could we prevent this?

One obvious answer is: tell everyone with an open PR to rebase off the current head of the main branch. Perhaps that’s the right answer, but it isn’t always the most convenient or the quickest.

My idea was to update the build system so that instead of building off the current head of main, it would build off what was the head of main when the PR branch was created.

With a little digging about and a little bit of help from Mastodon, I learned that what I wanted was the merge-base of the main branch and the PR branch. But my first dozen or so attempts to make that work in CI all failed with some version of “fatal: Not a valid commit name” for either the branch name or the commit hash.

Eventually, I worked out that the problem is that when the CI system checks out the PR branch it’s checking out a particular commit so it has a “detached HEAD”. The fix is to checkout the main branch explicitly before attempting to find the merge base. The steps are:

- name: Checkout the pull request
  uses: actions/checkout@v3
  with:
    ref: ${{ github.event.pull_request.head.sha }}
    fetch-depth: 0

- name: Get the merge-base for this PR
  id: get_mb
  run: |
    git checkout main
    mb=$(git merge-base main ${{ github.event.pull_request.head.sha }})
    echo "mbase=$mb" >> $GITHUB_OUTPUT

I’m doing this in check_branch job before the main job so that I can pass the merge branch hash on to other steps.

Then, in the main job, I get the PR, save the source bits, get the main branch at the merge base commit, restore the source bits, and it’s off to the races:

- name: Get the pull request
  uses: actions/checkout@v3
  with:
    ref: ${{ github.event.pull_request.head.sha }}

- name: Save the sources
  run: echo Whatever you need to do here to save the “source” parts

- name: Checkout the merge base branch
  uses: actions/checkout@v3
  with:
    ref: ${{ needs.check_branch.outputs.mbase }}

- name: Restore the sources
  run: echo Whatever you need to do here to restore the “source” parts

The rest of the build follows as before.

I learned some new things, which is cool, and I think this will make the PR builds more reliable.

It’s not perfect, however. Keep in mind that with this change, I’m explicitly building with an older version of the main branch; or a potentially older version anyway. That means any improvements that have been made (new tests or updated stylesheets, for example) won’t be reflected in the PR build.

You can’t, as they say, have your cake and eat it too.

#GitHub #GitHub Actions #TIL