Semantic versioning? Nah, just break your users
- 4 minutes read - 758 wordsWhat is Semantic Versioning and why is it useful? At a very high level, Semantic Versioning is a way of versioning things to indicate whether a change is breaking, additive, or a fix for something. It is a version number split into 3 chunks, Major, Minor and Patch. It can be summarised as:
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes
Semantic Versioning (SemVer) is widely accepted as being a good way of versioning things. It helps users plan upgrades and understand how much effort is involved. I like SemVer, and I wish everyone would use it, but alas, they do not.
Why am I writing about this now?
On 19th August 2024, GitHub announced breaking changes in one of their actions, the action in question here is the upload-artifact action. The announcement gave 2 weeks notice of the planned breaking change. It’s great that GitHub published this on their blog, but from what I can tell, that was the only communication around this. There were no warnings in the build up to the breaking change in the runner, nor was there any communication to maintainers instructing them of this change. None of this would really matter, had there been a major version update.
Like a lot of people, we pin to major versions for our GitHub actions (we don’t expect them to break, so let’s stay up to date). This was our downfall.
The breaking change
The breaking change went live on 2nd September 2024, and it took us by surprise. We deploy some software to Azure functions using Pulumi, meaning our pipeline looks something like this:
Build .NET Solution -> Test .NET Solution -> Publish .NET Solution -> Upload Artifacts -> Download Artifacts -> Run Pulumi and deploy to Azure
When we had a routine release yesterday, we immediately began to observe the release had failed, and we were getting errors. This didn’t make much sense to us, as the release in question was small, with nothing that should fundamentally break our system.
Then began the head scratching. All of our function triggers were missing, despite the deployment being “successful”. The error from Azure was:
Unable to load startup extension ‘Startup’ (Type: ‘Microsoft.Azure.WebJobs.Extensions.FunctionMetadataLoader.Startup, Microsoft.Azure.WebJobs.Extensions.FunctionMetadataLoader, Version=1.0.0.0, Culture=neutral, PublicKeyToken=551316b6919f366c’). The type does not exist. Please validate the type and assembly names.
This made even less sense, the underlying infrastructure was completely unchanged, the build was unchanged, the deployment was unchanged. Why has this suddenly started to break? After some searching around, we discovered this likely indicated that the .azurefunctions directory was missing from the deployed code. Strange.
We then doubted everything, and ran the same build steps locally to reproduce the output (I’m a firm believer of being able to replicate pipeline steps locally for diagnosing issues like this). So, we ran the build script and inspected the output, and sure enough, the .azurefunctions directory was present. Something environmentally must have changed.
The next thing to investigate was the artifacts being produced by the pipeline in GitHub actions, luckily, it’s really easy to download them from a failed run. We downloaded the artifacts and inspected them, and sure enough, the .azurefunctions directory was missing. Finally, we had something concrete to go on. After a quick search with “missing .azurefunctions in zip”, I eventually found this GitHub issue which detailed the breaking change, coupled with quite a few people unhappy about it. We were able to follow the instructions and make sure the .azurefunctions directory was included, and our software was up and running again.
What could have been done differently?
The argument here would be we should be pinning to specific versions of the actions, and not major versions, and I agree with this, mostly. When working with npm packages, I am especially careful to pin versions, it’s quite a volatile ecosystem and I don’t trust the majority of packages to not break. However, I didn’t expect GitHub to break their actions, without publishing a new major version, perhaps in future I won’t be so trusting.
The change itself is something that I actually think is quite sensible, and the reasons for it make sense. A lot of tooling puts credentials in a .tool/credentials file, so mitigating the security breach by automatically excluding hidden directories/files is a good idea, but there are better ways to make changes like this, and better ways of communicating out breaking changes if they are necessary.