Find the official installation docs for v2.23 at teamhephy.com.
Dec 20, 2022 Team Hephy tutorials UPDATE: Jan 2, 2023
In today’s post, I will clarify how Flux users can easily keep their Flux up to date, and perhaps some harder ways that may satisfy a different set of requirements. The main requirement should be that Flux is upgraded, like anything else in GitOps, by merging a Pull Request. There is actually a video of me, merging TWO pull requests!
I wanted to put the videos right at the top, so here:
It’s most important to understand that the pull request is opened automatically. You just merge it, and of course don’t forget to also upgrade your
flux CLI to the new version, now that it’s in use on the cluster. The video goes on and elaborates that you have other options besides this Auto-PR method to upgrade Flux, namely
flux bootstrap which is discussed later on in this article.
However, I think this automatic Pull Request method is the best and most “GitOps-y” way to upgrade Flux, and I like the idea that I can just show up and merge the pull request, and I’m already watching GitHub for new pull requests, so for me, this behavior is great!
There are several different ways to achieve this automatic PR experience. We may not cover all the ways here today. I hope that we will have shown enough different ways to understand how this is GitOps, and over the next few posts I hope we’ll show generally how users can manage software with Flux, whether it is some Helm chart, or your own software package, or even Flux itself.
Today, we show how I am managing Flux itself! We absolutely don’t need to worry about Kubernetes or anything else. Just merge a PR!
Bootstrapping is one of the most fundamental ideas to comprehend in Flux. When a Flux installation is bootstrapped, Flux manages Flux itself via GitOps. That is what is meant by “bootstrapping” – the Flux controllers are provisioned with enough privilege to upgrade themselves without intervention, and also manage an entire cluster as
The fifth principle of GitOps known as the “Closed Loop” is not officially recognized as one of the four principles of GitOps by the neutral standards body known as OpenGitOps, so it is not a strict requirement of GitOps, but it is nonetheless an important idea to understand and observe if one wishes to manage the total GitOps lifecycle without necessarily adding another potentially competing delivery paradigm.
flux bootstrap command is idempotent, so upgrading Flux in the cluster is a simple matter of upgrading the CLI to the newest version, and running
flux bootstrap again with the same parameters to perform an upgrade directly on the main branch.
This simple method is the easiest to understand; however the manual intervention step makes the need to create another periodic admin task. When this method is used, the question of when do users upgrade Flux may have only not-great answers: do we upgrade only whenever we have time, or when we feel like it? Do we just upgrade whenever we notice that Flux has an upgrade, or only when it’s really important, like a CVE? One thing’s for sure, an admin’s GitHub PAT with Repo Admin is required for full
bootstrap, and that makes it prohibitive to repeat this action too frequently.
How can users get to know as soon as possible when there is a new release of Flux available, and then be able to simply upgrade right then and there, by merging a PR that gets generated automatically and with as minimal friction as possible?
You can automate the components manifest update with GitHub Actions and open a PR when there is a new Flux version available. For more details please see the Flux GitHub Action docs and Automate Flux updates.
If owning a new GitHub Action configuration in this way also has you worried, as perhaps that’s sounding like just another thing to manage and scale (per-cluster that runs Flux) then don’t fret: there is an easier way. You can follow the GitHub Action Renovate docs for a self-hosted example (that is also still somewhat complicated to own), or you can use the GitHub-hosted SaaS version of Renovate which has an easy-to-follow onboarding workflow that also works directly from Git, is hosted by GitHub, and is configured and automated via Pull Request!
The prime advantage of Renovate Bot is that it supports more workflows than only upgrading Flux. Renovate will also scan public Helm Repositories based on the definition in your Flux
HelmReleases, and file individual PRs recommending those packages for upgrade whenever a new version becomes available. Renovate is competitive with Dependabot, offers more direct Flux integrations with Flux at the time of this writing.
A disadvantage is, for now at least, Renovate still has incomplete coverage of the ways that Flux users can consume Helm repositories. Once the Renovate Bot adds support for Flux
HelmRepository resources with
spec.type set to
oci, there should be no problem at all. This option has the flexibility to be used self-hosted in offline environments, on GitLab, or other scenarios where GitOps-driven automated upgrades by Pull Request are a desirable add-on for your developer platform.
It should not be under-emphasized that Flux recommends using Git as a Single Source of Truth. There are no surprises when Git is the only source of truth. OCI Repositories as GitOps sources are straightforwardly a surrogate through CI filter, so not inconsistent with this recommendation. We can say the commit stream is managed by CI, and a process which produces an OCI Artifact at the successful completion is a managed delivery experience built on Flux. To further complicate matters, we can also talk about Flux as a Service or the concept of “managed Flux”.
For users of many clusters, or with many Flux installations across different environments, it is a bigger question of how Flux itself is managed at scale. It is a strategic decision to enter into working with another separate management platform such as Terraform, or Azure Arc, or OpenShift, so depending on your environment you may already have access to Flux as a platform native managed service. You may decide that managing Flux outside of Git is contradictory, or it may be attractive to have Flux managed on your behalf, and get upgrades without any work from you. Different operator-based services can offer different guarantees and provide a variety of experiences, we intentionally focus only on those that behave according to the principle of Git as a Single Source of Truth.
Besides Flux managing itself either directly or indirectly, as we have already demonstrated quite handily through GitHub Actions and Renovate, Flux upgrades can also be managed externally from Flux through any other CI process or workflow tool that is Git-aware and can update Git, and this does not present any conflict whatsoever in terms of those principles of GitOps. Flux remains still the agent in charge of applying its own definition, and Kubernetes provides the necessary support so that Flux can safely upgrade itself without issue or interruption.
That would not necessarily require a cluster admin, except that Flux manages custom resource definitions and other resources at a cluster level, a job that cannot easily be separated out from Flux’s own upgrade lifecycle, since Flux uses CRDs itself as well.
Flux is designed to work in a multi-tenant system. There can be scenarios where, for example, Flux being provisioned with all the permissions it needs to upgrade itself, automatically and with out external intervention, is considered an actual risk and security liability, or an obstacle to central management. It is a common argument about the intended design of Flux.
While we might recommend Flux manages Flux itself, this pattern is not commonly acceptable for other software downstream of Flux. Software generally should not manage itself, (unless you are building for the Kubernetes operator pattern, and perhaps even then) so we can recommend a different pattern that also may lead to some new challenges, and a few difficult questions later.
The most common scenario which I have seen replace Flux’s own bootstrap, is using Terraform and Flux together. When clusters are provisioned from a Terraform plan, they are also provisioned with a Flux installation via the Terraform provider for Flux, which does essentially bootstrap Flux in the normal way, but with Terraform. This is done by the admin or PIC (person-in-charge) who, frankly, at this point usually may have been best advised to abandon the terraform plan altogether, saving the state somewhere until the cluster needs destroy.
We don’t usually recommend replaying Terraform plans on a loop in GitOps fashion without careful attention to Kubernetes. It is inadvisable, unless special care is taken to ensure that no plan update accidentally deletes a running Kubernetes cluster as this can create a major incident and many headaches! Sometimes, as stateful workloads are not necessarily accounted by Terraform, and where immutable resources cannot be upgraded, a plan will recommend to delete Kubernetes.
This is a known issue when working with Terraform. So when GitOps’ing your Terraforms, take care to look out for this important detail.
Terraform state basically does not account for the state inside of Kubernetes clusters, unless (… FIXME)
At this point, Flux upgrades can be done via Terraform or not – with any other GitOps-based automation. It is an implementation detail which process goes on managing the Flux infrastructure via GitOps. You can use a Pull-Request based workflow to let Flux manage itself, as Flux has some cluster level permissions to manage the Flux custom resources across the cluster. Or you may have policies which prohibited that cluster-admin level credential from getting a central foothold in your environment, and you manage Fluxes on a per-namespace or per-tenant level.
These implementation details are beyond the scope of this guide to upgrading Flux via Pull Request! More resources are available on the FluxCD website, and within the many links throughout this article.
Refer to the Security Best Practices for guidance about the Additional Best Practices for Shared Cluster Multi-tenancy in Flux’s Security Guide. This is the comprehensive guide to security architecture and guidance around the design of multi-tenant security for Flux.
Flux users may wish to perform upgrades of Flux’s GitOps Toolkit controllers, or other software similarly packaged for distribution on Kubernetes, without intervention. There are some parameters to vary, and the new
OCIRepository feature that Flux introduced ahead of the anticipated GA release of Flux 2.0, enables the broadest coverage and ability to vary those functional parameters.
The parameters which we can vary our disposition are: (1) is there a need for pull request (or can we skip it, and just go fully automated)? And (2) does the automated upgrade warrant any rollback or audit requirements (should we make every upgrade a commit, or is that all noise)?
ImageUpdateAutomation feature permits Flux users to monitor an
ImageRepository for newly published versions, and evaluate whether they are valid for automated upgrades that generate a commit according to an
ImagePolicy definition. This is the most flexible and powerful mechanism in Flux for performing automated upgrades in a pure GitOps fashion, and without requiring any admin intervention.
As soon as a new image version is observed by Flux’s Image Reflector, if the policy determines that it is the newest, then a commit is automatically pushed, where the image reference is updated to the newer version in Git.
OCIRepository source enables Flux’s automation to be used not only for application runtime images, but also now for OCI artifacts that carry the YAML manifests. This gives a degree of convenience and installation safety that was more available to Flux and Helm users, without the requirement to use Helm. Flux provides powerful new degrees of freedom and enables better security with each new feature release!
When the requirement is for fully automated releases with an audit trail, and there is a requirement that any change comes in the form of a commit, for example so that it is able to be rolled back in place whenever there was an error, Flux’s
ImageUpdateAutomation is a great option. Flux requires write access to the Git repository in order to maintain Git as the Single Source of Truth, so that whenever Flux observes a new image for deployment, it updates the
OCIRepository definition directly with a reference to the new latest tag.
In the example below, a simple Semantic Versioning scheme is in use for the tags, but this can easily be configured to a numeric or any other sortable pattern matching structure if the tags are not semver. The podinfo app is deployed from an
OCIRepository, which is kept updated by Flux’s
ImageUpdateAutomation. A kyaml setter marks the location of the image reference, so Flux’s automation can find and update it. The
OCIRepository is recreated as an
ImageUpdateAutomation can observe it through the lens of the
FIXME/TODO: add a code example here
This is the explicit, policy-based version of Flux’s automation. There are a number of scenarios where this might not work, or might require some simple tweaks. One common issue with this policy is that pushing directly to the main branch for automated deployment would require us to permit an exception to the protected branch policy, and that may be undesirable. The easy alternative that makes one small adjustment here is to Auto-PR the changes, since that can be done without circumventing branch protection!
Another option is to skip the commits altogether, and while this is still driven by policy, the directives are not separate from the resource itself. The next section covers how to inline an upgrade policy directly in the spec of GitOps Toolkit source resources like OCI and Git.
FIXME/TODO: out of stamina, (and want to make this section really good)
GitRepository resources, there are SemVer range capabilities in both source kinds (Git, OCI), and while not explicitly called out in the
HelmRepository docs, since Helm’s default is SemVer, users of
HelmRelease have also long had this capability, to select a SemVer range or wildcard in their Helm chart template definition.
So when the inline automation is used, while there is no pull request, and no commit to roll back, the two tests we proposed earlier are both negative. In an environment where there are many releases and very frequent, this can be a better strategy since those commits produce a lot of noise, and developers shouldn’t be rebasing their commits all the time due to automation.
The releases themselves being released will produce an audit log. It is perhaps unnecessary to duplicate that information in a separate audit log, reproduced explicitly for every environment in Git. Both solutions are presented because YMMV, and your specific requirements at any given juncture may dictate whether one solution or the other is more appropriate. Consider both options when you are designing environments.