CONFIG.SYS

How k8s finalizers work

In this blog post I take a closer look at how kubernetes finalizers work, starting from a problem which caused me quiet some headache.

The problem

In the project I was working on at the time we used GitOps with fluxcd and helm . When using this deployment technique flux creates a custom resource of type helm release . Flux watches for changes to these resources and renders the helm charts if it changes which in turn results in the reconciliation of the rendered kubernetes objects.
So in order to test if this reconciliation works from scratch, I decided to delete the namespace with kubectl delete namespace my-ns. But the namespace dit not get deleted but was stuck in the state Terminating:

NAME               	   STATUS            AGE
my-ns                  Terminating       10d

Trying to use the flags --grace-period=0 --force dit not help either.
Looking at the status of the namespace it turned out that the namespace was waiting for its finalizers to complete. As described here kubernetes updates every object in the namespace that has a metadata.finalizers field by adding a metadata.deletionTimestamp field with the time the deletion starts.
The controller responsible for these objects should notice this update and try to do cleanup or finalization. If the controller succeeds it should remove the finalizer key from the metadata to indicate that finalization has succeeded. Conclusion: Some objects in the namespace seemed to be stuck in finalization.

Examining the remaining objects

So I took a look at all the objects that remained in that namespace. The only object remaining was the helm release custom resource:

kubectl get helmreleases.helm.toolkit.fluxcd.io -n my-ns
NAME                       AGE    READY   STATUS
my-app                     84d    True    Release reconciliation succeeded

Looking at the details it turned out that the resource still contained a finalizer annotation, indicating that the controller has not yet finished deleting the resource:

metadata:
  finalizers:
  - finalizers.fluxcd.io

Taking a look at the helm controller

Responsible for deleting these custom resources is the helm controller . So I took a look at the source code to find possible reasons for the controller not deleting the helm release. Here is the function responsible for handling resource deletion written in go:

// reconcileDelete deletes the v1beta2.HelmChart of the v2beta1.HelmRelease,
// and uninstalls the Helm release if the resource has not been suspended.
func (r *HelmReleaseReconciler) reconcileDelete(ctx context.Context, hr v2.HelmRelease) (ctrl.Result, error) {
	r.recordReadiness(ctx, hr)

	// Delete the HelmChart that belongs to this resource.
	if err := r.deleteHelmChart(ctx, &hr); err != nil {
		return ctrl.Result{}, err
	}

	// Only uninstall the Helm Release if the resource is not suspended.
	if !hr.Spec.Suspend {
		getter, err := r.buildRESTClientGetter(ctx, hr)
		if err != nil {
			return ctrl.Result{}, err
		}
		run, err := runner.NewRunner(getter, hr.GetStorageNamespace(), ctrl.LoggerFrom(ctx))
		if err != nil {
			return ctrl.Result{}, err
		}
		if err := run.Uninstall(hr); err != nil && !errors.Is(err, driver.ErrReleaseNotFound) {
			return ctrl.Result{}, err
		}
		ctrl.LoggerFrom(ctx).Info("uninstalled Helm release for deleted resource")

	} else {
		ctrl.LoggerFrom(ctx).Info("skipping Helm uninstall for suspended resource")
	}

	// Remove our finalizer from the list and update it.
	controllerutil.RemoveFinalizer(&hr, v2.HelmReleaseFinalizer)
	if err := r.Update(ctx, &hr); err != nil {
		return ctrl.Result{}, err
	}

	return ctrl.Result{}, nil
}

Even though I was not a go developer it became apparent that the helm release would only be deleted if it was suspended first:

	if !hr.Spec.Suspend {
      ...
	} else {
		ctrl.LoggerFrom(ctx).Info("skipping Helm uninstall for suspended resource")
	}

Once more it turned out that being able to read the source code is a very handy skill. The source code also revealed that the logs should have contained a message indicating the reason preventing deletion. Unfortunately I did not have access to these logs.

The solution

The solution now was obvious. The helm release needed to be suspended. There are a couple of ways to to it. First you could use the flux command line tool:

$ flux suspend helmrelease my-app -n my-ns

A second approach would be to patch the helm release:

$ kubectl patch helmreleases.helm.toolkit.fluxcd.io my-app -p \
  '{"spec":{"suspend": true}}'

Same could be done using kubectl edit .
There is however another solution that will work regardless of what the controller does on finalization: Remove the finalize field in the metadata of the object. That would indicate to kubernetes that finalization has finished and it can delete the object.\ Various methods for that can be found in this stackoverflow post .

Conclusion

If a kubernetes resource is stuck in Terminating, pending finalization can be one of the causes. A more rustical solution is to remove the finalizer fileds in the metadata of those resources. In order to find out why finalization does not work a look at the controller managing the resource might yield more information and a more fitting solution for deleting the resource.