Readability of gitlab-ci yaml - Wed, Mar 30, 2022
Readability of gitlab-ci yaml
Expressiveness and readability of gitlab-ci yaml
Creating and maintaining CI/CD processes in a GitLab .gitlab-ci.yml file
for non-trivial pipelines can quite often results in hard to read and understand code.
One of the main issues I see is that yaml’s goals are centered around expressing data while GitLabs ci-yamls main purpose is the definition of processes. So while the data in a .gitlab-ci.yml
is represented fine, the resulting process ia often not obvious when writing these definitions.
GitLab has added a few web based tools for helping to edit these files including a Pipeline editor
, a linter
and the possibility to simulate pipelines
. This helps but does not really make these definitions expressive.
Increase expressiveness with a fluent api
Since I started to pursue the art of writing easy to read and expressive code, I wondered how a more expressive CI/CD process definition would look like. So I took the following example and tried to transform it into an imaginary, more python-stylish, fluent form. Here’s the example in yaml:
build-helm-charts:
stage: build
image:
name: alpine/helm
entrypoint: ["/bin/sh", "-c"]
script:
- helm -f ./test/test_values.yaml template > ./charts.yaml
artifacts:
paths:
- charts.yaml
expire_in: 1 day
tags:
- k8s
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
- if: '$CI_PIPELINE_SOURCE == "schedule"'
If you are familiar with GitLab CI/CD then this example might not look to complicated. Nevertheless here are three points with regards to its expressiveness:
- The definition does not mention some of the main concepts of a CI/CD configuration: Pipelines, Jobs and Runners
- The rules section does not really express what it is for
- It is not obvious what the specific rules in the rules section mean
There are fare more complex definitions out there so you can imagine that things can get much less readable.
So here is the first of two examples in which I tried to make the definition more readable:
project().add().job("build-helm-charts")
.toStage("build")
.runningScript("helm -f ./test/test_values.yaml template > ./charts.yaml")
.In(Image("alpine/helm").withEntrypoint("/bin/sh", "-c"))
.toPipelines(matchingRules(RuleSet(Rule.OnMergeRequest, Rule.OnSchedule)))
.onRunners().taggedWith("k8s")
.storingArtifacts("chart.yaml").until("1 day")
This definition can simply be read from left to right. The mere additions of prepositions like to
and on
as well as using the GitLab terminology makes it more obvious what the process looks like. In addition the rules get much more expressive compared with the if statements on unobvious variable names.
Moving a bit more away from data to programming we could even decouple some of the nested statements like this:
helm = Image("alpine/helm")
image.override_entry_point_with("/bin/sh", "-c")
helm_template = Script("helm -f ./test/test_values.yaml template > ./charts.yaml")
artifacts = [new Artifact(Path("chart.yaml"))]
rules = RuleSet(Rule.OnMergeRequest, Rule.OnSchedule)
project().add().job("build-helm-charts")
.toStage("build")
.runningScript(helm_template).In(helm)
.toPipelines(matchingRules(RuleSet(Rule.OnMergeRequest, Rule.OnSchedule)))
.onRunners().taggedWith("k8s")
.storingArtifacts("chart.yaml").until("1 day")
Although this code is a little longer, the final job definition is more concise and a bit more readable. And although my dream api will probably not make it to a feature request it nevertheless was already inspiring to dream of it ;-)
Conclusion
While .gitlab-ci.yml syntax
is very powerful, complex CI/CD definitions tend to become hard to read and hard to maintain. This is even more true when these definitions make use of the more advanced syntax of yaml like anchors and aliases.
The challenge here is leverage between trying to keep it simple
so that the code remains easy to understand and trying to minimize duplications through abstractions and generic constructs.