在ACS集群中使用GitLab Runner构建CI/CD环境

GitLab Runner是基于Go语言的开源应用。是运行来自GitLab的CI/CD作业的代理。CI任务作为典型的时间周期型算力使用场景，完美契合容器计算服务 ACS（Container Compute Service）按需使用和快速弹性的特性。这不仅可以简化业务的容量规划难度，还能降低整体资源持有成本。同时，借助云的弹性能力，可以显著提升CI作业的并发度。本文介绍如何使用GitLab Runner结合ACS算力特性的生产实践和推荐配置。

背景信息

GitLab Runner是一个开源的、用于执行GitLab CI/CD流水线作业的项目，作为一个可外置的任务执行框架，它提供了多样的执行器如Shell、Kubernetes等，让您可以在私有环境或云上执行具体的CI作业，并将结果反馈回GitLab中。

Gitlab Runner的核心配置分为两部分，一部分是Manager Pod的配置和初始化设置，另一部分是Kubernetes Executor的配置。更多信息，请参见配置极狐 GitLab Runner。

实践路径

本文的全部流程如下：

前提条件

已使用kubectl连接Kubernetes集群。具体操作，请参见获取集群KubeConfig并通过kubectl工具连接集群。

安装步骤

本文将基于17.3.1版本的gitlab-runner进行，对应Chart版本为0.68.1。关于常规gitlab-runner安装步骤，请参见极狐GitLab Runner Helm Chart。

获取GitLab Chart包。

添加GitLab Helm仓库。

helm repo add gitlab https://charts.gitlab.io

更新Chart。
```
helm repo update gitlab
```

获取GitLab。

helm pull gitlab/gitlab-runner --version 0.68.1 && tar zvxf gitlab-runner-0.68.1.tgz

注册Runner。
您需要在GitLab控制台对项目或群组生成Runner Token，并记录Token信息。

初始化设置values.yaml。

展开查看YAML内容

## GitLab Runner Image
##
## By default it's using registry.gitlab.com/gitlab-org/gitlab-runner:alpine-v{VERSION}
## where {VERSION} is taken from Chart.yaml from appVersion field
##
## ref: https://gitlab.com/gitlab-org/gitlab-runner/container_registry/29383?orderBy=NAME&sort=asc&search[]=alpine-v&search[]=
##
## Note: If you change the image to the ubuntu release
##       don't forget to change the securityContext;
##       these images run on different user IDs.
##
...

## The GitLab Server URL (with protocol) that want to register the runner against
## ref: https://docs.gitlab.com/runner/commands/index.html#gitlab-runner-register
##
gitlabUrl: https://jihulab.com/

## DEPRECATED: The Registration Token for adding new Runners to the GitLab Server.
##
## ref: https://docs.gitlab.com/ee/ci/runners/new_creation_workflow.html
##
# runnerRegistrationToken: ""

## The Runner Token for adding new Runners to the GitLab Server. This must
## be retrieved from your GitLab instance. It is the token of an already registered runner.
## ref: (we don't have docs for that yet, but we want to use an existing token)
##
runnerToken: "glrt-t3_sz6xxxxxxxxxDsWF77"
#

## Unregister all runners before termination
##
## Updating the runner's chart version or configuration will cause the runner container
## to be terminated and created again. This may cause your Gitlab instance to reference
## non-existant runners. Un-registering the runner before termination mitigates this issue.
## ref: https://docs.gitlab.com/runner/commands/index.html#gitlab-runner-unregister
##
unregisterRunners: true

## When stopping the runner, give it time to wait for its jobs to terminate.
##
## Updating the runner's chart version or configuration will cause the runner container
## to be terminated with a graceful stop request. terminationGracePeriodSeconds
## instructs Kubernetes to wait long enough for the runner pod to terminate gracefully.
## ref: https://docs.gitlab.com/runner/commands/#signals
terminationGracePeriodSeconds: 3600

## Set the certsSecretName in order to pass custom certficates for GitLab Runner to use.
## Provide resource name for a Kubernetes Secret Object in the same namespace,
## this is used to populate the /home/gitlab-runner/.gitlab-runner/certs/ directory
## ref: https://docs.gitlab.com/runner/configuration/tls-self-signed.html#supported-options-for-self-signed-certificates-targeting-the-gitlab-server
##
# certsSecretName:

## Configure the maximum number of concurrent jobs
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
concurrent: 10

...

## For RBAC support:
rbac:
  ## Specifies whether a Role and RoleBinding should be created
  ## If this value is set to `true`, `serviceAccount.create` should also be set to either `true` or `false`
  ##
  create: true
  ## Define the generated serviceAccountName when create is set to true
  ## It defaults to "gitlab-runner.fullname" if not provided
  ## DEPRECATED: Please use `serviceAccount.name` instead
  generatedServiceAccountName: ""
...

## Configuration for the Pods that the runner launches for each new job
##
runners:
  # runner configuration, where the multi line string is evaluated as a
  # template so you can specify helm values inside of it.
  #
  # tpl: https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function
  # runner configuration: https://docs.gitlab.com/runner/configuration/advanced-configuration.html
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "alpine"

  ## Absolute path for an existing runner configuration file
  ## Can be used alongside "volumes" and "volumeMounts" to use an external config file
  ## Active if runners.config is empty or null
  configPath: ""
...

配置项	配置描述
gitlabUrl	注册Runner的极狐GitLab服务器的完整URL。例如，`https://gitlab.example.com`。
runnerToken	上一步中注册Runner时产生的Token。此Token是每一个Runner的身份标识，Manager通过此Token信息关联由其创建的Pod、Secret等信息。
rbac	是否启用RBAC支持。启用后会自动创建相关ServiceAccount。 `rbac: create: true`
concurrent	配置作业的并发度，默认值是10。Manager Pod本身在管理作业时会有一定的内存开销，如果您的并发度较高，需要适当调大Manager Pod的资源规格。
unregisterRunners	在Manager Pod退出时执行`unregister`，此配置一般用于使用runnerRegistrationToken进行注册的场景，避免每次重新生成token导致的作业失联问题。详细信息，请参见FAQ。
runners.config	Runner运行配置，以多行字符串形式作为runner运行模板，您可以按需调整执行器的相关配置。

安装GitLab Runner到ACS集群。

helm install --namespace default gitlab-runner -f values.yaml --version 0.68.1 gitlab/gitlab-runner

执行以下命令，确认GitLab Runner的运行状态。
```
kubectl get pod | grep gitlab
```
预期输出：
```
gitlab-runner-7c5b4xxxxx-xxxxx   1/1     Running     0          5m17s
```
出现此输出表明安装已经完成。

镜像构建

本节我们将结合ACS的部分特性，分别以Docker-in-Docker和使用Kaniko的方式进行镜像构建。使用到的示例工程，请参见Java demo。

使用Docker-in-Docker模式进行镜像构建

在values.yaml中重新定义Runner配置。

您也可以通过修改同命名空间下名为gitlab-runner的ConfigMap来调整Runner的配置，但是推荐的方式是使用helm upgrade的方式来更新Runner配置。

...
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/docker:27-dind"
        privileged = true
        cpu_limit = 2
        cpu_request = 2
        memory_limit = "4Gi"
        memory_request = "4Gi"
        ephemeral_storage_request = "30Gi"
        ephemeral_storage_limit = "30Gi"
      [[runners.kubernetes.volumes.empty_dir]]
        name = "docker-certs"
        mount_path = "/certs/client"
        medium = "Memory"
      [[runners.feature_flags]]
        FF_USE_POD_ACTIVE_DEADLINE_SECONDS = true
...

配置说明如下。

配置项	配置描述
image	Docker-In-Docker模式的构建容器所使用的镜像。这里采用Docker社区提供的dind镜像。
privileged	配置Runner Pod是否开启特权模式，设置为true。说明本操作需要为ACS Pod开启特权模式，请提交工单开启。
cpu_request cpu_limit	用于设置构建容器的CPU规格。默认ACS提供最小0.25vCPU，您可以根据需要进行调整。
memory_request memory_limit	用于设置构建容器的内存规格。默认情况下ACS提供最小0.5GiB，您可以根据需要进行调整。
ephemeral_storage_request ephemeral_storage_limit	用于设置临时存储空间。默认情况下ACS会提供免费30Gi的存储空间，并且在创建ACS Pod时会使用此空间自动做镜像缓存，加速下一次作业启动。如果您需求更大的存储空间，您可以通过此配置进行调整。
FF_USE_POD_ACTIVE_DEADLINE_SECONDS	是否启用`activeDeadlineSeconds` featureGate。当开启时Pod的`activeDeadlineSeconds`会被设置为Job的超时时间，避免因未知原因作业Pod失联而残留在集群中。

更多配置项，请参见Kubernetes执行器。

创建gitlab-ci文件，配置构建过程。

本示例没有采用通过service容器声明dind容器的方式，而是直接在构建容器中拉起dockerd进程。

image: registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/docker:27-dind

stages:
  - build

variables:
  # When using dind service, you must instruct Docker to talk with
  # the daemon started inside of the service. The daemon is available
  # with a network connection instead of the default
  # /var/run/docker.sock socket.
  DOCKER_HOST: tcp://localhost:2376
  #
  # The 'docker' hostname is the alias of the service container as described at
  # https://docs.gitlab.com/ee/ci/services/#accessing-the-services.
  # If you're using GitLab Runner 12.7 or earlier with the Kubernetes executor and Kubernetes 1.6 or earlier,
  # the variable must be set to tcp://localhost:2376 because of how the
  # Kubernetes executor connects services to the job container
  # DOCKER_HOST: tcp://localhost:2376
  #
  # Specify to Docker where to create the certificates. Docker
  # creates them automatically on boot, and creates
  # `/certs/client` to share between the service and job
  # container, thanks to volume mount from config.toml
  DOCKER_TLS_CERTDIR: "/certs"
  # These are usually specified by the entrypoint, however the
  # Kubernetes executor doesn't run entrypoints
  # https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4125
  DOCKER_TLS_VERIFY: 1
  DOCKER_CERT_PATH: "$DOCKER_TLS_CERTDIR/client"

before_script:
  - echo "before task"
  - sh /usr/local/bin/dockerd-entrypoint.sh &
  - sleep 10s

build_image:
  stage: build
  tags:
    - demo
  script:
    - docker info
    - sleep 1d
    - docker build --network host -t demo:v1.0.0 -f Dockerfile .
    - docker push demo:v1.0.0

步骤说明如下。

步骤	说明
before_scrip	进行dockerd进程的拉起，并等待一段时间完成其初始化。
build_image	配置镜像构建的实际步骤。重要此步骤需要使用host网络模式进行构建，以复用容器网络和外部网络进行通信。

使用Kaniko以非特权模式进行镜像构建

Kaniko是一个开源的工具，它能够在没有Docker的环境中运行，并且不需要开启特权，适合当系统限制了对Docker的访问或需要在Kubernetes中构建镜像时使用。

配置values.yaml。

...
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        ephemeral_storage_request = "30Gi"
        ephemeral_storage_limit = "30Gi"
      [[runners.feature_flags]]
        FF_USE_POD_ACTIVE_DEADLINE_SECONDS = true
...

创建gitlab-ci文件，配置构建过程。

重要

GitLab CI需要依赖Shell执行器进行命令执行，即使用的基础镜像必须可以执行sh命令，因此此处选择使用kaniko executor的debug版本。

stages:
  - build

variables:
  KUBERNETES_POD_LABELS_1: "alibabacloud.com/compute-class=general-purpose"
  KUBERNETES_POD_LABELS_2: "alibabacloud.com/compute-qos=best-effort"

build_image:
  stage: build
  image:
    name: registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/kaniko-executor:v1.21.0-amd64-debug
    entrypoint: [""]
  tags:
    - demo
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

结合ACS的弹性策略降低CI成本

您可以通过以下几种方式使用ACS提供的best-effort实例，进一步降低CI/CD流程中的作业成本。

配置全局或工程维度使用best-effort实例。

建议您全局默认采用best-effort类型，部分特殊场景按需覆盖配置使用其他实例类型。

通过runners配置进行全局配置。

...
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        ...
        pod_labels_overwrite_allowed = ".*" #允许工程内通过variable覆盖Labels
      [[runners.kubernetes.pod_labels]]
      "app" = "acs-gitlab-runner"
      "alibabacloud.com/compute-class" = "general-purpose"
      "alibabacloud.com/compute-qos" = "best-effort" #声明使用best-effort型
...

您也可以在每个仓库下通过gitlab-ci.yml进行按需配置，例如使用Kaniko以非特权模式进行镜像构建中的variables配置。

配置ResourcePolicy，提高弹性确定性。

ResourcePolicy可以提供自定义的调度策略，您可以使用其提供的成本优先策略，对CI作业优先使用best-effort实例。当地域内此实例类型售罄时，可以自动降级至default型实例，确保CI作业的连续可用。

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: rp-demo
  namespace: default
spec:
  selector: # 在selector中标记Pod，表示带有app=stress标签的Pod将遵循此调度策略
    app: acs-gitlab-runner
  units: # 在units中定义调度顺序
  - resource: acs # 优先申请best-effort类型的资源
    podLabels:
      alibabacloud.com/compute-class: general-purpose
      alibabacloud.com/compute-qos: best-effort
  - resource: acs # 前者库存不足时，申请default类型的资源
    podLabels:
      alibabacloud.com/compute-class: general-purpose
      alibabacloud.com/compute-qos: default

FAQ

当Manager Pod重建或重启时，如何处理集群内出现Pod残留？

问题原因

一种可能的情况是，重启或重建Manager Pod后使用了和之前不同的一个RunnerToken进行启动，RunnerToken作为唯一的身份标识信息，当产生变化后，无法继续接管原有的作业Pod。

解决办法

您可以在GitLab控制台重新创建Runner并将RunnerToken持久化到安装的配置文件中。
如果是通过registrationToken在启动期间进行的Runner注册，建议挂载一个Secret，并将首次注册产生的Token信息持久化到Secret中，确保每次重启时优先进行读取操作。
建议开启featureGate FF_USE_POD_ACTIVE_DEADLINE_SECONDS，为每个worker增加TTL，作为备用的资源回收策略。