在ACS集群中使用GitLab Runner构建CI/CD环境

GitLab Runner是基于Go语言的开源应用。是运行来自GitLab的CI/CD作业的代理。CI任务作为典型的时间周期型算力使用场景,完美契合容器计算服务 ACS(Container Compute Service)按需使用和快速弹性的特性。这不仅可以简化业务的容量规划难度,还能降低整体资源持有成本。同时,借助云的弹性能力,可以显著提升CI作业的并发度。本文介绍如何使用GitLab Runner结合ACS算力特性的生产实践和推荐配置。

背景信息

GitLab Runner是一个开源的、用于执行GitLab CI/CD流水线作业的项目,作为一个可外置的任务执行框架,它提供了多样的执行器如Shell、Kubernetes等,让您可以在私有环境或云上执行具体的CI作业,并将结果反馈回GitLab中。

Gitlab Runner的核心配置分为两部分,一部分是Manager Pod的配置和初始化设置,另一部分是Kubernetes Executor的配置。更多信息,请参见配置极狐 GitLab Runner

实践路径

本文的全部流程如下:

image

前提条件

已使用kubectl连接Kubernetes集群。具体操作,请参见获取集群KubeConfig并通过kubectl工具连接集群

安装步骤

本文将基于17.3.1版本的gitlab-runner进行,对应Chart版本为0.68.1。关于常规gitlab-runner安装步骤,请参见极狐GitLab Runner Helm Chart

  1. 获取GitLab Chart包。

    1. 添加GitLab Helm仓库。

      helm repo add gitlab https://charts.gitlab.io
    2. 更新Chart。

      helm repo update gitlab
    3. 获取GitLab。

      helm pull gitlab/gitlab-runner --version 0.68.1 && tar zvxf gitlab-runner-0.68.1.tgz
  2. 注册Runner。

    您需要在GitLab控制台对项目或群组生成Runner Token,并记录Token信息。

  3. 初始化设置values.yaml。

    展开查看YAML内容

    ## GitLab Runner Image
    ##
    ## By default it's using registry.gitlab.com/gitlab-org/gitlab-runner:alpine-v{VERSION}
    ## where {VERSION} is taken from Chart.yaml from appVersion field
    ##
    ## ref: https://gitlab.com/gitlab-org/gitlab-runner/container_registry/29383?orderBy=NAME&sort=asc&search[]=alpine-v&search[]=
    ##
    ## Note: If you change the image to the ubuntu release
    ##       don't forget to change the securityContext;
    ##       these images run on different user IDs.
    ##
    ...
    
    ## The GitLab Server URL (with protocol) that want to register the runner against
    ## ref: https://docs.gitlab.com/runner/commands/index.html#gitlab-runner-register
    ##
    gitlabUrl: https://jihulab.com/
    
    ## DEPRECATED: The Registration Token for adding new Runners to the GitLab Server.
    ##
    ## ref: https://docs.gitlab.com/ee/ci/runners/new_creation_workflow.html
    ##
    # runnerRegistrationToken: ""
    
    ## The Runner Token for adding new Runners to the GitLab Server. This must
    ## be retrieved from your GitLab instance. It is the token of an already registered runner.
    ## ref: (we don't have docs for that yet, but we want to use an existing token)
    ##
    runnerToken: "glrt-t3_sz6xxxxxxxxxDsWF77"
    #
    
    ## Unregister all runners before termination
    ##
    ## Updating the runner's chart version or configuration will cause the runner container
    ## to be terminated and created again. This may cause your Gitlab instance to reference
    ## non-existant runners. Un-registering the runner before termination mitigates this issue.
    ## ref: https://docs.gitlab.com/runner/commands/index.html#gitlab-runner-unregister
    ##
    unregisterRunners: true
    
    ## When stopping the runner, give it time to wait for its jobs to terminate.
    ##
    ## Updating the runner's chart version or configuration will cause the runner container
    ## to be terminated with a graceful stop request. terminationGracePeriodSeconds
    ## instructs Kubernetes to wait long enough for the runner pod to terminate gracefully.
    ## ref: https://docs.gitlab.com/runner/commands/#signals
    terminationGracePeriodSeconds: 3600
    
    ## Set the certsSecretName in order to pass custom certficates for GitLab Runner to use.
    ## Provide resource name for a Kubernetes Secret Object in the same namespace,
    ## this is used to populate the /home/gitlab-runner/.gitlab-runner/certs/ directory
    ## ref: https://docs.gitlab.com/runner/configuration/tls-self-signed.html#supported-options-for-self-signed-certificates-targeting-the-gitlab-server
    ##
    # certsSecretName:
    
    ## Configure the maximum number of concurrent jobs
    ## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
    ##
    concurrent: 10
    
    ...
    
    ## For RBAC support:
    rbac:
      ## Specifies whether a Role and RoleBinding should be created
      ## If this value is set to `true`, `serviceAccount.create` should also be set to either `true` or `false`
      ##
      create: true
      ## Define the generated serviceAccountName when create is set to true
      ## It defaults to "gitlab-runner.fullname" if not provided
      ## DEPRECATED: Please use `serviceAccount.name` instead
      generatedServiceAccountName: ""
    ...
    
    ## Configuration for the Pods that the runner launches for each new job
    ##
    runners:
      # runner configuration, where the multi line string is evaluated as a
      # template so you can specify helm values inside of it.
      #
      # tpl: https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function
      # runner configuration: https://docs.gitlab.com/runner/configuration/advanced-configuration.html
      config: |
        [[runners]]
          [runners.kubernetes]
            namespace = "{{.Release.Namespace}}"
            image = "alpine"
    
      ## Absolute path for an existing runner configuration file
      ## Can be used alongside "volumes" and "volumeMounts" to use an external config file
      ## Active if runners.config is empty or null
      configPath: ""
    ...

    配置项

    配置描述

    gitlabUrl

    注册Runner的极狐GitLab服务器的完整URL。例如,https://gitlab.example.com

    runnerToken

    上一步中注册Runner时产生的Token。此Token是每一个Runner的身份标识,Manager通过此Token信息关联由其创建的Pod、Secret等信息。

    rbac

    是否启用RBAC支持。启用后会自动创建相关ServiceAccount。

    rbac:
      create: true

    concurrent

    配置作业的并发度,默认值是10。Manager Pod本身在管理作业时会有一定的内存开销,如果您的并发度较高,需要适当调大Manager Pod的资源规格。

    unregisterRunners

    在Manager Pod退出时执行unregister,此配置一般用于使用runnerRegistrationToken进行注册的场景,避免每次重新生成token导致的作业失联问题。详细信息,请参见FAQ

    runners.config

    Runner运行配置,以多行字符串形式作为runner运行模板,您可以按需调整执行器的相关配置。

  4. 安装GitLab Runner到ACS集群。

    helm install --namespace default gitlab-runner -f values.yaml --version 0.68.1 gitlab/gitlab-runner
  5. 执行以下命令,确认GitLab Runner的运行状态。

    kubectl get pod | grep gitlab

    预期输出:

    gitlab-runner-7c5b4xxxxx-xxxxx   1/1     Running     0          5m17s

    出现此输出表明安装已经完成。

镜像构建

本节我们将结合ACS的部分特性,分别以Docker-in-Docker和使用Kaniko的方式进行镜像构建。使用到的示例工程,请参见Java demo

使用Docker-in-Docker模式进行镜像构建

  1. 在values.yaml中重新定义Runner配置。

    您也可以通过修改同命名空间下名为gitlab-runner的ConfigMap来调整Runner的配置,但是推荐的方式是使用helm upgrade的方式来更新Runner配置。
    ...
    runners:
      config: |
        [[runners]]
          [runners.kubernetes]
            namespace = "{{.Release.Namespace}}"
            image = "registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/docker:27-dind"
            privileged = true
            cpu_limit = 2
            cpu_request = 2
            memory_limit = "4Gi"
            memory_request = "4Gi"
            ephemeral_storage_request = "30Gi"
            ephemeral_storage_limit = "30Gi"
          [[runners.kubernetes.volumes.empty_dir]]
            name = "docker-certs"
            mount_path = "/certs/client"
            medium = "Memory"
          [[runners.feature_flags]]
            FF_USE_POD_ACTIVE_DEADLINE_SECONDS = true
    ...        

    配置说明如下。

    配置项

    配置描述

    image

    Docker-In-Docker模式的构建容器所使用的镜像。采用Docker社区提供的dind镜像。

    privileged

    配置Runner Pod是否开启特权模式,设置为true。

    说明

    本操作需要为ACS Pod开启特权模式,请提交工单开启。

    cpu_request

    cpu_limit

    用于设置构建容器的CPU规格。默认ACS提供最小0.25vCPU,您可以根据需要进行调整。

    memory_request

    memory_limit

    用于设置构建容器的内存规格。默认情况下ACS提供最小0.5GiB,您可以根据需要进行调整。

    ephemeral_storage_request

    ephemeral_storage_limit

    用于设置临时存储空间。默认情况下ACS会提供免费30Gi的存储空间,并且在创建ACS Pod时会使用此空间自动做镜像缓存,加速下一次作业启动。如果您需求更大的存储空间,您可以通过此配置进行调整。

    FF_USE_POD_ACTIVE_DEADLINE_SECONDS

    是否启用activeDeadlineSeconds featureGate。当开启时Pod的activeDeadlineSeconds会被设置为Job的超时时间,避免因未知原因作业Pod失联而残留在集群中。

    更多配置项,请参见Kubernetes执行器

  2. 创建gitlab-ci文件,配置构建过程。

    本示例没有采用通过service容器声明dind容器的方式,而是直接在构建容器中拉起dockerd进程。示例工程的详细信息,请参见示例工程。

    image: registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/docker:27-dind
    
    stages:
      - build
    
    variables:
      # When using dind service, you must instruct Docker to talk with
      # the daemon started inside of the service. The daemon is available
      # with a network connection instead of the default
      # /var/run/docker.sock socket.
      DOCKER_HOST: tcp://localhost:2376
      #
      # The 'docker' hostname is the alias of the service container as described at
      # https://docs.gitlab.com/ee/ci/services/#accessing-the-services.
      # If you're using GitLab Runner 12.7 or earlier with the Kubernetes executor and Kubernetes 1.6 or earlier,
      # the variable must be set to tcp://localhost:2376 because of how the
      # Kubernetes executor connects services to the job container
      # DOCKER_HOST: tcp://localhost:2376
      #
      # Specify to Docker where to create the certificates. Docker
      # creates them automatically on boot, and creates
      # `/certs/client` to share between the service and job
      # container, thanks to volume mount from config.toml
      DOCKER_TLS_CERTDIR: "/certs"
      # These are usually specified by the entrypoint, however the
      # Kubernetes executor doesn't run entrypoints
      # https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4125
      DOCKER_TLS_VERIFY: 1
      DOCKER_CERT_PATH: "$DOCKER_TLS_CERTDIR/client"
    
    before_script:
      - echo "before task"
      - sh /usr/local/bin/dockerd-entrypoint.sh &
      - sleep 10s
    
    build_image:
      stage: build
      tags:
        - demo
      script:
        - docker info
        - sleep 1d
        - docker build --network host -t demo:v1.0.0 -f Dockerfile .
        - docker push demo:v1.0.0

    步骤说明如下。

    步骤

    说明

    before_scrip

    进行dockerd进程的拉起,并等待一段时间完成其初始化。

    build_image

    配置镜像构建的实际步骤。

    重要

    此步骤需要使用host网络模式进行构建,以复用容器网络和外部网络进行通信。

使用Kaniko以非特权模式进行镜像构建

Kaniko是一个开源的工具,它能够在没有Docker的环境中运行,并且不需要开启特权,适合当系统限制了对Docker的访问或需要在Kubernetes中构建镜像时使用。

  1. 配置values.yaml。

    ...
    runners:
      config: |
        [[runners]]
          [runners.kubernetes]
            namespace = "{{.Release.Namespace}}"
            ephemeral_storage_request = "30Gi"
            ephemeral_storage_limit = "30Gi"
          [[runners.feature_flags]]
            FF_USE_POD_ACTIVE_DEADLINE_SECONDS = true
    ...        
  2. 创建gitlab-ci文件,配置构建过程。

    重要

    GitLab CI需要依赖Shell执行器进行命令执行,即使用的基础镜像必须可以执行sh命令,因此此处选择使用kaniko executor的debug版本。

    stages:
      - build
    
    variables:
      KUBERNETES_POD_LABELS_1: "alibabacloud.com/compute-class=general-purpose"
      KUBERNETES_POD_LABELS_2: "alibabacloud.com/compute-qos=best-effort"
    
    build_image:
      stage: build
      image:
        name: registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/kaniko-executor:v1.21.0-amd64-debug
        entrypoint: [""]
      tags:
        - demo
      script:
        - /kaniko/executor
          --context "${CI_PROJECT_DIR}"
          --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
          --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

结合ACS的弹性策略降低CI成本

您可以通过以下几种方式使用ACS提供的best-effort实例,进一步降低CI/CD流程中的作业成本。

  1. 配置全局或工程维度使用best-effort实例。

    建议您全局默认采用best-effort类型,部分特殊场景按需覆盖配置使用其他实例类型。
    1. 通过runners配置进行全局配置。

      ...
      runners:
        config: |
          [[runners]]
            [runners.kubernetes]
              ...
              pod_labels_overwrite_allowed = ".*" #允许工程内通过variable覆盖Labels
            [[runners.kubernetes.pod_labels]]
            "app" = "acs-gitlab-runner"
            "alibabacloud.com/compute-class" = "general-purpose"
            "alibabacloud.com/compute-qos" = "best-effort" #声明使用best-effort型
      ...      
    2. 您也可以在每个仓库下通过gitlab-ci.yml进行按需配置,例如使用Kaniko以非特权模式进行镜像构建中的variables配置。

  2. 配置ResourcePolicy,提高弹性确定性。

    ResourcePolicy可以提供自定义的调度策略,您可以使用其提供的成本优先策略,对CI作业优先使用best-effort实例。当地域内此实例类型售罄时,可以自动降级至default型实例,确保CI作业的连续可用。

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: rp-demo
      namespace: default
    spec:
      selector: # 在selector中标记Pod,表示带有app=stress标签的Pod将遵循此调度策略
        app: acs-gitlab-runner
      units: # 在units中定义调度顺序
      - resource: acs # 优先申请best-effort类型的资源
        podLabels:
          alibabacloud.com/compute-class: general-purpose
          alibabacloud.com/compute-qos: best-effort
      - resource: acs # 前者库存不足时,申请default类型的资源
        podLabels:
          alibabacloud.com/compute-class: general-purpose
          alibabacloud.com/compute-qos: default

FAQ

当Manager Pod重建或重启时,如何处理集群内出现Pod残留?

问题原因

一种可能的情况是,重启或重建Manager Pod后使用了和之前不同的一个RunnerToken进行启动,RunnerToken作为唯一的身份标识信息,当产生变化后,无法继续接管原有的作业Pod。

解决办法

  • 您可以在GitLab控制台重新创建Runner并将RunnerToken持久化到安装的配置文件中。

  • 如果是通过registrationToken在启动期间进行的Runner注册,建议挂载一个Secret,并将首次注册产生的Token信息持久化到Secret中,确保每次重启时优先进行读取操作。

  • 建议开启featureGate FF_USE_POD_ACTIVE_DEADLINE_SECONDS,为每个worker增加TTL,作为备用的资源回收策略。