Serverless Kubernetes offers pod-level elasticity, with benefits like startup within seconds, per-second billing, and a high concurrency of 2,000 pods per minute. This makes it an increasingly popular choice for running Argo. This topic describes how to run Argo workflows on Elastic Container Instance (ECI) using an Alibaba Cloud Container Service for Kubernetes (ACK) cluster.
Set up Kubernetes and Argo
-
Set up an Alibaba Cloud Serverless Kubernetes cluster.
-
(Recommended) Create an ACK Serverless cluster. For more information, see Create an ACK Serverless cluster.
-
Create an ACK managed cluster and deploy the ack-virtual-node controller to create virtual nodes. For more information, see Create an ACK managed cluster and Deploy the virtual node controller and use it to create Elastic Container Instance-based pods.
-
-
Deploy Argo in the Kubernetes cluster.
-
(Recommended) Install the ack-workflow component. For more information, see Argo Workflows.
-
Deploy Argo manually. For more information, see the Argo Quick Start.
-
-
Install the Argo CLI. For more information, see argo-workflows.
Optimize basic resource configurations
By default, after you deploy Argo, resource requests and limits are not specified for the argo-server and workflow-controller core component pods. This assigns the pods a low Quality of Service (QoS) class, making them susceptible to Out of Memory (OOM) kills or pod evictions when cluster resources are insufficient. We recommend that you adjust the resources for these two component pods based on your cluster's scale. As a starting point, set their requests or limits to 2 vCPU and 4 GiB of memory or higher.
Use OSS as an artifact repository
By default, Argo uses MinIO as its artifact repository. In a production environment, the stability of the artifact repository is critical. The ack-workflow component supports using Alibaba Cloud Object Storage Service (OSS) as a durable and reliable artifact repository. For instructions on how to configure OSS as your artifact repository, see Configuring Alibaba Cloud OSS.
After completing the configuration, use the following example to create a workflow and verify the setup.
-
Save the following content as
workflow-oss.yaml.apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: artifact-passing- spec: entrypoint: artifact-example templates: - name: artifact-example steps: - - name: generate-artifact template: whalesay - - name: consume-artifact template: print-message arguments: artifacts: # bind message to the hello-art artifact # generated by the generate-artifact step - name: message from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}" - name: whalesay container: image: docker/whalesay:latest command: [sh, -c] args: ["cowsay hello world | tee /tmp/hello_world.txt"] outputs: artifacts: # generate hello-art artifact from /tmp/hello_world.txt # artifacts can be directories as well as files - name: hello-art path: /tmp/hello_world.txt - name: print-message inputs: artifacts: # unpack the message input artifact # and put it at /tmp/message - name: message path: /tmp/message container: image: alpine:latest command: [sh, -c] args: ["cat /tmp/message"] -
Create the workflow.
argo -n argo submit workflow-oss.yaml -
View the execution result of the workflow.
argo -n argo listExpected output:
s@xxxxxxxxxid:~$ argo -n argo list NAME STATUS AGE DURATION PRIORITY MESSAGE artifact-passing-2746t Succeeded 3m 30s 0
Choose an executor
Each Argo worker pod contains at least two containers:
-
The
maincontainerThis is your application container where your business logic runs.
-
The
waitcontainerArgo injects this system component into the pod as a sidecar. Its core responsibilities are:
-
Startup phase
-
Load artifacts and inputs that the
maincontainer depends on.
-
-
Running phase
-
Wait for the
maincontainer to exit, then kill any associated sidecar containers. -
Collect outputs and artifacts from the
maincontainer and report its status.
-
-
The executor acts as a bridge, allowing the wait container to access and control the main container. Argo abstracts this into the ContainerRuntimeExecutor interface, which defines the following operations:
-
GetFileContents: Gets output parameters (outputs/parameters) from themaincontainer. -
CopyFile: Gets output artifacts (outputs/artifacts) from themaincontainer. -
GetOutputStream: Gets the standard output (including standard error) of themaincontainer. -
Wait: Waits for themaincontainer to exit. -
Kill: Kills associated sidecar containers. -
ListContainerNames: Lists the names of the containers within the pod.
Argo supports multiple executors with different underlying mechanisms, all designed for standard Kubernetes architectures. Because the architecture of Serverless Kubernetes differs from standard Kubernetes, you must choose a compatible executor. We recommend using the Emissary executor for running Argo in a Serverless Kubernetes environment. The following table details the available executors:
|
Executor |
Description |
|
Emissary |
It functions by sharing files through an This executor is recommended because it depends only on the standard |
|
Kubernetes API |
It uses the Kubernetes API, but its functionality is incomplete. Because this executor offers incomplete functionality and can pressure the Kubernetes control plane at scale, it is not recommended. |
|
PNS |
Relies on process namespace (PID) sharing and Serverless Kubernetes enforces a higher level of security isolation and does not support privileged containers. Therefore, this executor is incompatible. |
|
Docker |
Uses the Docker CLI to perform its functions, which requires direct access to the underlying Docker container runtime. Because Serverless Kubernetes does not expose underlying nodes, you cannot access the Docker daemon on the node. Therefore, this executor is incompatible. |
|
Kubelet |
Uses the Kubelet Client API to perform its functions, which requires access to the underlying Kubelet component on the node. Because Serverless Kubernetes does not expose underlying nodes, you cannot access the Kubelet component. Therefore, this executor is incompatible. |
Schedule Argo tasks to ECI
An ACK Serverless cluster automatically schedules all pods to ECI, so no extra configuration is needed. For an ACK managed cluster, you must configure it to schedule pods to ECI. For more information, see Schedule pods to an x86-based virtual node.
The following YAML example demonstrates how to use a label for scheduling:
-
Add the label
alibabacloud.com/eci: "true": This label automatically schedules the pod to ECI. -
(Optional) Specify
{"schedulerName": "eci-scheduler"}: This is recommended. During an upgrade or change of the virtual node, the admission webhook might be briefly unavailable. This setting prevents pods from being scheduled to regular nodes during this temporary unavailability.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: parallelism-limit1-
spec:
entrypoint: parallelism-limit1
parallelism: 10
podSpecPatch: '{"schedulerName": "eci-scheduler"}' # Schedule the pod to ECI.
podMetadata:
labels:
alibabacloud.com/eci: "true" # Use a label to schedule the pod to ECI.
templates:
- name: parallelism-limit1
steps:
- - name: sleep
template: sleep
withSequence:
start: "1"
end: "10"
- name: sleep
container:
image: alpine:latest
command: ["sh", "-c", "sleep 30"]
Improve pod creation success rate
In a production environment, an Argo workflow often involves multiple compute pods. The failure of any single pod can cause the entire workflow to fail. If your workflow success rate is low, you may need to perform multiple reruns, which impacts execution efficiency and increases costs. Therefore, you should adopt strategies to improve the pod creation success rate:
-
In your Argo workflow definition:
-
When creating ECI pods:
-
Configure multiple zones to prevent pod creation failures due to insufficient inventory in a single zone. For more information, see Deploy pods in multiple zones.
-
Specify multiple instance specifications to avoid creation failures due to insufficient inventory of a specific instance type. For more information, see Create pods by specifying multiple specifications.
-
Specify vCPU and memory requirements instead of a specific instance type. ECI automatically matches your request to an available instance specification based on current inventory.
-
Use instance specifications with 2 vCPU and 4 GiB of memory or more. These are enterprise-grade instances with dedicated resources, which ensures stable performance.
-
Configure a pod fault handling policy to define whether to retry pod creation upon failure. For more information, see Configure a fault handling policy for a pod.
-
The following is a sample configuration:
-
Edit the
eci-profileConfigMap to configure multiple zones.kubectl edit -n kube-system cm eci-profileIn the
datasection, configurevSwitchIdswith the IDs of multiple vSwitches:data: # ...other configurations... vSwitchIds: vsw-2ze23nqzig8inprou****,vsw-2ze94pjtfuj9vaymf**** # Specify multiple vSwitch IDs to configure multiple zones. vpcId: vpc-2zeghwzptn5zii0w7**** # ...other configurations... -
Use multiple strategies to improve the success rate when you create a pod.
-
Use the
k8s.aliyun.com/eci-use-specsannotation to specify multiple instance specifications. In this example, three specifications are listed. ECI attempts to match them in order:ecs.c6.large,ecs.c5.large, and then2-4Gi(2 vCPU, 4 GiB memory). -
Use the
k8s.aliyun.com/eci-schedule-strategyannotation to set the multi-zone scheduling strategy. This example usesVSwitchRandom, which schedules pods randomly across the configured zones. -
Configure the
retryStrategyto set the Argo retry policy. This example setsretryPolicy: "Always", which retries all failed steps. -
Use the
k8s.aliyun.com/eci-fail-strategyannotation to set the pod fault handling policy. This example usesfail-fast. If pod creation fails, the system immediately reports an error, and the pod status becomesProviderFailed. The higher-level orchestration system can then decide whether to retry or schedule the pod to a regular node.
apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: parallelism-limit1- spec: entrypoint: parallelism-limit1 parallelism: 10 podSpecPatch: '{"schedulerName": "eci-scheduler"}' podMetadata: labels: alibabacloud.com/eci: "true" annotations: k8s.aliyun.com/eci-use-specs: "ecs.c6.large,ecs.c5.large,2-4Gi" k8s.aliyun.com/eci-schedule-strategy: "VSwitchRandom" k8s.aliyun.com/eci-fail-strategy: "fail-fast" templates: - name: parallelism-limit1 steps: - - name: sleep template: sleep withSequence: start: "1" end: "10" - name: sleep retryStrategy: limit: "3" retryPolicy: "Always" container: image: alpine:latest command: [sh, -c, "sleep 30"] -
Optimize pod costs
ECI supports multiple billing methods. By choosing the right billing method for your workload, you can significantly reduce your compute costs.
For more information on cost optimization methods, see the following topics:
Accelerate pod creation
A pod's startup time is often dominated by the image pull, a process dependent on image size and network speed. To accelerate pod creation, ECI provides an image cache feature. By creating an image cache from an image in advance, you can reduce or eliminate download time for subsequent pods that use the cache.
There are two types of image caches:
-
Automatic creation: This feature is enabled by default in ECI. When you create an ECI pod, if an exact image cache is not found, ECI automatically creates one from the pod's image.
-
Manual creation: You can manually create image caches by using a Custom Resource Definition (CRD).
We recommend that you manually create an image cache before you run high-concurrency Argo tasks. After the image cache is created, you can specify it in your pod definition and set the pod's image pull policy to
IfNotPresent. This allows the pod to skip the image pull step during startup, accelerating pod creation, reducing the runtime of Argo tasks, and lowering operational costs. For more information, see Use ImageCache to accelerate the creation of pods.
If you have already run the preceding examples, ECI has automatically created an image cache for you. You can log on to the Elastic Container Instance console to check the image cache status. You can use the following YAML to create a workflow that leverages the existing cache and test the pod startup speed.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: parallelism-limit1-
spec:
entrypoint: parallelism-limit1
parallelism: 100
podSpecPatch: '{"schedulerName": "eci-scheduler"}'
podMetadata:
labels:
alibabacloud.com/eci: "true"
annotations:
k8s.aliyun.com/eci-use-specs: "ecs.c6.large,ecs.c5.large,2-4Gi"
k8s.aliyun.com/eci-schedule-strategy: "VSwitchRandom"
k8s.aliyun.com/eci-fail-strategy: "fail-fast"
templates:
- name: parallelism-limit1
steps:
- - name: sleep
template: sleep
withSequence:
start: "1"
end: "100"
- name: sleep
retryStrategy:
limit: "3"
retryPolicy: "Always"
container:
imagePullPolicy: IfNotPresent
image: alpine:latest
command: [sh, -c, "sleep 30"]
After the workflow is created, you can check the ECI pod's events to see the ID of the matched image cache. The events also show that the image pull process was skipped during pod startup.
Events:
Type Reason Age From Message
---- ------ --- ---- -------
Normal SuccessfulHitImageCache 2m22s EciService [eci.imagecache]Successfully hit image cache imc-2zeedp8bxor2kcxxx, eci will be scheduled with this image cache.
Normal Pulled 2m12s kubelet Container image "registry.cn-hangzhou.aliyuncs.com/acs/argoexec:v3.3-0d060b7" already present on machine
Normal Created 2m11s kubelet Created container init
Normal Started 2m11s kubelet Started container init
Normal Pulled 2m11s kubelet Container image "registry.cn-hangzhou.aliyuncs.com/acs/argoexec:v3.3-0d060b7" already present on machine
Normal Created 2m11s kubelet Created container wait
Normal Started 2m10s kubelet Started container wait
Normal Pulled 2m10s kubelet Container image "alpine:latest" already present on machine
Normal Created 2m10s kubelet Created container main
Normal Started 2m10s kubelet Started container main
Accelerate data loading
Argo is widely used in AI inference, where tasks often access large datasets. In compute-storage separation architectures, data loading efficiency directly impacts task duration and cost. Concurrent data access from many Argo tasks can create a storage bottleneck. For example, when concurrent Argo tasks load data from OSS and the OSS bucket's bandwidth is saturated, each compute node becomes blocked at the data loading stage. This increases task duration and compute costs while reducing efficiency.
Fluid, a data acceleration service, solves this problem. Before you run a batch computation, you can create and preload a Fluid dataset. This pre-caches data from OSS onto a small number of cache nodes. Then, you can start your concurrent Argo tasks. Argo tasks then read data from the cache nodes instead of directly from OSS. The cache nodes effectively expand the available bandwidth from OSS, improving data loading efficiency for the compute nodes. This approach boosts Argo task performance and reduces running costs. For more information about Fluid, see Overview of Fluid.
The following example demonstrates how to use Fluid to load a 10 GB test file from OSS and calculate its MD5 hash across 100 concurrent tasks.
-
Deploy Fluid.
-
Log on to the ACK console.
-
In the left-side navigation pane, choose Marketplace>Marketplace.
-
Find and click the ack-fluid card.
-
On the ack-fluid page, click Deploy.
-
In the panel that appears, select your target cluster, configure the parameters, and click OK.
After the deployment is complete, you are redirected to the release details page for ack-fluid. If you return to the Helm page, you can see that the status of ack-fluid is
Deployed. You can also run akubectlcommand to verify that Fluid was deployed successfully.~$ kubectl get pod -n fluid-system NAME READY STATUS RESTARTS AGE dataset-controller-6f9967d766-pm22l 1/1 Running 0 5m18s fluid-webhook-5777b78c-8mt4h 1/1 Running 0 5m18s
-
-
Prepare test data.
After deploying Fluid, use a Fluid dataset to accelerate data access. Before you proceed, upload a 10 GB test file to your OSS bucket.
-
Generate a test file.
dd if=/dev/zero of=/test.dat bs=1G count=10 -
Upload the test file to your OSS bucket. For more information, see Simple upload.
-
-
Create an accelerated dataset.
-
Create the Dataset and JindoRuntime resources.
kubectl -n argo apply -f dataset.yamlThe following is an example of the
dataset.yamlfile. Replace the placeholder AccessKey and OSS bucket information with your values.apiVersion: v1 kind: Secret metadata: name: access-key stringData: fs.oss.accessKeyId: *************** # The AccessKey ID that has permissions to access the OSS bucket. fs.oss.accessKeySecret: ****************** # The AccessKey secret that has permissions to access the OSS bucket. --- apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: serverless-data spec: mounts: - mountPoint: oss://oss-bucket-name/ # The path to your OSS bucket. name: demo path: / options: fs.oss.endpoint: oss-cn-shanghai-internal.aliyuncs.com # The endpoint of the OSS bucket. encryptOptions: - name: fs.oss.accessKeyId valueFrom: secretKeyRef: name: access-key key: fs.oss.accessKeyId - name: fs.oss.accessKeySecret valueFrom: secretKeyRef: name: access-key key: fs.oss.accessKeySecret --- apiVersion: data.fluid.io/v1alpha1 kind: JindoRuntime metadata: name: serverless-data spec: replicas: 10 # The number of JindoRuntime cache nodes to create. podMetadata: annotations: k8s.aliyun.com/eci-use-specs: ecs.g6.2xlarge # Specify a suitable instance specification. k8s.aliyun.com/eci-image-cache: "true" labels: alibabacloud.com/eci: "true" worker: podMetadata: annotations: k8s.aliyun.com/eci-use-specs: ecs.g6.2xlarge # Specify a suitable instance specification. tieredstore: levels: - mediumtype: MEM # Cache medium type. Use MEM for memory, or LoadRaid0 if you specify an instance with local disks. volumeType: emptyDir path: /local-storage # Cache path. quota: 12Gi # Maximum cache capacity. high: "0.99" # High watermark for storage capacity. low: "0.99" # Low watermark for storage capacity.NoteThis example uses the memory of ECI pods as data cache nodes. Because each ECI pod has a dedicated VPC network interface, its bandwidth is not affected by other pods.
-
View the results.
-
Check the status of the dataset. A
PHASEofBoundindicates successful creation.kubectl -n argo get datasetExpected output:
$ kubectl -n argo get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE serverless-data 10.00GiB 0.00B 24.00GiB 0.0% Bound 92s -
Check the pod information. You can see that 10 JindoRuntime cache nodes have been created by the dataset.
kubectl -n argo get podsExpected output:
~$ kubectl -n argo get pods NAME READY STATUS RESTARTS AGE ack-workflow-ddd86b88c-r8fcj 1/1 Running 0 100m argo-server-84d69d65dd-1f2hj 1/1 Running 0 100m serverless-data-jindofs-master-0 1/1 Running 0 10m serverless-data-jindofs-worker-0 1/1 Running 0 9m20s serverless-data-jindofs-worker-1 1/1 Running 0 9m19s serverless-data-jindofs-worker-2 1/1 Running 0 9m19s serverless-data-jindofs-worker-3 1/1 Running 0 9m19s serverless-data-jindofs-worker-4 1/1 Running 0 9m19s serverless-data-jindofs-worker-5 1/1 Running 0 9m19s serverless-data-jindofs-worker-6 1/1 Running 0 9m19s serverless-data-jindofs-worker-7 1/1 Running 0 9m19s serverless-data-jindofs-worker-8 1/1 Running 0 9m19s serverless-data-jindofs-worker-9 1/1 Running 0 9m19s
-
-
-
Preload the data.
After the dataset is ready, create a DataLoad resource to trigger data preloading.
-
Create a DataLoad resource to trigger data preloading.
kubectl -n argo apply -f dataload.yamlThe following is an example of the dataload.yaml file:
apiVersion: data.fluid.io/v1alpha1 kind: DataLoad metadata: name: serverless-data-warmup namespace: argo spec: dataset: name: serverless-data namespace: argo loadMetadata: true -
Check the progress of the DataLoad operation.
kubectl -n argo get dataloadThe expected output shows that even though the test file is 10 GB, the data preloading process is very fast.
:~$ kubectl -n argo get dataload NAME DATASET PHASE AGE DURATION serverless-data-warmup serverless-data Complete 30s 14s
-
-
Run the Argo workflow.
After the data is preloaded, you can run the concurrent Argo tasks. For best results, combine this approach with an image cache.
-
Prepare the Argo workflow configuration file,
argo-test.yaml.The following is an example of the
argo-test.yamlfile:apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: parallelism-fluid- spec: entrypoint: parallelism-fluid parallelism: 100 podSpecPatch: '{"terminationGracePeriodSeconds": 0, "schedulerName": "eci-scheduler"}' podMetadata: labels: alibabacloud.com/fluid-sidecar-target: eci alibabacloud.com/eci: "true" annotations: k8s.aliyun.com/eci-use-specs: 8-16Gi templates: - name: parallelism-fluid steps: - - name: domd5sum template: md5sum withSequence: start: "1" end: "100" - name: md5sum container: imagePullPolicy: IfNotPresent image: alpine:latest command: ["sh", "-c", "cp /data/test.dat /test.dat && md5sum test.dat"] volumeMounts: - name: data-vol mountPath: /data volumes: - name: data-vol persistentVolumeClaim: claimName: serverless-data -
Create the workflow.
argo -n argo submit argo-test.yaml -
View the execution result of the workflow.
argo -n argo listExpected output:
xxx i:~$ argo -n argo list NAME STATUS AGE DURATION PRIORITY MESSAGE parallelism-fluid-56g2q Running 8s 8s 0You can use the
kubectl get pod -n argo --watchcommand to monitor the pod execution progress. In this example, the 100 Argo tasks are completed in about 2 to 4 minutes.parallelism-fluid-56g2q-412240702 0/3 Completed 0 3m17s parallelism-fluid-56g2q-563802762 0/3 Completed 0 3m19s parallelism-fluid-56g2q-693297214 0/3 Completed 0 3m17s parallelism-fluid-56g2q-615226358 0/3 Completed 0 3m18s parallelism-fluid-56g2q-982629280 0/3 Completed 0 3m20s parallelism-fluid-56g2q-918428816 0/3 Completed 0 3m16s parallelism-fluid-56g2q-3815880026 0/3 Completed 0 3m18s parallelism-fluid-56g2q-2992875428 0/3 Completed 0 3m19s parallelism-fluid-56g2q-3800105418 0/3 Completed 0 3m19s parallelism-fluid-56g2q-1897482410 0/3 Completed 0 3m17sIn contrast, running the same Argo tasks without data acceleration—loading the 10 GB test file directly from OSS—takes about 14 to 15 minutes to calculate the MD5 hash.
parallelism-fluid-fdr2j-2392572892 0/2 Completed 0 14m parallelism-fluid-fdr2j-1295033972 0/2 Completed 0 14m parallelism-fluid-fdr2j-2462229879 0/2 Completed 0 14m parallelism-fluid-fdr2j-4192350503 0/2 Completed 0 14m parallelism-fluid-fdr2j-4157125527 0/2 Completed 0 14m parallelism-fluid-fdr2j-4173654052 0/2 Completed 0 14m parallelism-fluid-fdr2j-1270167245 0/2 Completed 0 14m parallelism-fluid-fdr2j-1595813521 0/2 Completed 0 14m parallelism-fluid-fdr2j-1829788936 0/2 Completed 0 14mThis comparison shows that Fluid improves computing efficiency and significantly reduces costs.
-