本文主要为您提供AGS帮助示例。
前提条件
- 您已成功创建一个Kubernetes集群。参见创建Kubernetes托管版集群。
- 您已连接到Kubernetes集群,参见通过kubectl连接Kubernetes集群。
Log
List
您可以通过--limit参数选择查看的Workflow条目数。
[root@iZwz92q9h36kv8posr0i6uZ ~]# ags remote list --limit 8
+-----------------------+-------------------------------+------------+
| JOB NAME | CREATE TIME | JOB STATUS |
+-----------------------+-------------------------------+------------+
| merge-6qk46 | 2020-09-02 16:52:34 +0000 UTC | Pending |
| rna-mapping-gpu-ck4cl | 2020-09-02 14:47:57 +0000 UTC | Succeeded |
| wgs-gpu-n5f5s | 2020-09-02 13:14:14 +0000 UTC | Running |
| merge-5zjhv | 2020-09-02 12:03:11 +0000 UTC | Succeeded |
| merge-jjcw4 | 2020-09-02 10:44:51 +0000 UTC | Succeeded |
| wgs-gpu-nvxr2 | 2020-09-01 22:18:44 +0000 UTC | Succeeded |
| merge-4vg42 | 2020-09-01 20:52:13 +0000 UTC | Succeeded |
| rna-mapping-gpu-2ss6n | 2020-09-01 20:34:45 +0000 UTC | Succeeded |
集成kubectl命令
[root@iZwz92q9h36kv8posr0i6uZ ~]# ags get test-v2
Name: test-v2
Namespace: default
ServiceAccount: default
Status: Running
Created: Thu Nov 22 11:06:52 +0800 (2 minutes ago)
Started: Thu Nov 22 11:06:52 +0800 (2 minutes ago)
Duration: 2 minutes 46 seconds
STEP PODNAME DURATION MESSAGE
● test-v2
└---● bcl2fq test-v2-2716811808 2m
[root@iZwz92q9h36kv8posr0i6uZ ~]# ags kubectl describe pod test-v2-2716811808
Name: test-v2-2716811808
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: cn-shenzhen.i-wz9gwobtqrbjgfnqxl1k/192.168.0.94
Start Time: Thu, 22 Nov 2018 11:06:52 +0800
Labels: workflows.argoproj.io/completed=false
workflows.argoproj.io/workflow=test-v2
Annotations: workflows.argoproj.io/node-name=test-v2[0].bcl2fq
workflows.argoproj.io/template={"name":"bcl2fq","inputs":{},"outputs":{},"metadata":{},"container":{"name":"main","image":"registry.cn-hangzhou.aliyuncs.com/dahu/curl-jp:1.2","command":["sh","-c"],"ar...
Status: Running
IP: 172.16.*.***
Controlled By: Workflow/test-v2
通过使用ags kubectl命令,可以查看到describe pod的状态信息,所有kubectl原生命令AGS均支持。
集成ossutil命令
AGS初始化完毕后,您可以使用如下命令进行文件的上传和查看。
[root@iZwz92q9h36kv8posr0i6uZ ~]# ags oss cp test.fq.gz oss://my-test-shenzhen/fasq/
Succeed: Total num: 1, size: 690. OK num: 1(upload 1 files).
average speed 3000(byte/s)
0.210685(s) elapsed
[root@iZwz92q9h36kv8posr0i6uZ ~]# ags oss ls oss://my-test-shenzhen/fasq/
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-09-02 17:20:34 +0800 CST 690 Standard 9FDB86F70C6211B2EAF95A9B06B14F7E oss://my-test-shenzhen/fasq/test.fq.gz
Object Number is: 1
0.117591(s) elapsed
通过使用ags oss命令,可以进行文件的上传下载等,所有的ossutil原生命令AGS均支持。
查看Workflow资源使用量
securityContext安全支持
ags submit arguments-security-context.yaml
命令,绑定对应的psp来进行权限控制。apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: test
spec:
arguments: {}
entrypoint: test-security-
templates:
- inputs: {}
metadata: {}
name: test-security-
outputs: {}
parallelism: 1
steps:
- - arguments: {}
name: bcl2fq
template: bcl2fq
- container:
args:
- id > /tmp/yyy;echo `date` > /tmp/aaa;ps -e -o comm,euid,fuid,ruid,suid,egid,fgid,gid,rgid,sgid,supgid
> /tmp/ppp;ls -l /tmp/aaa;sleep 100;pwd
command:
- sh
- -c
image: registry.cn-hangzhou.aliyuncs.com/dahu/curl-jp:1.2
name: main
resources: #don't use too much resources
requests:
memory: 320Mi
cpu: 1000m
inputs: {}
metadata: {}
name: bcl2fq
outputs: {}
securityContext:
runAsUser: 800
YAML定义自动重试功能
bash命令会由于不明原因失败,重试就可以解决,AGS提供一种基于YAML配置的自动重启机制,当Pod内命令运行失败后,会自动拉起重试,并且可以设置重试次数。
ags submit arguments-auto-retry.yaml
命令,配置Workflow的自动重启机制。# This example demonstrates the use of retries for a single container.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: retry-container-
spec:
entrypoint: retry-container
templates:
- name: retry-container
retryStrategy:
limit: 10
container:
image: python:alpine3.6
command: ["python", -c]
# fail with a 66% probability
args: ["import random; import sys; exit_code = random.choice([0, 1, 1]); sys.exit(exit_code)"]
基于最近失败断点重试整个Workflow
在整个Workflow运行中,有时候任务中的某个步骤会失败,这时候希望从某个失败的节点重试Workflow,类似断点续传的断点重试功能。
使用ECI运行workflow
ECI操作请参见弹性容器实例ECI。
配置使用ECI前,请先安装AGS,请参见AGS 下载和安装。
查看Workflow实际资源使用量以及峰值
ags workflow controller会通过metrics-server自动获取Pod每分钟的实际资源使用量,并且统计出来总量和各个Pod的峰值使用量。
ags get steps-jr6tw --metrics
命令,查看Workflow实际资源使用量以及峰值。➜ ags get steps-jr6tw --metrics
Name: steps-jr6tw
Namespace: default
ServiceAccount: default
Status: Succeeded
Created: Tue Apr 16 16:52:36 +0800 (21 hours ago)
Started: Tue Apr 16 16:52:36 +0800 (21 hours ago)
Finished: Tue Apr 16 19:39:18 +0800 (18 hours ago)
Duration: 2 hours 46 minutes
Total CPU: 0.00275 (core*hour)
Total Memory: 0.04528 (GB*hour)
STEP PODNAME DURATION MESSAGE CPU(core*hour) MEMORY(GB*hour) MaxCpu(core) MaxMemory(GB)
✔ steps-jr6tw 0 0 0 0
└---✔ hello1 steps-jr6tw-2987978173 2h 0.00275 0.04528 0.000005 0.00028
设置Workflow优先级
当前面有一些任务正在运行时,有一个紧急任务急需运行,此时,您可以给Workflow设置高、中、低的优先级,高优先级抢占低优先级任务的资源。
- 您可以给某个Pod设置高优先级,示例如下:
创建并拷贝内容到arguments-high-priority-taskA.yaml文件中,并执行
ags submit arguments-high-priority-taskA.yaml
命令,给任务A设置高优先级。apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "This priority class should be used for XYZ service pods only."
- 您可以给某个Pod设置中优先级,示例如下:
创建并拷贝内容到arguments-high-priority-taskB.yaml文件中,并执行
ags submit arguments-high-priority-taskB.yaml
命令,给任务B设置中优先级。apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: medium-priority value: 100 globalDefault: false description: "This priority class should be used for XYZ service pods only."
- 您也可以一个Workflow设置高优先级,示例如下:
创建并拷贝内容到arguments-high-priority-Workflow.yaml文件中,并执行
ags submit arguments-high-priority-Workflow.yaml
命令,给Workflow中所有的Pod设置高优先级。apiVersion: argoproj.io/v1alpha1 kind: Workflow # new type of k8s spec metadata: generateName: high-proty- # name of the workflow spec spec: entrypoint: whalesay # invoke the whalesay template podPriorityClassName: high-priority # workflow level priority templates: - name: whalesay # name of the template container: image: ubuntu command: ["/bin/bash", "-c", "sleep 1000"] resources: requests: cpu: 3
下面以一个Workflow里面含有两个Pod,分别给一个Pod设置中优先级,另一个Pod设置高优先级,此时,高优先级的Pod就能抢占低优先级Pod的资源。
Workflow Filter
在ags get workflow中,针对较大的Workflow可以使用filter列出指定状态的Pod。
敏捷版Autoscaler使用流程
- 您已经有一个VPC。
- 您已经有一个vSwitch。
- 您已经设置好一个安全组。
- 您已经获取到敏捷版的APIServer内网地址。
- 您明确扩容节点的规格。
- 您已创建好一个ECS实例且拥有公网访问能力。
$ags config autoscaler根据提示输入对应的值Please input vswitchs with comma separated
vsw-hp3cq3fnv47bpz7x58wfe
Please input security group id
sg-hp30vp05x6tlx13my0qu
Please input the instanceTypes with comma separated
ecs.c5.xlarge
Please input the new ecs ssh password
xxxxxxxx
Please input k8s cluster APIServer address like(192.168.1.100)
172.24.61.156
Please input the autoscaling mode (current: release. Type enter to skip.)
Please input the min size of group (current: 0. Type enter to skip.)
Please input the max size of group (current: 1000. Type enter to skip.)
Create scaling group successfully.
Create scaling group config successfully.
Enable scaling group successfully.
Succeed
配置完成后,登录弹性伸缩控制台, 可以看到创建好的自动伸缩组。
配置使用ags configmap
本例中,默认使用hostNetwork。
在文档使用中是否遇到以下问题
更多建议
匿名提交