3ds Max DAG作业最佳实践

1. 准备工作

1.1. 选择区域

所有阿里云服务都需要使用相同的地域。

1.2. 开通服务

1.3. 制作镜像

制作镜像具体步骤请参考集群镜像, 请严格按文档的步骤创建镜像。镜像制作完成后,通过以下方式可以获取到对应的镜像信息。image

1.4. 上传素材

可以下载 3ds Max 官方提供的免费素材包进行测试。

通过 OSSBrowser工具将渲染素材到指定的 OSS bucket 中,如下图:

upload

1.5. 安装批量计算 SDK

在需要提交作业的机器上,安装批量计算 SDK 库;已经安装请忽略。Linux 安装执行如下命令;Windows 平台请参考文档

pip install batchcompute

2. 编写work脚本

work.py

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import os
import math
import sys
import re
import argparse
NOTHING_TO_DO = 'Nothing to do, exit'

def _calcRange(a,b, id, step):
  start = min(id * step + a, b)
  end  = min((id+1) * step + a-1, b)
  return (start, end)

def _parseContinuedFrames(render_frames, total_nodes, id=None, return_type='list'):
  '''
  解析连续帧, 如:1-10
  '''
  [a,b]=render_frames.split('-')
  a=int(a)
  b=int(b)
  #print(a,b)
  step = int(math.ceil((b-a+1)*1.0/total_nodes))
  #print('step:', step) 
  mod =  (b-a+1) % total_nodes
  #print('mod:', mod)
  if mod==0 or id < mod:
    (start, end) = _calcRange(a,b, id, step)
    #print('--->',start, end)
    return (start, end) if return_type!='list' else range(start, end+1)
  else:
    a1 =  step * mod + a
    #print('less', a1, b, id)
    (start, end) = _calcRange(a1 ,b, id-mod, step-1)
    #print('--->',start, end)
    return (start, end)  if return_type!='list' else range(start, end+1)

def _parseIntermittentFrames(render_frames, total_nodes, id=None):
  '''
  解析不连续帧, 如: 1,3,8-10,21
  '''
  a1=render_frames.split(',')
  a2=[]
  for n in a1:
    a=n.split('-')
    a2.append(range(int(a[0]),int(a[1])+1) if len(a)==2 else [int(a[0])])
  a3=[]
  for n in a2: 
    a3=a3+n
  #print('a3',a3)
  step = int(math.ceil(len(a3)*1.0/total_nodes))
  #print('step',step)
  mod =  len(a3) % total_nodes
  #print('mod:', mod)
  if mod==0 or id < mod:
    (start, end) = _calcRange(0, len(a3)-1, id, step) 
    #print(start, end)
    a4= a3[start: end+1] 
    #print('--->', a4)
    return a4
  else:
    #print('less',  step * mod  , len(a3)-1, id)
    (start, end) = _calcRange( step * mod   ,len(a3)-1, id-mod, step-1)
    if start > len(a3)-1:
      print(NOTHING_TO_DO)
      sys.exit(0)
    #print(start, end)
    a4= a3[start: end+1] 
    #print('--->', a4)
    return a4
def parseFrames(render_frames, return_type='list', id=None, total_nodes=None):
    '''
    @param render_frames {string}:  需要渲染的总帧数列表范围,可以用"-"表示范围,不连续的帧可以使用","隔开, 如: 1,3,5-10 
    @param return_type {string}:  取值范围[list,range]。 list样例: [1,2,3], range样例: (1,3)。 
            注意: render_frames包含","时有效,强制为list。
    @param id, 节点ID,从0开始。 正式环境不要填写,将从环境变量 BATCH_COMPUTE_DAG_INSTANCE_ID 中取得。
    @param total_nodes, 总共的节点个数。正式环境不要填写,将从环境变量 BATCH_COMPUTE_DAG_INSTANCE_COUNT 中取得。
    '''
    if id==None:
      id=os.environ['BATCH_COMPUTE_DAG_INSTANCE_ID']
    if type(id)==str:
      id = int(id)
    if total_nodes==None:
      total_nodes = os.environ['BATCH_COMPUTE_DAG_INSTANCE_COUNT']
    if type(total_nodes)==str:
      total_nodes = int(total_nodes)
    if re.match(r'^(\d+)\-(\d+)$',render_frames):
      # 1-2
      # continued frames
      return _parseContinuedFrames(render_frames, total_nodes, id, return_type)
    else:
      # intermittent frames
      return _parseIntermittentFrames(render_frames, total_nodes, id)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
          formatter_class = argparse.ArgumentDefaultsHelpFormatter,
          description = 'python scripyt for 3dmax dag job',
          usage='render3Dmax.py <positional argument> [<args>]',
      )

    parser.add_argument('-s', '--scene_file', action='store', type=str, required=True, help = 'the name of the file with .max subffix .')
    parser.add_argument('-i', '--input', action='store', type=str, required=True, help = 'the oss dir of the scene_file, eg: xxx.max.')
    parser.add_argument('-o', '--output', action='store', type=str, required=True, help = 'the oss of dir the result file to upload .')
    parser.add_argument('-f', '--frames', action='store', type=str, required=True, help = 'the frames to be renderd, eg: "1-10".')
    parser.add_argument('-t', '--retType', action='store', type=str, default="test.jpg", help = 'the tye of the render result,eg. xxx.jpg/xxx.png.')
    args = parser.parse_args()

    frames=parseFrames(args.frames)
    framestr='-'.join(map(lambda x:str(x), frames))

    s = "cd \"C:\\Program Files\\Autodesk\\3ds Max 2018\\\" && "
    s +='3dsmaxcmd.exe -o="%s%s" -frames=%s "%s\\%s"' % (args.output, args.retType, framestr, args.input, args.scene_file)
    print("exec: %s" % s)

    rc = os.system(s)
    sys.exit(rc>>8)

注意:

  • work.py 只需要被上传到 OSS bucket中不需要手动执行;各项参数通过作业提交脚本进行传递;

  • work.py 的112 行需要根据镜像制作过程中 3ds MAX 的位置做对应替换;

  • work.py 的 scene_file 参数表示场景文件;如 Lighting-CB_Arnold_SSurface.max;

  • work.py 的 input 参数表示素材映射到 VM 中的位置,如:D;

  • work.py 的 output 参数表示渲染结果输出的本地路径;如 C:\tmp\;

  • work.py 的 frames 参数表示渲染的帧数,如:1;

  • work.py 的 retType 参数表示素材映射到 VM 中的位置,如:test.jpg;渲染结束后如果是多帧,则每帧的名称为test000.jpg,test001.jpg等。

work

3. 编写作业提交脚本

test.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from batchcompute import Client, ClientError
from batchcompute.resources import (
    ClusterDescription, GroupDescription, Configs, Networks, VPC,
    JobDescription, TaskDescription, DAG,Mounts,
    AutoCluster,Disks,Notification,
)
import time
import argparse

from batchcompute import CN_SHANGHAI as REGION #需要根据 region 做适配
access_key_id = "xxxx" # your access key id
access_key_secret = "xxxx" # your access key secret

instance_type = "ecs.g5.4xlarge" # instance type  #需要根据 业务需要 做适配

image_id = "m-xxx"

workossPath = "oss://xxxxx/work/work.py"

client = Client(REGION, access_key_id, access_key_secret)

def getAutoClusterDesc(InstanceCount):
    auto_desc = AutoCluster()

    auto_desc.ECSImageId = image_id

    #任务失败保留环境,程序调试阶段设置。环境保留费用会继续产生请注意及时手动清除环境任务失败保留环境,
    # 程序调试阶段设置。环境保留费用会继续产生请注意及时手动清除环境
    auto_desc.ReserveOnFail = False

    # 实例规格
    auto_desc.InstanceType = instance_type

    #case3 按量
    auto_desc.ResourceType = "OnDemand"

    #Configs
    configs = Configs()
    #Configs.Networks
    networks  = Networks()
    vpc = VPC()

    # CidrBlock和VpcId 都传入,必须保证VpcId的CidrBlock 和传入的CidrBlock保持一致
    vpc.CidrBlock = '172.26.0.0/16'
    # vpc.VpcId = "vpc-8vbfxdyhx9p2flummuwmq"

    networks.VPC = vpc
    configs.Networks = networks

    # 设置系统盘type(cloud_efficiency/cloud_ssd)以及size(单位GB)
    configs.add_system_disk(size=40, type_='cloud_efficiency')

    #设置数据盘type(必须和系统盘type保持一致) size(单位GB) 挂载点
    # case1 linux环境
    # configs.add_data_disk(size=40, type_='cloud_efficiency', mount_point='/path/to/mount/')

    # 设置节点个数
    configs.InstanceCount = InstanceCount
    auto_desc.Configs = configs
    return auto_desc

def getTaskDesc(inputOssPath, outputossPath, scene_file, frames, retType, clusterId, InstanceCount):
    taskDesc = TaskDescription()

    timestamp = time.strftime("%Y_%m_%d_%H_%M_%S", time.localtime())
    inputLocalPath = "D:"
    outputLocalPath = "C:\\\\tmp\\\\" + timestamp + "\\\\"
    outputossBase = outputossPath + timestamp + "/"
    stdoutOssPath = outputossBase + "stdout/" #your stdout oss path
    stderrOssPath = outputossBase + "stderr/" #your stderr oss path
    outputossret = outputossBase + "ret/"

    taskDesc.InputMapping = {inputOssPath: inputLocalPath}
    taskDesc.OutputMapping = {outputLocalPath: outputossret}

    taskDesc.Parameters.InputMappingConfig.Lock = True

    # 设置程序的标准输出地址,程序中的print打印会实时上传到指定的oss地址
    taskDesc.Parameters.StdoutRedirectPath = stdoutOssPath

    # 设置程序的标准错误输出地址,程序抛出的异常错误会实时上传到指定的oss地址
    taskDesc.Parameters.StderrRedirectPath = stderrOssPath

    #触发程序运行的命令行
    # PackagePath存放commandLine中的可执行文件或者二进制包
    taskDesc.Parameters.Command.PackagePath = workossPath
    taskDesc.Parameters.Command.CommandLine = "python work.py -i %s -o %s -s %s -f %s -t %s" % (inputLocalPath, outputLocalPath, scene_file, frames, retType)

    # 设置任务的超时时间
    taskDesc.Timeout = 86400

    # 设置任务所需实例个数
    taskDesc.InstanceCount = InstanceCount

    # 设置任务失败后重试次数
    taskDesc.MaxRetryCount = 3

    if clusterId:
        # 采用固定集群提交作业
        taskDesc.ClusterId = clusterId
    else:
        #采用auto集群提交作业
        taskDesc.AutoCluster = getAutoClusterDesc(InstanceCount)

    return taskDesc


def getDagJobDesc(inputOssPath, outputossPath, scene_file, frames, retType, clusterId = None, instanceNum = 1):
    job_desc = JobDescription()
    dag_desc = DAG()

    job_desc.Name = "testBatch"
    job_desc.Description = "test 3dMAX job"
    job_desc.Priority = 1

    # 任务失败
    job_desc.JobFailOnInstanceFail = False

    # 作业运行成功后户自动会被立即释放掉
    job_desc.AutoRelease = False
    job_desc.Type = "DAG"

    render = getTaskDesc(inputOssPath, outputossPath, scene_file, frames, retType, clusterId, instanceNum)

    # 添加任务
    dag_desc.add_task('render', render)

    job_desc.DAG = dag_desc
    return job_desc

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        formatter_class = argparse.ArgumentDefaultsHelpFormatter,
        description = 'python scripyt for 3dmax dag job',
        usage='render3Dmax.py <positional argument> [<args>]',
    )

    parser.add_argument('-n','--instanceNum', action='store',type = int, default = 1,help = 'the parell instance num .')
    parser.add_argument('-s', '--scene_file', action='store', type=str, required=True, help = 'the name of the file with .max subffix .')
    parser.add_argument('-i', '--inputoss', action='store', type=str, required=True, help = 'the oss dir of the scene_file, eg: xxx.max.')
    parser.add_argument('-o', '--outputoss', action='store', type=str, required=True, help = 'the oss of dir the result file to upload .')
    parser.add_argument('-f', '--frames', action='store', type=str, required=True, help = 'the frames to be renderd, eg: "1-10".')
    parser.add_argument('-t', '--retType', action='store', type=str, default = "test.jpg", help = 'the tye of the render result,eg. xxx.jpg/xxx.png.')
    parser.add_argument('-c', '--clusterId', action='store', type=str, default=None, help = 'the clusterId to be render .')

    args = parser.parse_args()

    try:
        job_desc = getDagJobDesc(args.inputoss, args.outputoss, args.scene_file, args.frames,args.retType, args.clusterId, args.instanceNum)
        # print job_desc
        job_id = client.create_job(job_desc).Id
        print('job created: %s' % job_id)
    except ClientError,e:
        print (e.get_status_code(), e.get_code(), e.get_requestid(), e.get_msg())

注意:

  • 代码中 12~20 行需要根据做适配,如 AK 信息需要填写账号对应的AK信息;镜像Id 就是1.3 中制作的镜像 Id;workosspath 是步骤 2 work.py 在OSS上的位置;

  • 参数 instanceNum 表示当前渲染作业需要几个节点参与,默认是1个节点;若是设置为多个节点,work.py 会自动做均分;

  • 参数 scene_file 表示需要渲染的场景文件,传给 work.py;

  • 参数 inputoss 表示素材上传到 OSS 上的位置,也即1.4 中的 OSS 位置;

  • 参数 outputoss 表示最终结果上传到 Oss 上的位置;

  • 参数 frames 表示需要渲染的场景文件的帧数,传给 work.py;3ds MAX 不支持隔帧渲染,只能是连续帧,如1-10;

  • 参数 retType 表示需要渲染结果名称,传给 work.py,默认是 test.jpg,则最终得到test000.jpg

  • 参数 clusterId 表示采用固定集群做渲染时,固定集群的Id。

4. 提交作业

根据以上示例文档,执行以下命令:

python test.py -s Lighting-CB_Arnold_SSurface.max -i oss://bcs-test-sh/3dmaxdemo/Scenes/Lighting/ -o oss://bcs-test-sh/test/ -f 1-1 -t 123.jpg

示例运行结果:

restulr

picture