更新时间:2019-09-30 15:51
作业管理支持通过spark-submit脚本、spark SQL方式编写scala&python&java作业及SQL,提交集群运行、结果展示等功能;同时可以添加到工作流做周期定时调度。
入口:https://hbase.console.aliyun.com/hbase/cn-shanghai/workspace/job
创建作业时,需要先选择一个可运行的集群。作业类型可以为“SparkJob”、“SparkSQL”
spark作业内容就是spark-submit的命令行参数,由于平台和运行环境特性,spark作业支持的命令行参数是官方spark-submit的一个子集,不需要配置—master参数,具体格式和支持参数如下:
[Options] <app jar | python file | R file> [app arguments]
Options (参数) | 说明 |
---|---|
—class CLASS_NAME | Your application’s main class (for Java / Scala apps). |
—jars JARS | Comma-separated list of jars to include on the driver and executor classpaths. |
—py-files PY_FILES | Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. |
—files FILES | Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName). |
—driver-memory MEM | Memory for driver (e.g. 1000M, 2G) (Default: 1024M). |
—driver-cores NUM | Number of cores used by the driver, only in cluster mode (Default: 1). |
—executor-cores NUM | Number of cores per executor. (Default: 1). |
—executor-memory MEM | Memory per executor (e.g. 1000M, 2G) (Default: 1G). |
—num-executors NUM | Number of executors to launch (Default: 2). |
—name NAME | A name of your application. |
—conf PROP=VALUE | Arbitrary Spark configuration property. |
例子:一个简单的java pi demo:
--class org.apache.spark.examples.SparkPi
--driver-memory 2G
--driver-cores 1
--executor-memory 2G
--executor-cores 2
--num-executors 1
--name pi
/examples_2.11-2.3.2.jar
10000
比如:下面作业传入${yyyyMMdd+2} 参数在实际运行的时候会被替换为“20191002”
yyyyMMdd -> ${yyyyMMdd}或${yyyyMMdd+n}或${yyyyMMdd-n}
yyyy-MM-dd -> ${yyyy-MM-dd}或${yyyy-MM-dd+n}或${yyyy-MM-dd-n}
yyyy/MM/dd -> ${yyyy/MM/dd}或${yyyy/MM/dd+n}或${yyyy/MM/dd-n}
timestamp -> ${timestamp}或${timestamp+n}或${timestamp-n}
+n为增加n天,-n为减少n天 n只支持正整数
点击运行时,如果当前region有多个集群,还可以再选择其它集群运行。
在文档使用中是否遇到以下问题
更多建议
匿名提交