分析集群选择使用HttpFS服务来供用户上传管理作业的jar包、python文件等到服务端
1、从分析集群控制台获取HttpFS服务地址,比如:
HttpFS: http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000
2、使用建议:
3、上传本地jar包或者python文件到Spark服务端
创建目录/resourcesdir:
curl -i -X PUT "http://ap-xxx.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir?op=MKDIRS&user.name=resource"
上传jar:
上传本地./examples/jars/examples_2.11-2.3.2.jar到HttpFs的/resourcesdir/examples_2.11-2.3.2.jar
curl -i -X PUT -T ./examples/jars/examples_2.11-2.3.2.jar "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/examples_2.11-2.3.2.jar?op=CREATE&data=true&user.name=resource" -H "Content-Type:application/octet-stream"
上传本地./examples/src/main/python/pi.py到HttpFs的/resourcesdir/pi.py
curl -i -X PUT -T ./examples/src/main/python/pi.py "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/pi.py?op=CREATE&data=true&user.name=resource" -H "Content-Type:application/octet-stream"
查看HttpFs的/resourcesdir/目录文件
curl -i "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/?op=LISTSTATUS&user.name=resource"
Spark服务选择Apache LivyServer来构建作业管理服务,支持提交jar(包括streaming)、python等
1、分析集群控制台获取LivyServer服务地址,比如:
LivyServer:http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998
2、提交作业
{
"file": "/resourcesdir/spark-examples_2.11-2.3.2.jar",
"className": "org.apache.spark.examples.SparkPi",
"driverMemory": "1g",
"executorMemory": "1g",
"conf": {
"spark.executor.instances": "1",
"spark.executor.cores": "1"
}
}
命令:
curl -H "Content-Type: application/json" -X POST -d @livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
样例:
[root@master]# curl -H "Content-Type: application/json" -X POST -d @livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 368 100 145 100 223 4815 7405 --:--:-- --:--:-- --:--:-- 7689
{
"appId": null,
"appInfo": {
"driverLogUrl": null,
"sparkUiUrl": null
},
"id": 1,
"log": [
"stdout: ",
"\nstderr: ",
"\nYARN Diagnostics: "
],
"state": "starting"
}
命令:
curl -X POST --data '{"file": "/resourcesdir/pi.py"}' -H "Content-Type: application/json" http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches
3、查询作业状态
通过LivyServer的API以及Spark UI查看
命令:
curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
样例:
[root@master t-apsara-spark-2.2.2]# curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 26 100 26 0 0 1904 0 --:--:-- --:--:-- --:--:-- 2000
{
"id": 1,
"state": "success"
}
4、 参考资料
Livy社区文档:https://livy.incubator.apache.org/
Spark社区文档:http://spark.apache.org/docs/2.3.2/
Aliyun官方Demo:https://github.com/aliyun/aliyun-apsaradb-hbase-demo/tree/master/spark
在文档使用中是否遇到以下问题
更多建议
匿名提交