Ganos Spark模块允许用户基于Apache Spark分布式系统进行大规模的地理信息数据处理与分析。它基于Spark环境提供了一系列的接口进行数据加载、分析和保存。Ganos Spark提供了不同级别的数据分析模型,最基础的是GeometryRDD模型,用来实现Ganos数据中SimpleFeature与Spark中RDD模型的之间的转换。在GeometryRDD基础上,Ganos Spark基于SparkSQL设计了一系列用于空间数据表达的UDT与UDF或UDAF,允许用户使用类似SQL结构化查询语言进行数据的查询与分析。Ganos Spark整体框架如下:1

1. 获取Ganos Spark 工具包

首先请从此链接获取GanosSparkSDK开发包:Ganos Spark驱动

在工程目录的pom文件中增加依赖:
  1. <!— Ganos Spark —><dependency> <groupId>com.aliyun.ganos</groupId> <artifactId>ganos-spark-runtime</artifactId> <version>1.0-SNAPSHOT</version> <scope>system</scope> <systemPath>../ganos-spark-runtime-1.0-SNAPSHOT.jar</systemPath></dependency><!— Spark —><dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core${scala.binary.version}</artifactId> <version>${spark.version}</version></dependency><dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-catalyst${scala.binary.version}</artifactId> <version>${spark.version}</version> <exclusions> <exclusion> <groupId>org.scala-lang</groupId> <artifactId>scala-reflect</artifactId> </exclusion> </exclusions></dependency><dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql${scala.binary.version}</artifactId> <version>${spark.version}</version></dependency><dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-yarn${scala.binary.version}</artifactId> <version>${spark.version}</version></dependency> <dependency> <groupId>io.netty</groupId> <artifactId>netty-all</artifactId> <version>4.1.18.Final</version></dependency>
  1. </section>
  2. <section class="section" id="section-rua-gtr-dw5">
  3. <h2 class="title sectiontitle" id="title-wk3-62n-tqp">2 使用Ganos Spark查询HBase Ganos</h2>
  4. <p class="p" id="p-4ej-6bp-42l">开发环境配置完成后,用户可参考下面实例通过Ganos Spark连接HBase Ganos并查询数据:</p>
  5. <pre class="pre codeblock language-scala" id="codeblock-dn3-mav-36l"><code>package com.aliyun.ganos

import com.aliyun.ganos.spark.GanosSparkKryoRegistratorimport org.apache.log4j.{Level, Logger}import org.apache.spark.sql.SparkSession

object GanosSparkDemo {

def main(args: Array[String]): Unit = { Logger.getLogger(“org”).setLevel(Level.ERROR) Logger.getLogger(“com”).setLevel(Level.ERROR)

  1. //指定HBase连接参数,POINT为Catalog名称
  2. val params = Map(
  3. "hbase.catalog" -&gt; "POINT",
  4. "hbase.zookeepers" -&gt; "zookeeper地址",
  5. "geotools" -&gt; "true")
  6. //初始化SparkSession
  7. val sparkSession = SparkSession.builder
  8. .appName("Simple Application")
  9. .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  10. .config("spark.sql.crossJoin.enabled", "true")
  11. .config("spark.kryo.registrator", classOf[GanosSparkKryoRegistrator].getName)
  12. .master("local[*]")
  13. .getOrCreate()
  14. //加载AIS数据源
  15. val dataFrame = sparkSession.read
  16. .format("ganos")
  17. .options(params)
  18. .option("ganos.feature", "AIS")
  19. .load()
  20. //查询全部数据
  21. dataFrame.createOrReplaceTempView("ais")
  22. val r=sparkSession.sql("SELECT * FROM ais")
  23. r.show()
  24. //时空查询
  25. val r1=sparkSession.sql("SELECT * FROM ais WHERE st_contains(st_makeBBOX(70.00000,11.00000,75.00000,14.00000), geom)")
  26. r1.show()
  27. //将查询结果反写入HBase Ganos
  28. r1.write.format("ganos").options(params).option("ganos.feature", "result").save()

}}

运行结果如下:

  1. <div class="p" id="p-u6i-jyk-yv7"><img class="image break" id="image-sgg-san-yk6" src="//static-aliyun-doc.oss-cn-hangzhou.aliyuncs.com/assets/img/zh-CN/3088854851/p88500.png"></div>
  2. <p class="p" id="p-bft-rfd-oqz">关于Ganos Spark支持的空间操作函数,用户可参考:<a href="https://help.aliyun.com/document_detail/129915.html">Ganos Spark函数</a></p>
  3. <p class="p" id="p-izd-u7n-0x8"></p>
  4. </section>
  5. <section class="section" id="section-1sb-wzo-isv">
  6. <h2 class="title sectiontitle" id="title-08f-73b-us8">3. 在Jupyter中使用Ganos Spark</h2>
  7. <p class="p" id="p-z6f-3yn-1dh">Ganos Spark提供了相关工具包允许用户在Jupyter环境下查询数据并进行可视化。</p>
  8. <p class="p" id="p-mvd-fcy-gj2">首先下载Ganos Spark Leaflet工具包:<a href="https://tst-ganos-bj-public.oss-cn-beijing.aliyuncs.com/hbase/driver_jar/spark/gas-spark-jupyter-leaflet-1.0-SNAPSHOT.jar">Leaflet工具</a></p>
  9. <p class="p" id="p-1fz-4ih-z5e">进入控制台,进行如下操作:</p>
  10. <div class="p" id="p-cp1-vqt-6bq">(1) 安装Jupyter:
  11. <pre class="pre codeblock" id="codeblock-li9-pid-0y1"><code>$ pip install --upgrade jupyter</code></pre>
  12. <pre class="pre codeblock" id="codeblock-ik0-9rd-xet"><code>$ pip3 install --upgrade jupyter</code></pre>
  13. </div>
  14. <div class="p" id="p-b7i-k61-p4l">(2) 配置SPARK_HOME环境变量,通过toree添加名为“Ganos Spark Test“的 kernel,并启动jupyter:
  15. <pre class="pre codeblock" id="codeblock-ez5-te9-dk5"><code>$ jars="ganos-spark-runtime-1.0-SNAPSHOT.jar,ganos-spark-jupyter-leaflet-1.0-SNAPSHOT.jar"

$ jupyter toree install —replace —user —kernel_name “Ganos Spark Test” —spark_home=${SPARK_HOME} —spark_opts=”—master localhost[*] —jars $jars”$ jupyter notebook

  1. <p class="p" id="p-yhi-kdg-v3b">服务器启动成功后,用户可在浏览器中访问:<a href="http://localhost:8888">http://localhost:8888</a>进入Jupyter控制台创建Ganos Spark Test会话。
  2. </p>
  3. <p class="p" id="p-f01-45v-pw6">(3) 加载HBase Ganos数据</p>
  4. <p class="p" id="p-e91-2lj-yvq">a) 创建Spark会话:</p>
  5. <div class="p" id="p-j6s-90g-ygd"><img class="image break" id="image-ry0-64x-thj" src="//static-aliyun-doc.oss-cn-hangzhou.aliyuncs.com/assets/img/zh-CN/4088854851/p88508.png"></div>
  6. <p class="p" id="p-k60-9qc-2az">b) 通过Spark SQL查询HBase Ganos数据:</p>
  7. <div class="p" id="p-fzp-1yv-rrn"><img class="image break" id="image-03r-4tf-1cr" height="600" src="//static-aliyun-doc.oss-cn-hangzhou.aliyuncs.com/assets/img/zh-CN/4088854851/p88507.png" width="600"></div>
  8. <div class="p" id="p-9co-ioo-urn">c) 在Leaflet中展示数据:<img class="image break" id="image-w70-76t-btn" src="//static-aliyun-doc.oss-cn-hangzhou.aliyuncs.com/assets/img/zh-CN/4088854851/p88504.png"></div>
  9. <p class="p" id="p-56k-ucx-hwj">用户可以下载完整notebook文档测试:<a href="https://tst-ganos-bj-public.oss-cn-beijing.aliyuncs.com/hbase/driver_jar/spark/GASSpark%E6%B5%8B%E8%AF%95.ipynb">GanosSpark测试.ipynb</a></p>
  10. <p class="p" id="p-pvk-hd4-csl"></p>
  11. <p class="p" id="p-2bg-b7e-orx"></p>
  12. <p class="p" id="p-5u2-stf-yt9"></p>
  13. <p class="p" id="p-j61-kc3-cr0"></p>
  14. <p class="p" id="p-m2f-gmo-68v"></p>
  15. <p class="p" id="p-t3i-hbu-ber"></p>
  16. <p class="p" id="p-vvp-tbm-pon"></p>
  17. <p class="p" id="p-7rs-rmy-fvd"></p>
  18. <p class="p" id="p-z4h-qoo-amn"></p>
  19. <p class="p" id="p-xzs-4bj-mdg"></p>
  20. <p class="p" id="p-aze-xvc-kcp"></p>
  21. <p class="p" id="p-nhl-c9p-qwv"></p>
  22. <p class="p" id="p-tke-g8a-0by"></p>
  23. <p class="p" id="p-aet-881-tkq"></p>
  24. </section>
  25. </div>
  26. </article>
  27. </main>