You can add a Spark data source for fast, batch data imports. This topic describes how to add a Spark data source.
Prerequisites
-
You have a Lindorm instance with the Lindorm Tunnel Service (LTS) engine.
-
You have a Lindorm instance with Lindorm Distributed Processing System (LDPS) activated. For more information, see Create an instance.
Procedure
In the Lindorm console
Log on to the Lindorm console.
-
On the Instances page, click the ID of the instance that uses the LTS engine.
-
In the left-side navigation pane, click Data Sources.
-
On the Compute Engine Data Source tab, click Add Data Source.
-
In the Add Data Source dialog box, configure the parameters described in the following table.
Parameter
Description
Instance Type
Select Lindorm.
Region
Select the region of the target Lindorm instance.
Instance ID
Select the ID of the target Lindorm instance.
Note-
Ensure that LDPS is activated for the target Lindorm instance. For more information, see Activate the service.
-
Ensure that the target Lindorm instance and the Lindorm instance that uses the LTS engine are in the same virtual private cloud (VPC). To connect instances across different VPCs, see Connect VPCs.
-
-
Click Determine. A status of Associated indicates that the Spark data source is successfully added.
In Lindorm Tunnel Service (LTS)
-
Log on to LTS. For more information, see Activate and log on to LTS.
-
In the left-side navigation pane, choose .
-
On the Add Data Source page, configure the parameters described in the following table.
Parameter
Description
Name
Enter lts_bulkload_spark.
Data source type
Select Spark.
Data source parameters
Configure the parameters for the Spark data source.
{ "virtualClusterName":"token", "hdfsUri":"hdfs://nn1:8020,nn2:8020", "sparkEndpoint":"http://192.168.XX.XX:10099" }-
virtualClusterName: The token for the LDPS JAR address. To obtain the token, go to the Lindorm console, navigate to the Database Connections page.
virtualClusterNametoken value can be obtained on the Database Connection page in the Lindorm console, by selecting the Compute Engine tab, and finding the value in the Token field. -
hdfsUri: The HDFS connection address of the Lindorm instance. The format ishdfs://nn1:8020,nn2:8020.NoteTo obtain the
nn1andnn2values for the connection address, see Connect to and use LindormDFS with an open-source HDFS client. The hdfs-site information contains thenn1andnn2addresses. -
sparkEndpoint: The LDPS JAR VPC address. To obtain this address, go to the Lindorm console, navigate to the Database Connections page.
sparkEndpointvalue can be obtained on the Database Connection page in the Lindorm console, by selecting the Compute Engine tab, and finding the value in the JAR VPC Address field.
-
-
Click Add.