User-created Spark SQL data source

更新时间:
复制 MD 格式

Add a Spark SQL data source to connect Quick BI to your Spark SQL database for data analysis and visualization. Quick BI can connect to Spark SQL databases (version 3.1.2) over a public network or through an Alibaba Cloud VPC.

Prerequisites

  • Ensure that your network is connected:

    • To connect to your Spark SQL database (version 3.1.2) over a public network, add the Quick BI IP addresses to the database whitelist. For more information, see Add a security group rule.

    • To connect to your Spark SQL database (version 3.1.2) over an internal network, use one of the following methods to establish a connection between your data source and Quick BI:

      • If your Spark SQL database is deployed on an Alibaba Cloud ECS instance, you can connect to it through an Alibaba Cloud VPC.

      • You can also set up a bastion host and access the database through an SSH tunnel.

  • You have the username and password for your user-created Spark SQL database.

Limitations

Quick BI supports Spark SQL version 3.1.2, and its underlying Hive Metastore must be version 2.0 or later.

Procedure

  1. Log on to the Quick BI console.

  2. Follow these steps to add a data source:

    1. Go to the page for creating a data source. For more information, see Create a data source.

    2. On the User-Created Data Sources tab, select Spark SQL.image

  3. In the Configure Connection dialog box, configure the parameters based on your business requirements.image

    Parameter

    Description

    Display name

    The display name for the data source.

    Enter a name without special characters or leading or trailing spaces.

    Database address

    The IP address or domain name of the Spark SQL database host.

    Port

    The database connection port.

    Database

    The name of the database you specified during deployment.

    Username and password

    The logon credentials for the Spark SQL database. Ensure that the user has the CREATE, INSERT, UPDATE, and DELETE permissions on the tables in the database.

    VPC data source

    If your database is in an Alibaba Cloud VPC, enable the VPC data source option.

    Select Instance if your database is on an ECS instance, or select CLB if you are connecting through a Classic Load Balancer (CLB).

    • Instance

      • Purchaser AccessId: The AccessKey ID of the account that owns the ECS instance.

        For more information, see Obtain an AccessKey.

        Note

        This AccessKey must have read permissions on the target instance. If the AccessKey also has write permissions on the security group, Quick BI automatically adds its IP addresses to the whitelist. Otherwise, you must add them manually. For more information, see Create a custom policy.

      • Purchaser AccessKey: The AccessKey secret corresponding to the AccessKey ID.

        For more information, see Obtain an AccessKey.

      • Instance ID: The ID of the ECS instance. You can find this on the Instances page of the ECS console.

        For more information, see View instance information.

      • Region: The region where the ECS instance is located, visible in the upper-left corner of the ECS console.

        For more information, see View instance information.

    • CLB

      • Purchaser AccessId: The AccessKey ID of the account that owns the CLB instance. This account must have read permissions on the instance.

        For more information, see View AccessKey information of a RAM user.

      • Purchaser AccessKey: The AccessKey secret that corresponds to the AccessKey ID. Log on to the RAM console to obtain the AccessKey secret.

        For more information, see View AccessKey information of a RAM user.

      • Instance ID: The ID of the CLB instance. Log on to the CLB console and obtain the instance ID from the instance list.

      • Region: The region where the instance is located. Log on to the CLB console and view the region in the upper-left corner of the Instances page.

    SSH

    If you select SSH, you must configure the following parameters:

    To access the database via an SSH tunnel, you need a bastion host. Contact your system administrator for the required information.

    • SSH host: Enter the IP address of the bastion host.

    • SSH username: The username used to log on to the bastion host.

    • SSH password: The password for the username.

    • SSH port: The port number of the bastion host. The default value is 22.

    For more information, see Connect to a Linux instance by using a password.

    Initial SQL

    When enabled, this feature lets you configure SQL statements to execute when a connection is initialized.

    Each statement runs after a connection is established. Only SET statements are allowed. Separate multiple statements with a semicolon (;).

  4. Click Test Connection to test the connectivity to the data source.

  5. After the connection test succeeds, click OK.

    The data source then appears in the data source list.

Next steps

After creating the data source, you can create a dataset and analyze data.