Configure GBase 8c Input Component

更新时间:
复制 MD 格式

The GBase 8c input component retrieves data from a GBase 8c data source. To synchronize data from a GBase 8c data source to another data source, first configure the GBase 8c input component to read the source data, and then configure the destination data source for synchronization. This topic describes how to configure the GBase 8c input component.

Prerequisites

  • You have created a GBase 8c data source. For detailed instructions, see .

  • The account you use to configure the properties of the GBase 8c input component must have read-through permission for the data source. If you do not have this permission, request it for the data source. For more information, see Request, renew, and revoke data source permissions.

Procedure

  1. On the Dataphin homepage, in the top menu bar, select Development > Data Integration.

  2. On the Integration page, select Project from the top menu bar. If you are in Dev-Prod mode, select an environment.

  3. In the navigation pane on the left, click Batch Pipeline. From the Batch Pipeline list, click the Offline Pipeline to develop. This opens its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, select Input. In the input component list on the right, locate the GBase 8c component and drag it to the canvas.

  6. Click the image icon on the GBase 8c input component card to open the GBase 8c Input Configuration dialog box.

  7. In the GBase 8c Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the GBase 8c input component. Dataphin automatically generates the step name. You can modify it as needed. Naming conventions are as follows:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters in length.

    Datasource

    The data source drop-down list displays all GBase 8c data sources, including those for which you have read-through permission and those for which you do not. Click the image icon to copy the current data source name.

    • For data sources for which you do not have read-through permission, you can click Apply next to the data source to request read permission for that data source. For detailed steps on how to apply for, renew, and return data source permissions, see Apply for, Renew, and Return Data Source Permissions.

    • If you do not have a GBase 8c data source, click New Data Source to create a data source. For more information, see .

    Schema (Optional)

    Supports selecting tables across schemas. Select the schema where the table resides. If not specified, the system uses the schema configured in the data source by default.

    Source Table Quantity

    Select the source table quantity. Source table quantity includes Single Table and Multiple Tables:

    • Single Table: Use this for scenarios where you synchronize business data from one table to one destination table.

    • Multiple Tables: Use this for scenarios where you synchronize business data from multiple tables to the same destination table. When writing data from multiple tables to the same data table, the system uses a union algorithm.

      For more information about UNION, see Intersection (INTERSECT), Union (UNION), and Exception (EXCEPT).

    Table Matching Method

    Select General Rules or Database Regular Expression.

    Note

    Configure this option only when you select Multiple Tables for Source Table Quantity.

    Table

    Select the source table:

    • If you select Single Table for Source Table Quantity, enter table name keywords to search, or enter the exact table name and click Precise Search. After selecting a table, the system automatically detects its status. Click the image icon to copy the name of the selected table.

    • If you select Multiple Tables for Source Table Quantity, enter different expressions to add tables based on the table matching method.

      • If you select General Rules for the table matching method, enter table expressions in the input box to filter for tables with the same structure. The system supports enumeration, regular expression-like, and mixed forms. For example, table_[001-100];table_102;.

      • If you select Database Regular Expression for the table matching method, enter the regular expression supported by the current database in the input box. The system matches tables in the destination database based on this regular expression. During runtime, the system immediately matches new table ranges for synchronization based on the database regular expression.

      After entering the expression, click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.

    Split Key (Optional)

    The system partitions data based on the configured split key field. Use this with concurrency configuration to achieve concurrent reads. You can use a column from the source table as the split key. Additionally, use a primary key or an indexed column as the split key to ensure transfer performance.

    Important

    When selecting a date/time type, the system performs a brute-force split based on the total time range and concurrency by identifying the maximum and minimum values. This does not guarantee an even distribution.

    Batch Read Count (Optional)

    The number of data records read at one time. When reading data from the source database, configure a specific batch read count (such as 1024 records) instead of reading records one by one. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (Optional)

    Enter filter information for input fields, such as ds=${bizdate}. Input Filter applies to the following two scenarios:

    • A fixed subset of data.

    • Parameter filtering.

    Output Fields

    The Output Fields area displays all fields from the selected table and those matching the filter conditions. It supports the following operations:

    • Field Management: If you do not need to output certain fields to downstream components, delete the corresponding fields:

      • Delete Single Field: To delete a few fields, click the sgaga icon in the Actions column to remove unnecessary fields.

      • Batch Delete Fields: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, then click the image left-move icon to move the selected input fields to the unselected input fields, and click OK to complete the batch deletion of fields.

        image..png

    • Batch Add: Click Batch Add. This supports batch configuration in JSON, TEXT, and DDL formats.

      Note

      After batch addition is complete, clicking OK overwrites the configured field information.

      • Batch configure in JSON format. For example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        Index indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import.

        For example, "index":3,"name":"user_id","type":"String" means to import the fourth column from the file, with the field name user_id and field type String.

      • Batch configure in TEXT format. For example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.).

        • The column delimiter separates the field name and field type. The default is a comma (,). It supports ','. The field type is optional; the default is ','.

      • Batch configure in DDL format. For example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • New Output Field: Click +New Output Field. Fill in Column, Type, and Remarks as prompted on the page, and select Mapping Type. After configuring the current row, click the image icon to save.

  8. Click OK to complete the property configuration for the GBase 8c input component.