Configure the Databricks input component

更新时间: 2026-06-02 18:35:11

The Databricks input component reads data from a Databricks data source. To synchronize Databricks data to other data sources, configure the Databricks input component as the source, then configure the target data source.

Prerequisites

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.

  2. In the top navigation bar, select a project. In Dev-Prod mode, also select an environment.

  3. In the left-side navigation pane, click Batch Pipeline. From the Batch Pipeline list, click the target offline pipeline to open its configuration page.

  4. In the upper-right corner, click Component Library to open the Component Library panel.

  5. In the Component Library panel, select Input, find Databricks, and drag it to the canvas.

  6. Click the image icon on the Databricks component to open the Databricks Input Configuration dialog box.

  7. In the Databricks Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step Name

    The component name. Dataphin auto-generates a name that you can modify. Naming rules:

    • Allows Chinese characters, letters, underscores (_), and digits only.

    • Maximum 64 characters.

    Datasource

    Lists all Databricks data sources and project-level data sources in the current Dataphin instance, regardless of your read-through permission. Click the image icon to copy the data source name.

    To request read-through permission, click Request next to the data source.Request data source permissions.

    If no Databricks data source exists, click Create Data Source.Create a Databricks data source.

    Time Zone

    The time zone used to process time-formatted data. Defaults to the time zone of the selected data source and cannot be modified.

    Note

    For tasks created before V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. Default: Channel Configuration Time Zone.

    • Data Source Default Configuration: uses the time zone of the selected data source.

    • Channel Configuration Time Zone: uses the time zone set in Properties > Channel Configuration for the current task.

    Schema (optional)

    Select the schema of the target table. If not specified, the schema configured in the data source is used.

    When a project is selected as the data source, the schema is automatically determined by the project.

    Table

    Search by keyword or enter the exact name and click Exact Match. The system validates the table after selection. Click the image icon to copy the table name.

    Shard Key (optional)

    Shards data by the specified column for concurrent reading. Use with the concurrency setting. Use a primary key or indexed column for optimal performance.

    Important

    For date and time types, the system shards based on the total time range and concurrency. Even distribution is not guaranteed.

    Batch Read Count (optional)

    The number of records to read per batch, for example, 1,024. Batching reduces data source interactions, improves I/O efficiency, and lowers network latency.

    Input Filter (optional)

    A Databricks-compatible condition expression to filter source data.

    Note
    • Enter only the condition after WHERE. Do not include the WHERE keyword.

    • You can use system global variables, such as the data timestamp ${bizdate}.

    Output Fields

    Displays all fields matching the selected table and filter conditions. Remove fields you do not want to pass to downstream components.

    Note

    The data source table does not support hierarchical classification.

    • Delete a single field: Click the sgaga icon in the Operation column.

    • Delete multiple fields in batches: Click Field Management. In the Field Management dialog box, select the fields to remove, click the image left arrow icon to move them to the unselected list, then click OK.

  8. Click OK to save the Databricks input component configuration.

上一篇: Configure TDengine input widget 下一篇: Configure the Snowflake input component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈