Configure OSS input component

更新时间: 2026-06-23 13:16:43

The OSS input component reads data from OSS data sources. To synchronize data from an OSS data source to other data sources, configure the OSS input component as the source first, and then configure the destination data source.

Prerequisites

  • An OSS data source is created. For more information, see Create an OSS data source.

  • The account used to configure the OSS input component properties has the read-through permission on the data source. If you do not have the permission, request it. For more information, see Request permissions on a data source.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.

  2. In the top navigation bar of the integration page, select a project (In Dev-Prod mode, you need to select an environment).

  3. In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.

  4. Click Component Library in the upper-right corner of the page to open the Component Library panel.

  5. In the left-side navigation pane of the Component Library panel, select Inputs. Find the OSS component in the input component list on the right and drag it to the canvas.

  6. Click the image icon in the OSS input component card to open the OSS Input Configuration dialog box.

  7. In the OSS Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the OSS input component. Dataphin automatically generates a step name. You can modify the name based on your business scenario. The name must meet the following requirements:

    • The name can contain only Chinese characters, letters, underscores (_), and digits.

    • The name cannot exceed 64 characters in length.

    Datasource

    Select an OSS data source configured in Dataphin that meets the following conditions:

    • The data source type is OSS Data Source.

    • The account used to configure the properties has the read-through permission on the data source. If you do not have the permission, request it. For more information, see Request permissions on a data source.

    You can also click Create next to Data Source to go to the planning module and add a data source. For more information, see Create an OSS data source.

    Object Prefix

    The name of the OSS object from which to read data. You can specify multiple object names. For example, if a bucket contains a data folder with the phin.txt file, set the Object Prefix to data/phin.txt to synchronize a specific file. To synchronize all files in a folder, use a wildcard character, such as data/*.

    File Type

    The system supports reading files in Text, CSV, xls, and xlsx formats. Different formats require different configuration.

    Output Fields

    Displays the output fields. You can manually add output fields:

    • Click Batch Add.

      • Configure in JSON format, for example:

        // Example:
        [{"index": 0,"name": "user_id","type": "String"},
         {"index": 1,"name": "user_name","type": "String"}]
        Note

        index indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import. For example: "index":3,"name":"user_id","type":"String" indicates that the fourth column in the file is imported, the field name is user_id, and the field type is String.

      • Configure in TEXT format, for example:

        1,user_name,String
        • The row delimiter is used to separate the information of each field. The default value is a line feed (\n). The system supports line feeds (\n), semicolons (;), and periods (.).

        • The column delimiter is used to separate field names from field types. The default value is a comma (,).

    • Click Create Output Field, and fill in Source Index, Column, and select Type as prompted. For Text and CSV file types, you must fill in the numeric index of the column where the field is located in the Source Index field. The index starts from 0.

    You can also perform the following operations on added fields:

    • Click and drag the image icon next to a field to change its position.

    • Click the Actionsagag icon in the column to edit an existing field.

    • Click the Actionsagfag icon in the column to delete an existing field.

  8. Text and CSV formats

    Parameter

    Description

    Column Delimiter

    The column delimiter of the file. Defaults to a comma (,).

    Row Delimiter

    The row delimiter of the file. Defaults to a line feed (\n).

    File Encoding

    The encoding format of the source file. Supported values: UTF-8 and GBK.

    Null Value

    Enter the fields to represent as null. If these fields exist in the source, the corresponding values are converted to null.

    Compression Format

    The compression format of the files. Leave this parameter empty (default) if the files are not compressed. Supported formats:

    • zip

    • gzip

    • bzip2

    • lzo

    • lzo_deflate

    First Row Content Type

    The content type of the first row. Supported values: Data Content or Column Name.

    Xls and xlsx formats

    Parameter

    Description

    Sheet Selection

    You can select sheets to read by name or index. If you want to read multiple sheets, make sure that they have the same data format.

    • By Name: You need to fill in the Sheet Name that you want to read.

    • By Index: You need to fill in the Sheet Index that you want to read, starting from 0.

    Data Content Start Row

    Specify the starting row of the data content. The default value is 1, which means data starts from the first row. To skip the first N rows, set this value to N+1.

    Data Content End Row

    Specify the ending row of the data content. If not specified, the system reads to the last row that contains data.

    Export Sheet Name

    Select whether to export the source sheet name of the data. The exported content is {sheet name}.

    File Encoding

    The system supports UTF-8 and GBK encoding.

    Compression Format

    The system supports zip, gzip, bzip2, lzo, and lzo_deflate compression formats.

    Null Value Conversion

    You can specify any string to be converted to a Null value.

  9. Click OK to complete the property configuration of the OSS input component.

What to do next

After you configure the input component, configure downstream components to complete data synchronization. For more information, see Development description of the integration component library.

上一篇: Configure a PolarDB input component 下一篇: Configure the SAP HANA Input Component
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈