Add ID Column

更新时间:
复制 MD 格式

The Add ID Column component is an algorithm that processes data tables. It inserts a unique ID column as the first column of a table and assigns a serial number to each row to help with data identification and management.

Algorithm description

The algorithm supports a scale of 1,000,000,000 × 1,023.

Configure the component

Method 1: Use the GUI

Add the Add ID Column component to the Designer workflow. Configure the component parameters in the pane on the right.

Parameter type

Parameter

Description

Parameters Setting

All Selected by Default

By default, all columns are selected. Extra columns do not affect the prediction result.

Serial number

The default value is append_id.

Execution Tuning

Number of computing cores

The number of cores.

Memory per core

The memory size of each core, in MB. The value must be in the range of (1, 65536).

Method 2: Use PAI commands

Configure the Add ID Column component parameters using PAI commands. You can run PAI commands in the SQL script component. For more information, see SQL Script.

PAI -name AppendId
    -project algo_public
    -DinputTableName=maple_test_appendid_basic_input
    -DoutputTableName=maple_test_appendid_basic_output;

Parameter

Required

Default value

Description

inputTableName

Yes

None

The name of the input table.

selectedColNames

No

All columns

The columns in the input table that are used for training. Separate column names with commas (,). Columns of the INT and DOUBLE types are supported. If the input is in sparse format, STRING columns are also supported.

inputTablePartitions

No

All partitions

The partitions in the input table that are used for training. The following formats are supported:

  • Partition_name=value

  • name1=value1/name2=value2: multi-level format

Note

If you specify multiple partitions, separate them with commas (,).

outputTableName

Yes

None

The output table.

IDColName

No

append_id

The name of the ID column.

lifecycle

No

None

The output table lifecycle.

coreNum

No

System allocated

The number of cores.

memSizePerCore

No

System allocated

The memory size of each core, in MB. The value must be in the range of (1, 65536).

Example

PAI -name AppendId
    -project algo_public
    -DinputTableName=maple_test_appendid_basic_input
    -DoutputTableName=maple_test_appendid_basic_output;
  • Data generation

    col0

    col1

    col2

    col3

    col4

    10

    0.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

    11

    1.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    false

    12

    2.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

    13

    3.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

    14

    4.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

  • Output table

    append_id

    col0

    col1

    col2

    col3

    col4

    0

    10

    0.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

    1

    11

    1.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    false

    2

    12

    2.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

    3

    13

    3.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true

    4

    14

    4.0

    aaaa

    Thu Oct 01 00:00:00 CST 2015

    true