Feature Scaling-Platform For AI(PAI)-阿里云帮助中心

Overview

The feature scaling component can:

Apply common scaling functions, such as log2, log10, ln, abs, and sqrt.
Process data in both dense and sparse formats.

Component configuration

You can configure the feature scaling component in one of the following ways.

Method 1: Use the console

Configure the component parameters in the Designer pipeline.

Tab	Parameter	Description
Fields setting	Scaled features	The features to scale.
	Label column	If you specify this parameter, you can view the x-y distribution histogram of features against the target variable.
	Is k:v,k:v sparse feature	Specifies whether the training data is sparse. Sparse data is typically stored in a single field instead of multiple fields.
	Keep original transformed features	Specifies whether to keep the original features. If this option is selected, new scaled features are created with the scale_ prefix.
Parameters setting	Scaling function	The feature scaling component supports the following scaling functions: log2 log10 ln abs sqrt

Method 2: Use the CLI

Configure the component parameters using a PAI command. You can run the command in the SQL script component.

PAI -name fe_scale_runner -project algo_public
    -Dlifecycle=28
    -DscaleMethod=log2
    -DscaleCols=nr_employed
    -DinputTable=pai_dense_10_1
    -DoutputTable=pai_temp_2262_20380_1;

Parameter	Required	Description	Default
inputTable	Yes	The name of the input table.	N/A
inputTablePartitions	No	The partitions of the input table to use for training. Specify partitions in the format `Partition_name=value`. For multi-level partitions, use the format `name1=value1/name2=value2;`. Separate multiple partitions with a comma (,).	All partitions in the input table.
outputTable	Yes	The output table for the scaled results.	N/A
scaleCols	Yes	The features to scale. The component automatically filters out sparse features. Only numeric features can be selected.	N/A
labelCol	No	The label column. If you specify this parameter, you can view the x-y distribution histogram of features against the target variable.	N/A
categoryCols	No	The columns to be treated as categorical features. These columns are not scaled.	""
scaleMethod	No	The scaling method. Valid values: log2 log10 ln abs sqrt	log2
scaleTopN	No	When the scaleCols parameter is not selected, the system automatically selects the TopN features to be scaled.	10
isSparse	No	Specifies whether the features are sparse and in the k:v format.	dense
itemSpliter	No	The delimiter for items in a sparse feature.	,
kvSpliter	No	The delimiter between a key and a value in a sparse feature item.	:
lifecycle	No	The lifecycle of the output table, in days.	7
coreNum	No	The number of nodes. The value must be a positive integer in the range of [1, 9999]. This parameter is used in conjunction with the memSizePerCore parameter.	Automatically allocated.
memSizePerCore	No	The memory size per core in MB. The value must be a positive integer in the range of [2048, 64 * 1024].	Automatically allocated.

Examples

Input data

Use the following SQL script to generate the input data.

create table if not exists pai_dense_10_1 as
select
    nr_employed
from bank_data limit 10;

Parameter settings

Select nr_employed as the feature to scale. Only numeric features are supported. For Scaling Function, select log2.

Results

nr_employed
12.352071021075528
12.34313018339218
12.285286613666395
12.316026916036957
12.309533196497519
12.352071021075528
12.316026916036957
12.316026916036957
12.309533196497519
12.316026916036957