Singular value decomposition

更新时间:
复制 MD 格式

Singular value decomposition (SVD) is a matrix factorization technique in linear algebra — a generalization of the diagonalization of normal matrices — that decomposes a matrix X into three components: X = U S V'. SVD is widely used in signal processing and statistics.

How it works

SVD factorizes an input matrix X (m rows × n columns) into three output matrices:

Output Dimensions Description
U m × sgNum Left singular vectors (unitary matrix)
S sgNum × sgNum Diagonal matrix of singular values (scattering matrix)
V n × sgNum Right singular vectors (V matrix)

Here, sgNum is the number of singular values actually computed (which may be less than the requested k), m is the number of rows in the input table, and n is the number of columns.

Configure the component

Use the Machine Learning Platform for AI console

Tab Parameter Description
Fields setting Feature columns Columns storing key-value pairs. Separate keys from values with a colon (:), and separate multiple key-value pairs with a comma (,).
Parameters setting Number of reserved singular values The top N singular values to compute. Computes all singular values by default.
Accuracy error The allowed error precision for convergence.
Tuning Memory size per node Memory allocated to each node, in MB. Must be used with Number of nodes. Valid values: 1–9999 (positive integer).
Number of nodes Number of compute nodes. Valid values: 1024–64 × 1024 (positive integer).
Lifetime Lifecycle of the output table, in days.

Use commands

Submit the SVD job from the command line:

PAI -name svd
    -project algo_public
    -DinputTableName=bank_data
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;

Input parameters

Parameter Required Default Description
inputTableName Yes Input table used for training.
selectedColNames No All columns Comma-separated list of columns to include. Use STRING columns for sparse input; use INT or DOUBLE columns for dense input.
inputTablePartitions No All partitions Partitions to read from the input table. Format: partition_name=value. For multi-level partitions: name1=value1/name2=value2;. Separate multiple partitions with commas.
enableSparse No false Set to true if the input data is in sparse key-value format.
itemDelimiter No Space Delimiter between key-value pairs in sparse format.
kvDelimiter No : Delimiter between keys and values in sparse format.
k Yes Number of singular values to compute. The actual number returned may be less than k.
tol No 1.0e-06 Convergence error threshold.

Output parameters

Parameter Required Description
outputUTableName Yes Output table for the U matrix (m × sgNum).
outputSTableName Yes Output table for the S matrix (sgNum × sgNum).
outputVTableName Yes Output table for the V matrix (n × sgNum).

Resource parameters

Parameter Required Default Description
coreNum No System default Number of cores. Must be used with memSizePerCore. Valid values: 1–9999 (positive integer).
memSizePerCore No System default Memory per core, in MB. Must be used with coreNum. Valid values: 1024–64 × 1024 (positive integer).
lifecycle No Lifecycle of the output table, in days (positive integer).

Example

This example runs SVD on a sparse input table with six rows and up to 100,000 columns, computing the top 5 singular values.

Step 1: Create the input table.

DROP TABLE IF EXISTS svd_test_input;
CREATE TABLE svd_test_input
AS
SELECT *
FROM
(
  SELECT '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' AS col0
  UNION ALL
  SELECT '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' AS col0
  UNION ALL
  SELECT '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' AS col0
  UNION ALL
  SELECT '2:0.767 5:0.01891 8:0.25235' AS col0
  UNION ALL
  SELECT '0:0.29819 2:0.87598086 6:0.5315568' AS col0
  UNION ALL
  SELECT '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' AS col0
) a;

Step 2: Run the SVD job.

PAI -name svd
    -project algo_public
    -DinputTableName=svd_test_input
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;