在无向图G中,计算每一个节点周围的稠密度,星状网络稠密度为0,全联通网络稠密度为1。本文为您介绍PAI-Designer(原PAI-Studio)提供的点聚类系数组件。

PAI-Designer(原PAI-Studio)支持通过可视化或PAI命令方式,配置点聚类系数的参数。

可视化方式

页签参数描述
IO/字段设置起始节点边表的起点所在列。
终止节点边表的终点所在列。
参数设置最大节点度如果节点度大于该值,则进行抽样。默认为500,选填。
执行调优进程数量作业并行执行的节点数。数字越大并行度越高,但是框架通讯开销会增大。
进程内存单个作业可使用的最大内存量。系统默认为每个作业分配4096 MB内存,实际使用内存超过该值,会抛出OutOfMemory异常。
数据切分大小数据切分的大小,默认为64。

PAI命令方式

PAI -name NodeDensity
    -project algo_public
    -DinputEdgeTableName=NodeDensity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=NodeDensity_func_test_result
    -DmaxEdgeCnt=500;
参数是否必选描述默认值
inputEdgeTableName输入边表名。
inputEdgeTablePartitions输入边表的分区。全表读入
fromVertexCol输入边表的起点所在列。
toVertexCol输入边表的终点所在列。
outputTableName输出表名。
outputTablePartitions输出表的分区。
lifecycle输出表的生命周期。
maxEdgeCnt如果节点度大于该值,则进行抽样。500
workerNum作业并行执行的节点数。数字越大并行度越高,但是框架通讯开销会增大。未设置
workerMem单个作业可使用的最大内存量。系统默认为每个作业分配4096 MB内存,实际使用内存超过该值,会抛出OutOfMemory异常。4096
splitSize数据切分大小。64

使用示例

  1. 生成训练数据。
    drop table if exists NodeDensity_func_test_edge;
    create table NodeDensity_func_test_edge as
    select * from
    (
      select '1' as flow_out_id, '2' as flow_in_id from dual
      union all
      select '1' as flow_out_id, '3' as flow_in_id from dual
      union all
      select '1' as flow_out_id, '4' as flow_in_id from dual
      union all
      select '1' as flow_out_id, '5' as flow_in_id from dual
      union all
      select '1' as flow_out_id, '6' as flow_in_id from dual
      union all
      select '2' as flow_out_id, '3' as flow_in_id from dual
      union all
      select '3' as flow_out_id, '4' as flow_in_id from dual
      union all
      select '4' as flow_out_id, '5' as flow_in_id from dual
      union all
      select '5' as flow_out_id, '6' as flow_in_id from dual
      union all
      select '5' as flow_out_id, '7' as flow_in_id from dual
      union all
      select '6' as flow_out_id, '7' as flow_in_id from dual
    )tmp;
    drop table if exists NodeDensity_func_test_result;
    create table NodeDensity_func_test_result
    (
      node string,
      node_cnt bigint,
      edge_cnt bigint,
      density double,
      log_density double
    );
    对应的图结构如下图所示。点聚类系数图结构
  2. 查看训练结果。
    +------+----------+----------+---------+-------------+
    | node | node_cnt | edge_cnt | density | log_density |
    +------+----------+----------+---------+-------------+
    | 1    | 5        | 4        | 0.4     | 1.45657     |         
    | 2    | 2        | 1        | 1.0     | 1.24696     |
    | 3    | 3        | 2        | 0.66667 | 1.35204     |
    | 4    | 3        | 2        | 0.66667 | 1.35204     |
    | 5    | 4        | 3        | 0.5     | 1.41189     |
    | 6    | 3        | 2        | 0.66667 | 1.35204     |
    | 7    | 2        | 1        | 1.0     | 1.24696     |
    +------+----------+----------+---------+-------------|