边聚类系数算法是指在无向图G中,计算每一条边周围的稠密度。本文为您介绍PAI-Studio提供的边聚类系数组件。

PAI-Studio支持通过可视化或PAI命令方式,配置边聚类系数组件的参数。

可视化方式

页签 参数 描述
IO/字段设置 起始节点 边表的起点所在列。
结束节点 边表的终点所在列。
执行调优 进程数量 作业并行执行的节点数。数字越大并行度越高,但框架通讯开销框架通讯开销是什么意思会增大。
进程内存 单个作业可使用的最大内存量。系统默认为每个作业分配4096 MB内存,实际使用内存超过该值,会抛出OutOfMemory异常。
数据切分大小 数据切分的大小,默认为64。

PAI命令方式

PAI -name EdgeDensity
    -project algo_public
    -DinputEdgeTableName=EdgeDensity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=EdgeDensity_func_test_result;
参数 是否必选 描述 默认值
inputEdgeTableName 输入边表名。
inputEdgeTablePartitions 输入边表的分区。 全表读入
fromVertexCol 输入边表的起点所在列。
toVertexCol 输入边表的终点所在列。
outputTableName 输出表名。
outputTablePartitions 输出表的分区。
lifecycle 输出表的生命周期。
workerNum 作业并行执行的节点数。数字越大并行度越高,但框架通讯开销框架通讯开销是什么意思会增大。 未设置
workerMem 单个作业可使用的最大内存量。系统默认为每个作业分配4096 MB内存,实际使用内存超过该值,会抛出OutOfMemory异常。 4096
splitSize 数据切分大小。 64

使用示例

  1. 生成训练数据。
    drop table if exists EdgeDensity_func_test_edge;
    create table EdgeDensity_func_test_edge as
    select * from
    (
      select '1' as flow_out_id,'2' as flow_in_id from dual
      union all
      select '1' as flow_out_id,'3' as flow_in_id from dual
      union all
      select '1' as flow_out_id,'5' as flow_in_id from dual
      union all
      select '1' as flow_out_id,'7' as flow_in_id from dual
      union all
      select '2' as flow_out_id,'5' as flow_in_id from dual
      union all
      select '2' as flow_out_id,'4' as flow_in_id from dual
      union all
      select '2' as flow_out_id,'3' as flow_in_id from dual
      union all
      select '3' as flow_out_id,'5' as flow_in_id from dual
      union all
      select '3' as flow_out_id,'4' as flow_in_id from dual
      union all
      select '4' as flow_out_id,'5' as flow_in_id from dual
      union all
      select '4' as flow_out_id,'8' as flow_in_id from dual
      union all
      select '5' as flow_out_id,'6' as flow_in_id from dual
      union all
      select '5' as flow_out_id,'7' as flow_in_id from dual
      union all
      select '5' as flow_out_id,'8' as flow_in_id from dual
      union all
      select '7' as flow_out_id,'6' as flow_in_id from dual
      union all
      select '6' as flow_out_id,'8' as flow_in_id from dual
    )tmp;
    drop table if exists EdgeDensity_func_test_result;
    create table EdgeDensity_func_test_result
    (
      node1 string,
      node2 string,
      node1_edge_cnt bigint,
      node2_edge_cnt bigint,
      triangle_cnt bigint,
      density double
    );
    对应的图结构如下图所示。边聚类系数图结构
  2. 查看训练结果。
    1,2,4,4,2,0.5
    2,3,4,4,3,0.75
    2,5,4,7,3,0.75
    3,1,4,4,2,0.5
    3,4,4,4,2,0.5
    4,2,4,4,2,0.5
    4,5,4,7,3,0.75
    5,1,7,4,3,0.75
    5,3,7,4,3,0.75
    5,6,7,3,2,0.66667
    5,8,7,3,2,0.66667
    6,7,3,3,1,0.33333
    7,1,3,4,1,0.33333
    7,5,3,7,2,0.66667
    8,4,3,4,1,0.33333
    8,6,3,3,1,0.33333