模块度(Modularity)是一种评估社区网络结构的指标,用来评估社区内部连接相对于社区之间连接的紧密程度,通常模块度为0.3以上表示社区划分质量较为合适。Modularity组件能够输出图的模块度。
配置组件
方法一:可视化方式
在Designer工作流页面添加Modularity组件,并在界面右侧配置相关参数:
参数类型  | 参数  | 描述  | 
字段设置  | 源顶点列  | 边表的起点所在列。  | 
起始点标签列  | 边表起点的群组。  | |
目标顶点列  | 边表的终点所在列。  | |
目标点标签列  | 边表终点的群组。  | |
执行调优  | 进程数  | 作业并行执行的节点数。数字越大并行度越高,但是框架通讯开销会增大。  | 
进程内存  | 单个作业可使用的最大内存量,单位:MB,默认值为4096。 如果实际使用内存超过该值,会抛出  | 
方法二:PAI命令方式
使用PAI命令配置Modularity组件参数。您可以使用SQL脚本组件进行PAI命令调用,详情请参见场景4:在SQL脚本组件中执行PAI命令。
PAI -name Modularity
    -project algo_public
    -DinputEdgeTableName=Modularity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DfromGroupCol=group_out_id
    -DtoVertexCol=flow_in_id
    -DtoGroupCol=group_in_id
    -DoutputTableName=Modularity_func_test_result;参数  | 是否必选  | 默认值  | 描述  | 
inputEdgeTableName  | 是  | 无  | 输入边表名。  | 
inputEdgeTablePartitions  | 否  | 全表读入  | 输入边表的分区。  | 
fromVertexCol  | 是  | 无  | 输入边表的起点所在列。  | 
fromGroupCol  | 是  | 无  | 输入边表起点的群组。  | 
toVertexCol  | 是  | 无  | 输入边表的终点所在列。  | 
toGroupCol  | 是  | 无  | 输入边表终点的群组。  | 
outputTableName  | 是  | 无  | 输出表名。  | 
outputTablePartitions  | 否  | 无  | 输出表的分区。  | 
lifecycle  | 否  | 无  | 输出表的生命周期。  | 
workerNum  | 否  | 未设置  | 作业并行执行的节点数。数字越大并行度越高,但是框架通讯开销会增大。  | 
workerMem  | 否  | 4096  | 单个worker可使用的最大内存量,单位:MB,默认值为4096。 如果实际使用内存超过该值,会抛出  | 
splitSize  | 否  | 64  | 数据切分的大小,单位:MB。  | 
使用示例
步骤中SQL脚本组件均去勾选使用Script模式和是否由系统添加Create Table语句。
添加SQL脚本组件,输入以下SQL语句生成训练数据。
drop table if exists Modularity_func_test_edge; create table Modularity_func_test_edge as select * from ( select '1' as flow_out_id,'3' as group_out_id,'2' as flow_in_id,'3' as group_in_id union all select '1' as flow_out_id,'3' as group_out_id,'3' as flow_in_id,'3' as group_in_id union all select '1' as flow_out_id,'3' as group_out_id,'4' as flow_in_id,'3' as group_in_id union all select '2' as flow_out_id,'3' as group_out_id,'3' as flow_in_id,'3' as group_in_id union all select '2' as flow_out_id,'3' as group_out_id,'4' as flow_in_id,'3' as group_in_id union all select '3' as flow_out_id,'3' as group_out_id,'4' as flow_in_id,'3' as group_in_id union all select '4' as flow_out_id,'3' as group_out_id,'6' as flow_in_id,'7' as group_in_id union all select '5' as flow_out_id,'7' as group_out_id,'6' as flow_in_id,'7' as group_in_id union all select '5' as flow_out_id,'7' as group_out_id,'7' as flow_in_id,'7' as group_in_id union all select '5' as flow_out_id,'7' as group_out_id,'8' as flow_in_id,'7' as group_in_id union all select '6' as flow_out_id,'7' as group_out_id,'7' as flow_in_id,'7' as group_in_id union all select '6' as flow_out_id,'7' as group_out_id,'8' as flow_in_id,'7' as group_in_id union all select '7' as flow_out_id,'7' as group_out_id,'8' as flow_in_id,'7' as group_in_id )tmp ;对应的数据结构图:

添加SQL脚本组件,输入以下PAI命令进行训练,并和步骤 1中添加的组件连线。
drop table if exists ${o1}; PAI -name Modularity -project algo_public -DinputEdgeTableName=Modularity_func_test_edge -DfromVertexCol=flow_out_id -DfromGroupCol=group_out_id -DtoVertexCol=flow_in_id -DtoGroupCol=group_in_id -DoutputTableName=${o1};运行此工作流。运行完成后右击步骤 2中添加的组件,选择查看数据 > SQL脚本的输出,查看训练结果。
| val | | ------------------- | | 0.42307692766189575 |