特征配置
在 rank 阶段,需要调用打分模型服务。在这之前,需要从特征存储源里获取到 user 或者 item 的特征数据。在某些情况下,获取到的特征数据还需要进一步做处理,比如特征工程,根据已有的特征生成新的特征,根据现有特征进行组合等,这些需要 FeatureOp 实现。
Hologres
通过一个具体的例子,来详细的说明。
下面的配置中,live_feed 指的是场景名称。FeatureConfs 支持配置多个场景信息,在例子中,只配置了一个场景。
FeatureLoadConfs 定义了具体的特征获取逻辑,本身是一个列表,可以配置多个特征获取步骤。每个步骤又包括获取逻辑和特征处理逻辑。FeatureDaoConf 定义特征获取逻辑,Features 定义特征变换逻辑。
FeatureDaoConf 提供了配置存在哪里,以及获取具体的获取逻辑。
AdapterType 数据源类型,目前支持 hologres, redis, tablestore
HologresName 配置的 hologres 名称,名称可以从 HologresConfs 找到
FeatureKey 根据哪个值,去从表里查找特征数据。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。比如,
user:uid
获取 user 的 uid 属性值,item:pair_id
获取 item 的 pair_id 的属性值UserFeatureKeyName 表里的字段名称,这个字段的值就是 FeatureKey
HologresTableName 表名称
UserSelectFields select 获取的字段
ItemSelectFields 获取 item 特征的字段列表, * 说明获取表的所有字段
FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段,Properties, 这个 Properties 是个 map, 用来存储特征数据
获取 user 特征可以这样理解
SELECT ${UserSelectFields} FROM ${HologresTableName} WHERE ${UserFeatureKeyName} = ${FeatureKey} 。
这里的 UserSelectFields , ItemSelectFields 是根据样本的特征来的。在调用模型之前,需要构造出和样本数据一样的特征。
在获取特征的时候,是根据现有的数据来获取的,一般是 uid 或者 itemid, 分别这样表示 user:uid 和 item:id。 如果根据其他的字段,那么字段必须存在 Properties 字段中。像示例中的 item:pair_id 和 item:matchmaker_id, pair_id 和 matchmaker_id 是存在 item 的 Properties 中的。
AsynLoadFeature = true 代码异步并发调用 多个 FeatureLoadConfs, 可以减少 RT 耗时。如果 FeatureLoadConfs 的获取逻辑独立,AsynLoadFeature 变为 true 可以提高获取性能。
"FeatureConfs" :{
"live_feed" :{
"AsynLoadFeature" : true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "holo-pai",
"FeatureKey": "user:uid",
"UserFeatureKeyName" :"uid",
"HologresTableName": "recom_user_features_processed_holo_online",
"UserSelectFields":"rids_count,sex,alladdfriendnum,allpayrosenum,getgiftnum7d,friendnum7d,talknum7d,start_age,end_age,start_height,end_height,lowest_education,lowest_salary,height,wealth,age,living_condition,education,headstatus,marriage,professionid,provinceid,role,salary,socialtag,facevalue",
"FeatureStore":"user"
},
"Features" :[
]
},
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "holo-pai",
"ItemFeatureKeyName" :"uid",
"FeatureKey": "item:pair_id",
"HologresTableName": "recom_user_features_processed_holo_online",
"ItemSelectFields":"uid, rids_count as rids2_count,sex as guestsex,alladdfriendnum as alladdfriendnum2,allpayrosenum as allpayrosenum2, getgiftnum7d as getgiftnum7d2,friendnum7d as friendnum7d2,talknum7d as talknum7d2,start_age as start_age2,end_age as end_age2,start_height as start_height2,end_height as end_height2,lowest_education as lowest_education2,lowest_salary as lowest_salary2,height as height2,wealth as wealth2,age as age2,living_condition as living_condition2,education as education2,headstatus as headstatus2,marriage as marriage2,professionid as professionid2,provinceid as provinceid2,role as role2,salary as salary2,socialtag as socialtag2,facevalue as facevalue2",
"FeatureStore":"item"
},
"Features" :[
]
},
{
"FeatureDaoConf": {
"AdapterType": "hologres",
"HologresName": "holo-pai",
"ItemFeatureKeyName" :"cupid_id",
"FeatureKey": "item:matchmaker_id",
"HologresTableName": "recom_red_features_processed_holo_online",
"ItemSelectFields":"cupid_id, cupid_id as redid,sex as redsex,role as role1,good_num as good_num1,mid_num as mid_num1,bad_num as bad_num1,total_access as total_access1,duration as duration1,jubaohongniangshu as jubaohongniangshu1,jubaohongniangzongshu as jubaohongniangzongshu1",
"FeatureStore":"item"
},
"Features" :[
]
}
]
}
}
注意:在获取 item 的特征时候,ItemSelectFields 的第一个值一定是 ItemFeatureKeyName 配置的值。item 是个列表,在并发获取数据的时候,是通过 ItemFeatureKeyName 来找到匹配关系的。
在获取到特征之后,还可以对特征进行进一步处理,生成新特征或者特征的预处理等。
"Features":[
{
"FeatureType":"raw_feature",
"FeatureName":"article_id",
"FeatureSource":"item:id",
"FeatureStore":"item"
},
{
"FeatureType":"raw_feature",
"FeatureName":"item_elapse_time",
"FeatureSource":"item:item_ctime",
"Normalizer":"time_ln",
"RemoveFeatureSource":true,
"FeatureStore":"item"
}
]
FeatureType FeatureOp 名称,raw_feature 指的是从原始特征直接生成第三方特征
FeatureName 新生成的特征名称
FeatureSource 原始特征
FeatureStore 新生成的特征存到哪个对象里,user 或者 item
Normalizer 可选,归一化处理操作名称,目前只有 time_ln
RemoveFeatureSource 可选,是否删除原始特征
Redis
再来看个 redis 的例子
redis 中存储特征使用 key value 形式,value 形式为 "key1:value1,key2:value2" 。
AdapterType 数据源为 redis
RedisName redis 数据源名称,可以从 RedisConfs 里找到
RedisPrefix key 前缀
FeatureKey 根据哪个值,去构造 redis key 值。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。比如,
user:uid
获取 user 的 uid 属性值,item:id
获取 item 的 id 的属性值FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据
"FeatureConfs" :{
"home_feed" :{
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "redis",
"RedisName": "user_redis",
"RedisPrefix": "UF_V2_",
"FeatureKey": "user:uid",
"FeatureStore":"user"
},
"Features" :[]
},
{
"FeatureDaoConf": {
"AdapterType": "redis",
"RedisName": "item_redis",
"RedisPrefix": "IF_V2_FM_",
"FeatureKey": "item:id",
"FeatureStore":"item"
},
"Features" :[
{
"FeatureType": "raw_feature",
"FeatureName" : "article_id",
"FeatureSource" : "item:id",
"FeatureStore":"item"
},
{
"FeatureType": "raw_feature",
"FeatureName" : "item_elapse_time",
"FeatureSource" : "item:item_ctime",
"Normalizer": "time_ln",
"RemoveFeatureSource" : true,
"FeatureStore":"item"
}
]
}
]
}
}
在上面的例子中,还有 Features 的变化操作。
FeatureType FeatureOp 名称,raw_feature 指的是从原始特征直接生成第三方特征
FeatureName 新生成的特征名称
FeatureSource 原始特征
FeatureStore 新生成的特征存到哪个对象里,user 或者 item
Normalizer 可选,归一化处理操作名称
RemoveFeatureSource 可选,是否删除原始特征
OTS(tablestore)
OTS 的配置与 Hologres 类似, 样例配置如下:
"FeatureConfs" :{
"home_feed" :{
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "tablestore",
"TableStoreName": "",
"FeatureKey": "user:uid",
"UserFeatureKeyName" :"uid",
"TableStoreTableName" : "",
"UserSelectFields":"",
"FeatureStore":"user"
},
"Features" :[]
},
{
"FeatureDaoConf": {
"AdapterType": "tablestore",
"TableStoreName": "",
"FeatureKey": "item:id",
"ItemFeatureKeyName" :"item_id",
"TableStoreTableName" : "",
"ItemSelectFields":"",
"FeatureStore":"item"
},
"Features" :[]
}
]
}
}
AdapterType 数据源类型,此处固定值 tablestore
TableStoreName 配置的 tablestore 名称,名称可以从 TableStoreConfs 找到
FeatureKey 根据哪个值,去从表里查找特征数据。FeatureKey 标明查找的值来自于 user 或者 item 的哪个字段。
UserFeatureKeyName 表里的字段名称,这个字段的值就是 FeatureKey
TableStoreTableName 表名称
UserSelectFields 获取 user 特征的字段列表, * 说明获取表的所有字段。 FeatureStore = user 必填
ItemSelectFields 获取 item 特征的字段列表, * 说明获取表的所有字段。FeatureStore = item 必填
FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据
FeatureStore(特征平台)¶
{
"FeatureConfs" :{
"rank_v1": {
"AsynLoadFeature": true,
"FeatureLoadConfs": [
{
"FeatureDaoConf": {
"AdapterType": "featurestore",
"FeatureStoreName": "pairec-fs",
"FeatureKey": "user:uid",
"FeatureStoreModelName": "rank_v1",
"FeatureStoreEntityName": "user",
"FeatureStore": "user"
}
}
]
}
}
}
AdapterType 数据源类型,此处固定值 featurestore
FeatureStoreName 配置的 featurestore 名称,名称可以从 FeatureStoreConfs 找到
FeatureKey 根据哪个值,去特征平台获取特征数据。 是设置成 FeatureStoreEntityName 的 join_id 的值去查找数据
FeatureStoreModelName 特征平台的 model 名称
FeatureStoreEntityName 特征平台的实体名称
FeatureStore 特征获取到之后,存储到哪里。user 和 item 都有属性的字段 Properties, 这个 Properties 是个 map, 用来存储特征数据。