您可以使用数值类规则来对数据中的数值类数据进行校验。
配置示例
{
"datasets": [
{
"type": "Table",
"tables": [
"tb_d_spec_demo"
],
"filter": "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'",
"dataSource": {
"name": "odps_first",
"envType": "Dev"
}
}
],
"rules": [
{
"avg(size) between 100 and 300"
}, {
"duplicate_count(product_id) = 0"
}, {
"max(size) <= 500"
}, {
"min(size) >= 50"
}, {
"row_count > 0"
}, {
"sum(discount) < 120"
}
],
"computeResource": {
"id": 2001
}
}
问题数据保留
对于duplicate_count
、duplicate_percent
、distinct_count
、distinct_percent
四种指标,可以开启问题数据保留。
{
"datasets": [
{
"type": "Table",
"tables": [
"tb_d_spec_demo"
],
"filter": "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'",
"dataSource": {
"name": "odps_first",
"envType": "Dev"
}
}
],
"rules": [
{
"duplicate_percent(number_employees) < 5%",
"collectFailedRows": true
}
],
"computeResource": {
"id": 2001
}
}
指标列表
avg:字段均值
row_count:数据行数
sum:字段加和值
min:字段最小值
max:字段最大值
distinct_count:唯一值个数
distinct_percent:唯一值个数/数据行数 % 100%
table_size:数据存储大小
duplicate_count:重复值数据行数
duplicate_percent:重复值数据行数 / 数据行数 % 100%
该文章对您有帮助吗?