您可以使用空值规则来对数据中某个字段的空值情况进行校验。
配置示例
{
"datasets": [
{
"type": "Table",
"tables": [
"tb_d_spec_demo"
],
"filter": "dt=$[yyyymmdd]/hh=$[hh24-1/24]",
"dataSource": {
"name": "odps_first",
"envType": "Dev"
}
}
],
"rules": [
{
"missing_count(birthday) = 0"
}, {
"missing_percent(gender) < 5%"
}, {
"missing_count(first_name) = 0",
"missing" : {
"regex": "(?:N/A)"
}
}, {
"missing_count(first_name) = 0",
"missing" : {
"values": [
"n/a",
"NA",
"none"
]
}
}, {
"missing_percent(email_address) = 0%"
}
],
"computeResource": {
"id": 2001
}
}
定义空值规则
执行空值规则校验时,系统会扫描数据中的指定字段,统计出指定字段值为空的数据行数或者数据行数占比。
系统会将NULL值当作空值统计。
可以通过
missing_percent
来校验空值行数的占比。
示例一:校验birthday
字段为NULL
的数据。
{
"datasets": [
{
"type": "Table",
"tables": [
"tb_d_spec_demo"
],
"filter": "dt=$[yyyymmdd]/hh=$[hh24-1/24]",
"dataSource": {
"name": "odps_first",
"envType": "Dev"
}
}
],
"rules": [
{
"missing_count(birthday) = 0"
}
],
"computeResource": {
"id": 2001
}
}
示例二:定义number_employees
字段为NULL
的行数不能超过监控总行数的5%。
{
"datasets": [
{
"type": "Table",
"tables": [
"tb_d_spec_demo"
],
"filter": "dt=$[yyyymmdd]/hh=$[hh24-1/24]",
"dataSource": {
"name": "odps_first",
"envType": "Dev"
}
}
],
"rules": [
{
"missing_percent(number_employees) < 5%"
}
],
"computeResource": {
"id": 2001
}
}
保留问题数据
如果规则的collectFailedRows
设置为true
,则在校验不通过(状态为warn
或者fail
)时,会将指定字段为NULL的数据自动保存到问题数据表中供后续问题排查使用。
{
"datasets": [
{
"type": "Table",
"tables": [
"tb_d_spec_demo"
],
"filter": "dt=$[yyyymmdd]/hh=$[hh24-1/24]",
"dataSource": {
"name": "odps_first",
"envType": "Dev"
}
}
],
"rules": [
{
"missing_percent(number_employees) < 5%",
"collectFailedRows": true
}
],
"computeResource": {
"id": 2001
}
}
该文章对您有帮助吗?