数值类规则

您可以使用数值类规则来对数据中的数值类数据进行校验。

配置示例

{
  "datasets": [
    {
      "type": "Table",
      "tables": [
        "tb_d_spec_demo"
      ],
      "filter": "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'",
      "dataSource": {
        "name": "odps_first", 
        "envType": "Dev"
      }
    }
  ],
  "rules": [
    {
      "avg(size) between 100 and 300"
    }, {
      "duplicate_count(product_id) = 0"
    }, {
      "max(size) <= 500"
    }, {
      "min(size) >= 50"
    }, {
      "row_count > 0"
    }, {
      "sum(discount) < 120"
    }
  ],
  "computeResource": {
    "id": 2001
  }
}

问题数据保留

对于duplicate_countduplicate_percentdistinct_countdistinct_percent四种指标,可以开启问题数据保留。

{
  "datasets": [
    {
      "type": "Table",
      "tables": [
        "tb_d_spec_demo"
      ],
      "filter": "dt='$[yyyymmdd]' AND hh='$[hh24-1/24]'",
      "dataSource": {
        "name": "odps_first", 
        "envType": "Dev"
      }
    }
  ],
  "rules": [
    {
      "duplicate_percent(number_employees) < 5%",
      "collectFailedRows": true
    }
  ],
  "computeResource": {
    "id": 2001
  }
}

指标列表

  • avg:字段均值

  • row_count:数据行数

  • sum:字段加和值

  • min:字段最小值

  • max:字段最大值

  • distinct_count:唯一值个数

  • distinct_percent:唯一值个数/数据行数 % 100%

  • table_size:数据存储大小

  • duplicate_count:重复值数据行数

  • duplicate_percent:重复值数据行数 / 数据行数 % 100%