Introduction to partition filter expression

更新时间:
复制 MD 格式

Dataphin Data Quality partition filter expressions control which partitions quality rules validate, based on the data production date. The system provides built-in expressions and supports custom expressions for flexible partition matching.

Usage recommendations

  • If you do not need to specify validation partitions and want to validate whichever partitions your code updates, use the code check trigger schedule - task update partition. No partition filter expression is required because the system automatically identifies updated partitions.

    Note: The task update partition pattern does not support triggering volatility validation rules that require specific partitions, such as partition size, partition row count, or field statistics.

  • For cases where the trigger date is unrestricted and you need to validate specific partitions on new arrivals or daily, use the business date ds='${yyyyMMdd}', execution date ds='$[yyyyMMdd]', or monthly ds='${yyyyMM}' partition directly.

  • To validate on specific dates such as the first or last day of the month or business days, enable conditional scheduling. The system evaluates public calendar conditions (month, week, day, business day, holiday, tag) and task type (timed or manual scheduling) to determine whether to run validation.

Format of partition filter expression

The syntax is:

column1=${FORMAT[OPERATE NUM DELTA]} and columnN=$[FORMAT[OPERATE NUM DELTA]];

Example:

ds=${yyyyMMdd-1M} and city='cn-hangzhou' and hour=${HH:mm:ss};

Format Description:

  • ${expression}:

    • Calculates based on the business date of data production. For example, data produced on 20220101 uses that date for calculation.

    • Basic expressions such as ${yyyy}, ${MM}, ${dd} can be combined. For example, ds=${yyyy}-${MM}-${dd} yields the partition condition ds=2022-01-01 for data produced on 20220101. OPERATE can be applied to each basic expression.

  • $[expression]:

    • Calculates based on the current execution date, suitable for near real-time tasks at the hour and minute level. For example, if the execution date is 20220101, the calculation uses that date.

    • Basic expressions such as $[yyyy], $[MM], $[dd] can be concatenated. For example, ds=$[yyyy]-$[MM]-$[dd] produces the partition condition ds=2022-01-01 for data produced on 20220101. OPERATE can be applied to each basic expression.

  • column1……columnN:

    • Partition fields. Multi-level partitions can be linked with "and" to create compound partition filter expressions. Multi-level "or" expressions are not supported.

  • FORMAT: The partition format string, which supports the following format strings:

    • yyyy: Year.

    • MM: Month.

    • dd: Day.

    • ld: Last day of the month.

    • HH: 24-hour format.

    • hh: 12-hour format.

    • mm: Minute.

    • ss: Second.

  • OPERATE: The operator, supporting the following:

    • +: Increase. For example, +1y means adding one year, +1M means adding one month.

    • -: Decrease. For example, -1y means subtracting one year, -1M means subtracting one month.

    • ~: Specify. For example, ~3M means specifying the third month, ~1q means specifying the last month of the first quarter.

    • %: Modulus. For example, %1m means resetting the month to zero, %1h means resetting the hour to zero.

  • NUM: The number used to control the quantity of DELTA.

  • DELTA: The identifier, supporting the following:

    Identifier

    Meaning

    Type

    Example

    y

    Year

    Number

    2009

    M

    Month in year

    Text & Number

    July & 07

    d

    Day in month

    Number

    10

    h

    Hour in am/pm (1-12)

    Number

    12

    H

    Hour in day (0-23)

    Number

    0

    m

    Minute in hour

    Number

    30

    s

    Second in minute

    Number

    55

    S

    Millisecond

    Number

    978

    E

    Day in week

    Text

    • 1 (Sunday)

    • 2 (Monday)

    • 3 (Tuesday)

    • 4 (Wednesday)

    • 5 (Thursday)

    • 6 (Friday)

    • 7 (Saturday)

    D

    Day in year

    Number

    189

    F

    Day of week in month

    Number

    2 (2nd Wed in July)

    w

    Week in year

    Number

    27

    W

    Week in month

    Number

    2

    a

    AM/PM marker

    Text

    PM

    k

    Hour in day (1-24)

    Number

    24

    K

    Hour in am/pm (0-11)

    Number

    0

    z

    Time zone

    Text

    Pacific Standard Time

    Q

    Quarter, first month

    Number

    Month=1,4,7,10

    q

    Quarter, last month

    Number

    Month=3,6,9,12

Calculation logic of partition filter expression

Calculation example 1: Date expression, execution condition, and trigger condition description

ds=${yyyyMMld-1d};

When partition data for May is produced, the calculation proceeds as follows:

  1. Step 1: Calculate yyyyMMld, which is the last day of the given month. For May, this is the 31st, resulting in: 20220531.

  2. Step 2: Apply the DELTA, which in this case is subtracting one day. Therefore, 20220531 minus one day is 20220530.

    The final result is 20220530.

Calculation example 2:

ds=$[yyyyMMdd-1M];

With $[], the calculation uses the execution date instead of the business date. Suppose today is May 3, 2022, and data for April 3 is being backfilled:

  • Step 1: Calculate yyyyMMdd. Based on the current execution time of May 3, the result is 20220503.

  • Step 2: Apply the DELTA, which is -1M, resulting in the same day of the previous month. The result is 20220403.

When backfilling data for April 3, the rule will be triggered. If backfilling data for May 3 or March 3, the rule will not be triggered. $[] is primarily used for execution-date-based partition expressions.

The rule triggers when backfilling data for April 3, but not for May 3 or March 3. The $[] syntax is used for partition expressions based on the execution date.

Types of built-in partition filter expressions

  • Business Date (Time): Calculated based on the business date of data production. For example, data produced on 20220101 uses that date for calculation.

  • Execution Date (Time): Uses the task's scheduled execution time from the trigger schedule (the scheduled time applies even if actual execution is delayed by upstream dependencies or resource issues); timed schedules use the scheduled time T.

Partition Filter Expression

New Name

ds=${yyyyMMdd}

Business Date

ds=${yyyyMMdd} and hour=${HH}

Business Date Time

ds=${HHmmss}

Business Time

ds=$[yyyyMMdd]

Execution Date

ds=$[yyyyMMdd] and hour $[HH]

Execution Date Time

ds=$[HHmmss]

Execution Time

ds=$[yyyyMMdd - 1d]

Day Before Execution Date

full table

Full Table Scan

Common business examples

Produce last month's salary on a specific day each month

For example, if salaries are issued on the 15th of each month and the partition field is formatted as yyyyMM, configure the following:

  • Schedule condition configuration: Date - belongs to - 15th.

    image.png

  • Partition filter expression configuration:

ds=$[yyyyMM-1M];

Produce this month's salary on the last day of each month

If salaries are distributed on the last day of each month and the partition field is formatted as yyyyMM, configure the following:

  • Schedule condition configuration: Date - belongs to - last day of the month.

    image.png

  • Partition filter expression configuration:

ds=$[yyyyMM];

Minute-level task checks at the last minute of each hour

For minute-level tasks with the format yyyyMMdd HH:mm, the last minute of each hour is the 59th minute. Use the following partition filter expression: ds=$[yyyyMMdd HH:59];