Configure job resources

更新时间:
复制 MD 格式

You can configure resources for a job before it starts or modify them while it is running. This topic describes how to configure job resources and the parameters for each mode.

Precautions

After you configure resources, you must restart the job for the changes to take effect.

Procedure

  1. Go to the resource configuration page.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. In the Actions column of the target workspace, click Console.

    3. On the O&M > Deployments page, click the name of the target job.

    4. On the Configuration tab, click Edit on the right side of the Resources section.

  2. Modify the job resource information.

    Resource mode

    Description

    Details

    Basic mode

    In basic mode, you specify the total resources (CPU and total JVM memory) for each TaskManager. The system then evenly distributes these resources among all slots based on the taskmanager.numberOfTaskSlots setting. This mode is sufficient for most simple jobs.

    Basic mode (coarse-grained)

    Expert mode

    In expert mode, you configure the resources for each Slot Sharing Group (SSG). Flink then calculates the required specifications for each slot and dynamically requests matching TaskManagers and slots from the resource pool. For complex jobs where coarse-grained allocation may cause low resource utilization, you can use fine-grained resource control to tune each operator. This improves resource utilization and helps meet throughput requirements.

    Note

    Expert mode is supported only for SQL jobs.

    Expert mode (fine-grained)

    For more information about concepts such as TaskManager, JobManager, Task, and slot, see the Apache Flink Architecture documentation.

  3. Click Save.

  4. Restart the job.

Basic mode (coarse-grained)

Parameter

Description

Parallelism

The overall parallelism of the job.

JobManager CPU

For stable operation, a JobManager requires at least 0.5 cores and 2 GiB of memory. We recommend 1 core and 4 GiB. The maximum value is 16 cores.

JobManager Memory

The value ranges from 2 to 64 GiB.

TaskManager CPU

For stable operation, a TaskManager requires at least 0.5 cores and 2 GiB of memory. We recommend 1 core and 4 GiB. The maximum value is 16 cores.

TaskManager Memory

The value ranges from 2 to 64 GiB.

Slots per TaskManager

Specify the number of slots for each TaskManager.

You can use the following formulas to calculate resource requirements:

  • Number of CUs = MAX(Total CPU for JobManager and TaskManagers, Total memory for JobManager and TaskManagers / 4)

  • Actual number of TaskManagers = ceil(Parallelism / Slots per TaskManager)

  • Actual slots per TaskManager = Parallelism / Actual number of TaskManagers

Note
  • Round division results up to the nearest integer.

  • Resource configurations cannot exceed the default maximum limits. To request an increase to these limits, submit a ticket.

  • You can also set the numberOfTaskSlots parameter in the Other Configurations field within the Running Parameters Configuration section on the job's Configuration tab. This setting has the same effect as the Slots per TaskManager field but takes precedence.

For example, assume you set the parallelism to 12 and the slots per TaskManager to 4.

In this example, JobManager CPU is 2 cores, JobManager Memory is 4 GiB, TaskManager CPU is 2 cores, and TaskManager Memory is 4 GiB.

In the Realtime Compute for Apache Flink console, the actual number of TaskManagers is 3, and each TaskManager has 4 slots.

The actual number of TaskManagers and slots per TaskManager are calculated as follows:

  1. Actual number of TaskManagers = ceil(Configured parallelism / Configured slots per TaskManager) = ceil(12 / 4) = 3.

  2. Actual slots per TaskManager = Parallelism / Actual number of TaskManagers = 12 / 3 = 4.

Expert mode (fine-grained)

Note
  • Expert mode is supported only for SQL jobs.

  • If you modify the SQL or resource configuration after a job is deployed, you must fetch the resource plan graph again to ensure the job starts properly.

Configure basic resources

Parameter

Description

JobManager CPU

For stable operation, a JobManager requires at least 0.5 cores and 2 GiB of memory.

JobManager Memory

Unit: GiB. For example, 4 GiB. The minimum value is 2 GiB and the maximum value is 64 GiB.

Slots per TaskManager

Not applicable.

Configure slot resources

  1. In expert mode, click Get Plan Now to fetch the resource plan graph.

  2. Click the Edit 编辑 icon on a slot box. The generated resource plan graph displays multiple slot boxes, each containing VERTEX operator information and a PARALLELISM value.

  3. Modify the slot configuration. In the dialog box, you can configure the CPU, heap memory, off-heap memory, and parallelism parameters.

    The parallelism you set here applies to all operators within this Slot Sharing Group. After you save the configuration, the system automatically:

    • Sets the same parallelism for all operators in this Slot Sharing Group.

    • Allocates the required memory for the state backend, Python, and operators based on the job's computation logic. This allocation is automatic.

    • Note
      • For a Source node, we recommend setting a parallelism that is proportional to its partition count. In other words, the parallelism should be a divisor of the partition count. For example, if a Kafka topic has 16 partitions, set the parallelism to 16, 8, or 4 to avoid data skew.

      • Setting the parallelism of a Source node too low can cause a bottleneck, as one Source may read too much data and reduce job throughput.

      • For other nodes, set the parallelism according to their data traffic, assigning higher parallelism to nodes with more traffic.

  4. Click OK.

Configure operator resources

By default, all operators share a single Slot Sharing Group, which prevents you from configuring their resources individually. To configure resources for a specific operator, enable Multiple SSG mode. This mode assigns an independent slot to each operator, allowing you to configure its resources on that slot.

  1. On the Configuration tab, click Edit in the Resources section, and set Resource mode to expert mode.

  2. (Optional) If no resource plan is displayed, click Get Plan Now.

    By default, the generated resource plan graph shows all operators within a single slot box.

  3. Turn on the Multiple SSG mode switch and then click Re-fetch.

    This action splits the operators in the sharing group into individual slots.

  4. Click the Edit 编辑 icon on the slot box that corresponds to the target operator, and then modify the operator resources.

    In the Modify slot dialog box, you can configure the CPU, heap memory, off-heap memory, and parallelism parameters.

  5. Click OK.

Operator parallelism, chaining strategy, and State TTL

Note

Configuring State TTL is supported only in Ververica Runtime (VVR) 8.0.7 and later versions.

You can configure the parallelism, chaining strategy, and State TTL for individual operators.

  1. Click the Expand image icon on the target VERTEX box.

    After expansion, the VERTEX box displays each operator node, its PARALLELISM value, and an Edit icon next to each operator.

    Note

    You can click the Edit 编辑 icon on a VERTEX to set the parallelism for all operators within that VERTEX in a batch.

  2. Click the Edit image icon for the operator.

  3. Configure the operator resources.

    The following table describes the parameters.

    Parameter

    Description

    Parallelism

    The parallelism for the operator.

    Chaining strategy

    Chaining connects multiple operators into a single task, improving performance by reducing data transfer and serialization overhead. However, you can break a chain to gain finer control over the execution flow. The following strategies are available:

    • ALWAYS (Default): The operator can always be chained with upstream and downstream operators.

    • HEAD: The current operator acts as the head of a chain. It is not chained with upstream operators but remains chained with downstream operators.

    • NEVER: The current operator is not chained with any upstream or downstream operators.

    State TTL

    You can set the expiration time in seconds, minutes, hours, or days. By default, the operator inherits the job's state expiration time, which defaults to 1.5 days. To configure the job-level expiration time, see Configure running parameters.

    Note
    • This feature is supported only in Ververica Runtime (VVR) 8.0.7 and later.

    • TTL configuration is supported only for stateful operators.

    • State expiration is an approximate cleanup mechanism. The system does not guarantee that expired state is removed immediately after the TTL elapses. The actual cleanup time depends on background state access patterns and cleanup policies.

  4. Click OK.

Related documents