Create a Spark SQL task

更新时间:
复制 MD 格式

Create a Spark SQL offline computing task in Dataphin.

Prerequisites

The Spark SQL service is enabled in the Hadoop computing source of your project. For more information, see .

Procedure

  1. Navigate to the Dataphin homepage, select Development > Data Development from the top menu bar.

  2. On the Development page, select Project from the top menu bar (select the environment in Dev-Prod mode).

  3. In the navigation pane on the left, select Data Processing > Compute Task. In the Compute Tasks list, click the image icon and select Spark SQL.

  4. In the New Spark SQL Task dialog box, configure the parameters.

    Parameter

    Description

    Task Name

    Enter the offline computing task name.

    The length must be 256 characters or fewer and cannot include vertical lines (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or quotation marks (").

    Schedule Type

    Choose the task's schedule type. Options for Schedule Type include the following:

    • Recurring Task, which is automatically included in the system's periodic scheduling.

    • One-Time Task, which must be manually triggered to execute.

    Select Directory

    Select the folder that contains the task.

    If no directory exists, you can Create Folder using the following steps:

    1. Click the image icon above the task list on the left to open the Create Folder dialog box.

    2. In the New Folder dialog box, enter the folder Name and choose the Select Directory location as needed.

    3. Click Confirm.

    Use Template

    Toggle the Use Template switch to decide whether to apply a code template. If enabled, you must also select the Template and its Version.

    Reference code templates for streamlined development. Templates are read-only — configure the required parameters to complete your code. For more information, see Create an offline computing template.

    Description

    Provide a concise description of the task, within 1000 characters.

  5. Click Confirm.

  6. In the code editor on the Spark SQL task tab, write the code for the offline computing task. Then, click Precompile above the code editor to check the syntax of your code.

  7. After precompilation, click Run to execute the code.

  8. Click Property in the right-side sidebar. On the Property panel, configure the task's Basic Information, Runtime Parameter, Spark Resource Settings, Schedule Property (for recurring tasks), Schedule Dependency (for recurring tasks), Runtime Configuration, and Resource Configuration.

    • Basic Information

      Define the task name, assign an owner, and add a description. For configuration details, see Configure basic task information.

    • Runtime Parameter

      If your task uses parameter variables, define their values here. During scheduling, the variables are automatically replaced with the specified values. For configuration details, see Configure runtime parameters for an offline task.

    • Spark Resource Settings

      When configuring the project's computing source with the Spark SQL service configuration to use the Kyuubi service type for both production and development environments, note that Spark resource settings will only be applied in the production environment.

      Dynamic Resource Allocation: Allows enabling or disabling of this feature.

      • Enable: The system will dynamically allocate resources for Spark SQL tasks.

      • Disable: You may set Spark resource parameters manually. For descriptions of these parameters, see Configure Spark SQL task parameters.

      Note

      When a project's computing source is configured with the Spark SQL service using the Thrift Server service type, Spark resource settings are not applicable.

    • Scheduling Properties (for recurring tasks)

      For a Recurring Task, configure its scheduling properties in addition to Basic Information. For configuration details, see Configure scheduling properties.

    • Schedule Dependency (for recurring tasks)

      For a Recurring Task, configure its schedule dependency in addition to Basic Information. For configuration instructions, see Configure schedule dependency.

    • Runtime Configuration

      Configure task-level runtime timeout and rerun policies based on your business needs. If not configured, the tenant-level defaults are used. For configuration instructions, see Configure runtime settings for computing tasks.

    • Resource Configuration: Specifies the resources consumed by scheduled instances of the task. Resources are isolated between resource groups. Spark SQL is a shared resource task and does not support custom resource groups. For more information, see Configure resource settings for computing tasks.

  9. On the Spark SQL task tab, save and submit the task.

    1. Click the image icon to save the code.

    2. Click the image icon to submit the code for review.

  10. In the Submitting Log page, confirm the Submission Content and Pre-check results, and enter remarks. For more information, see Instructions for submitting offline computing tasks.

  11. After review, click Confirm And Submit to complete the submission.

What to do next

  • In Dev-Prod mode, after submitting your task, go to the release list to publish the task to the production environment. For more information, see Manage release tasks.

  • In Basic mode, the Spark SQL task joins the production schedule after submission. Go to the Operation Center to view your published tasks. For more information, see View and manage script tasks, View and manage one-time tasks.