Create an Impala SQL offline computing task in Dataphin.
Prerequisites
Before you create an Impala SQL task, enable the Impala task in the Hadoop compute source. For more information, see .
Procedure
-
On the Dataphin homepage, choose Development > Data Development from the top menu bar.
-
On the Development page, select a project in the top menu bar. If you use the Dev-Prod mode, select an environment.
-
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the
icon and select Impala SQL. -
In the Create Impala SQL Task dialog box, configure the parameters.
Parameter
Description
Task Name
Enter a name for the offline computing task.
The name can be up to 256 characters in length. It cannot contain vertical bars (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or double quotation marks (").
Schedule Type
Select a schedule type for the task:
-
Recurring Task: The task runs automatically on a schedule.
-
Manual Task: The task must be triggered manually.
Select Directory
Select a folder to store the task.
To create a new folder, follow these steps:
-
Above the list of computing tasks on the left, click the
icon to open the Create Folder dialog box. -
In the Create Folder dialog box, enter a Name for the folder and select a folder location as needed.
-
Click OK.
Use Template
Enable Use Template to create the task from a code template. You must also select a template and a template version.
Code templates streamline development. The template code is read-only — you only need to configure the template parameters. For more information, see Create an offline computing template.
Description
Enter a brief description of the task. The description can be up to 1,000 characters in length.
-
-
Click OK.
-
On the Impala SQL task tab, write the code for the offline computing task in the code editor. Then, click Precompile above the code editor to check the syntax of your Impala SQL code.
-
After the code is precompiled, click Run above the code editor.
-
In the sidebar on the right, click Property. In the Property panel, configure the Basic Information, Runtime Parameter, Scheduling Properties (for recurring tasks), Schedule Dependency (for recurring tasks), Runtime Configuration, and Resource Configuration for the task.
-
Basic Information
Set the basic information for the task, such as the name, owner, and description. For more information, see Configure basic information for a task.
-
Runtime Parameter
If your task uses parameter variables, assign values here. The variables are automatically replaced with the specified values when the node is scheduled. For more information, see Configure runtime parameters for an offline task.
-
Scheduling Properties (for recurring tasks)
If the schedule type is Recurring Task, configure the scheduling properties in addition to Basic Information. For more information, see Configure scheduling properties.
-
Schedule Dependency (for recurring tasks)
If the schedule type is Recurring Task, configure the schedule dependencies in addition to Basic Information. For more information, see Configure schedule dependencies.
-
Runtime Configuration
Set the runtime timeout and failure retry policy for the task. If not configured, the task inherits the tenant-level defaults. For more information, see Configure runtime settings for a computing task.
-
Resource Configuration
Assign a scheduling resource group to the task. The task consumes the resource quota of this group when scheduled. For more information, see Configure resource settings for a computing task.
-
-
On the Impala SQL task tab, save and submit the task.
-
Click the
icon above the code editor to save the code. -
Click the
icon above the code editor to submit the code.
-
-
On the Submitting Log page, review the Submission Content and the Pre-check results. Then, enter your comments. For more information, see Submit an offline computing task.
-
After you review the information, click Confirm and Submit.
What to do next
-
If you use the Dev-Prod mode, you must publish the task to the production environment after the task is submitted. For more information, see Manage published tasks.
-
If you use the Basic mode, the submitted Impala SQL task can be scheduled in the production environment. You can go to the Operation Center to view the published task. For more information, see Manage integration and computing tasks, Manage one-time tasks.
Appendix: Switch task types
If the offline engine for your project is a Hadoop compute source and the Impala task feature is enabled, you can switch between the Impala SQL and Hive SQL task types.
-
On the Dataphin homepage, choose Development > Data Development from the top menu bar.
-
On the Development page, select a project in the top menu bar. If you use the Dev-Prod mode, select an environment.
-
In the navigation pane on the left, choose Data Processing > Compute Job. From the Compute Job list, select the target Impala SQL job.
-
Click the
icon next to the Impala SQL task and select Change Type. -
In the Change Type dialog box, select Hive SQL and click OK. This switches the task type.