How to create a Dataphin general-purpose project-Dataphin(Dataphin)-阿里云帮助中心

Prerequisites

Before you begin, make sure that you meet the following requirements:

If you plan to develop stream-batch integration tasks, create a compute source that supports this feature first. For more information, see:
- Create a Ververica Flink compute source.
- Create a Blink compute source.
If you select MaxCompute as the compute engine for Dataphin and need to use features such as standardized modeling, ad-hoc queries, and MaxCompute SQL compute tasks, create a MaxCompute compute source before you create a project workspace. For more information, see Create a MaxCompute compute source.

When your compute engine is MaxCompute, you can also create a Hologres compute source. After you bind a Hologres compute source to your project, you can use features such as ad-hoc queries and HOLOGRES_SQL compute tasks. For more information, see Create a Hologres compute source.

Background information

Dataphin supports two development modes for projects:

Dev-Prod mode: Creates both a development environment (Dev project) and a production environment (Prod project). The Prod project ensures data security in the production environment. Recommended if you have strong governance requirements, a large number of data developers with clearly defined roles, and a higher budget for computing and storage.
Basic mode: Creates a single Basic project that serves as a unified development and production environment. Offers a streamlined data production process. Recommended if you prioritize development efficiency, have flexible developer roles with overlapping responsibilities, and have a limited budget for computing and storage.

Limits

If you need to associate your project with Platform for AI (PAI) to create scheduling tasks, upgrade PAI-Studio to Designer and migrate your experiments from the old Studio console to the new Designer. For more information, see Migrate Studio experiments to Designer.

Permissions

Super administrators, system administrators, and sector architects can create projects.
Super administrators, system administrators, and sector architects can enable or disable permission requests for reading from and writing to data tables.

Procedure

In the top navigation bar of the Dataphin homepage, choose Planning > Projects.
On the Project Management page, click Create General-purpose Project to open the Create Project dialog box.
In the Create Project dialog box, select Dev-Prod mode or Basic mode and then click Next.

Important
You cannot upgrade a project from Basic mode to Dev-Prod mode. Basic mode also carries the risk of direct changes to the production environment. Choose the mode carefully.

If you select Basic mode, manage project members carefully to maintain the stability of data production.

In the Create Project dialog box, configure the parameters.

The parameters for Dev-Prod mode and Basic mode are the same. The following example uses Dev-Prod mode.

Parameter		Description
Owning sector	Data sector	Select the data sector to which the project belongs.
Basic information	Common English name	Enter the common English name of the project. The name must follow these rules: Can contain letters, digits, and underscores (_). Cannot start with LD_. Cannot exceed 64 characters. The English name of a development environment project ends with _dev by default. Note When the compute engine is MaxCompute, the common English name of the project should match the name of the corresponding MaxCompute project.
	Common name	Enter the common name of the project. The name must follow these rules: Can contain Chinese characters, digits, letters, underscores (_), or hyphens (-). Cannot start with LD_. Cannot exceed 64 characters.
	Compute source type	Select a compute source type and then select the corresponding compute source. Important A compute source that is already bound to a project cannot be bound to another project. The Dev and Prod projects must use the same compute source. When the Dataphin compute engine is initialized as MaxCompute, you can select MaxCompute or Hologres as the offline engine. If you select MaxCompute, you can also enable machine learning PAI. Dataphin integrates with Platform for AI (PAI) to provide basic algorithm scheduling capabilities. When you create a visual modeling workspace in PAI, select a MaxCompute-based compute resource group. For more information, see Platform for AI (PAI). If you enable PAI, configure the following parameters. PAI region: Select the same region as your Dataphin instance. Access method: Select the access method for PAI. You can choose VPC or public network access. AccessKey ID, AccessKey Secret: Enter the AccessKey ID and AccessKey Secret of the account used to access PAI. PAI project name: Select the PAI project. The MaxCompute project bound to the current Dataphin project should be the same as the MaxCompute project bound to PAI.
	Default project resource group	Tasks created in this project use the default resource group configured here for scheduling. This setting is available only if an offline compute engine is enabled for the project. You can also customize the resource group for individual tasks during task configuration. You can only select a resource group that is in the Normal state, is used for daily task scheduling, and is associated with the current project. If you change the default resource group here, tasks whose scheduling resource group is set to Default Project Resource Group will automatically use the new resource group. If you do not want to automatically update the resource group, assign a specific custom resource group to the task. For more information, see Configure compute resources for a task. Note This feature is available only if custom resource groups are enabled for the tenant. For more information, see Resource group overview. The tenant's public scheduling resource group (the tenant's default resource group) is used. Resource contention may occur during peak scheduling times.
	Description	Enter a brief description for the project. The description can be up to 128 characters in length.
Business information	Workspace type	Specifies the characteristics of development tasks and data outputs for the project. The default is Application layer. The available types are: Intermediate layer: Typically used for storing and processing data to provide consistent, accurate, and clean data. Source layer: Typically used to store raw data integrated from business systems, which serves as a source for subsequent processing and development. Application layer: Oriented towards business needs, this layer defines and generates diverse, customized data metrics for different use cases. Common layer: Typically used to store common aggregated data, such as summary data for a specific dimension within a data domain.
Security settings	Sandbox whitelist	Add the IP addresses or domain names that this project's integration, Shell, and Python tasks need to access. Access address: The address that tasks in the project need to access. Port: The port number for the access address. Wildcards () are supported. Description: A brief description of the access address. Note* For a whitelisted item, you can click the icon in the Actions column to remove it. After removal, the data sources, Shell tasks, and Python tasks in the project can no longer access the corresponding IP address or domain name.
	Global security settings	Security settings let you apply fine-grained controls over data security and access, and configure the switch and authentication mode for Spark tasks. For more information, see Security settings.
	Data result download (download approval)	Dataphin supports downloading business data. You can configure whether data at the project level can be downloaded. Once downloaded, data is no longer under system control. You can add watermarks to promote data security and prevent unauthorized sharing. For more information, see Configure data downloads. Important Only users with non-visitor roles can download data results to their local machines.
	Data permission approval	Data permission approval policies let you set different approval rules for different data sensitivity levels. This enables approvers to focus on highly sensitive data and bypass approvals for public data, reducing the burden of permission management. For more information, see Configure data permissions.
	Asset security policy	After installation, you can use data security policies to protect sensitive data. You can modify these settings under Governance > Data Security > Project Security Policy. For more information, see Project security policy.
Commit settings	Code review	Disabled by default. If enabled, you must also configure a code reviewer. When code review is enabled, compute tasks in this project must be reviewed before they are committed. The code reviewer is set to project administrator by default, but you can also Customize the selection to include multiple members for approval.
Publish settings	Publish approval	If enabled, you must configure the Approval settings. The publishing process for objects in this project will then require approval. Specify approver: The request is approved if any approver accepts it and rejected if any approver denies it. You can select project administrator or Custom. If you select Custom, you must select one to ten approvers. Specify approval template: Approvals are processed according to the selected approval template. If no suitable template is available, click + New Template to go to the Approval Templates page and create a new one. For more information, see Create and manage approval templates.
Task parameter configuration	Default Flink task parameter configuration	After you enable the real-time engine, you can enter Flink-related parameter configurations in the text box. When you create Flink tasks in this project, these parameters are applied by default. Parameters must be in a key-value format: `key:value`. For example: taskmanager.numberOfTaskSlots:1.
More settings	Default function menu	After you select the data sector for the project, the system selects corresponding function menus by default based on your chosen workspace type. You can modify the selection based on your business needs. Note The default function menu is not supported when you select a Hologres compute engine.
More settings	Periodic scheduling for production environment Note For Basic projects, this is labeled as Periodic scheduling.	Enable: Automatic task scheduling: When enabled, the status of new instances generated by periodic tasks in this project matches the task's status. Historical instances are not affected. Disable: Paused task scheduling: When disabled, new instances generated by periodic tasks in this project are set to a paused state. Historical instances are not affected. Disabling periodic scheduling can have serious consequences. Proceed with caution. Note In a development environment, the instance status changes from Not Running to Paused by default.

Click OK to create the project.

Next steps

After creating the project, go to the data development module to begin data development. For more information, see Data development overview.