FAQ

更新时间: 2026-03-17 20:34:13

This page answers frequently asked questions about Data Lake Formation (DLF).

When should I use Data Lake Formation?

DLF is well suited for the following scenarios:

  • Data analysis and machine learning in the cloud

  • Building a data lake in the cloud

  • Reducing O&M costs

What is the difference between Data Lake Formation and OSS?

Object Storage Service (OSS) and DLF serve different roles and are designed to work together:

Dimension

OSS

DLF

Primary purpose

Stores raw data files in the cloud

Manages permissions and metadata views over data stored in OSS

Access control

Bucket- and object-level policies

Fine-grained permissions for big data analysis and machine learning scenarios

In short, OSS is the backend storage that DLF sits on top of. DLF adds the governance and discovery layer that makes big data in the cloud accessible and manageable at scale.

How do I apply for the public preview qualification of Data Lake Formation?

You must use an Alibaba Cloud account to apply. RAM users cannot apply for public preview qualification.

To apply:

  1. Log on to the Alibaba Cloud console with your Alibaba Cloud account.

  2. Enter the required information about your enterprise.

  3. After your application is approved, log on to the DLF console to start using the service.

Note: The application must be submitted by the primary Alibaba Cloud account holder. RAM users cannot submit a public preview application.

How am I charged for using Data Lake Formation?

DLF is free during the public preview period:

  • Computing resources: No charge during public preview.

  • Metadata storage: A free monthly quota of metadata tables and objects is included. Usage that exceeds the free quota is billed accordingly.

After DLF is commercialized (moved to general availability), charges will apply based on the computing resources consumed when importing data.

How do I estimate the number of CUs for a data import template?

A compute unit (CU) is the resource unit used by each data import task in DLF. One CU provides:

  • 2 vCPUs

  • 8 GiB memory

During the public preview, you can apply for a maximum of 40 CUs. If you need more computing resources than the 40 CU limit allows, submit a ticket to request an increase.

How do I use Spark to read data from a DLF data lake?

You can use Spark to read data from a DLF data lake only when it is integrated with Alibaba Cloud E-MapReduce. For more information, see EMR and DLF data lake solution. Integration with self-managed Hadoop or Spark clusters is not supported.

上一篇: Support 下一篇: Agreements
阿里云首页 数据湖构建 相关技术圈