FAQ
This page answers frequently asked questions about Data Lake Formation (DLF).
-
How do I apply for the public preview qualification of Data Lake Formation?
-
How do I estimate the number of CUs for a data import template?
When should I use Data Lake Formation?
DLF is well suited for the following scenarios:
Data analysis and machine learning in the cloud
Building a data lake in the cloud
Reducing O&M costs
What is the difference between Data Lake Formation and OSS?
Object Storage Service (OSS) and DLF serve different roles and are designed to work together:
Dimension |
OSS |
DLF |
|---|---|---|
Primary purpose |
Stores raw data files in the cloud |
Manages permissions and metadata views over data stored in OSS |
Access control |
Bucket- and object-level policies |
Fine-grained permissions for big data analysis and machine learning scenarios |
In short, OSS is the backend storage that DLF sits on top of. DLF adds the governance and discovery layer that makes big data in the cloud accessible and manageable at scale.
How do I apply for the public preview qualification of Data Lake Formation?
You must use an Alibaba Cloud account to apply. RAM users cannot apply for public preview qualification.
To apply:
Log on to the Alibaba Cloud console with your Alibaba Cloud account.
Enter the required information about your enterprise.
After your application is approved, log on to the DLF console to start using the service.
Note: The application must be submitted by the primary Alibaba Cloud account holder. RAM users cannot submit a public preview application.
How am I charged for using Data Lake Formation?
DLF is free during the public preview period:
Computing resources: No charge during public preview.
Metadata storage: A free monthly quota of metadata tables and objects is included. Usage that exceeds the free quota is billed accordingly.
After DLF is commercialized (moved to general availability), charges will apply based on the computing resources consumed when importing data.
How do I estimate the number of CUs for a data import template?
A compute unit (CU) is the resource unit used by each data import task in DLF. One CU provides:
2 vCPUs
8 GiB memory
During the public preview, you can apply for a maximum of 40 CUs. If you need more computing resources than the 40 CU limit allows, submit a ticket to request an increase.
How do I use Spark to read data from a DLF data lake?
You can use Spark to read data from a DLF data lake only when it is integrated with Alibaba Cloud E-MapReduce. For more information, see EMR and DLF data lake solution. Integration with self-managed Hadoop or Spark clusters is not supported.