Data lineage allows you to track relationships between tables and fields. This helps you trace data origins, manage your data assets, and analyze the impact of job failures on upstream and downstream dependencies. Hologres is deeply integrated with DataWorks, enabling you to manage data lineage for Hologres through the Data Map module.
Background
Data Map is a module in DataWorks that provides an enterprise data catalog based on metadata. It offers features such as viewing metadata details and managing data lineage and data categories. Data Map helps you discover, understand, and use your data. For more information, see Overview.
Limitations
-
The Data Map feature is supported only on Hologres V1.1 or later. If your instance is an earlier version, see Troubleshoot upgrade preparation failures or join the Hologres DingTalk group for assistance. For more details, see How do I get more online support?.
-
You can view data lineage only in DataWorks Standard Edition or later.
-
After you configure a Hologres metadata crawler in Data Map, it takes about one hour for the data lineage to appear.
Hologres data lineage
Follow these steps to view the data lineage between Hologres tables in Data Map.
-
Collect and ingest Hologres metadata.
Use the metadata collection feature to import metadata from your Hologres data source into Data Map for centralized management. For more information, see Metadata collection.
After metadata collection is complete, you can go to the Data Overview page to view statistics for Hologres databases and tables in the current region where a metadata crawler is configured. For more information, see View overall data. Data Map provides a statistical overview of the Hologres instance, including metrics such as the total number of databases and the total number of tables.
To find a specific table, see Metadata retrieval.
-
View lineage details.
After you find the target table, click its name to go to the table details page. Here, you can view basic information, output information, and data lineage for the table. For more information, see View the details of a table.
The following example shows an internal Hologres table. On this page, you can view the table's metadata and its lineage on the Lineage tab. The table details page contains the Details, Lineage, and Instructions tabs. On the Details tab, the Field Information sub-tab displays the name and data type of each field, such as o_orderkey(bigint) and o_custkey(integer). In addition to basic information, the left panel also includes Technical Information and Business Information sections.
Data lineage with MaxCompute
Data Map also lets you view the data lineage between MaxCompute and Hologres. Click the link for the Hologres foreign table to view details of the corresponding MaxCompute table.
You can navigate to the MaxCompute table details page from this link only if the project that contains the MaxCompute table is bound to the target DataWorks workspace.
In the Technical Information section of the table details page, the External Table field displays the name of the associated MaxCompute table, such as default.weather. Click this link to go to the corresponding MaxCompute table details page.
You can view table lineage information on this page. For example, this page shows that in the DataWorks Scheduling module, data is written from a Hologres foreign table to a MaxCompute table. You can also go to the Field Lineage tab to view field lineage relationships. For example, on the details page for the Hologres foreign table public.weather2, the left side displays basic information (such as the data source type: Hologres, database, and owner) and technical information (the foreign table path). On the right, the Lineage Information tab graphically displays the foreign table mapping relationship between this foreign table and the MaxCompute table default.weather.
Data lineage with Flink
You can view the data lineage between Hologres and Flink in the Realtime Compute for Apache Flink console. For more information, see View data lineage.