Comparison of supported open formats

更新时间:
复制 MD 格式

DLF REST Catalog provides fully managed metadata services that address the concurrency, performance, and governance limitations of self-managed FileSystem Catalogs.

FileSystem catalog: lightweight to start, limited in production

The FileSystem catalog organizes table metadata using a directory structure, such as warehouse/dbName.db/tableName. It requires no external services and works out of the box, making it a convenient starting point.

In production, however, it runs into fundamental constraints:

  • Unsafe concurrent writes: It relies on Object Storage rename operations to simulate commits. Because these operations are not atomic, concurrent writes on the same table can cause file renaming conflicts and data loss.

  • Compaction tied to write jobs: Without a centralized metadata service, compaction must run inside write jobs. This consumes write resources, complicates resource planning, and reduces stability.

  • Slow table lifecycle operations: Creating, deleting, or renaming a table requires traversing a large number of files—a slow and error-prone process that worsens at scale.

  • High-latency metadata reads: All metadata retrieval depends on list operations in Object Storage, resulting in high latency and high costs for large tables.

  • No visibility or governance: It lacks production-grade capabilities such as monitoring, storage overviews, access control, and hot/cold data management.

Standard REST protocol

FileSystem catalog: Metadata stored in file system directories requires list operations for retrieval. These are slow, costly, and create strong dependencies on the underlying storage, limiting extensibility.

DLF REST catalog: Provides lightweight, fast metadata reads and writes via an open, standard REST API. Java and Python SDKs reduce integration complexity across multi-language environments.