Best practices for performing operations on OSS foreign tables

更新时间:
复制 MD 格式

OSS foreign tables let you query data in Object Storage Service (OSS) directly from AnalyticDB for PostgreSQL without importing it first. Query performance depends on how well the optimizer understands your data and how your objects are organized in OSS. Follow these practices to get the most out of parallel scans, accurate query plans, and efficient file layouts.

If these optimizations still leave performance short of your requirements, consider using partitioned OSS foreign tables, which can significantly reduce scan scope by pruning irrelevant partitions at query time.

View execution plans

Use EXPLAIN to inspect the execution plan for any query that involves OSS foreign tables:

EXPLAIN SELECT COUNT(*) FROM oss_lineitem_orc WHERE l_orderkey > 14062498;

Review the plan to verify that the query optimizer is choosing efficient join orders and scan methods, especially for complex queries on multiple tables.

Collect statistics

AnalyticDB for PostgreSQL does not automatically collect statistics for OSS foreign tables. Unlike regular tables stored locally, OSS foreign tables live in OSS, so the database has no automatic mechanism to sample them. Without up-to-date statistics, the query optimizer works from estimates and may choose an inefficient plan — for example, selecting the wrong join order or broadcast strategy for queries that join multiple tables.

Run ANALYZE manually whenever your data changes significantly:

  1. Check the current execution plan:

    EXPLAIN <table_name>;
  2. Collect statistics:

    ANALYZE <table_name>;
  3. Check the updated execution plan to confirm the optimizer now generates a more efficient plan:

    EXPLAIN <table_name>;

Split large objects

OSS foreign tables support parallel scans across multiple nodes. Each node scans a separate object, so if your data consists of a small number of large files, most nodes sit idle. Split large files into smaller ones so each node has a separate object to scan in parallel — this significantly improves query throughput.

The following example shows how to split a large CSV file on Linux.

  1. Check the number of rows in the file:

    wc -l <csv_file>
  2. Split the file into smaller files, each containing N rows:

    split -l N <csv_file>

    Replace N with the number of rows per output file. The split command preserves row boundaries, so no row is divided across multiple output files.

View object information

To inspect the objects that back an OSS foreign table, run:

SELECT * FROM get_oss_table_meta('<OSS FOREIGN TABLE>');

Replace <OSS FOREIGN TABLE> with the name of the OSS foreign table.

What's next