Read and write external data files by using oss_fdw

更新时间:
复制 MD 格式

The oss_fdw extension is a foreign data wrapper (FDW) for PolarDB for PostgreSQL that maps Object Storage Service (OSS) objects to foreign tables. With oss_fdw, you read and write OSS data through standard SQL — ideal for archiving historical data, read-only data, and cold data to reduce storage costs.

OSS is a secure, cost-effective, and highly reliable cloud storage service with 99.995% data availability.

Prerequisites

  • OSS is activated and a bucket is created. What is OSS?

  • Your PolarDB for PostgreSQL cluster runs one of these engine versions:

    • PostgreSQL 16 (revision version 2.0.16.6.2.0 or later)

    • PostgreSQL 14 (revision version 2.0.14.5.3.0 or later)

    • PostgreSQL 11 (revision version 2.0.11.2.1.0 or later)

Note

View the revision version in the console or run SHOW polardb_version; to query it. To upgrade, see Version management.

Limitations

oss_fdw foreign tables support SELECT, INSERT, and TRUNCATE only. UPDATE and DELETE are not supported. Once data is written to OSS, it can be read but not modified in place.

Foreign table options reference

Use these options when creating an OSS foreign table:

Option Description Example value
dir Maps the foreign table to an OSS directory. Each INSERT creates a new file in the directory. 'archive/'
prefix Maps the foreign table to a filename prefix. Each INSERT creates a new file with the prefix. 'prefix/file_'
format Data format. Default: csv. 'csv'
compressiontype Compression algorithm. Default: none. Valid values: gzip, zstd. 'gzip'
compressionlevel Compression level. Higher levels produce smaller files but consume more CPU. '9'
Note

Specify either dir or prefix to define the OSS path for the foreign table.

Install the extension

CREATE EXTENSION oss_fdw;

Create a foreign data server

Define the connection between your PolarDB cluster and an OSS bucket.

CREATE SERVER ossserver
FOREIGN DATA WRAPPER oss_fdw
OPTIONS (
    host 'oss-cn-xxx.aliyuncs.com',
    bucket 'mybucket',
    id 'xxx',
    key 'xxx'
);

The server options are:

Parameter Description
host OSS endpoint
bucket OSS bucket name
id AccessKey ID of your Alibaba Cloud account
key AccessKey secret of your Alibaba Cloud account

Map a foreign table to an OSS directory

  1. Create an OSS foreign table mapped to a directory.

       CREATE FOREIGN TABLE t1_oss (
           id INT,
           f FLOAT,
           txt TEXT
       )
       SERVER ossserver
       OPTIONS (dir 'archive/');
  2. Insert data. Data is written to the archive/ directory. Query the table to verify:

       INSERT INTO t1_oss VALUES (generate_series(1,100), 0.1, 'hello');
       EXPLAIN SELECT COUNT(*) FROM t1_oss;
                                  QUERY PLAN
       -----------------------------------------------------------------
        Aggregate  (cost=6.54..6.54 rows=1 width=8)
          ->  Foreign Scan on t1_oss  (cost=0.00..6.40 rows=54 width=0)
                Directory on OSS: archive/
                Number Of OSS file: 1
                Total size of OSS file: 1292 bytes
       (5 rows)
    
       SELECT COUNT(*) FROM t1_oss;
        count
       -------
          100
       (1 row)
  3. Each subsequent INSERT creates a new file in the same OSS directory.

       INSERT INTO t1_oss VALUES (generate_series(1,100), 0.1, 'hello');
    
       EXPLAIN SELECT COUNT(*) FROM t1_oss;
                                   QUERY PLAN
       -------------------------------------------------------------------
        Aggregate  (cost=12.07..12.08 rows=1 width=8)
          ->  Foreign Scan on t1_oss  (cost=0.00..11.80 rows=108 width=0)
                Directory on OSS: archive/
                Number Of OSS file: 2
                Total size of OSS file: 2584 bytes
       (5 rows)
    
       SELECT COUNT(*) FROM t1_oss;
        count
       -------
          200
       (1 row)
  4. Run TRUNCATE to remove all OSS files mapped to the foreign table.

       TRUNCATE t1_oss;
    
       SELECT COUNT(*) FROM t1_oss;
       WARNING:  does not match any file in oss
        count
       -------
            0
       (1 row)

Map a foreign table to a directory prefix

  1. Create a foreign table with the prefix option.

       CREATE FOREIGN TABLE t2_oss (
           id INT,
           f FLOAT,
           txt TEXT
       )
       SERVER ossserver
       OPTIONS (prefix 'prefix/file_');
  2. Each INSERT creates a new file with the specified prefix.

       INSERT INTO t2_oss VALUES (generate_series(1,100), 0.1, 'hello');
    
       INSERT INTO t2_oss VALUES (generate_series(1,100), 0.1, 'hello');
    
       EXPLAIN SELECT COUNT(*) FROM t2_oss;
                                   QUERY PLAN
       -------------------------------------------------------------------
        Aggregate  (cost=12.07..12.08 rows=1 width=8)
          ->  Foreign Scan on t2_oss  (cost=0.00..11.80 rows=108 width=0)
                Directory on OSS: prefix/file_
                Number Of OSS file: 2
                Total size of OSS file: 2584 bytes
       (5 rows)
    
       SELECT COUNT(*) FROM t2_oss;
        count
       -------
          200
       (1 row)

Specify the storage format

oss_fdw stores data in CSV format by default. Use the format option to specify a different format. Each INSERT writes data in the specified format to an OSS file.

CREATE FOREIGN TABLE t3_oss (
    id INT,
    f FLOAT,
    txt TEXT
)
SERVER ossserver
OPTIONS (dir 'archive_csv/', format 'csv');

List files for an OSS foreign table

  1. Create a foreign table and run three INSERT statements to write data to three OSS files.

       CREATE FOREIGN TABLE t4_oss (
           id INT,
           f FLOAT,
           txt TEXT
       )
       SERVER ossserver
       OPTIONS (dir 'archive_file_list/');
    
       INSERT INTO t4_oss VALUES (generate_series(1,10000), 0.1, 'hello');
       INSERT INTO t4_oss VALUES (generate_series(1,10000), 0.1, 'hello');
       INSERT INTO t4_oss VALUES (generate_series(1,10000), 0.1, 'hello');
  2. Call oss_fdw_list_file() to list files associated with a foreign table. Optionally pass the schema name (defaults to public).

       SELECT * FROM oss_fdw_list_file('t4_oss');
                          name                    |  size
       -------------------------------------------+--------
        archive_file_list/_t4_oss_783053364762580 | 148894
        archive_file_list/_t4_oss_783053364849053 | 148894
        archive_file_list/_t4_oss_783053366496328 | 148894
       (3 rows)
    
       SELECT * FROM oss_fdw_list_file('t4_oss', 'public');
                          name                    |  size
       -------------------------------------------+--------
        archive_file_list/_t4_oss_783053364762580 | 148894
        archive_file_list/_t4_oss_783053364849053 | 148894
        archive_file_list/_t4_oss_783053366496328 | 148894
       (3 rows)

Compress data with gzip or Zstandard

The compressiontype option sets the compression algorithm. Default: none. Valid values: gzip and zstd.

The compressionlevel option controls the trade-off between file size and CPU usage. Higher levels produce smaller files and reduce network transfer volume, but require more CPU.

Algorithm Compression level range Default level Version requirement
gzip 1 to 9 6 All supported engine versions
Zstandard (zstd) -7 to 22 6 PostgreSQL 14 (revision version 14.9.13.0 or later)

gzip compression

gzip compression levels range from 1 to 9, with a default of 6.

CREATE FOREIGN TABLE t5_oss (
    id INT,
    f FLOAT,
    txt TEXT
)
SERVER ossserver
OPTIONS (
    dir 'archive_file_compression/',
    compressiontype 'gzip',
    compressionlevel '9'
);

INSERT INTO t5_oss VALUES (generate_series(1,10000), 0.1, 'hello');
INSERT INTO t5_oss VALUES (generate_series(1,10000), 0.1, 'hello');
INSERT INTO t5_oss VALUES (generate_series(1,10000), 0.1, 'hello');

Compare compressed and uncompressed file sizes:

SELECT * FROM oss_fdw_list_file('t4_oss');
                   name                    |  size
-------------------------------------------+--------
 archive_file_list/_t4_oss_741147680906121 | 148894
 archive_file_list/_t4_oss_741147680965631 | 148894
 archive_file_list/_t4_oss_741147681201236 | 148894
(3 rows)

SELECT * FROM oss_fdw_list_file('t5_oss');
                        name                         | size
-----------------------------------------------------+-------
 archive_file_compression/_t5_oss_741147752563794.gz | 23654
 archive_file_compression/_t5_oss_741147752633713.gz | 23654
 archive_file_compression/_t5_oss_741147752828680.gz | 23654
(3 rows)

Zstandard compression

Note

Zstandard (zstd) compression is supported only on clusters running PostgreSQL 14 (revision version 14.9.13.0 or later).

Zstandard compression levels range from -7 to 22, with a default of 6.

CREATE FOREIGN TABLE t6_oss (
    id INT,
    f FLOAT,
    txt TEXT
)
SERVER ossserver
OPTIONS (
    dir 'archive_file_zstd/',
    compressiontype 'zstd',
    compressionlevel '9'
);

INSERT INTO t6_oss VALUES (generate_series(1,10000), 0.1, 'hello');
INSERT INTO t6_oss VALUES (generate_series(1,10000), 0.1, 'hello');
INSERT INTO t6_oss VALUES (generate_series(1,10000), 0.1, 'hello');

Compare compressed and uncompressed file sizes:

SELECT * FROM oss_fdw_list_file('t4_oss');
                   name                    |  size
-------------------------------------------+--------
 archive_file_list/_t4_oss_741147680906121 | 148894
 archive_file_list/_t4_oss_741147680965631 | 148894
 archive_file_list/_t4_oss_741147681201236 | 148894
(3 rows)

SELECT * FROM oss_fdw_list_file('t6_oss');
                     name                      | size
-----------------------------------------------+------
 archive_file_zstd/_t6_oss_748106174612293.zst | 6710
 archive_file_zstd/_t6_oss_748106174700206.zst | 6710
 archive_file_zstd/_t6_oss_748106174866829.zst | 6710
(3 rows)

Compression size comparison

File sizes for 10,000 rows (3 columns) at compression level 9:

Compression File size per file File extension Reduction
None 148,894 bytes (none) --
gzip (level 9) 23,654 bytes .gz ~84%
Zstandard (level 9) 6,710 bytes .zst ~95%

Remove the extension

DROP EXTENSION oss_fdw;