Object tables map files stored in OSS into a queryable Hologres table—similar to an external table, but purpose-built for unstructured data such as images, PDFs, audio, and video. Hologres V4.0 and later support this feature.
With object tables, you can:
Query OSS file metadata using standard SQL without moving data out of OSS.
Process unstructured data end-to-end by combining object tables with Dynamic Tables and AI Functions—no external embedding services required.
Incrementally detect file changes to avoid reprocessing unchanged files and reduce compute costs.
Prerequisites
Before you begin, ensure that you have:
A Hologres instance running V4.0 or later. To upgrade, see Upgrade an instance
An OSS bucket containing the files you want to access
A RAM role with read access to the OSS bucket. To find the role ARN (Alibaba Cloud Resource Name), go to the Resource Access Management (RAM) console, click Identity Management > Role, and then click the role name
Limitations
Only classic network endpoints are supported.
Only manual refresh is supported. There is no automatic refresh.
By default, object tables scan only the top-level directory of the specified path. Subdirectories are not included. To scan subdirectories recursively, run:
NoteRecursive scans consume more resources than top-level scans. Set
pathto the most specific directory that contains your target files.ALTER DATABASE <db_name> SET hg_experimental_enable_oss_meta_recursive = on;
Create an object table
Object tables have a fixed schema—you do not specify columns when creating one.
Syntax:
CREATE OBJECT TABLE [IF NOT EXISTS] [schema_name.]<table_name>
WITH (
-- Required
path = '<oss_path>',
oss_endpoint = '<oss_endpoint>',
role_arn = '<role_arn>',
-- Optional
[orientation = 'column | row | row,column',]
[table_group = '<table_group_name>',]
[distribution_key = '<column_name>[,...]',]
[clustering_key = '<column_name>[:asc][,...]',]
[storage_mode = 'hot | cold',]
[event_time_column = '<column_name>[,...]',]
[bitmap_columns = '<column_name>[,...]',]
[dictionary_encoding_columns = '<column_name>[,...]',]
[time_to_live_in_seconds = '<non_negative_integer>']
);After creating an object table, refresh it once to load data:
REFRESH OBJECT TABLE [schema_name.]<table_name>;Required parameters
Optional parameters
Parameter | Description | Example |
| The OSS directory path containing your files. The object table reads metadata from files in this directory. |
|
| The OSS classic network endpoint for the bucket region. For endpoints by region, see . | |
| The ARN of a RAM role with read access to the OSS bucket. |
|
Parameter | Default | Description |
|
| Storage format. Supported values: |
| Default table group | The table group and shard count for the object table. |
|
| The distribution key. |
|
| The clustering key. |
|
| Storage mode. Supported values: |
|
| |
|
| The bitmap index columns. |
|
| Dictionary encoding columns. |
|
| Data lifecycle for the table in seconds. |
Query an object table
After the initial refresh, query an object table like any standard table:
SELECT * FROM <object_table_name>;Object tables have the following fixed columns:
To view the DDL or storage properties of an object table, use either of the following:
-- View DDL
SELECT hg_dump_script('[schema_name.]<object_table_name>');
-- View storage properties
SELECT * FROM hologres.hg_table_properties WHERE table_name = '<object_table_name>';Refresh an object table
Refresh an object table to sync file metadata from OSS. Only manual refresh is supported.
REFRESH OBJECT TABLE [schema_name.]<table_name>;Refreshing consumes compute resources. To minimize cost, set path to the most specific directory that contains your target files.
Manage an object table
Rename or move an object table
-- Rename
ALTER OBJECT TABLE [IF EXISTS] <table_name> RENAME TO <new_name>;
-- Move to a different schema
ALTER OBJECT TABLE [IF EXISTS] [schema_name.]<table_name> SET SCHEMA <new_schema>;Delete an object table
-- Moves to the recycle bin if it is enabled; otherwise, permanently deletes
DROP OBJECT TABLE [IF EXISTS] <table_name>;
-- Permanently deletes without moving to the recycle bin
DROP OBJECT TABLE [IF EXISTS] <table_name> FORCE;If the database recycle bin is enabled, dropping an object table moves it to the recycle bin automatically. The object table can be recovered using the recycle bin recovery command, and it remains an object table after recovery. For details, see Recycle bin.
The FORCE option permanently deletes the object table. Recovery is not possible.
Usage example
This example shows how to use an Object Table to parse PDF documents. Follow these steps.
Preparations
Prepare data
Download 80 prospectuses (PDF) from the Bojin Large Language Model Challenge—Financial Qwen-14B Dataset.
Log on to the OSS console. . Upload the downloaded PDF files to the bucket path. For upload details, see .
Prepare accounts
Log on to the Resource Access Management (RAM) console. Create an Alibaba Cloud RAM role and grant it OSS permissions.
We recommend granting the AliyunOSSReadOnlyAccess permission.
Grant login and Hologres access permissions to the RAM role.
Alibaba Cloud account (root account)
Update the trust policy for the RAM role. Update these parameters:
Action: Set to
sts:AssumeRole.Service: Set to
hologres.aliyuncs.com.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::1866xxxx:root" ], "Service": [ "hologres.aliyuncs.com" ] } } ], "Version": "1" }RAM user (sub-account)
Grant permissions to the RAM user.
On the page, click Create Policy, and select the Script Editor mode to create a policy. For more information, see Create a custom policy.
Hologres uses this policy to check whether the RAM user has permission to create the RAM role. Policy content:
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "hologram:GrantAssumeRole", "Resource": "<RoleARN>" } ] }On the page, click Add Permissions in the Actions
Grant permissions to the RAM role.
Update the trust policy for the RAM role. Update these parameters:
Action: Set to
sts:AssumeRole.Service: Set to
hologres.aliyuncs.com.
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::1866xxxx:root" ], "Service": [ "hologres.aliyuncs.com" ] } } ], "Version": "1" }
Prepare a Hologres instance
Ensure the Hologres instance version is V4.0 or later.
Procedure
Connect to HoloWeb and run queries. Create an Object Table to link to the PDF files in OSS.
CREATE OBJECT TABLE public.pdf_bs_challenge_financial_14b_dataset WITH ( path = 'oss://xxxx-hangzhou/bs_challenge_financial_14b_dataset/pdf', -- Required oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com', role_arn = 'acs:ram::xxx:role/xxxx' );Refresh and query the Object Table.
You must manually refresh the Object Table once after creation to query data. Run these commands:
-- Refresh REFRESH OBJECT TABLE public.pdf_bs_challenge_financial_14b_dataset; -- Query SELECT object_uri ,etag, file->>'size' AS size, file->>'last_modified_at' as last_modified_at FROM public.pdf_bs_challenge_financial_14b_dataset;The query returns:
object_uri | etag | size | last_modified_at --------------------------------------------------------------------------------- oss://hm-****-hangzhou/bs_challenge_financial_14b_dataset/pdf/5c2707b186a798118bba251cce8d1fc331b****.PDF | 836F504F63F4B77A2271CD3285DDECF7 | 4170245 | 2025-08-18 14:18:19+08 oss://hm-****-hangzhou/bs_challenge_financial_14b_dataset/pdf/72103a0ef6299585cdd09751682534d873f****.PDF | 39DA597D83BBCA571AC8B94CC464CDC8 | 4022782 | 2025-08-18 14:17:05+08Search unstructured data.
An Object Table maps PDF metadata fields one-to-one. You can combine it with Hologres AI Function to search and infer PDF content. Follow these steps:
This example uses AI resources with the specification large-96core-512G-384G and one node.
After purchasing AI resources, go to the Hologres console to deploy models. For PDF search, deploy the following models and resources:
Column | Type | Description |
| TEXT | The exact OSS file path. |
| TEXT | A unique identifier for the file content, assigned when the object is created. |
| FILE | A JSON-like structure containing detailed file metadata. Parse it using JSON and JSONB types. Fields include: |
| JSON | Custom metadata attached to the file. |
Create a table to store embedded PDF data.
Create an internal Hologres table to store embedded PDF data. Build a vector index on this table for vector search.
-- Create embedding table
CREATE TABLE public.embed_pdf_bs (
object_uri text NOT NULL,
etag text NOT NULL,
chunk text NOT NULL,
embedding_vector float4[] CHECK(array_ndims(embedding_vector) = 1 AND array_length(embedding_vector, 1) =1024)
) WITH (
bitmap_columns = 'object_uri,etag,chunk',
clustering_key = 'object_uri,etag',
distribution_key = 'object_uri',
vectors = '{
"embedding_vector": {
"algorithm": "HGraph",
"distance_method": "Cosine",
"builder_params": {
"base_quantization_type": "sq8_uniform",
"max_degree": 64,
"ef_construction": 400,
"precise_quantization_type": "fp32",
"use_reorder": true
}
}}'
);Call AI Function to embed and chunk PDFs, then write results to the table.
-- Convert PDFs to documents and embed
insert into public.embed_pdf_bs
WITH parsed_doc AS (
SELECT object_uri, etag, ai_parse_document(file, 'auto', 'markdown') AS doc
FROM pdf_bs_challenge_financial_14b_dataset limit 1),
chunked_doc AS (
SELECT object_uri, etag, unnest(ai_chunk(doc, 300, 50)) AS chunk
FROM parsed_doc )
SELECT object_uri, etag, chunk, ai_embed(chunk) AS embedding_vector
FROM chunked_doc
WHERE chunk IS NOT NULL AND length(chunk) > 0;Search and infer PDF content.
You can assess whether a company’s future outlook is optimistic or pessimistic based on financial trends in prospectuses. This helps guide investment decisions. Example SQL:
WITH
-- Embed question and retrieve
embedding_recall AS (
SELECT chunk FROM embed_pdf_bs
ORDER BY
approx_cosine_distance(
embedding_vector,
ai_embed('In the reporting period, what were the year-on-year growth rates for revenue and net profit of Hunan Goke Microelectronics Co., Ltd. in 2014, 2015, and 2016?')
) DESC
LIMIT 20
),
-- Concatenate top chunks
concat_top_chunks AS (
SELECT string_agg(chunk, E'\n\n----\n\n') AS merged_chunks FROM embedding_recall
)
-- Generate final answer
SELECT
ai_gen('Analyze whether the following financial trend is optimistic or pessimistic, and explain why: ' ||
ai_gen('In the reporting period, what were the year-on-year growth rates for revenue and net profit of Hunan Goke Microelectronics Co., Ltd. in 2014, 2015, and 2016? ' || merged_chunks))
FROM concat_top_chunks;
The query returns:
ai_gen
----
"From the provided financial data and growth analysis, Hunan Goke Microelectronics Co., Ltd. showed an overall **optimistic** performance trend from 2014 to 2016, for the following reasons:
### 1. **Revenue grew consistently**
- Revenue increased by 112.94% in 2014, 103.00% in 2015, and 33.22% in 2016. This indicates steady business expansion and rising market demand.
- Growth was driven by mass production and sales of new products (e.g., smart surveillance chips and GK-series chips) and partnerships with well-known distributors, highlighting strengths in product innovation and market outreach.
### 2. **Net profit rose significantly**
- Net profit surged by 467.53% in 2014, 8.21% in 2015, and 33.07% in 2016. Although growth slowed in 2015, it remained positive.
- In 2016, higher-margin products (e.g., solid-state storage chips) contributed more to revenue, while sales expenses declined. This reflects successful optimization of product mix and cost control.
### 3. **Product mix improved and innovation advanced**
- In 2014, the company benefited from the opening of the Live Satellite market. In 2015, smart surveillance chips entered mass production. In 2016, GK-series chips were adopted by major distributors. This shows the company’s ability to align with market needs and drive growth through innovation.
- Higher-margin products made up a larger share of sales, boosting overall profitability.
### 4. **Market reach expanded**
- Partnerships with leading distributors (e.g., Koton and Shijie) indicate breakthroughs in channel development, helping expand market share and brand influence.
### 5. **Overall trend is positive**
- Although revenue growth slowed in 2016 compared to 2015, net profit continued to rise at a healthy pace. This signals improved operational efficiency and profitability. It shows the company adapts well to market changes and optimizes effectively.
### Conclusion:
Overall, Hunan Goke Microelectronics Co., Ltd. showed an **optimistic** performance trend from 2014 to 2016. The company achieved consistent growth in both revenue and net profit. It strengthened its core competitiveness and profitability through product mix optimization, innovation, and market expansion. While revenue growth slowed in 2016, stable net profit growth reflects strong operational management and cost control."Function name | Model name | Model Function Description | CPU per replica | Memory per replica | GPU per replica | Number of replicas |
to_doc | ds4sd/docling-models | Convert PDFs to documents. | 20 core | 100 G | 1 GPU (48 G) | 1 |
chunk | recursive-character-text-splitter | Split documents. Split large PDFs. | 15 core | 30 G | 0 GPU (0 G) | 1 |
pdf_embed | BAAI/bge-base-zh-v1.5 | Document Embedding. | 7 core | 30 G | 1 GPU (96 G) | 1 |
llm | Qwen/Qwen3-32B | Use a Large Language Model (LLM) to infer retrieved document content based on prompts. | 7 core | 30 G | 1 GPU (96 G) | 1 |
Best practices
Object tables work with Dynamic Tables, AI Functions, vector search, and full-text search to automate processing, searching, and analysis of unstructured data. For details, see: