In-Database AI/ML overview
AnalyticDB PostgreSQL 7.0 provides In-Database AI/ML through the pgml extension. Train, fine-tune, deploy, and run inference on models with GPU/CPU acceleration — all without moving data out of the database. PostgresML-compatible, the extension integrates XGBoost, LightGBM, and Scikit-Learn.
Prerequisites
-
An AnalyticDB PostgreSQL 7.0 instance with kernel version V7.1.1.0 or later.
NoteYou can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.
-
Your instance uses elastic storage mode.
-
The pgml extension is installed on the instance.
Note-
The pgml extension cannot be installed from the console. To install or uninstall it, submit a ticket.
-
The pgml extension is not supported on AnalyticDB for PostgreSQL 7.0 Basic Edition.
-
Metadata tables
Installing the pgml extension on an AnalyticDB PostgreSQL 7.0 instance creates a pgml schema with the following metadata tables.
|
Metadata table name |
Description |
|
projects |
Training task project metadata. |
|
models |
Model training parameters and metrics. |
|
files |
Stored model files. |
|
snapshots |
Training dataset snapshots. |
|
logs |
Training logs. |
|
deployments |
Model deployment records. |
Training tasks populate these metadata tables.
The pgml metadata tables use custom types (such as task, runtime, and sampling) defined in Machine learning.
projects
Stores project ID, name, task type, and timestamps for training tasks. Schema and indexes:
Table "pgml.projects"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+--------------------------------------
id | bigint | | not null | nextval('projects_id_seq'::regclass)
name | text | | not null |
task | task | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"projects_name_idx" btree (name)
Triggers:
projects_auto_updated_at BEFORE UPDATE ON projects FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_projects BEFORE INSERT ON projects FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_projects()
Distributed Replicated
models
Stores model training parameters and metadata, including project ID and snapshot ID. Schema and indexes:
Table "pgml.models"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+------------------------------------
id | bigint | | not null | nextval('models_id_seq'::regclass)
project_id | bigint | | not null |
snapshot_id | bigint | | |
num_features | integer | | not null |
algorithm | text | | not null |
runtime | runtime | | | 'python'::runtime
hyperparams | jsonb | | not null |
status | text | | not null |
metrics | jsonb | | |
search | text | | |
search_params | jsonb | | not null |
search_args | jsonb | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"models_pkey" PRIMARY KEY, btree (id)
"models_project_id_idx" btree (project_id)
"models_snapshot_id_idx" btree (snapshot_id)
Triggers:
models_auto_updated_at BEFORE UPDATE ON models FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_models BEFORE INSERT ON models FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_models_fk()
Distributed Replicated
files
Stores trained model files in binary format (data column), split into 100 MB chunks. Schema and indexes:
Table "pgml.files"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-----------------------------------
id | bigint | | not null | nextval('files_id_seq'::regclass)
model_id | bigint | | not null |
path | text | | not null |
part | integer | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
data | bytea | | not null |
Indexes:
"files_pkey" PRIMARY KEY, btree (id)
"files_model_id_path_part_idx" btree (model_id, path, part)
Triggers:
files_auto_updated_at BEFORE UPDATE ON files FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_files BEFORE INSERT ON files FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_files()
Distributed Replicated
snapshots
Stores training dataset snapshots, including the source table name and test/train split configuration. Schema and indexes:
Table "pgml.snapshots"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+---------------------------------------
id | bigint | | not null | nextval('snapshots_id_seq'::regclass)
relation_name | text | | not null |
y_column_name | text[] | | |
test_size | real | | not null |
test_sampling | sampling | | not null |
status | text | | not null |
columns | jsonb | | |
analysis | jsonb | | |
created_at | timestamp without time zone | | not null | clock_timestamp()
updated_at | timestamp without time zone | | not null | clock_timestamp()
materialized | boolean | | | false
Indexes:
"snapshots_pkey" PRIMARY KEY, btree (id)
Triggers:
snapshots_auto_updated_at BEFORE UPDATE ON snapshots FOR EACH ROW EXECUTE FUNCTION set_updated_at()
Distributed Replicated
logs
Stores training logs, with multiple entries per task ordered by created_at ascending. Schema and indexes:
Table "pgml.logs"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+----------------------------------
id | integer | | not null | nextval('logs_id_seq'::regclass)
model_id | bigint | | |
project_id | bigint | | |
created_at | timestamp without time zone | | | CURRENT_TIMESTAMP
logs | jsonb | | |
Indexes:
"logs_pkey" PRIMARY KEY, btree (id)
Distributed Replicated
deployments
Stores model deployment records, including project ID, deployment ID, and model ID. Schema and indexes:
Table "pgml.deployments"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+-----------------------------------------
id | bigint | | not null | nextval('deployments_id_seq'::regclass)
project_id | bigint | | not null |
model_id | bigint | | not null |
strategy | strategy | | not null |
created_at | timestamp without time zone | | not null | clock_timestamp()
Indexes:
"deployments_pkey" PRIMARY KEY, btree (id)
"deployments_model_id_created_at_idx" btree (model_id)
"deployments_project_id_created_at_idx" btree (project_id)
Triggers:
deployments_auto_updated_at BEFORE UPDATE ON deployments FOR EACH ROW EXECUTE FUNCTION set_updated_at()
trigger_before_insert_pgml_deployments BEFORE INSERT ON deployments FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_deployments_fk()
Distributed Replicated