In-Database AI/ML overview

更新时间:
复制 MD 格式

AnalyticDB PostgreSQL 7.0 provides In-Database AI/ML through the pgml extension. Train, fine-tune, deploy, and run inference on models with GPU/CPU acceleration — all without moving data out of the database. PostgresML-compatible, the extension integrates XGBoost, LightGBM, and Scikit-Learn.

Prerequisites

  • An AnalyticDB PostgreSQL 7.0 instance with kernel version V7.1.1.0 or later.

    Note

    You can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.

  • Your instance uses elastic storage mode.

  • The pgml extension is installed on the instance.

    Note
    • The pgml extension cannot be installed from the console. To install or uninstall it, submit a ticket.

    • The pgml extension is not supported on AnalyticDB for PostgreSQL 7.0 Basic Edition.

Metadata tables

Installing the pgml extension on an AnalyticDB PostgreSQL 7.0 instance creates a pgml schema with the following metadata tables.

Metadata table name

Description

projects

Training task project metadata.

models

Model training parameters and metrics.

files

Stored model files.

snapshots

Training dataset snapshots.

logs

Training logs.

deployments

Model deployment records.

Training tasks populate these metadata tables.

Note

The pgml metadata tables use custom types (such as task, runtime, and sampling) defined in Machine learning.

projects

Stores project ID, name, task type, and timestamps for training tasks. Schema and indexes:

                                         Table "pgml.projects"
   Column   |            Type             | Collation | Nullable |               Default                
------------+-----------------------------+-----------+----------+--------------------------------------
 id         | bigint                      |           | not null | nextval('projects_id_seq'::regclass)
 name       | text                        |           | not null | 
 task       | task                        |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
 updated_at | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "projects_pkey" PRIMARY KEY, btree (id)
    "projects_name_idx" btree (name)
Triggers:
    projects_auto_updated_at BEFORE UPDATE ON projects FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_projects BEFORE INSERT ON projects FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_projects()
Distributed Replicated

models

Stores model training parameters and metadata, including project ID and snapshot ID. Schema and indexes:

                                           Table "pgml.models"
    Column     |            Type             | Collation | Nullable |              Default               
---------------+-----------------------------+-----------+----------+------------------------------------
 id            | bigint                      |           | not null | nextval('models_id_seq'::regclass)
 project_id    | bigint                      |           | not null | 
 snapshot_id   | bigint                      |           |          | 
 num_features  | integer                     |           | not null | 
 algorithm     | text                        |           | not null | 
 runtime       | runtime                     |           |          | 'python'::runtime
 hyperparams   | jsonb                       |           | not null | 
 status        | text                        |           | not null | 
 metrics       | jsonb                       |           |          | 
 search        | text                        |           |          | 
 search_params | jsonb                       |           | not null | 
 search_args   | jsonb                       |           | not null | 
 created_at    | timestamp without time zone |           | not null | clock_timestamp()
 updated_at    | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "models_pkey" PRIMARY KEY, btree (id)
    "models_project_id_idx" btree (project_id)
    "models_snapshot_id_idx" btree (snapshot_id)
Triggers:
    models_auto_updated_at BEFORE UPDATE ON models FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_models BEFORE INSERT ON models FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_models_fk()
Distributed Replicated

files

Stores trained model files in binary format (data column), split into 100 MB chunks. Schema and indexes:

                                         Table "pgml.files"
   Column   |            Type             | Collation | Nullable |              Default              
------------+-----------------------------+-----------+----------+-----------------------------------
 id         | bigint                      |           | not null | nextval('files_id_seq'::regclass)
 model_id   | bigint                      |           | not null | 
 path       | text                        |           | not null | 
 part       | integer                     |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
 updated_at | timestamp without time zone |           | not null | clock_timestamp()
 data       | bytea                       |           | not null | 
Indexes:
    "files_pkey" PRIMARY KEY, btree (id)
    "files_model_id_path_part_idx" btree (model_id, path, part)
Triggers:
    files_auto_updated_at BEFORE UPDATE ON files FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_files BEFORE INSERT ON files FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_files()
Distributed Replicated

snapshots

Stores training dataset snapshots, including the source table name and test/train split configuration. Schema and indexes:

                                           Table "pgml.snapshots"
    Column     |            Type             | Collation | Nullable |                Default                
---------------+-----------------------------+-----------+----------+---------------------------------------
 id            | bigint                      |           | not null | nextval('snapshots_id_seq'::regclass)
 relation_name | text                        |           | not null | 
 y_column_name | text[]                      |           |          | 
 test_size     | real                        |           | not null | 
 test_sampling | sampling                    |           | not null | 
 status        | text                        |           | not null | 
 columns       | jsonb                       |           |          | 
 analysis      | jsonb                       |           |          | 
 created_at    | timestamp without time zone |           | not null | clock_timestamp()
 updated_at    | timestamp without time zone |           | not null | clock_timestamp()
 materialized  | boolean                     |           |          | false
Indexes:
    "snapshots_pkey" PRIMARY KEY, btree (id)
Triggers:
    snapshots_auto_updated_at BEFORE UPDATE ON snapshots FOR EACH ROW EXECUTE FUNCTION set_updated_at()
Distributed Replicated

logs

Stores training logs, with multiple entries per task ordered by created_at ascending. Schema and indexes:

                                         Table "pgml.logs"
   Column   |            Type             | Collation | Nullable |             Default              
------------+-----------------------------+-----------+----------+----------------------------------
 id         | integer                     |           | not null | nextval('logs_id_seq'::regclass)
 model_id   | bigint                      |           |          | 
 project_id | bigint                      |           |          | 
 created_at | timestamp without time zone |           |          | CURRENT_TIMESTAMP
 logs       | jsonb                       |           |          | 
Indexes:
    "logs_pkey" PRIMARY KEY, btree (id)
Distributed Replicated

deployments

Stores model deployment records, including project ID, deployment ID, and model ID. Schema and indexes:

                                         Table "pgml.deployments"
   Column   |            Type             | Collation | Nullable |                 Default                 
------------+-----------------------------+-----------+----------+-----------------------------------------
 id         | bigint                      |           | not null | nextval('deployments_id_seq'::regclass)
 project_id | bigint                      |           | not null | 
 model_id   | bigint                      |           | not null | 
 strategy   | strategy                    |           | not null | 
 created_at | timestamp without time zone |           | not null | clock_timestamp()
Indexes:
    "deployments_pkey" PRIMARY KEY, btree (id)
    "deployments_model_id_created_at_idx" btree (model_id)
    "deployments_project_id_created_at_idx" btree (project_id)
Triggers:
    deployments_auto_updated_at BEFORE UPDATE ON deployments FOR EACH ROW EXECUTE FUNCTION set_updated_at()
    trigger_before_insert_pgml_deployments BEFORE INSERT ON deployments FOR EACH ROW EXECUTE FUNCTION trigger_check_pgml_deployments_fk()
Distributed Replicated