CreateCollection

更新时间:
复制 MD 格式

Create a vector dataset.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

gpdb:CreateCollection

create

*Collection

acs:gpdb:{#regionId}:{#accountId}:collection/{#DBInstanceId}

None None

Request parameters

Parameter

Type

Required

Description

Example

DBInstanceId

string

No

The instance ID.

Note

You can call the DescribeDBInstances operation to query the IDs of all AnalyticDB for PostgreSQL instances in a specific region.

gp-bp152460513z****

ManagerAccount

string

Yes

The name of the management account that has the rds_superuser privilege.

Note

You can call the CreateAccount operation to create an account.

testaccount

ManagerAccountPassword

string

Yes

The password of the management account.

testpassword

Namespace

string

No

The namespace.

Note

You can call the CreateNamespace operation to create a namespace or the ListNamespaces operation to list existing namespaces.

mynamespace

Collection

string

Yes

The name of the collection to create.

Note

The name must comply with PostgreSQL object naming conventions.

document

Dimension

integer

No

The vector dimension.

Note

If you specify this parameter, a vector index is created. In subsequent calls to the UpsertCollectionData operation, the length of Rows.Vector must match this dimension. If you do not specify this parameter, you must call the CreateVectorIndex operation to create an index later.

1024

FullTextRetrievalFields

string

No

The fields to use for full-text search. Use commas (,) to separate multiple field names. These fields must be keys defined in the Metadata parameter.

title,content

Metadata

string

Yes

A JSON string that defines the metadata schema as a map. The keys are field names, and the values are their corresponding data types.

Note

Supported data types

  • For a list of supported data types, see Data types.

  • The money data type is not supported.

Warning The field names id, vector, to_tsvector, and source are reserved and cannot be used.

{"title":"text","content":"text","response":"int"}

Parser

string

No

The parser for full-text search. The default is zh_cn.

zh_cn

RegionId

string

Yes

The ID of the region where the instance is located.

cn-hangzhou

Metrics

string

No

The distance metric used to build the vector index. Valid values:

  • l2: Euclidean distance.

  • ip: dot product.

  • cosine: cosine similarity.

cosine

HnswM

integer

No

The maximum number of neighbors for the HNSW algorithm. You do not typically need to set this parameter, as the system automatically determines a value based on the vector dimension.

Note

Value range:

  • For AnalyticDB for PostgreSQL V6.0 instances: 1 to 1000.

  • For AnalyticDB for PostgreSQL V7.0 instances: 2 to 100. The default value is 16.

Note

We recommend that you set this parameter based on the vector dimension:

  • 16 for dimensions less than or equal to 384.

  • 32 for dimensions greater than 384 and less than or equal to 768.

  • 64 for dimensions greater than 768 and less than or equal to 1024.

  • 128 for dimensions greater than 1024.

64

HnswEfConstruction

string

No

The size of the candidate set for HNSW index construction. The value must be greater than or equal to 2 * HnswM.

Note

Value range:

  • For AnalyticDB for PostgreSQL V6.0 instances: 40 to 4000.

  • For AnalyticDB for PostgreSQL V7.0 instances: 4 to 1000. The default value is 64.

128

PqEnable

integer

No

Specifies whether to enable Product Quantization (PQ) for index acceleration. This is recommended for datasets with more than 500,000 entries. Valid values:

  • 0: Disabled.

  • 1: (Default) Enabled.

1

ExternalStorage

integer

No

Specifies whether to use mmap to build the HNSW index. The default value is 0. We recommend setting this to 1 if your data does not require deletion and you need high-performance data ingestion.

Valid values:

  • 0: (Default) Builds the index by using segmented page storage. This mode can use the shared_buffer in PostgreSQL for caching and supports DELETE and UPDATE operations.

  • 1: Builds the index by using mmap. This mode does not support DELETE or UPDATE operations.

Important The ExternalStorage parameter is available only for AnalyticDB for PostgreSQL v6.0 instances and is not supported in v7.0.

0

WorkspaceId

string

No

The ID of the workspace, which contains multiple database instances. You must specify either WorkspaceId or DBInstanceId. If both are specified, WorkspaceId takes precedence.

gp-ws-*****

MetadataIndices

string

No

The scalar index fields. Separate multiple fields with commas (,). The fields must be keys that are defined in Metadata.

title

SupportSparse

boolean

No

Specifies whether to enable support for sparse vectors. The default value is false.

true

SparseVectorIndexConfig

object

No

The configuration for the sparse vector index. If specified, a sparse vector index is created.

HnswM

integer

No

The maximum number of neighbors for the HNSW algorithm. You do not typically need to set this parameter, as the system automatically determines a value based on the vector dimension.

Note

Value range:

  • For AnalyticDB for PostgreSQL V6.0 instances: 1 to 1000.

  • For AnalyticDB for PostgreSQL V7.0 instances: 2 to 100. The default value is 16.

Note

We recommend that you set this parameter based on the vector dimension:

  • 16 for dimensions less than or equal to 384.

  • 32 for dimensions greater than 384 and less than or equal to 768.

  • 64 for dimensions greater than 768 and less than or equal to 1024.

  • 128 for dimensions greater than 1024.

64

HnswEfConstruction

integer

No

The size of the candidate set for HNSW index construction. The value must be an integer from 4 to 1,000. The default is 64.

Note

This parameter is required only for AnalyticDB for PostgreSQL V7.0 instances, and its value must be greater than or equal to 2 * HnswM.

128

Algorithm

string

No

The vector index algorithm.

Valid values:

  • hnswflat: (Default) An HNSW index without quantization compression.

  • novam: A graph index without quantization compression. This algorithm is suitable for high-performance scenarios, such as real-time recommendations.

hnswflat

VectorIndexConfig

object

No

The configuration for the dense vector index.

Nlist

integer

No

The number of lists (partitions) for a novad index. The value must be an integer from 2 to 1,073,741,824. The default is 256.

256

RabitqBits

integer

No

The number of bits for rabitq compression. The value must be an integer from 1 to 8. The default is 3.

3

Algorithm

string

No

The vector index algorithm.

Valid values:

  • hnswflat: (Default) An HNSW index without quantization compression.

  • novam: A graph index without quantization compression. This algorithm is suitable for high-performance scenarios, such as real-time recommendations.

  • novad: A partitioned index with rabitq quantization. This algorithm is suitable for large-scale, low-cost retrieval scenarios.

hnswflat

Note

After creating a collection, use DescribeCollection to view it.

Response elements

Element

Type

Description

Example

object

RequestId

string

The request ID.

ABB39CC3-4488-4857-905D-2E4A051D0521

Message

string

The response message.

create successfully

Status

string

The status of the operation. Valid values:

  • success: The operation succeeded.

  • fail: The operation failed.

success

Examples

Success response

JSON format

{
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "Message": "create successfully",
  "Status": "success"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.