Create a collection
This topic describes how to create a collection using the Python SDK.
Prerequisites
A cluster is created: Create a cluster.
An API key is obtained: API-KEY management.
The latest version of the SDK is installed: Install the DashVector SDK.
API definition
Client.create(
name: str,
dimension: int,
dtype: Union[Type[int], Type[float]]=float,
fields_schema: Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool], Type[long],Type[List[long]],Type[List[str]],Type[List[int]], Type[List[float]]]]=None,
metric: str='cosine',
extra_params: Dict[str, Any]=None,
timeout: Optional[int]=None,
vectors: Union[None, VectorParam, Dict[str, VectorParam]] = None,
sparse_vectors: Union[None, Dict[str, VectorParam]] = None,
) -> DashVectorResponseExamples
To ensure the code runs correctly, replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with your cluster endpoint.
Create a single-vector collection
import dashvector
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
# Create a collection named quickstart with 4 vector dimensions,
# a vector data type of float (default),
# and a dotproduct (inner product) distance measure.
# Pre-define eight fields: name, weight, age, id, tags, numbers, bank_cards, and grades, with the data types str, float, int, dashvector.long, List[str], List[int], List[dashvector.long], and List[float] respectively.
# Set timeout to -1 to enable asynchronous mode for the create operation.
ret = client.create(
name='quickstart',
dimension=4,
metric='dotproduct',
dtype=float,
# For semantic clarity in type annotations, DashVector defines the long type using the typing module. This lets you annotate large integers.
fields_schema={
'name': str, 'weight': float, 'age': int, 'id': dashvector.long,
'tags': List[str],
'numbers': List[int],
'bank_cards': List[dashvector.long],
'grades': List[float]
},
timeout=-1
)
# Check if the collection was created successfully.
if ret:
print('create collection success!')
# This is equivalent to the following code:
# from dashvector import DashVectorCode
# if ret.code == DashVectorCode.Success:
# print('create collection success!')
Create a multi-vector collection
ret = client.create(
'multi_vector_demo',
vectors={
"title": VectorParam(4),
"content": VectorParam(6, metric="euclidean"),
},
sparse_vectors={
"abstruct": VectorParam(metric="dotproduct"),
"keywords": VectorParam(metric="dotproduct"),
# Sparse vector indexes currently support only the dotproduct measure. You do not need to set dimension or dtype because they use default values.
},
fields_schema={
'author': str,
}
)
assert retA multi-vector collection cannot contain only one dense vector and one sparse vector. For this scenario, create a single-vector collection and set the distance measure to dotproduct.
Request parameters
Parameter | Type | Default value | Description |
name | str | - | The name of the collection to create. |
dimension | int | - | The vector dimensions. |
dtype (optional) | Union[Type[int], Type[float]] | float | The vector data type. |
fields_schema (optional) | Optional[Dict[str,Union[Type[str], Type[int], Type[float], Type[bool],Type[long], Type[List[long], Type[List[str], Type[List[int], Type[List[float]]]] | None | The field definitions. |
metric (optional) | str | cosine | The distance measure. Valid values: If this parameter is set to |
extra_params (optional) | Dict[str, Any] | None | Optional parameters:
|
timeout (optional) | Optional[int] | None |
|
vectors (optional) | Union[None, VectorParam, Dict[str, VectorParam]] | None | Optional parameters:
|
sparse_vectors (optional) | Union[None, VectorParam, Dict[str, VectorParam]] | None | Optional parameters:
|
To learn about the benefits of pre-defining fields when you create a collection, see Schema Free.
For more information about quantization policies, see Dynamic vector quantization.
Response parameters
The result is a DashVectorResponse object. This object contains information about the operation, as described in the following table.
Field | Type | Description | Example |
code | int | The status code. For more information, see Status codes. | 0 |
message | str | The returned message. | success |
request_id | str | The unique ID of the request. | 19215409-ea66-4db9-8764-26ce2eb5bb99 |