This topic describes the required and optional parameters for running Proxima CE.
Required parameters
Parameter Name | Description |
doc_table | The input base table, which is a MaxCompute table. Prepare this table to use as the candidate set for retrieval. Important Do not use a period ( |
doc_table_partition | The MaxCompute partition of the base table. |
query_table | The input query table, which is a MaxCompute table. Prepare this table to use as the retrieval set. Important Do not use a period ( |
query_table_partition | The MaxCompute partition of the query table. |
output_table | The output table. You do not need to create this table. Specify a table name to store the retrieval results. |
output_table_partition | The MaxCompute partition of the output table. |
data_type | The data type of the input data table. The supported types are |
dimension | The dimension of the feature vector. If |
Optional parameters
Parameter Name | Description | Default value |
h (–help) | Displays help information. | None |
topk | The number of similar results to retrieve. You can specify multiple values, such as | 200 |
pk_type | Specifies the data type of the | string |
vector_separator | The separator for the vector. You can specify a separator other than the tilde (~). Spaces are supported. To use a space, specify | ~ |
binary_to_int | Specifies whether to use INT32 to represent BINARY data. This parameter is valid only for data of the BINARY type. The | false |
job_mode | The supported modes are combinations of the following:
| train:build:seek |
clean_build_volume | Specifies whether to delete the index. After the build job completes index building, it writes the index to a MaxCompute volume. The seek job then loads this index. After the seek job is executed, the index is deleted by default. Note If this parameter is set to true, the index is also cleared when the task fails. | true |
algo_model | The index building method. Based on the proxima2.x kernel, the following six index building methods are supported:
| hnsw |
builder_params | The parameters for index building. The default value is empty. These parameters must correspond to the index type specified by | None |
searcher_params | The parameters for index searching. The default value is empty. These parameters must correspond to the index type specified by | None |
converter | The name of the converter for index building. Index Converter is a Proxima 2.x module that transforms feature vectors. For example, it can perform dimensionality reduction, half-float conversion, or INT8 quantization on features. It can be used independently or as part of the retrieval flow. For more information, see Index Converter. | None |
converter_params | The parameters for the converter. Provide the parameters as a single-line JSON string. Do not escape the double quotation marks or include spaces. For example, to specify the parameters for | None |
distance_method | The formula for calculating the distance between features. The following methods are supported:
| squared_euclidean |
measure_params | The parameters for the distance method specified by -distance_method. Provide the parameters as a single-line JSON string. Do not escape the double quotation marks or include spaces. For example, to specify the parameters for | None |
column_num | The number of columns for index building. The default value is 0.
Both | 0 |
row_num | The number of rows for retrieval queries. The default value is 0.
Both | 0 |
category_threshold | In multi-category retrieval scenarios, this parameter specifies the threshold for large-category retrieval. When the number of documents in a category exceeds this threshold, the category is processed using large-category retrieval. Otherwise, it is processed using small-category retrieval. Small-category retrieval uses the linear retrieval method by default, and data from multiple small categories is merged for retrieval. | 1000000 |
category_col_num | When you query by category, this parameter specifies the number of columns for building indexes for small categories (fewer than 1 million documents). For more information, see the description of the | 0 |
category_row_num | When you query by category, this parameter specifies the number of rows for querying indexes for small categories (fewer than 1 million documents). For more information, see the description of the | 0 |
category_thread_num | When you query by category, this parameter sets the concurrency (thread pool size) for tasks that process large categories (more than 1 million documents). | 10 |
query_multi_label | Specifies whether a single query can have multiple categories. If this parameter is set to | false |
threshold_score | The score threshold for filtering retrieval results. For distance methods other than | None |
tunnel_endpoint | The tunnel endpoint for MaxCompute. The default value is empty. This prevents download session creation failures when accessing data tables across networks. For more information, see MaxCompute Tunnel Endpoint issues. | None |
memory_load | Specifies the index loading method for the seek phase. The default value is true, which indicates that the index is loaded entirely into memory. If cluster memory resources are limited, you can set this to false as needed. | true |
sharding_mode | The index sharding method. The | hash |
kmeans_resource_name | This parameter is used for the | kmeans_resource_name |
kmeans_sample_ratio | This parameter is used for the | 0.05 |
kmeans_seek_ratio | This parameter is used for the | 0.1 |
kmeans_iter_num | This parameter is used for the | 30 |
kmeans_cluster_num | This parameter is used for the | 1000 |
kmeans_init_center_method | This parameter is used for the | "" |
kmeans_worker_num | This parameter is used for the | 0 |
mapper_split_size | Exposes the | 256 |
odps_task_priority | The priority of the Proxima CE task. This is set by configuring the priority for all internal MaxCompute tasks in Proxima CE, such as SQL, MapReduce (MR), and Graph tasks. The value can be an integer from 0 to 9. A smaller value indicates a higher priority. The default value is -1, which follows the baseline priority of MaxCompute. | -1 |
oss_access_id | The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user. You can obtain the AccessKey ID on the AccessKey Management page. | None |
oss_access_key | The AccessKey secret that corresponds to the AccessKey ID. You can obtain the AccessKey secret on the AccessKey Management page. | None |
oss_endpoint | The endpoint of the MaxCompute service. You need to configure the endpoint based on the region and network connectivity type you selected when you created the MaxCompute project. For the endpoints of different regions and networks, see Endpoints. | None |
oss_bucket | The name of the Object Storage Service (OSS) bucket. For information about how to view bucket names, see List buckets. | None |