GetIndexMonitor

更新时间:
复制 MD 格式

You can call the GetIndexMonitor operation to query monitoring data for a specified knowledge base within a specific time range. This data is crucial for App Performance Analytics, capacity planning, and cost management. The monitoring data includes two main dimensions: storage and retrieval. Storage monitoring retrieves the index storage limit and current usage of the knowledge base. Retrieval monitoring retrieves performance metrics for the query period, such as peak queries per second (QPS), total requests, and average QPS. The metrics are provided as totals and are also broken down by time window. The requests are categorized as successful, failed, and rate-limited.

Operation description

  • Before you call this operation, a RAM user must obtain the required API permissions for Alibaba Cloud Model Studio (which requires the AliyunBailianDataFullAccess permission) and join a workspace. Alibaba Cloud accounts can call this operation directly without authorization. You can call this operation using the latest version of the Alibaba Cloud Model Studio software development kit (SDK). Before you call this operation, make sure that the specified knowledge base has been created and has not been deleted. This means that the knowledge base ID (IndexId) must be valid. This operation is idempotent. The maximum query time range (EndTimestamp - StartTimestamp) is 30 days. The granularity of the time window in the returned data is dynamically adjusted based on the query time range.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

sfm:GetIndexMonitor

get

*All Resource

*

None None

Request syntax

GET /{WorkspaceId}/rag/index/monitor HTTP/1.1

Path Parameters

Parameter

Type

Required

Description

Example

WorkspaceId

string

Yes

The ID of the workspace where the knowledge base is located.

llm-3shx2gu255oqxxxx

Request parameters

Parameter

Type

Required

Description

Example

IndexId

string

Yes

The unique ID of the target knowledge base.

kb-123456xxxx

StartTimestamp

integer

Yes

The start of the time range to query. This is a UNIX timestamp in seconds.

1767604500

EndTimestamp

integer

Yes

The end of the time range to query. The end time can be a maximum of 30 days after the start time. This is a UNIX timestamp in seconds.

1767604500

Response elements

Element

Type

Description

Example

object

Schema of Response

RequestId

string

The request ID.

778C0B3B-xxxx-5FC1-A947-36EDD13606AB

Code

string

The status code.

200

Data

any

The core data object of the response.

pipelineCommercialType (String): The edition of the knowledge base.

  • standard: Standard Edition

  • enterprise: Ultimate Edition

storageMonitorData (Object): The storage monitoring data of the knowledge base.

  • indexStorageLimit (Number): The index storage limit of the knowledge base, in GB.

  • indexStorageUsage (Number): The current index storage usage of the knowledge base, in GB.

pipelineCommercialCu (Integer): The number of RCU for the Ultimate Edition knowledge base. For example: 2.

qpsMonitorData (Object): The aggregated retrieval monitoring data for the knowledge base over the entire query period.

  • peakQps (Integer): The peak QPS over the entire time period.

  • totalRequests (Integer): The total number of requests over the entire time period.

  • avgQpsOfActiveSeconds (Number): The average QPS during active seconds over the entire time period. Active seconds are seconds in which calls were made.

  • monitorData (Array): An array of detailed monitoring data broken down by time window. Each object in the array represents the performance statistics for a single time window.

    Sub-properties

    • successData (Object): The statistics for successful requests within this window.

    • limitData (Object): The statistics for rate-limited requests within this window.

    • failData (Object): The statistics for failed calls within this window.

    • peakQpsInWindowRange (Integer): The total peak QPS within this window (successful + rate-limited + failed).

    • totalRequests (Integer): The total number of requests within this window (successful + rate-limited + failed).

    • windowRange (Integer): The start time of the time window (UNIX timestamp in seconds).

    • windowRangeEnd (Integer): The end time of the time window (UNIX timestamp in seconds).

    • avgQpsOfActiveSeconds (Number): The average QPS during active seconds within this window.

    The successData, limitData, and failData objects have the same internal structure, as described below:

    • peakQpsInWindowRange (Integer): The peak QPS for the corresponding status.

    • totalRequests (Integer): The total number of requests for the corresponding status.

    • avgQpsOfActiveSeconds (Number): The average QPS during active seconds for the corresponding status.

{ "code": "Success", "status_code": 200, "data": { "pipelineCommercialType": "standard", "storageMonitorData": Object{...}, "qpsMonitorData": Object{...} }, "success": true, "message": "success", "request_id": "65d34b79-b97e-478e-a0a3-xxx", "status": "SUCCESS" }

Message

string

The status message.

success

Success

boolean

Indicates whether the request was successful.

true

Status

integer

The status code returned by the operation.

SUCCESS

Examples

Success response

JSON format

{
  "RequestId": "778C0B3B-xxxx-5FC1-A947-36EDD13606AB",
  "Code": "200",
  "Data": "{\n    \"code\": \"Success\",\n    \"status_code\": 200,\n    \"data\": {\n\"pipelineCommercialType\": \"standard\",       \"storageMonitorData\": Object{...},\n        \"qpsMonitorData\": Object{...}\n    },\n    \"success\": true,\n    \"message\": \"success\",\n    \"request_id\": \"65d34b79-b97e-478e-a0a3-xxx\",\n    \"status\": \"SUCCESS\"\n}",
  "Message": "success",
  "Success": true,
  "Status": 0
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.