KMS Agent is an HTTP proxy service that retrieves credential values from KMS and caches them in memory. Applications retrieve these credential values from KMS Agent through HTTP requests. Deploying KMS Agent simplifies identity authentication and cache management for applications, especially in large-scale scenarios. KMS Agent also reduces application refactoring costs and ensures a unified integration standard. This topic outlines KMS Agent.
KMS Agent credential retrieval flow
KMS Agent uses memory to cache credential values. It periodically refreshes the cached values based on the Time to Live (TTL) that you set. When an application requests a credential value from KMS Agent through an HTTP request, the agent verifies the request's validity using the Server-Side Request Forgery (SSRF) token file. If a valid credential value exists in the cache, the agent returns it directly. Otherwise, the agent forwards the request to the KMS service. After the KMS service verifies the agent's identity, it decrypts and returns the credential. KMS Agent then updates its cache and returns the credential value to the application in an HTTP message. This process is illustrated in the following figures:
Cache hit process.
Cache miss process.
You can deploy KMS Agent together with your applications. It supports multiple environments, such as local physical servers, virtual machines (for example, ECS), and containers (for example, Kubernetes pods). For more information about the KMS Agent code, visit alibabacloud-kms-agent.
Scope
KMS Agent is applicable only to KMS 3.0 instances. If you are using an earlier version, you must first upgrade or purchase a 3.0 instance. For more information, see Purchase and enable a KMS instance.
Functional modules
The KMS Agent proxy consists of four modules: HTTP Server, Cache, KMS Configuration, and Log.
You can configure each functional module by setting the corresponding parameters in the configuration file. The following is a sample configuration file:
# All configuration items
[Server]
# Optional. The default value is 2025. The agent listens on 127.0.0.1:2025 by default.
HttpPort = 2025
# Optional. The default value is ["X-KMS-Token", "X-Vault-Token"].
# Requests to the agent must include an SSRF header. Otherwise, access is denied.
SSRFHeaders = ["X-KMS-Token"]
# Optional. The default value is ["KMS_TOKEN", "KMS_SESSION_TOKEN", "KMS_CONTAINER_AUTHORIZATION_TOKEN"]. The variable value can be a specific value or a file path, such as file:///var/run/awssmatoken.
# The agent retrieves the SSRF token from the environment variable and compares it with the token in the request header. Access is granted only if they match.
SSRFEnvVariables = ["KMS_TOKEN"]
# Optional. The default value is "/v1/".
# The URI prefix for path-based requests.
PathPrefix = "/v1/"
# Optional. The default value is 800.
# The maximum number of concurrent requests.
MaxConn = 800
# Optional. The default value is 0.
# 0: The credential is returned in the format of a KMS GetSecretValue API response. 1: The credential is returned in the format of an AWS Secrets Manager GetSecretValue API response. 2: The credential is returned in the HashiCorp KV structure.
ResponseType = 0
# Optional. The default value is true.
# If IgnoreTransientErrors is set to true, the agent returns the expired credential from the cache when the cache becomes invalid and a request to the remote KMS fails.
IgnoreTransientErrors = true
[Kms]
# Optional. The default value is cn-hangzhou.
# The region where KMS is located.
Region = "cn-hangzhou"
# Optional. The default value is kms.cn-hangzhou.aliyuncs.com.
# The endpoint can be a shared gateway endpoint or a dedicated gateway endpoint.
Endpoint = "kms.cn-hangzhou.aliyuncs.com"
[Cache]
# Optional. The default value is InMemory. Only memory cache is supported.
CacheType = "InMemory"
# Optional. The default cache size is 1000 credentials. If CacheSize is set to 0, caching is disabled, and each request accesses the remote KMS.
CacheSize = 1000
# Optional. The cache validity period. The default value is 300s.
TtlSeconds = 300
# Optional. The cache eviction policy. The default value is false.
# When the number of cached credentials reaches the CacheSize limit, a value of false indicates that the oldest credential is deleted. A value of true indicates that the least recently used credential is evicted.
EnableLRU = false
[Log]
# Optional. The default log level is Debug.
LogLevel = "Debug"
# Optional. The default log path is ./logs/ in the application startup directory.
LogPath = "./logs/"
# Optional. The default size of a single log file is 100 MB.
MaxSize = 100
# Optional. The default number of log files to retain is 2.
MaxBackups = 2HTTP Server module.
This module responds to application requests to retrieve credentials. By default, the credential values returned by KMS Agent use the same response format as the GetSecretValue operation. You can also set the ResponseType parameter in the configuration file to specify other response formats.
Cache module.
KMS Agent has a built-in caching mechanism that stores credentials in memory. The credential values in the cache are not encrypted. Applications read credentials directly from the local cache. This reduces the frequency of requests to the KMS service. You can set the cache duration, cache size, and eviction policy to prevent business interruptions caused by expired credentials.
ImportantTo enhance the storage security of credential values in the cache, you can set a memory protection mechanism, configure appropriate access permissions for the KMS Agent process, and deploy memory leak detection tools.
KMS Configuration module.
This module supports setting the region and gateway address (endpoint). Both shared and dedicated gateway endpoints are supported.
NoteWhen you use a dedicated gateway endpoint, KMS Agent includes built-in CA certificates for dedicated gateways in all regions. You do not need to configure CA certificates.
Log module.
This module is based on the popular Zap logging framework and provides logs in JSON format. It lets you configure the size limit for individual log files and the maximum number of log files to retain.
Security
Authentication and authorization
KMS Agent access to KMS.
KMS Agent uses the default credential provider chain to access KMS. The chain automatically checks for credentials in the following order of priority: environment variables, ECS instance RAM roles, and configuration files. Do not use plaintext AccessKey pairs in the configuration file. For more information, see Default credential provider chain. We recommend that you use an ECS instance RAM role for deployments in Linux environments, RRSA for deployments in Kubernetes sidecar containers, and environment variables for deployments in other environments.
When you use a RAM policy to restrict access to credentials, KMS Agent needs permission to retrieve the credential values. Because the credential values are encrypted, the agent also needs permission to decrypt them with the key. Follow the principle of least privilege when you set permissions for the agent.
Application access to KMS Agent.
When KMS Agent starts, it generates an SSRF token file, such as
/var/run/kmstoken. When an application sends an HTTP request to KMS Agent, the request header must contain this SSRF token. The agent validates the token. If the token is valid, the request is processed. Otherwise, a failure response is returned.Linux deployment scenario.
By default, the SSRF token file can be read only by KMS Agent and the system user that owns the application. Other processes are denied access to the SSRF token file.
Sidecar container deployment.
When deployed in sidecar mode in the same pod as the application container, the SSRF token file is accessible only from within the pod by default. Other pods or external services cannot access it directly.
Communication security
Communication between KMS Agent and the KMS service uses the Transport Layer Security (TLS) protocol to ensure data security during transmission and prevent attacks or eavesdropping. We recommend that you use a dedicated gateway endpoint instead of a shared gateway endpoint to access the KMS service. This way, requests are transmitted only within your VPC and are not exposed to the public network, providing higher security.
KMS Agent listens only on 127.0.0.1. This means only applications or processes running on the same machine can communicate with KMS Agent. Devices from external networks cannot connect to KMS Agent.
Logging support
All operations to retrieve credentials through KMS Agent are recorded as logs in your specified log path. This helps you audit operation records.
Stability
KMS Agent ensures service continuity in complex network environments and during sudden failures through self-checks and retry mechanisms.
Startup self-check mechanism.
When KMS Agent starts, it verifies connectivity to KMS. If the verification fails, the startup is terminated.
Error retry mechanism.
KMS Agent relies on Alibaba Cloud Software Development Kit (SDK) (V2) to communicate with the KMS service. If a network exception occurs, it automatically retries the request using the built-in error retry logic of the SDK. When it encounters server-side throttling (HTTP 429) or internal server errors (HTTP 500), it retries the request up to three times using an exponential backoff method.
Use of expired cache during failures.
You can set the IgnoreTransientErrors parameter in the KMS Agent configuration file. When a network or server-side failure occurs, KMS Agent can check for and return expired cached data. This ensures that applications can still retrieve credentials during short-term failures. The IgnoreTransientErrors parameter is enabled by default.
High availability based on systemd or sidecar containers.
Linux deployment: KMS Agent is managed by systemd. If the process is interrupted, it restarts automatically.
Sidecar container deployment: You can configure the sidecar as an init container. This ensures that KMS Agent completes initialization before the application container starts.
Benefits of KMS Agent
Performance and reliability.
KMS Agent caches credential values in memory. In high-frequency access scenarios, this reduces the frequency of requests to the KMS service and avoids potential throttling, which improves performance and business availability.
Compatibility.
KMS Agent provides services based on a standard HTTP interface, which supports direct calls from applications written in any programming language. If your business has multiple applications in different languages, using KMS Agent reduces integration difficulty.
Simplified integration.
KMS Agent decouples applications from KMS. It reduces the complexity of application interactions with the KMS service. Applications only need to communicate with KMS Agent and do not need to directly handle authentication or API calls to access the KMS service.
Centralized management and scalability.
For enterprise-level multi-application scenarios, you can use KMS Agent to centrally manage and control access permissions. This reduces the need to configure permissions on each client and ensures uniformity across application integrations. If your business needs to expand, KMS Agent makes it easier to integrate new applications and reduces the permission configuration and code refactoring that might be required when using an SDK.
KMS Agent vs. secrets client
KMS Agent acts as a middle layer, allowing applications to access the KMS service indirectly. In contrast, using a secrets client requires applications to call the KMS service API directly through an SDK. The differences between the two are shown in the following table.
Aspect | KMS Agent | Secrets client |
Deployment location | Deployed as an independent process (such as a sidecar container), decoupled from the application. | Integrated with application code as a dependency library. |
Access control | KMS Agent acts as a single access point for centralized permission policy enforcement. | Access identity and permission policies must be configured in each application, resulting in decentralized policy management. |
Language support | The agent provides a universal HTTP interface that supports applications in any language. | Provides support for Java (Java 8 and later), Python, and Go. |
Service performance | KMS Agent caches credential values in memory. For high-frequency scenarios, this reduces access latency and lowers the risk of throttling. | Each request must access the remote KMS service. High-frequency access may result in throttling. |
Handling of secret rotation | By setting a TTL in KMS Agent, the agent retrieves the latest credential value from the KMS service when the TTL expires. This reduces credential invalidation issues that can be caused by secret rotation. | By configuring a credential caching mechanism and an error retry mechanism, the client automatically retrieves the credential from the KMS service when it becomes invalid. |
Integration complexity | Applications do not need to integrate an SDK. They only need to call the HTTP interface provided by KMS Agent. The application does not need to handle KMS interaction logic, which makes integration simpler. If your business involves applications in multiple languages, using KMS Agent also ensures uniform integration quality. | Requires calling APIs in the code and handling retries, errors, and caching mechanisms. Integration complexity is higher if the business involves multiple applications. |
Maintenance cost | In multi-application scenarios, when permission policies change, you can maintain the KMS Agent configuration centrally without affecting applications. | In multi-application scenarios, when permission policies change, each application's configuration must be modified separately. |
For enterprise users with multiple applications in different languages, we recommend using KMS Agent to centralize permission policy control, reduce integration complexity, and ensure uniform integration quality. For small or single applications that do not require complex access control policies, a secrets client is a suitable choice.