As the network infrastructure for cloud-native environments, ASM offers rich extensibility. Custom plugins allow you to enforce fine-grained controls at the mesh level over how applications, including gateways and regular business Pods, call Large Language Models (LLMs) to prevent sensitive information leakage. This tutorial demonstrates how to use a Wasm plugin to enforce global protection for LLM calls across the mesh.
Background
The rapid development of Large Language Models (LLMs) enables large-scale AI adoption across industries. Since the introduction of Model-as-a-Service (MaaS), vendors worldwide have launched their own model services, further accelerating the real-world application of LLMs. As a result, LLMs are becoming a fundamental service for many enterprises.
Adopting LLMs introduces significant security risks. For example, if an API key is leaked to a caller, it can lead to API abuse and increased costs. Another risk is the accidental transmission of sensitive corporate data to an LLM service. Because the service is controlled by an external vendor, this data is no longer secure. To prevent these potential losses, it is crucial to implement global security protections at the platform level.
ASM lets you extend mesh proxy functionality with Wasm. You can develop plugins in languages such as Go, Rust, or C++, compile them into Wasm binaries, and package them as an image. This image can be uploaded to an image repository and dynamically dispatched to mesh proxies (such as gateways and sidecars) to manipulate requests. Wasm plugins are fully hot-pluggable and do not require application redeployment or affect existing requests. They also run in a sandbox, providing strong isolation without impacting the proxy itself. Given the lower development barrier of Wasm (compared to developing native Envoy HTTP filters), ASM prioritizes Go for developing the LLMProxy plugin.
The plugin code used in this tutorial is open source. You can download it for your own use or to create a custom LLM plugin. For more information, see asm-labs/wasm-llm-proxy at main · AliyunContainerService/asm-labs.
Prerequisites
-
You have added a cluster to an ASM instance (version 1.18 or later).
-
Sidecar injection is enabled. For more information, see Configure a sidecar injection policy.
-
You have activated Model Studio and obtained a valid API key. For more information, see Obtain an API key.
Overview
This tutorial demonstrates the following capabilities:
-
A sidecar or gateway dynamically injects an API key into LLM requests, so applications do not need to manage API keys themselves. This dynamic configuration prevents API key exposure.
-
A sidecar or gateway is configured with custom detection rules to block LLM requests containing sensitive information from leaving the Pod and being sent to an external LLM service.
-
The plugin calls a private model to analyze LLM requests and more accurately determine if they contain sensitive information, and decides whether to allow or block the request. The private model is only used for this purpose, so a smaller model can be chosen to ensure accuracy while minimizing latency.
Before integrating with ASM, accessing an external HTTPS service required making direct HTTPS requests and maintaining a persistent TCP connection to the LLM service. Improperly managing this connection leads to frequent new connections and degrades performance.
After joining the service mesh, your application sends HTTP requests. The mesh proxy upgrades these HTTP requests to HTTPS. Because Envoy maintains the HTTPS connections, the number of TLS handshakes is reduced, which improves performance.
The business container sends a request by using the HTTP protocol without including the LLM's API key. This request enters the sidecar, which injects the API key and performs a sensitive information check. Based on the check, the request is either allowed or denied. If allowed, the request is upgraded from HTTP to HTTPS and sent to the external LLM service.
This demo uses the Model Studio platform for the LLM service. We will use a standard HTTP interface to call the LLM. For related documentation, see Overview.
Procedure
Step 1: Deploy the client application
Use kubectl to connect to the Kubernetes cluster that you added to the ASM instance. Create a file named sleep.yaml with the following content.
Run the following command to deploy the sleep application.
kubectl apply -f sleep.yaml
Step 2: Create a ServiceEntry and a DestinationRule
Because the LLM service is external to the mesh, you must create a ServiceEntry to register the service. This allows the mesh to manage it. For more information, see Create a service entry.
In this step, you create a ServiceEntry to register the Model Studio service with ASM. Use the following YAML content:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: dashscope
namespace: default
spec:
hosts:
- dashscope.aliyuncs.com
ports:
- name: http-port
number: 80
protocol: HTTP
targetPort: 443 # Used with a DestinationRule to upgrade HTTP to HTTPS
- name: https-port
number: 443
protocol: HTTPS
resolution: DNS
To enable the sidecar to upgrade HTTP requests on port 80 to HTTPS, create a corresponding DestinationRule with the following YAML content:
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: dashscope
namespace: default
spec:
host: dashscope.aliyuncs.com
trafficPolicy:
portLevelSettings:
- port:
number: 80
tls:
mode: SIMPLE
Run the following commands in sequence to verify that the sidecar has upgraded the protocol from HTTP to HTTPS.
export API_KEY=${your_dashscope_api_key} # Replace with your actual API key.kubectl exec deploy/sleep -- curl -v 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer ${API_KEY}" \
--header 'Content-Type: application/json' \
--header 'user: test' \
--data '{
"model": "qwen-turbo",
"messages": [
{"role": "user", "content": "Who are you?"}
],
"stream": false
}'
Expected output:
{"choices":[{"message":{"role":"assistant","content":"I am a large-scale language model from Alibaba Cloud. My name is Qwen."},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":16,"total_tokens":26},"created":xxxxxxxx,"system_fingerprint":null,"model":"qwen-turbo","id":"xxxxxxxxxxxxxxxxxx"}
Step 3: Configure the LLMProxy plugin
Create a file named WasmPlugin.yaml with the following content.
apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
name: asm-llm-proxy
namespace: default
spec:
imagePullPolicy: Always
phase: AUTHN
selector:
matchLabels:
app: sleep
url: registry-cn-hangzhou.ack.aliyuncs.com/test/asm-llm-proxy:v0.2
pluginConfig:
api_key: ${your_dashscope_api_key}
deny_patterns:
- .*账号.* # Denies requests containing the Chinese word for "account" (账号).
hosts:
- dashscope.aliyuncs.com # This plugin applies only to requests sent to dashscope.aliyuncs.com.
intelligent_guard: # Configure a private LLM service to check requests for sensitive information.
# For demonstration purposes, this example still uses the Model Studio service for validation.
api_key: ${your_dashscope_api_key}
host: dashscope.aliyuncs.com
model: qwen-turbo
path: /compatible-mode/v1/chat/completions
port: 80 # The HTTP port from the ServiceEntry.
For image repositories outside the Chinese mainland, use registry-cn-hongkong.ack.aliyuncs.com/test/asm-llm-proxy:v0.2.
The following table describes the parameters in the pluginConfig section.
|
Parameter |
Subparameter |
Description |
|
api_key |
N/A |
The API key for Model Studio. When configured, applications can send HTTP requests without an API key. The key is dynamically injected based on this configuration, reducing the risk of key exposure. To rotate the API key, you only need to update the YAML file, not the application itself. |
|
deny_patterns |
N/A |
A list of regular expressions used to match messages in LLM requests. Requests that match any of these patterns are denied. An |
|
hosts |
N/A |
A list of hosts. Only requests sent to these hosts are processed by the LLMProxy plugin. This prevents the plugin from incorrectly processing other requests. |
|
intelligent_guard |
api_key |
The API key for Model Studio. |
|
host |
The host of the Model Studio service. |
|
|
model |
The type of LLM to call, such as qwen-turbo, qwen-max, or baichuan2-7b-chat-v1. You can customize this parameter to meet your needs. To ensure accuracy while minimizing latency, select an LLM with low latency. |
|
|
path |
The path for the LLM request. |
|
|
port |
The port of the private LLM service. This must match the HTTP port specified in the |
The intelligent_guard uses a standard OpenAI interface to determine whether a request sent to the LLM contains sensitive information. If the private LLM determines that the request contains sensitive information, the request is denied and a reason is returned. For demonstration purposes, this example still calls the Model Studio service for this check.
Run the following command to create the WasmPlugin.
kubectl apply -f WasmPlugin.yaml
Testing
-
Run the following command to verify that a request without an API key successfully accesses the LLM service.
kubectl exec deploy/sleep -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-turbo", "messages": [ {"role": "user", "content": "Who are you?"} ], "stream": false }'Expected output:
{"choices":[{"message":{"role":"assistant","content":"I am a large-scale language model from Alibaba Cloud. My name is Qwen."},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":16,"total_tokens":26},"created":xxxxxxx,"system_fingerprint":null,"model":"qwen-turbo","id":"xxxxxxxxx"} -
Run the following command to test that a request containing the sensitive word "
account" will be rejected.kubectl exec deploy/sleep -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-turbo", "messages": [ {"role": "user", "content": "I like red bean zongzi, and my QQ account is 1111111"} ], "stream": false }'Expected output:
request was denied by asm llm proxy -
Test a request that contains sensitive information but does not match any pattern in
deny_patterns, and verify that theintelligent_guardidentifies the sensitive information and blocks the request.kubectl exec deploy/sleep -- curl -s 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-turbo", "messages": [ {"role": "user", "content": "Our company will hold a high-level internal meeting on September 10th. The topic is how to better serve customers. Please write an opening speech for me."} ], "stream": false }'external service returned deny: {"result": "deny", "reason": "The meeting content involves details and topics of a high-level internal company meeting, which is confidential information that cannot be disclosed."}As shown in the output, the LLM successfully identified that the request may contain sensitive information. The LLMProxy plugin then denied the request from being sent to the external LLM service. In a production environment, deploy the detection model privately to ensure no sensitive data is exposed.
Summary
This tutorial answers two key questions about securing corporate data when using external LLM services:
-
How can you secure the API key used to call the LLM?
-
How can you prevent data leaks when calling the LLM?
With the ASM LLMProxy plugin, you can manage API key rotation more effectively and intelligently restrict the outflow of sensitive information. ASM's Wasm extension capabilities make this possible. We have open-sourced the code for this plugin at asm-labs/wasm-llm-proxy at main · AliyunContainerService/asm-labs. We encourage you to try it. If you have other use cases, please open an issue to help us improve the plugin.