Protect user data with the ASM LLMProxy plugin-Alibaba Cloud Service Mesh-阿里云帮助中心

Background

The rapid development of Large Language Models (LLMs) enables large-scale AI adoption across industries. Since the introduction of Model-as-a-Service (MaaS), vendors worldwide have launched their own model services, further accelerating the real-world application of LLMs. As a result, LLMs are becoming a fundamental service for many enterprises.

Adopting LLMs introduces significant security risks. For example, if an API key is leaked to a caller, it can lead to API abuse and increased costs. Another risk is the accidental transmission of sensitive corporate data to an LLM service. Because the service is controlled by an external vendor, this data is no longer secure. To prevent these potential losses, it is crucial to implement global security protections at the platform level.

ASM lets you extend mesh proxy functionality with Wasm. You can develop plugins in languages such as Go, Rust, or C++, compile them into Wasm binaries, and package them as an image. This image can be uploaded to an image repository and dynamically dispatched to mesh proxies (such as gateways and sidecars) to manipulate requests. Wasm plugins are fully hot-pluggable and do not require application redeployment or affect existing requests. They also run in a sandbox, providing strong isolation without impacting the proxy itself. Given the lower development barrier of Wasm (compared to developing native Envoy HTTP filters), ASM prioritizes Go for developing the LLMProxy plugin.

Note

The plugin code used in this tutorial is open source. You can download it for your own use or to create a custom LLM plugin. For more information, see asm-labs/wasm-llm-proxy at main · AliyunContainerService/asm-labs.

Prerequisites

You have added a cluster to an ASM instance (version 1.18 or later).
Sidecar injection is enabled. For more information, see Configure a sidecar injection policy.
You have activated Model Studio and obtained a valid API key. For more information, see Obtain an API key.

Overview

This tutorial demonstrates the following capabilities:

A sidecar or gateway dynamically injects an API key into LLM requests, so applications do not need to manage API keys themselves. This dynamic configuration prevents API key exposure.
A sidecar or gateway is configured with custom detection rules to block LLM requests containing sensitive information from leaving the Pod and being sent to an external LLM service.
The plugin calls a private model to analyze LLM requests and more accurately determine if they contain sensitive information, and decides whether to allow or block the request. The private model is only used for this purpose, so a smaller model can be chosen to ensure accuracy while minimizing latency.

Note

Before integrating with ASM, accessing an external HTTPS service required making direct HTTPS requests and maintaining a persistent TCP connection to the LLM service. Improperly managing this connection leads to frequent new connections and degrades performance.

After joining the service mesh, your application sends HTTP requests. The mesh proxy upgrades these HTTP requests to HTTPS. Because Envoy maintains the HTTPS connections, the number of TLS handshakes is reduced, which improves performance.

The business container sends a request by using the HTTP protocol without including the LLM's API key. This request enters the sidecar, which injects the API key and performs a sensitive information check. Based on the check, the request is either allowed or denied. If allowed, the request is upgraded from HTTP to HTTPS and sent to the external LLM service.

This demo uses the Model Studio platform for the LLM service. We will use a standard HTTP interface to call the LLM. For related documentation, see Overview.

Procedure

Step 1: Deploy the client application

Use kubectl to connect to the Kubernetes cluster that you added to the ASM instance. Create a file named sleep.yaml with the following content.

YAML content

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
    service: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: registry.cn-hangzhou.aliyuncs.com/acs/curl:8.1.2
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
---

Run the following command to deploy the sleep application.

kubectl apply -f sleep.yaml

Step 2: Create a ServiceEntry and a DestinationRule

Because the LLM service is external to the mesh, you must create a ServiceEntry to register the service. This allows the mesh to manage it. For more information, see Create a service entry.

In this step, you create a ServiceEntry to register the Model Studio service with ASM. Use the following YAML content:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: dashscope
  namespace: default
spec:
  hosts:
    - dashscope.aliyuncs.com
  ports:
    - name: http-port
      number: 80
      protocol: HTTP
      targetPort: 443    # Used with a DestinationRule to upgrade HTTP to HTTPS
    - name: https-port
      number: 443
      protocol: HTTPS
  resolution: DNS

To enable the sidecar to upgrade HTTP requests on port 80 to HTTPS, create a corresponding DestinationRule with the following YAML content:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: dashscope
  namespace: default
spec:
  host: dashscope.aliyuncs.com
  trafficPolicy:
    portLevelSettings:
    - port:
        number: 80
      tls:
        mode: SIMPLE

Run the following commands in sequence to verify that the sidecar has upgraded the protocol from HTTP to HTTPS.

export API_KEY=${your_dashscope_api_key}  # Replace with your actual API key.

kubectl exec deploy/sleep -- curl -v 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer ${API_KEY}" \  
--header 'Content-Type: application/json' \
--header 'user: test' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "Who are you?"}
    ],
    "stream": false
}'

Expected output:

{"choices":[{"message":{"role":"assistant","content":"I am a large-scale language model from Alibaba Cloud. My name is Qwen."},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":16,"total_tokens":26},"created":xxxxxxxx,"system_fingerprint":null,"model":"qwen-turbo","id":"xxxxxxxxxxxxxxxxxx"}

Step 3: Configure the LLMProxy plugin

Create a file named WasmPlugin.yaml with the following content.

apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
  name: asm-llm-proxy
  namespace: default
spec:
  imagePullPolicy: Always
  phase: AUTHN
  selector:
    matchLabels:
      app: sleep
  url: registry-cn-hangzhou.ack.aliyuncs.com/test/asm-llm-proxy:v0.2
  pluginConfig:
    api_key: ${your_dashscope_api_key}
    deny_patterns:
    - .*账号.*     # Denies requests containing the Chinese word for "account" (账号).
    hosts:
    - dashscope.aliyuncs.com    # This plugin applies only to requests sent to dashscope.aliyuncs.com.
    intelligent_guard:   # Configure a private LLM service to check requests for sensitive information.
      # For demonstration purposes, this example still uses the Model Studio service for validation.
      api_key: ${your_dashscope_api_key}
      host: dashscope.aliyuncs.com
      model: qwen-turbo
      path: /compatible-mode/v1/chat/completions
      port: 80  # The HTTP port from the ServiceEntry.

Note

For image repositories outside the Chinese mainland, use registry-cn-hongkong.ack.aliyuncs.com/test/asm-llm-proxy:v0.2.

The following table describes the parameters in the pluginConfig section.

Parameter	Subparameter	Description
api_key	N/A	The API key for Model Studio. When configured, applications can send HTTP requests without an API key. The key is dynamically injected based on this configuration, reducing the risk of key exposure. To rotate the API key, you only need to update the YAML file, not the application itself.
deny_patterns	N/A	A list of regular expressions used to match messages in LLM requests. Requests that match any of these patterns are denied. An `allow_patterns` parameter is also supported, which allows only matching requests to pass.
hosts	N/A	A list of hosts. Only requests sent to these hosts are processed by the LLMProxy plugin. This prevents the plugin from incorrectly processing other requests.
intelligent_guard	api_key	The API key for Model Studio.
	host	The host of the Model Studio service.
	model	The type of LLM to call, such as qwen-turbo, qwen-max, or baichuan2-7b-chat-v1. You can customize this parameter to meet your needs. To ensure accuracy while minimizing latency, select an LLM with low latency.
	path	The path for the LLM request.
	port	The port of the private LLM service. This must match the HTTP port specified in the `ServiceEntry`.

The intelligent_guard uses a standard OpenAI interface to determine whether a request sent to the LLM contains sensitive information. If the private LLM determines that the request contains sensitive information, the request is denied and a reason is returned. For demonstration purposes, this example still calls the Model Studio service for this check.

Run the following command to create the WasmPlugin.

kubectl apply -f WasmPlugin.yaml

Testing

Run the following command to verify that a request without an API key successfully accesses the LLM service.

kubectl exec deploy/sleep -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "Who are you?"}
    ],
    "stream": false
}'

Expected output:

{"choices":[{"message":{"role":"assistant","content":"I am a large-scale language model from Alibaba Cloud. My name is Qwen."},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":16,"total_tokens":26},"created":xxxxxxx,"system_fingerprint":null,"model":"qwen-turbo","id":"xxxxxxxxx"}

Run the following command to test that a request containing the sensitive word "account" will be rejected.

kubectl exec deploy/sleep -- curl 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "I like red bean zongzi, and my QQ account is 1111111"}
    ],
    "stream": false
}'

Expected output:

request was denied by asm llm proxy

Test a request that contains sensitive information but does not match any pattern in deny_patterns, and verify that the intelligent_guard identifies the sensitive information and blocks the request.

kubectl exec deploy/sleep -- curl -s 'http://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "messages": [
        {"role": "user", "content": "Our company will hold a high-level internal meeting on September 10th. The topic is how to better serve customers. Please write an opening speech for me."}
    ],
    "stream": false
}'

external service returned deny: {"result": "deny", "reason": "The meeting content involves details and topics of a high-level internal company meeting, which is confidential information that cannot be disclosed."}

As shown in the output, the LLM successfully identified that the request may contain sensitive information. The LLMProxy plugin then denied the request from being sent to the external LLM service. In a production environment, deploy the detection model privately to ensure no sensitive data is exposed.

Summary

This tutorial answers two key questions about securing corporate data when using external LLM services:

How can you secure the API key used to call the LLM?
How can you prevent data leaks when calling the LLM?

With the ASM LLMProxy plugin, you can manage API key rotation more effectively and intelligently restrict the outflow of sensitive information. ASM's Wasm extension capabilities make this possible. We have open-sourced the code for this plugin at asm-labs/wasm-llm-proxy at main · AliyunContainerService/asm-labs. We encourage you to try it. If you have other use cases, please open an issue to help us improve the plugin.