Quick start: Build a generative AI chat app with ACS

更新时间:
复制 MD 格式

Alibaba Cloud Container Compute Service (ACS) provides elastic, serverless computing resources through a Kubernetes-native interface for running containerized applications. This tutorial walks you through deploying a generative AI chat application in an ACS cluster using the ACS console and a cluster certificate, and monitoring its status.

Prerequisites

An Alibaba Cloud account with completed real-name verification. Register an Alibaba Cloud account. Complete individual real-name verification.

Background

Procedure

To use ACS for the first time, activate the service, grant permissions, create a cluster, and deploy the application.

image

Step 1: Activate and authorize ACS

Before you use ACS for the first time, you must activate the service and grant it the necessary permissions to access other cloud resources.

  1. Log on to the ACS console and click Activate.

  2. On the ACS activation page, follow the on-screen instructions to activate the service.

  3. Return to the ACS console, refresh the page, and click Go to Authorize.

  4. On the ACS authorization page, follow the on-screen instructions to grant the required permissions.

    After you grant the permissions, refresh the console to start using ACS.

Step 2: Create an ACS cluster

This section shows how to create an ACS cluster by configuring only its key parameters.

  1. Log on to the ACS console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click Create Kubernetes Cluster in the upper-left corner.

  3. On the Create Kubernetes Cluster page, configure the following parameters. You can use the default values for any parameters not listed here.

    Parameter

    Description

    Example

    Cluster Name

    Enter a name for the cluster.

    ACS-Demo

    Region

    Select the region where you want to create the cluster.

    China (Beijing)

    Select VPC

    Set the network for the cluster. ACS clusters support only VPCs. You can choose Create VPC or Select Existing VPC .

    • Create VPC: The system automatically creates a VPC, a NAT gateway, and configures SNAT rules.

    • Select Existing VPC : Select an existing VPC and vSwitch. If you need to access the internet, for example to pull container images, you must configure a NAT gateway. We recommend that you upload container images to ACR in the same region as your cluster and pull the images over the internal VPC network.

    For more information, see Create and manage a VPC.

    Select Create VPC.

    API Server Access Settings

    Specify whether to expose the cluster's API server to the public internet. If you need to manage the cluster remotely from the internet, you must configure an Elastic IP (EIP).

    Select Expose API server with EIP.

    Service Discovery

    Click Show Advanced Options and specify whether to enable service discovery for the cluster. If you need service discovery, you can select CoreDNS.

    Select CoreDNS.

  4. Click Confirm, review and accept the terms of service, and then click Create Kubernetes Cluster.

    Note

    Cluster creation takes about 10 minutes. After the cluster is created, it appears on the Clusters page.

Step 3: Deploy RWKV-Runner with the console

Deploy the RWKV-Runner stateless application (Deployment) on a general-purpose instance in your ACS cluster and expose its RESTful API within the cluster. Create a stateless workload Deployment.

  1. Log on to the ACS console. On the Cluster page, click the name of your target cluster (ACS-Demo).

  2. In the left-side navigation pane, choose Workloads > Deployments.

  3. On the Deployments page, click Create from Image.

  4. In the Basic Information step, set the Application Name to rwkv-runner, select General-purpose for Instance Type and default for QoS Type, and then click Next.

  5. In the Container step, configure the container and click Next.

    Parameter

    Description

    Example value

    Image Name

    Image address without a tag, or click Select Image to select an image.

    registry.cn-beijing.aliyuncs.com/acs-demo-ns/rwkv-runner

    Image Version

    Click Select Image Version to select an image version.

    1.0.0

    CPU

    Number of CPU cores for the application.

    1 Core

    Memory

    Amount of memory for the application.

    2 GiB

    Port

    Container ports.

    • Name: runner

    • Container Port: 8000

    • Protocol: TCP

  6. In the Advanced step, click Create to the right of Services.

  7. In the Create Service dialog box, configure the following parameters and click Create. This exposes the rwkv-runner's RESTful API within the cluster.

    Parameter

    Description

    Example value

    Name

    Name of the service.

    rwkv-runner-svc

    Type

    Service type. Determines how the service is accessed.

    ClusterIP

    Port Mapping

    Set the Service Port and Container Port. The Container Port must match the port exposed by the backend pod.

    • Name: runner

    • Service Port: 80

    • Container Port: 8000

    • Protocol: TCP

  8. In the Advanced step, click Create in the lower-right corner.

    After creation, the Complete step shows the application objects. Click View Details to review the application.

Step 4: Deploy ChatGPT-Next-Web with a certificate

Use your cluster certificate (kubeconfig) to deploy the ChatGPT-Next-Web stateless application (Deployment) and expose it to the internet. Create a stateless workload Deployment.

  1. Log on to the ACS console. On the Cluster page, click the name of your target cluster (ACS-Demo).

  2. On the Cluster Information page, click the Connection Information tab. Obtain the public access certificate and follow the on-screen instructions to save it to the correct location.

  3. Create a file named chat-next-web.yaml and add the following content.

    View content of chat-next-web.yaml

    apiVersion: v1
    kind: Service
    metadata:
      name: chat-frontend-svc
    spec:
      ports:
        - name: chat
          port: 80
          protocol: TCP
          targetPort: 3000
      selector:
        app: chat-frontend
      type: LoadBalancer
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: chat-frontend
      name: chat-frontend
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: chat-frontend
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: general-purpose  # Specifies the general-purpose instance type.
            # To use the performance-enhanced type, use: alibabacloud.com/compute-class: performance
            app: chat-frontend
        spec:
          containers:
            - env:
                - name: BASE_URL
                  value: 'http://rwkv-runner-svc'
              image: registry.cn-beijing.aliyuncs.com/acs-demo-ns/chatgpt-next-web:amd64
              imagePullPolicy: IfNotPresent
              name: chat-frontend
              ports:
                - containerPort: 3000
                  protocol: TCP
              resources:
                requests:
                  cpu: "1"
                  memory: 2Gi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  4. Run the following command to apply the resources to your ACS cluster.

    kubectl apply -f chat-next-web.yaml

Step 5: Create an initialization job

Use your cluster certificate to create a Kubernetes job that initializes the RWKV-Runner model. This job runs on a BestEffort QoS class instance. Create a job workload.

  1. Create a file named rwkv-init-job.yaml and add the following content.

    View content of rwkv-init-job.yaml

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: job-demo
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 6
      completionMode: NonIndexed
      completions: 1
      parallelism: 1
      suspend: false
      template:
        metadata:
          labels:
            alibabacloud.com/compute-qos: best-effort # Specifies the BestEffort QoS class.
            # To use the default QoS class, use: alibabacloud.com/compute-qos: default
            app: job-demo
        spec:
          containers:
          - name: job
            image: registry.cn-beijing.aliyuncs.com/acs-demo-ns/rwkv-init-job:1.0.0
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
    
  2. Run the following command to submit the initialization job.

    kubectl apply -f rwkv-init-job.yaml
  3. Confirm that the initialization job completed.

    kubectl get pod

    The job pod STATUS shows Completed.

Step 6: Test the application

Access the deployed application through its service.

  1. Log on to the ACS console. On the Cluster page, click the name of your target cluster (ACS-Demo).

  2. In the left-side navigation pane, choose Network > Services.

  3. On the Services page, find the newly created service (chat-frontend-svc) and click the IP address in the External IP column to access your generative AI chat application.

Clean up resources

ACS cluster fees have two components:

  • Compute power used by workloads, charged by ACS.

  • Other Alibaba Cloud resources, charged by their respective services.

After completing this tutorial:

  • If you no longer need the cluster, delete it and its associated resources. Delete a cluster.

  • To keep using the cluster, maintain an account balance of at least CNY 100.00. Cloud resource billing.