GPU实例的AI应用开发实践-函数计算-阿里云

您可以通过函数计算控制台、SDK或Serverless Devs来体验GPU实例的最佳实践。本文以Python语言为例，说明如何使用Serverless Devs开发工具或通过控制台，将原始图像经过函数代码处理，实现风格合成以及对象检测。

应用场景与优势

传统的面向人工智能应用的GPU基础设施，通常会面临着建设周期长、运维复杂度高、集群利用率低和成本较高等问题。函数计算的GPU实例将这些问题从用户侧转移至云厂商侧，让您无需关心底层GPU基础设施，完全聚焦于业务本身，极大地简化了业务的实现路径。

在不同的应用场景下，函数计算提供的GPU实例与CPU相比所具备的优势如下。

成本优先的AI应用场景
- 提供弹性预留模式，从而按需为客户保留GPU工作实例，对比自建GPU集群拥有较大成本优势。
- 提供GPU共享虚拟化，支持以1/2、独占方式使用GPU，允许业务以更精细化的方式配置GPU实例。
效率优先的AI应用场景
- 屏蔽运维GPU集群的繁重负担（驱动/CUDA版本管理、机器运行管理、GPU坏卡管理），使得开发者专注于代码开发、聚焦业务目标的达成。

GPU实例的更多信息，请参见实例类型及使用模式。

神经风格迁移教程

神经风格迁移是一种生成技术，主要用来合成两张图像，即从其中一张图像提取内容，另一张图像提取风格，以合成一张新图像。本示例通过使用TensorFlow Hub预置模型，完成任意图像的风格合成。

合成效果

内容图像	风格图像	合成图像

前提条件

通用
- 使用GPU实例过程中，为了确保您的业务正常进行，请加入钉钉用户群（钉钉群号：11721331），并提供以下信息。
  - 组织名称，例如您所在的公司名称。
  - 您的阿里云账号ID。
  - 您期望使用GPU实例的地域，例如华南1（深圳）。
  - 联系方式，例如您的手机号、邮箱或钉钉账号等。
- 将需处理的音视频资源上传至在GPU实例所在地域的OSS Bucket中，且您对该Bucket中的文件有读写权限。具体步骤，请参见控制台上传文件。权限相关说明，请参见修改存储空间读写权限。
仅适用于通过ServerlessDevs部署GPU应用
- 在GPU实例所在地域，完成以下操作：
  - 创建容器镜像服务的企业版实例或个人版实例，推荐您创建企业版实例。具体操作步骤，请参见创建企业版实例。
  - 创建命名空间镜像仓库。具体操作步骤，请参见步骤二：创建命名空间和步骤三：创建镜像仓库。
- 安装Serverless Devs工具及依赖
- 配置Serverless Devs

通过ServerlessDevs部署GPU应用

创建项目。

s init devsapp/start-fc-custom-container-event-python3.9 -d fc-gpu-prj

创建的项目目录如下所示。

fc-gpu-prj
├── code
│   ├── app.py        # 函数代码
│   └── Dockerfile    # Dockerfile：将代码打包成镜像的Dockerfile
├── README.md
└── s.yaml            # 项目配置：包含了镜像如何部署在函数计算

进入项目所在目录。
```
cd fc-gpu-prj
```

按实际情况修改目录文件的参数配置。

编辑s.yaml文件。

YAML文件的参数详解，请参见YAML规范。

edition: 1.0.0
name: container-demo
access: default
vars:
  region: cn-shenzhen
services:
  customContainer-demo:
    component: devsapp/fc
    props:
      region: ${vars.region}
      service:
        name: tgpu_tf_service
        internetAccess: true
      function:
        name: tgpu_tf_func
        description: test gpu for tensorflow
        handler: not-used
        timeout: 600
        caPort: 9000
        instanceType: fc.gpu.tesla.1
        gpuMemorySize: 8192
        cpu: 4
        memorySize: 16384
        diskSize: 512
        runtime: custom-container
        customContainerConfig:
          # 1. 请检查阿里云ACR容器镜像仓库已提前创建相应的命名空间（namespace:demo）与仓库（repo:gpu-tf-style-transfer_s）。
          # 2. 后续更新函数时，请修改此处的tag，由v0.1修改为v0.2后，重新执行s build && s deploy。
          image: registry.cn-shenzhen.aliyuncs.com/demo/gpu-tf-style-transfer_s:v0.1
        codeUri: ./code
      triggers:
        - name: httpTrigger
          type: http
          config:
            authType: anonymous
            methods:
              - GET

编辑app.py文件。

示例如下：

# -*- coding: utf-8 -*-
# python2 and python3
from __future__ import print_function
from http.server import HTTPServer, BaseHTTPRequestHandler
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import json
import sys
import logging
import os
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import os
import PIL
import tensorflow as tf
import pathlib
import urllib.request
import random

class Resquest(BaseHTTPRequestHandler):
    def upload(self, url, path):
        print("enter upload:", url)
        headers = {
            'Content-Type': 'application/octet-stream',
            'Content-Length': os.stat(path).st_size,
        }
        req = urllib.request.Request(url, open(path, 'rb'), headers=headers, method='PUT')
        urllib.request.urlopen(req)

    def tensor_to_image(self, tensor):
        tensor = tensor*255
        tensor = np.array(tensor, dtype=np.uint8)
        if np.ndim(tensor)>3:
            assert tensor.shape[0] == 1
            tensor = tensor[0]
        return PIL.Image.fromarray(tensor)

    def load_img(self, path_to_img):
        max_dim = 512
        img = tf.io.read_file(path_to_img)
        img = tf.image.decode_image(img, channels=3)
        img = tf.image.convert_image_dtype(img, tf.float32)

        shape = tf.cast(tf.shape(img)[:-1], tf.float32)
        long_dim = max(shape)
        scale = max_dim / long_dim

        new_shape = tf.cast(shape * scale, tf.int32)

        img = tf.image.resize(img, new_shape)
        img = img[tf.newaxis, :]
        return img

    def do_style_transfer(self):
        mpl.rcParams['figure.figsize'] = (12,12)
        mpl.rcParams['axes.grid'] = False

        # 需替换为您个人阿里云账号下的OSS，且您有可读写的权限。
        # 此处是读取您存储在OSS Bucket中的内容和样式图片。
        content_path = tf.keras.utils.get_file(str(random.randint(0,100000000)) + ".jpg", 'https://your_public_oss/c1.png')
        style_path = tf.keras.utils.get_file(str(random.randint(0,100000000)) + ".jpg",'https://your_public_oss/c2.png')

        content_image = self.load_img(content_path)
        style_image = self.load_img(style_path)
        print("load image ok")

        import tensorflow_hub as hub
        hub_model = hub.load('https://hub.tensorflow.google.cn/google/magenta/arbitrary-image-stylization-v1-256/2')
        #可以将hub模型打包至镜像加载，加快处理速度
        #hub_model = hub.load('/usr/src/app/style_transfer_model')
        stylized_image = hub_model(tf.constant(content_image), tf.constant(style_image))[0]
        print("load model ok")

        path = "/tmp/" + str(random.randint(0,100000000)) + ".png"
        self.tensor_to_image(stylized_image).save(path)
        print("generate stylized image ok")

        # 需替换为您个人阿里云账号下的OSS，且您有可读写的权限。
        # 此处是将最后合成的图片存储至您的OSS Bucket。
        self.upload("https://your_public_oss/stylized-image.png" ,path)
        return "transfer ok"

    def style_transfer(self):
        msg = self.do_style_transfer()
        data = {"result": msg}
        self.send_response(200)
        self.send_header("Content-type", "application/json")
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

    def pong(self):
        data = {"function":"tf_style_transfer"}
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

    def dispatch(self):
        mode = self.headers.get('RUN-MODE')

        if mode == "ping":
            self.pong()
        elif mode == "normal":
            self.style_transfer()
        else:
            self.pong()

    def do_GET(self):
        self.dispatch()

    def do_POST(self):
        self.dispatch()

if __name__ == "__main__":
    host = ("0.0.0.0", 9000)
    server = HTTPServer(host, Resquest)
    print("Starting server, listen at: %s:%s" % host)
    server.serve_forever()

编辑Dockerfile文件。

示例如下：

FROM registry.cn-shanghai.aliyuncs.com/serverless_devs/tensorflow:2.7.0-gpu
WORKDIR /usr/src/app
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN pip3 install matplotlib
RUN pip install tensorflow_hub
COPY . .
CMD [ "python3", "-u", "/usr/src/app/app.py" ]
EXPOSE 9000

构建镜像。
```
s build --dockerfile ./code/Dockerfile
```
部署代码至函数计算。
```
s deploy
```
说明
服务名称和函数名称不变，重复执行以上命令时，请选择本地配置，即use local。

配置预留模式的实例。

s provision put --target 1 --qualifier LATEST

查询预留模式的实例是否就绪。

s provision get --qualifier LATEST

如果查询到current参数为1，则说明GPU实例的预留模式已就绪，示例如下。

[2022-06-21 11:53:19] [INFO] [FC] - Getting provision: tgpu_tf_service.LATEST/tgpu_tf_func
helloworld:
  serviceName:            tgpu_tf_service
  functionName:           tgpu_tf_func
  qualifier:              LATEST
  resource:               188077086902****#tgpu_tf_service#LATEST#tgpu_tf_func
  target:                 1
  current:                1
  scheduledActions:
    (empty array)
  targetTrackingPolicies:
    (empty array)
  currentError:
  alwaysAllocateCPU:      true

调用函数。

查看线上函数版本

s invoke
FC Invoke Result:
{"function": "tf_style_transfer"}

执行风格迁移

s invoke -e '{"method":"GET","headers":{"RUN-MODE":"normal"}}'
generate stylized image ok
enter upload: https://your_public_oss/stylized-image.png  # 可以下载此文件查看合成结果
FC Invoke Result:
{"result": "transfer ok"}

释放GPU实例。

s provision put --target 0 --qualifier LATEST

通过函数计算控制台部署GPU应用

部署镜像。
1. 建容器镜像服务的企业版实例或个人版实例。
  推荐您创建企业版实例。具体操作步骤，请参见创建企业版实例。
2. 创建命名空间和镜像仓库。
  具体操作步骤，请参见步骤二：创建命名空间和步骤三：创建镜像仓库。
3. 在容器镜像服务控制台，根据界面提示完成Docker相关操作步骤。然后将上述示例app.py和Dockerfile推送至实例镜像仓库，文件信息，请参见通过ServerlessDevs部署GPU应用时/code目录中的app.py和Dockerfile。
创建服务。具体操作步骤，请参见创建服务。
创建函数。具体操作步骤，请参见创建Custom Container函数。
说明
实例类型选择GPU实例，请求处理程序类型选择处理 HTTP 请求。
修改函数的执行超时时间。
1. 单击目标服务下目标函数右侧操作列的配置。
2. 在环境信息区域，修改执行超时时间，然后单击保存。
说明
CPU转码耗时会超过默认的60s，因此建议您修改执行超时时间为较大的值。
配置GPU预留实例。
1. 在函数详情页面，单击弹性管理页签，然后单击创建规则。
2. 在创建弹性伸缩限制规则页面，按需配置参数，预留GPU实例，然后单击创建。
  关于配置预留实例的具体操作，请参见配置弹性伸缩规则。
配置完成后，您可以在规则列表查看预留的GPU实例是否就绪。即当前预留实例数是否为设置的预留实例数。
使用cURL测试函数。
1. 在函数详情页面，单击触发器管理页签，查看触发器的配置信息，获取触发器的访问地址。
2. 在命令行执行如下命令，调用GPU函数。
  - 查看线上函数版本
```
curl -v "https://tgpu-ff-console-tgpu-ff-console-ajezot****.cn-shenzhen.fcapp.run"
{"function": "trans_gpu"}
```
  - 执行风格迁移
```
curl "https://tgpu-fu-console-tgpu-se-console-zpjido****.cn-shenzhen.fcapp.run" -H 'RUN-MODE: normal'
{"result": "transfer ok"}
```

结果验证

您可通过在浏览器中访问以下域名，查看经过风格合成处理后的图片：

https://cri-zbtsehbrr8******-registry.oss-cn-shenzhen.aliyuncs.com/stylized-image.png

本域名仅为示例，需以实际情况为准。

对象检测教程

当多个对象同时出现时，需要使用对象检测技术针对感兴趣的对象构建矩形边框，并持续跟踪。对象检测应用通常用于大量不同类型对象的标记、识别。本示例通过使用OpenCV DNN，完成多对象检测功能。

检测效果

如下表所示，左列为需检测对象的原图，右列为经过OpenCV DNN处理后的对象检测结果图。结果图中会显示检测到的对象名称和准确率。

原始图片	识别对象

前提条件

使用GPU实例过程中，为了确保您的业务正常进行，请加入钉钉用户群（钉钉群号：11721331），并提供以下信息。
- 组织名称，例如您所在的公司名称。
- 您的阿里云账号ID。
- 您期望使用GPU实例的地域，例如华南1（深圳）。
- 联系方式，例如您的手机号、邮箱或钉钉账号等。
在GPU实例所在地域，完成以下操作：
- 创建容器镜像服务的企业版实例或个人版实例，推荐您创建企业版实例。具体操作步骤，请参见创建企业版实例。
- 创建命名空间镜像仓库。具体操作步骤，请参见步骤二：创建命名空间和步骤三：创建镜像仓库。
安装Serverless Devs工具及依赖
配置Serverless Devs
编译OpenCV。
OpenCV需要自行编译以使用GPU加速，编译方式如下：
- （推荐）通过Docker使用已编译好的OpenCV。下载地址：opencv-cuda-docker和cuda-opencv
- 自行编译。具体步骤，请参见官网编译手册。
将需处理的音视频资源上传至在GPU实例所在地域的OSS Bucket中，且您对该Bucket中的文件有读写权限。具体步骤，请参见控制台上传文件。权限相关说明，请参见修改存储空间读写权限。

操作步骤

创建项目。

s init devsapp/start-fc-custom-container-event-python3.9 -d fc-gpu-prj

创建的项目目录如下所示。

fc-gpu-prj
├── code
│   ├── app.py        # 函数代码
│   └── Dockerfile    # Dockerfile：将代码打包成镜像的Dockerfile
├── README.md
└── s.yaml            # 项目配置：包含了镜像如何部署在函数计算

进入项目所在目录。
```
cd fc-gpu-prj
```

按实际情况修改目录文件的参数配置。

编辑s.yaml文件。

YAML文件的参数详解，请参见YAML规范。

edition: 1.0.0
name: container-demo
access: default
vars:
  region: cn-shenzhen
services:
  customContainer-demo:
    component: devsapp/fc
    props:
      region: ${vars.region}
      service:
        name: tgpu_object_detect_service
        internetAccess: true
      function:
        name: tgpu_object_detect_func
        description: test gpu for opencv
        handler: not-used
        timeout: 600
        caPort: 9000
        memorySize: 16384
        gpuMemorySize: 8192
        instanceType: fc.gpu.tesla.1
        runtime: custom-container
        customContainerConfig:
          # 1. 请检查阿里云ACR容器镜像仓库已提前创建相应的命名空间（namespace:demo）与仓库（repo:gpu-transcoding_s）。
          # 2. 后续更新函数时，请修改此处的tag，由v0.1修改为v0.2后，重新执行s build && s deploy。
          image: registry.cn-shenzhen.aliyuncs.com/demo/gpu-object-detect_s:v0.1
        codeUri: ./code
      triggers:
        - name: httpTrigger
          type: http
          config:
            authType: anonymous
            methods:
              - GET

编辑app.py文件。

示例如下：

# -*- coding: utf-8 -*-
# python2 and python3

from __future__ import print_function
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import sys
import logging
import os
import numpy as np
import cv2
import urllib.request

class Resquest(BaseHTTPRequestHandler):
    def download(self, url, path):
        print("enter download:", url)
        f = urllib.request.urlopen(url)
        with open(path, "wb") as local_file:
            local_file.write(f.read())

    def upload(self, url, path):
        print("enter upload:", url)
        headers = {
            'Content-Type': 'application/octet-stream',
            'Content-Length': os.stat(path).st_size,
        }
        req = urllib.request.Request(url, open(path, 'rb'), headers=headers, method='PUT')
        urllib.request.urlopen(req)

    def core(self):
        CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
               "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
               "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
               "sofa", "train", "tvmonitor"]
        COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

        print("[INFO] loading model...")
        prototxt = "/usr/src/app/m.prototxt.txt"
        model = "/usr/src/app/m.caffemodel"
        net = cv2.dnn.readNetFromCaffe(prototxt, model)

        msg = ""
        mode = ""
        if not cv2.cuda.getCudaEnabledDeviceCount():
            msg = "No CUDA-capable device is detected |"
        else:
            msg = "CUDA-capable device supported |"
            net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
            net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

        path = "/tmp/target.png"
        # 需替换为您个人阿里云账号下的OSS，且您有可读写的权限。此处是读取您存储在OSS Bucket中的图片。
        self.download("https://your_public_oss/a.png", path)
        image = cv2.imread(path)
        (h, w) = image.shape[:2]
        blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5)

        print("[INFO] computing object detections...")
        net.setInput(blob)
        detections = net.forward()

        # loop over the detections
        for i in np.arange(0, detections.shape[2]):
            confidence = detections[0, 0, i, 2]
            if confidence > 0.2:
                idx = int(detections[0, 0, i, 1])
                box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
                (startX, startY, endX, endY) = box.astype("int")
                cv2.rectangle(image, (startX, startY), (endX, endY), COLORS[idx], 2)
                x = startX + 10 if startY - 15 < 15 else startX
                y = startY - 15 if startY - 15 > 15 else startY + 20
                label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
                cv2.putText(image, label, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
                print("[INFO] {}".format(label))
        cv2.imwrite(path, image)

        # 需替换为您个人阿里云账号下的OSS，且您有可读写的权限。此处是读取您存储在OSS Bucket中的图片。
        self.upload("https://your_public_oss/target.jpg", path)
        msg = msg + " process image ok!"

        data = {'result': msg}
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

    def pong(self):
        data = {"function":"object-detection"}
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

    def dispatch(self):
        mode = self.headers.get('RUN-MODE')

        if mode == "ping":
            self.pong()
        elif mode == "normal":
            self.core()
        else:
            self.pong()

    def do_GET(self):
        self.dispatch()

    def do_POST(self):
        self.dispatch()

if __name__ == '__main__':
    host = ('0.0.0.0', 9000)
    server = HTTPServer(host, Resquest)
    print("Starting server, listen at: %s:%s" % host)
    server.serve_forever()

编辑Dockerfile文件。

示例如下：

FROM registry.cn-shanghai.aliyuncs.com/serverless_devs/opencv-cuda:cuda-10.2-opencv-4.2
WORKDIR /usr/src/app
RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list
RUN apt-get clean
RUN apt-get update --fix-missing
RUN apt-get install -y build-essential
RUN apt-get install -y python3
COPY . .
CMD [ "python3", "-u", "/usr/src/app/app.py" ]
EXPOSE 9000

下载以下文件，并存放至/code目录下。
- m.caffemodel
- m.prototxt.txt
构建镜像。
```
s build --dockerfile ./code/Dockerfile
```
部署代码至函数计算。
```
s deploy
```
说明
服务名称和函数名称不变，重复执行以上命令时，请选择本地配置，即use local。

配置预留模式的实例。

s provision put --target 1 --qualifier LATEST

查询预留模式的实例是否就绪。

s provision put --target 1 --qualifier LATEST

如果查询到current参数为1，则说明GPU实例的预留模式已就绪，示例如下。

[2021-12-07 02:20:55] [INFO] [S-CLI] - Start ...
[2021-12-07 02:20:55] [INFO] [FC] - Getting provision: tgpu_object_detect_service.LATEST/tgpu_object_detect_func
customContainer-demo:
 serviceName:      tgpu_object_detect_service
 functionName:      tgpu_object_detect_func
 qualifier:       LATEST
 resource:        188077086902****#tgpu_object_detect_service#LATEST#tgpu_object_detect_func
 target:         1
 current:        1
 scheduledActions:    (empty array)
 targetTrackingPolicies: (empty array)

调用函数。

查看线上函数版本

s invoke
FC Invoke Result:
{"result": "CUDA-capable device supported | process image ok!"}

执行对象识别

s invoke -e '{"method":"GET","headers":{"RUN-MODE":"normal"}}'
enter upload: https://your_public_oss/target.jpg # 可以下载此文件查看推理结果
FC Invoke Result:
{"result": "CUDA-capable device supported | process image ok!"}

释放GPU实例。

s provision put --target 0 --qualifier LATEST

结果验证

您可通过在浏览器中访问以下域名，查看目标识别处理后的图片：

https://cri-zbtsehbrr8******-registry.oss-cn-shenzhen.aliyuncs.com/target2.jpg

本域名仅为示例，需以实际情况为准。