配置MLflow模型仓库_容器服务 Kubernetes 版 ACK(ACK)-阿里云帮助中心

MLflow是一个开源的机器学习生命周期管理平台，可以用来追踪模型训练信息、管理和部署机器学习模型。本文介绍如何为模型管理功能配置MLflow模型仓库。

MLflow模型仓库介绍

关于MLflow模型仓库功能的详细介绍，请参见MLflow Model Registry — MLflow documentation。

前提条件

已创建ACK Pro集群且Kubernetes版本不低于1.20。具体操作，请参见创建ACK Pro版集群。
已创建RDS PostgreSQL实例。具体操作，请参见创建RDS PostgreSQL实例。
建议创建RDS PostgreSQL实例时优先选择ACK集群所在的VPC并将VPC所在的网段加入白名单，这样可以使用内网地址访问数据库。如果RDS实例和ACK集群位于不同的VPC内，请确保RDS开启了公网访问，并将ACK集群的VPC网段加入白名单中。具体操作，请参见设置白名单。
在RDS PostgreSQL实例中创建一个名为mlflow的普通账号。具体操作，请参见创建账号。
在RDS PostgreSQL实例创建一个名为mlflow_store的数据库用于存储模型元数据，并将其授权账号设置为mlflow账号。具体操作，请参见创建数据库。
（可选）在RDS PostgreSQL实例创建一个名为mlflow_basic_auth的数据库用于存储MLflow用户认证信息，并将其授权账号设置为mlflow账号。具体操作，请参见创建数据库。
已配置Arena客户端用于进行模型管理，Arena版本需为0.9.14及以上版本。具体操作，请参见配置Arena客户端。

步骤一：在ACK集群中部署MLflow

登录容器服务管理控制台，在左侧导航栏选择集群列表。
在集群列表页面，单击目标集群名称，然后在左侧导航栏，选择应用 > Helm。
单击创建，然后在创建页面，将应用名设置为mlflow、命名空间设置为kube-ai，在Chart区域的搜索栏搜索并选中mlflow，完成后单击下一步，在弹出的对话框中，确认是否采用mlflow为Chart的默认的命名空间。
- 如果需要通过AI套件开发控制台进行模型管理，需要将mlflow部署在kube-ai命名空间中，发布名称需保持为默认值mlflow。
- 如果需要通过Arena进行模型管理，可以将Mlflow部署在任意命名空间中，但是发布名称需保持为默认值mlflow。

在创建页面，配置Chart的参数。

配置defaultArtifactRoot及backendStore参数，配置示例如下所示。

trackingServer:
  # -- Specifies which mode mlflow tracking server run with, available options are `serve-artifacts`, `no-serve-artifacts` and `artifacts-only`
  mode: no-serve-artifacts
  # -- Specifies a default artifact location for logging, data will be logged to `mlflow-artifacts/:` if artifact serving is enabled, otherwise `./mlruns`
  defaultArtifactRoot: "./mlruns"
  
# For more information about how to configure backend store, please visit https://mlflow.org/docs/latest/tracking/backend-stores.html
backendStore:
  # -- Backend store uri e.g. `<dialect>+<driver>://<username>:<password>@<host>:<port>/<database>`
  backendStoreUri: postgresql+psycopg2://mlflow:<password>@pgm-xxxxxxxxxxxxxx.pg.rds.aliyuncs.com/mlflow_store

其中，backendStore.backendStoreUri请替换成前提条件中创建的mlflow_store数据库访问地址，例如postgresql+psycopg2://mlflow:<password>@pgm-xxxxxxxxxxxxxx.pg.rds.aliyuncs.com/mlflow_store。

重要

如果RDS实例和ACK集群位于同一VPC中，请使用RDS内网连接地址，否则请使用RDS外网地址，并确保ACK集群能够访问。

登录RDS PostgreSQL控制台，依次单击实例ID > 数据库连接 > 内/外网地址，获取pgm-xxxxxxxxxxxxxx.pg.rds.aliyuncs.com数据库地址。

更多信息，请参见连接数据库。

（可选）如需开启BasicAuth，请配置如下参数。

trackingServer:
  # -- Specifies which mode mlflow tracking server run with, available options are `serve-artifacts`, `no-serve-artifacts` and `artifacts-only`
  mode: no-serve-artifacts
  # -- Specifies a default artifact location for logging, data will be logged to `mlflow-artifacts/:` if artifact serving is enabled, otherwise `./mlruns`
  defaultArtifactRoot: "./mlruns"
  
  # Basic authentication configuration,
  # for more information, please visit https://mlflow.org/docs/latest/auth/index.html#configuration
  basicAuth:
    # -- Specifies whether to enable basic authentication
    enabled: true
    # -- Default permission on all resources, available options are `READ`, `EDIT`, `MANAGE` and `NO_PERMISSIONS`
    defaultPermission: NO_PERMISSIONS
    # -- Database location to store permissions and user data e.g. `<dialect>+<driver>://<username>:<password>@<host>:<port>/<database>`
    databaseUri: postgresql+psycopg2://<username>:<password>@pgm-xxxxxxxxxxxxxx.pg.rds.aliyuncs.com/mlflow_basic_auth
    # -- Default admin username if the admin is not already created
    adminUsername: admin
    # -- Default admin password if the admin is not already created
    adminPassword: password
    # -- Function to authenticate requests
    authorizationFunction: mlflow.server.auth:authenticate_request_basic_auth
    
# For more information about how to configure backend store, please visit https://mlflow.org/docs/latest/tracking/backend-stores.html
backendStore:
  # -- Backend store uri e.g. `<dialect>+<driver>://<username>:<password>@<host>:<port>/<database>`
  backendStoreUri: postgresql+psycopg2://mlflow:<password>@pgm-xxxxxxxxxxxxxx.pg.rds.aliyuncs.com/mlflow_store

替换trackingServer.basicAuth.databaseUri为前提条件中创建的mlflow_basic_auth数据库的访问地址，例如postgresql+psycopg2://<username>:<password>@pgm-xxxxxxxxxxxxxx.pg.rds.aliyuncs.com/mlflow_basic_auth。
修改trackingServer.basicAuth.adminUsername和trackingServer.basicAuth.adminPassword参数来配置MLflow管理员的用户名和初始密码（仅在管理员用户还未创建时才会新建管理员用户）。

关于MLflow完整的参数配置，请参见MLflow。

步骤二：访问部署在Kubernetes上的MLflow Web UI

执行以下命令，将MLflow Web UI服务转发到本地的5000端口。

kubectl port-forward -n kube-ai services/mlflow 5000

预期输出：

Forwarding from 127.0.0.1:5000 -> 5000
Forwarding from [::1]:5000 -> 5000
Handling connection for 5000
Handling connection for 5000
...

从浏览器访问http://127.0.0.1:5000查看MLflow Web UI。

后续操作：模型管理

云原生AI套件支持对MLflow模型仓库中的模型进行管理，关于如何使用云原生AI套件开发控制台和Arena命令行工具进行模型管理，请参见对MLflow模型仓库中的模型进行管理。