Perform machine learning predictions with SQL

更新时间:
复制 MD 格式

This topic describes how to use SQL in AnalyticDB for MySQL to deploy a Behavior Sequence Transformer (BST) model for machine learning tasks. It covers the entire workflow, from data transformation and model creation to training, evaluation, and prediction.

Use cases

You can use SQL queries for model training and inference, allowing you to perform data preprocessing, training, and inference entirely within SQL.

AnalyticDB for MySQL supports BST models. These models are suitable for applications that require understanding user behavior patterns, analyzing user preferences, predicting future trends, and implementing personalized recommendations. For example, in e-commerce or gaming, BST models can capture long-term dependencies in user behavior. This allows for more accurate predictions of user patterns and preferences, enabling you to deliver personalized services and recommendations.

About BST models

The BST model is designed to process sequential behavior data. It takes a sequence of behavior event IDs as input and outputs a classification result of 0 or 1.

For example, in a game, a player's interactions can be recorded and converted into a series of behavior events.

A player might generate the following sequence of actions: Log in, Claim login reward, Accept quest, Battle, Battle, Battle, Complete quest, Top up, Battle, Log out.

This sequence of actions is converted into a sequence of event IDs, such as 0,1,2,3,3,3,4,5,3,6. This sequence is then passed to the BST model as a string. The model analyzes the sequence and returns a classification result of 0 or 1, indicating the predefined category of the behavior sequence.

Prerequisites

  • An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.

  • Your cluster's kernel version must be 3.2.4.0 or later.

    Note

    To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Step 1: Create and bind resource groups

Running machine learning tasks with SQL requires two types of resource groups:

  • AI resource group: Manages the GPU resources required for compute-intensive operations such as model training and prediction.

  • Regular resource group: Processes standard SQL queries, such as those that generate training data or call prediction functions.

When you submit a SQL query, it is first sent to a regular resource group. If the system detects that the query requires AI computation, it automatically forwards the task to the bound AI resource group for execution.

To do this, create an AI resource group and bind it to a regular resource group.

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Cluster Management > Resource Management, and then click the Resource Groups tab.

  3. In the upper-right corner of the resource group list, click Create Resource Group to create an AI resource group.

    Parameter

    Description

    Resource Group Name

    The custom name of the resource group. The name must be 2 to 30 characters in length, start with a letter, and contain only letters, digits, and underscores (_).

    Job Type

    From the drop-down list, select AI.

    Important

    If the AI option is not available, the AI resource group feature is not enabled for your cluster. Contact technical support to enable it.

    Specifications

    A combination of GPU, CPU, and memory resources. You can select ADB.MLLarge.24, ADB.MLLarge.2, or ADB.MLAdvanced.6.

    Minimum Resources

    The minimum number of resource units.

    Maximum Resources

    The maximum number of resource units.

  4. Click OK to create the resource group.

  5. Find the target regular resource group. In the Actions column, click Modify. Use the ML Job Resubmission Rules setting to bind the regular resource group to the newly created AI resource group.

Step 2: Data transformation

Data transformation is the process of converting raw data into a format suitable for model training. This process depends on three key factors: the raw data's format, its storage location, and the model's input data requirements.

Table schema requirements for training data

  • Input feature column: This column must be a string and contain a series of comma-separated integers. Each entry represents a sample's feature vector.

  • Target label column: This column must contain one of two values, such as 0 or 1, to represent the class label.

Example: ('1,2,3',0), ('3,2,1',1).

Data transformation methods

If your raw data is not in the required format for model training, use the following method to transform it:

  1. Prepare a JAR package: Package your Spark data transformation program into a JAR file and upload it to an OSS bucket.

  2. Configure Spark job parameters: When you submit the Spark job, configure the required parameters. For more information, see Spark application configuration parameters.

If your raw data already meets the format requirements for model training, you can skip this step.

Step 3: Create and train the model

Create a machine learning model, define its parameters, specify the training data, and then check the model status.

  1. In the left-side navigation pane, choose Job Development > SQL Development.

  2. In the SQL Console, use the CREATE MODEL statement to create and train the model.

-- Create the model.
-- Specify the resource group.
/*+resource_group=itrain*/
CREATE MODEL bstdemo.bst-- Model name
OPTIONS (
  model_type='bst_classification', -- Model type
  feature_cols=(event_list),  -- Input feature column
  target_cols=(target),  -- Target column to predict
  hyperparameters = (  -- Other model hyperparameters
    use_best_ckpt = 'False',
    early_stopping_patience='0'
  )
)
AS SELECT event_list, target FROM bstdemo.adb; -- Specifies the data source for model training.

-- Check the model training status. Training is complete when the status is READY.
SHOW MODEL bstdemo.bst;

Step 4: Evaluate the model

Evaluate the trained machine learning model to assess its performance.

-- The statement syntax is similar to that of CREATE MODEL.
/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
    feature_cols=(event_list), 
    target_cols=(target),  
)
AS SELECT event_list, target FROM bstdemo.adb01;

Step 5: Make predictions

Select feature columns from a table, pass the data to the trained model, and obtain prediction results based on the input data.

-- Use the trained model to make predictions.
-- The first parameter of ML_PREDICT is the model name, and subsequent parameters are the input columns.
SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;