Perform machine learning predictions with SQL
This topic describes how to use SQL in AnalyticDB for MySQL to deploy a Behavior Sequence Transformer (BST) model for machine learning tasks. It covers the entire workflow, from data transformation and model creation to training, evaluation, and prediction.
Use cases
You can use SQL queries for model training and inference, allowing you to perform data preprocessing, training, and inference entirely within SQL.
AnalyticDB for MySQL supports BST models. These models are suitable for applications that require understanding user behavior patterns, analyzing user preferences, predicting future trends, and implementing personalized recommendations. For example, in e-commerce or gaming, BST models can capture long-term dependencies in user behavior. This allows for more accurate predictions of user patterns and preferences, enabling you to deliver personalized services and recommendations.
Prerequisites
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
Your cluster's kernel version must be 3.2.4.0 or later.
NoteTo view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.
Step 1: Create and bind resource groups
Running machine learning tasks with SQL requires two types of resource groups:
AI resource group: Manages the GPU resources required for compute-intensive operations such as model training and prediction.
Regular resource group: Processes standard SQL queries, such as those that generate training data or call prediction functions.
When you submit a SQL query, it is first sent to a regular resource group. If the system detects that the query requires AI computation, it automatically forwards the task to the bound AI resource group for execution.
To do this, create an AI resource group and bind it to a regular resource group.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Cluster Management > Resource Management, and then click the Resource Groups tab.
In the upper-right corner of the resource group list, click Create Resource Group to create an AI resource group.
Parameter
Description
Resource Group Name
The custom name of the resource group. The name must be 2 to 30 characters in length, start with a letter, and contain only letters, digits, and underscores (_).
Job Type
From the drop-down list, select AI.
ImportantIf the AI option is not available, the AI resource group feature is not enabled for your cluster. Contact technical support to enable it.
Specifications
A combination of GPU, CPU, and memory resources. You can select ADB.MLLarge.24, ADB.MLLarge.2, or ADB.MLAdvanced.6.
Minimum Resources
The minimum number of resource units.
Maximum Resources
The maximum number of resource units.
Click OK to create the resource group.
Find the target regular resource group. In the Actions column, click Modify. Use the ML Job Resubmission Rules setting to bind the regular resource group to the newly created AI resource group.
Step 2: Data transformation
Data transformation is the process of converting raw data into a format suitable for model training. This process depends on three key factors: the raw data's format, its storage location, and the model's input data requirements.
Table schema requirements for training data
Input feature column: This column must be a string and contain a series of comma-separated integers. Each entry represents a sample's feature vector.
Target label column: This column must contain one of two values, such as 0 or 1, to represent the class label.
Example: ('1,2,3',0), ('3,2,1',1).
Data transformation methods
If your raw data is not in the required format for model training, use the following method to transform it:
Prepare a JAR package: Package your Spark data transformation program into a JAR file and upload it to an OSS bucket.
Configure Spark job parameters: When you submit the Spark job, configure the required parameters. For more information, see Spark application configuration parameters.
If your raw data already meets the format requirements for model training, you can skip this step.
Step 3: Create and train the model
Create a machine learning model, define its parameters, specify the training data, and then check the model status.
In the left-side navigation pane, choose .
In the SQL Console, use the
CREATE MODELstatement to create and train the model.
-- Create the model.
-- Specify the resource group.
/*+resource_group=itrain*/
CREATE MODEL bstdemo.bst-- Model name
OPTIONS (
model_type='bst_classification', -- Model type
feature_cols=(event_list), -- Input feature column
target_cols=(target), -- Target column to predict
hyperparameters = ( -- Other model hyperparameters
use_best_ckpt = 'False',
early_stopping_patience='0'
)
)
AS SELECT event_list, target FROM bstdemo.adb; -- Specifies the data source for model training.
-- Check the model training status. Training is complete when the status is READY.
SHOW MODEL bstdemo.bst;Step 4: Evaluate the model
Evaluate the trained machine learning model to assess its performance.
-- The statement syntax is similar to that of CREATE MODEL.
/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
feature_cols=(event_list),
target_cols=(target),
)
AS SELECT event_list, target FROM bstdemo.adb01;Step 5: Make predictions
Select feature columns from a table, pass the data to the trained model, and obtain prediction results based on the input data.
-- Use the trained model to make predictions.
-- The first parameter of ML_PREDICT is the model name, and subsequent parameters are the input columns.
SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;