Machine Learning Designer provides a rich set of components for visually building and debugging models. The following example uses a heart disease prediction model to walk through the model building and debugging process in a Designer pipeline.
Prerequisites
You have created a pipeline.
Build a model
A model consists of multiple fine-grained tasks, represented as nodes (components), organized in a pipeline. Before you start building, plan your model by breaking it down into smaller tasks. Each node should perform a single, simple task. The general process is as follows:
-
In the component list on the left, find a component and drag it onto the canvas.
Components marked with a purple icon in the list are Alink components, such as the Read CSV File component. In addition to their standard functionality, Alink components can be grouped. Configuring resources for a group improves execution efficiency and resource utilization. For more information, see Alink components.
-
Click a target node to configure its parameters in the pane on the right.
-
Connect the nodes to organize them into a pipeline with upstream and downstream relationships.
Each node has one or more input or output ports. Hover over a port to view its data type, which helps you connect the nodes correctly.
When the model runs, upstream nodes execute first. A downstream node starts only after all of its upstream nodes have completed successfully.
Model building typically includes the following stages:
Read data
Add a Source/Destination component to your pipeline to read data from sources such as MaxCompute and Object Storage Service (OSS). For more information, refer to the documentation for specific components under Component Reference: Source/Destination. This example reads data from MaxCompute.
-
Create a table in MaxCompute and import your data. For more information, see Create and use MaxCompute tables.
This example creates a table named
heartdiseasein thetestproject and imports test data.-- Create a table. CREATE TABLE IF NOT EXISTS heartdisease( age STRING COMMENT 'The age of the patient.', sex STRING COMMENT 'The gender of the patient. Valid values: female or male.', cp STRING COMMENT 'The type of chest pain. Valid values (from most severe to least severe): typical, atypical, non-anginal, and asymptomatic.', trestbps STRING COMMENT 'Resting blood pressure.', chol STRING COMMENT 'Serum cholesterol.', fbs STRING COMMENT 'Fasting blood sugar. If the value is greater than 120 mg/dl, the value is true. Otherwise, the value is false.', restecg STRING COMMENT 'Resting electrocardiographic results. Valid values (from least severe to most severe): norm and hyp.', thalach STRING COMMENT 'The maximum heart rate achieved.', exang STRING COMMENT 'Indicates whether the patient has exercise-induced angina. true indicates yes, and false indicates no.', oldpeak STRING COMMENT 'ST depression induced by exercise relative to rest (the ST segment depression value).', slop STRING COMMENT 'The slope of the peak exercise ST segment. Valid values: down, flat, and up.', ca STRING COMMENT 'The number of major vessels found by fluoroscopy.', thal STRING COMMENT 'The type of defect. Valid values (from least severe to most severe): norm, fix, and rev.', `status` STRING COMMENT 'Indicates whether the patient has the disease. buff indicates healthy and sick indicates diseased.', style STRING); -- For demonstration purposes, this example directly imports public test data from PAI. INSERT INTO heartdisease select * from pai_online_project.heart_disease_prediction; -
Drag the Read Table component onto the canvas to read data from the MaxCompute table.
A node named Read Table-1 is automatically created on the canvas. The number increments based on the order in which you add components of the same type, starting from 1.
-
In the node configuration pane, configure the source table name. For more information about the parameters, see Read Table.
On the canvas, select the Read Table-1 node. In the node configuration pane on the right, enter the corresponding MaxCompute table name in the Table Name field. For this example, enter
heartdisease.NoteWhen you read table data across MaxCompute projects, the table name must be in the
project_name.table_nameformat (for example, test2.heartdisease), and you must have permissions for the project. -
In the right-side configuration pane, switch to the Fields Information tab to view the field details of this public dataset.
Data preprocessing
After reading the data, preprocess it to meet the input requirements for model training or prediction. Machine Learning Designer provides a rich set of data preprocessing and large model data processing components.
In this example, the data preprocessing stage uses four components in sequence: SQL Script, Type Conversion, Normalization, and Split. These are placed between the Read Table and logistic regression for binary classification components.
You can also use the SQL script component to write custom SQL scripts for your features. For example, the following script converts the data types of input features:
select age,
(case sex when 'male' then 1 else 0 end) as sex,
(case cp when 'angina' then 0 when 'notang' then 1 else 2 end) as cp,
trestbps,
chol,
(case fbs when 'true' then 1 else 0 end) as fbs,
(case restecg when 'norm' then 0 when 'abn' then 1 else 2 end) as restecg,
thalach,
(case exang when 'true' then 1 else 0 end) as exang,
oldpeak,
(case slop when 'up' then 0 when 'flat' then 1 else 2 end) as slop,
ca,
(case thal when 'norm' then 0 when 'fix' then 1 else 2 end) as thal,
(case status when 'sick' then 1 else 0 end) as ifHealth
from ${t1};
Model training
A model component receives preprocessed data from upstream components and passes its output to downstream components such as prediction or inference components. Each component can have one or more input and output ports. Hover over a port to view its data type to ensure correct connections.
For example, the logistic regression for binary classification component has two output ports:
-
Logistic Regression Model: This output serves as the model input for components such as prediction.
-
PMML: Model deployment usually relies on Predictive Model Markup Language (PMML) models. For example, if you need to deploy a generated model by using a built-in Processor (such as the PMML Processor), you need to select Generate PMML in the parameters of a component that supports model generation, and then run the component.
Model prediction or inference
After training a model, connect prediction or inference components to evaluate the model's performance.
For example, the prediction component has two input ports:
-
Model Input: Accepts the trained model from the model training section.
-
Prediction Data Input: Accepts the preprocessed test data.
Connect the split test data to the prediction data input port of the prediction node. Connect the logistic regression for binary classification model to the model input port. In the right-side configuration pane for the prediction node, set the Feature Columns by selecting the features to use for prediction, and configure the Pass-through Columns. Include the label column for later evaluation. Set Output Result Column Name to prediction_result, Output Score Column Name to prediction_score, and Output Detail Column Name to prediction_detail.
Model evaluation
Evaluation components analyze model performance based on relevant metrics.
Machine Learning Designer provides the following evaluation components, which you can connect downstream from the prediction component as needed.
Evaluation components include Binary Classification Evaluation, Regression Evaluation, Clustering Evaluation, Confusion Matrix, and Multiclass Classification Evaluation. On the pipeline canvas, connect the required evaluation node downstream of the prediction node and run it. Then, click the visualization icon in the toolbar to view the evaluation results.
Debug a model
Debug and run
-
Entire pipeline: Click the
(Run) icon in the upper-left corner of the canvas to run the entire pipeline. For complex pipelines, we recommend that you run a single node or a subset of nodes to facilitate debugging. -
Single or multiple components: Right-click a component to run a single node or a part of the pipeline. Multiple run options are available.
The run options include Run This Node, Run from Here, Run to Here, Run from Root to Here, and Run Downstream Nodes.
A
icon appears on a component that runs successfully. If it fails, a
icon appears. You can right-click the component to view logs and results.
View logs and results
-
After a component runs successfully, you can right-click it and select View Data to view the output data.
For some components, Machine Learning Designer can convert data into graphs and charts. This visualization of complex data and analysis results helps you quickly identify key information, trends, and patterns. You can also click Visual Analysis or the visualization icon at the top of the canvas to perform a visual analysis. For more information, see Visual Analysis.
-
View logs: If a component fails to run, you can right-click it and select View Log to identify the root cause of the failure.
View running tasks
Click View All Tasks in the upper-right corner of the canvas to view all historical task runs. Each run during the modeling process is recorded as a historical task, preserving the involved nodes, their configurations, and their output.
In the Historical Tasks panel that appears, the Actions column for each task provides links for Details, Model, version rollback, and Share. Click Details to view the node configurations and output results for the task.
On the task details page, click a target node, such as SQL Script-3, on the canvas on the left. The pane on the right shows the details of the node. Select the Task Log tab to view the execution log. You can also switch to the Run Information or Output Results tab to view their respective content.
If you need to perform a version rollback, review the details of the historical task to confirm that you are rolling back to the correct version. Before you roll back, save and run your latest task. This creates a record of the current state that you can revert to if needed.
Related topics
-
After you debug your model, you can register the trained model as a new model and manage it. For more information, see Register and manage models.
-
After you debug your model, you can deploy it for online prediction. For more information, see Model prediction and deployment.
-
Machine Learning Designer provides an Update EAS (Beta) component to update model services. For more information, see Periodically update online model services.
-
You can use DataWorks to schedule offline pipeline runs and periodically update models. For more information, see Use DataWorks to schedule Designer pipelines offline.
-
For more information about components, see Component Reference.