Conditional Random Field Prediction component: usage and examples-Platform For AI(PAI)-阿里云帮助中心

The Conditional Random Field Prediction component runs sequence labeling tasks — assigning a label to each token in a sequence. Common use cases include named entity recognition (NER) and text chunking. The component uses a trained Linear Conditional Random Field (LinearCRF) model in Model I/O format and is configured in Machine Learning Designer (formerly Machine Learning Studio).

Configure parameters

Set the following parameters in Designer.

Parameter	Description
Select the ID column	The column that holds the unique ID for each sample. Samples are stored as N-tuples.
Select a Feature Column	The word to annotate and its corresponding features. The column names you select must match the feature column names used during training.
Select the Target Column	Select the target column.
Prediction Result Column	The name of the output column for predicted labels. Default: `prediction_result`.
Prediction Score Column	The name of the prediction score column. Default: `prediction_score`.
Prediction Detail Column	The name of the prediction detail column. Leave blank if you don't need this output.

Example

This example uses noun phrase (NP) chunking data to show the end-to-end flow from training data to online prediction.

Training data format

The training data table uses the following structure.

sentence_id	word	f1	f2	label
1	Rockwell	NNP	POS	B-NP
1	International	NNP	NP	I-NP
1	Corp	NNP	PO	I-NP
1	's	POS	NN	B-NP
...	...	...	...	...

Input format

Each feature column in the training data becomes a separate field in the request JSON, using the same column name as the key. Features for different words within each field are separated by spaces.

{
  "inputs": [
    {
      "word": {
        "dataType": 50,
        "dataValue": "Rockwell International Corp 's ..."
      },
      "f1": {
        "dataType": 50,
        "dataValue": "NNP NNP NNP POS ..."
      },
      "f2": {
        "dataType": 50,
        "dataValue": "POS NP PO NN ..."
      }
    }
  ]
}

dataType: 50 represents string type.

Output format

The response includes prediction_result, prediction_score, and prediction_detail for every word in the input, returned as JSON within outputValue. Each token's output key is formed by concatenating its word and feature values (for example, "Rockwell NNP POS").

{
  "outputs": [
    {
      "outputLabel": "CRFProcessor_Result",
      "outputValue": {
        "dataType": 50,
        "dataValue": {
          "Rockwell NNP POS": {
            "prediction_result": "B-NP",
            "prediction_score": 0.99,
            "prediction_detail": {"B-ADJP": 0.000145, "B-NP": 0.99, ...}
          },
          "International NNP NP": ...
        }
      }
    }
  ]
}

Field	Type	Description
`prediction_result`	string	The predicted label for the token.
`prediction_score`	float	The prediction score for the token.
`prediction_detail`	object	The prediction detail for the token.

Error response

If the input format is incorrect, the response returns an error string instead of prediction data.

{
  "outputs": [
    {
      "outputLabel": "CRFProcessor_Result",
      "outputValue": {
        "dataType": 50,
        "dataValue": "Failed: The input format is incorrect"
      }
    }
  ]
}