Conditional random field prediction

更新时间:
复制 MD 格式

The Conditional Random Field Prediction component runs sequence labeling tasks — assigning a label to each token in a sequence. Common use cases include named entity recognition (NER) and text chunking. The component uses a trained Linear Conditional Random Field (LinearCRF) model in Model I/O format and is configured in Machine Learning Designer (formerly Machine Learning Studio).

Configure parameters

Set the following parameters in Designer.

ParameterDescription
Select the ID columnThe column that holds the unique ID for each sample. Samples are stored as N-tuples.
Select a Feature ColumnThe word to annotate and its corresponding features. The column names you select must match the feature column names used during training.
Select the Target ColumnSelect the target column.
Prediction Result ColumnThe name of the output column for predicted labels. Default: prediction_result.
Prediction Score ColumnThe name of the prediction score column. Default: prediction_score.
Prediction Detail ColumnThe name of the prediction detail column. Leave blank if you don't need this output.

Example

This example uses noun phrase (NP) chunking data to show the end-to-end flow from training data to online prediction.

Training data format

The training data table uses the following structure.

sentence_idwordf1f2label
1RockwellNNPPOSB-NP
1InternationalNNPNPI-NP
1CorpNNPPOI-NP
1'sPOSNNB-NP
...............

Input format

Each feature column in the training data becomes a separate field in the request JSON, using the same column name as the key. Features for different words within each field are separated by spaces.

{
  "inputs": [
    {
      "word": {
        "dataType": 50,
        "dataValue": "Rockwell International Corp 's ..."
      },
      "f1": {
        "dataType": 50,
        "dataValue": "NNP NNP NNP POS ..."
      },
      "f2": {
        "dataType": 50,
        "dataValue": "POS NP PO NN ..."
      }
    }
  ]
}

dataType: 50 represents string type.

Output format

The response includes prediction_result, prediction_score, and prediction_detail for every word in the input, returned as JSON within outputValue. Each token's output key is formed by concatenating its word and feature values (for example, "Rockwell NNP POS").

{
  "outputs": [
    {
      "outputLabel": "CRFProcessor_Result",
      "outputValue": {
        "dataType": 50,
        "dataValue": {
          "Rockwell NNP POS": {
            "prediction_result": "B-NP",
            "prediction_score": 0.99,
            "prediction_detail": {"B-ADJP": 0.000145, "B-NP": 0.99, ...}
          },
          "International NNP NP": ...
        }
      }
    }
  ]
}
FieldTypeDescription
prediction_resultstringThe predicted label for the token.
prediction_scorefloatThe prediction score for the token.
prediction_detailobjectThe prediction detail for the token.

Error response

If the input format is incorrect, the response returns an error string instead of prediction data.

{
  "outputs": [
    {
      "outputLabel": "CRFProcessor_Result",
      "outputValue": {
        "dataType": 50,
        "dataValue": "Failed: The input format is incorrect"
      }
    }
  ]
}