The Conditional Random Field Prediction component runs sequence labeling tasks — assigning a label to each token in a sequence. Common use cases include named entity recognition (NER) and text chunking. The component uses a trained Linear Conditional Random Field (LinearCRF) model in Model I/O format and is configured in Machine Learning Designer (formerly Machine Learning Studio).
Configure parameters
Set the following parameters in Designer.
| Parameter | Description |
|---|---|
| Select the ID column | The column that holds the unique ID for each sample. Samples are stored as N-tuples. |
| Select a Feature Column | The word to annotate and its corresponding features. The column names you select must match the feature column names used during training. |
| Select the Target Column | Select the target column. |
| Prediction Result Column | The name of the output column for predicted labels. Default: prediction_result. |
| Prediction Score Column | The name of the prediction score column. Default: prediction_score. |
| Prediction Detail Column | The name of the prediction detail column. Leave blank if you don't need this output. |
Example
This example uses noun phrase (NP) chunking data to show the end-to-end flow from training data to online prediction.
Training data format
The training data table uses the following structure.
| sentence_id | word | f1 | f2 | label |
|---|---|---|---|---|
| 1 | Rockwell | NNP | POS | B-NP |
| 1 | International | NNP | NP | I-NP |
| 1 | Corp | NNP | PO | I-NP |
| 1 | 's | POS | NN | B-NP |
| ... | ... | ... | ... | ... |
Input format
Each feature column in the training data becomes a separate field in the request JSON, using the same column name as the key. Features for different words within each field are separated by spaces.
{
"inputs": [
{
"word": {
"dataType": 50,
"dataValue": "Rockwell International Corp 's ..."
},
"f1": {
"dataType": 50,
"dataValue": "NNP NNP NNP POS ..."
},
"f2": {
"dataType": 50,
"dataValue": "POS NP PO NN ..."
}
}
]
}dataType: 50 represents string type.
Output format
The response includes prediction_result, prediction_score, and prediction_detail for every word in the input, returned as JSON within outputValue. Each token's output key is formed by concatenating its word and feature values (for example, "Rockwell NNP POS").
{
"outputs": [
{
"outputLabel": "CRFProcessor_Result",
"outputValue": {
"dataType": 50,
"dataValue": {
"Rockwell NNP POS": {
"prediction_result": "B-NP",
"prediction_score": 0.99,
"prediction_detail": {"B-ADJP": 0.000145, "B-NP": 0.99, ...}
},
"International NNP NP": ...
}
}
}
]
}| Field | Type | Description |
|---|---|---|
prediction_result | string | The predicted label for the token. |
prediction_score | float | The prediction score for the token. |
prediction_detail | object | The prediction detail for the token. |
Error response
If the input format is incorrect, the response returns an error string instead of prediction data.
{
"outputs": [
{
"outputLabel": "CRFProcessor_Result",
"outputValue": {
"dataType": 50,
"dataValue": "Failed: The input format is incorrect"
}
}
]
}