Data preparation-阿里云帮助中心

Models learn patterns from labeled data. First, prepare a labeled dataset. For a product review parsing and classification task, for example, each data entry consists of text and one or more property dimension-sentiment pairs. There are four sentiment categories: Positive, Neutral, Negative, and Not Mentioned. The "Not Mentioned" category is optional. For each category, prepare at least 100 training data entries before testing. Process the training data into the format defined by the NLP Self-Learning Platform. For JSON files, for example, format each data entry as follows:

{
    "1":  {
        "content": "It's great. I bought too much of it, so I trimmed it myself with a knife. It still looks good.",
        "records": {
          "Overall": [
            "Positive"
          ],
          "Appearance Design": [
            "Positive"
          ]
        }
      }
}

Place the text in the "content" field and the labels in the "records" field. The "content" field must be a string. The "records" field must be an object. In the "records" object, the key represents the property dimension and must be a string. The value represents the sentiment label and must also be a string, such as "Positive", "Neutral", "Negative", or "Not Mentioned". The "Not Mentioned" label is optional.