Data pre-processing

更新时间:
复制 MD 格式

Text data often contains noise that is irrelevant to your task. You can pre-process the text to remove this noise.

The NLP Self-Learning Platform provides several built-in pre-processing rules. To pre-process your data, you can select from the following rules:

  1. Remove URL links.

  2. Remove emoji.

  3. Convert English uppercase letters to lowercase.

  4. Convert Traditional Chinese characters to Simplified Chinese.

If the platform does not provide a pre-processing rule that you need, you can process the data before you upload it. We encourage you to send us feedback so we can add more rules.

The goal of pre-processing is to remove information that is not useful for classification. However, be careful not to remove useful information. For example, emoji can help determine sentiment. Therefore, do not remove emoji for sentiment classification tasks.