Product introduction-Natural Language Processing(NLP)-阿里云帮助中心

This document introduces the NLP Self-Learning Platform.

Product overview

The NLP Self-Learning Platform is designed for users without an algorithm background. It provides automated, industry-specific annotation, training, and services. The platform supports customizable Natural Language Processing (NLP) capabilities, such as entity extraction, text classification, key phrase extraction, sentiment analysis, relation extraction, short text matching, and product review analysis. You can obtain high-quality NLP models by annotating or uploading a sufficient amount of data.

Tutorial video

Features

The NLP Self-Learning Platform includes the following features:

Basic self-learning models: You can train models for natural language algorithm capabilities, such as entity extraction, text classification, key phrase extraction, relation extraction, short text matching, and conversational text classification.

Model name	Model description	Maximum text length
Text classification	Classifies text based on content type. For example, in a text message scenario, a carrier classifies messages as pornography, terrorism, political, or advertising based on their content. The platform supports custom classification models based on user-defined classification systems.	/
Entity extraction	Extracts entities with specific meanings from text. For example, in a contract review scenario, you can extract entities such as contract name, party A, party B, and receiving account number to quickly structure many contracts. The platform supports custom entity extraction models based on user-defined entity types.	/
Key phrase extraction	Extracts keyword and phrase labels based on the TextRank algorithm. This project type does not require you to upload annotated data. The algorithm automatically analyzes text features to extract key phrases. You can also upload a custom dictionary to optimize key phrase extraction for specific domains.	500 characters
Relation extraction	Extracts entities and their corresponding relationships from text, such as a person's name and their birthday, or an organization and its founding date. The platform supports custom relation extraction models based on user-defined relationship systems.	/
Short text matching	Calculates the similarity between different texts and outputs a score between 0 and 1. A higher score indicates greater similarity. The platform supports custom short text matching models based on user-defined datasets.	/
Sentence pair classification	Classifies a pair of sentences by content type, supporting both single-choice and multiple-choice classification. Common scenarios include determining if two sentences are semantically equivalent, checking if a question and an answer match, and performing contextual single-sentence classification.	/
Conversational text classification	Classifies an entire conversation by content type, supporting both single-choice and multiple-choice classification. Common scenarios include conversation quality inspection, customer intent recognition, and telesales lead mining.	/

Industry-specific self-learning models: You can train models for natural language algorithm capabilities, such as sentiment analysis, product review analysis, resume extraction, and bid information extraction.

Model name	Model description	Maximum text length
Sentiment analysis	Analyzes text to determine its positive or negative sentiment. The platform supports custom sentiment analysis models based on user-defined datasets.	/
Product review analysis	Builds custom models for various industries based on massive amounts of annotated data from Alibaba's e-commerce platforms to perform multi-dimensional analysis of product reviews. The platform supports custom product review analysis models based on user-defined review dimensions.	500 characters
Resume extraction	Uses models and a rules engine trained on massive amounts of internal annotated data from Alibaba to achieve high-accuracy extraction from Chinese and English resumes. It supports 27 common Chinese fields and 10 common English fields. For other custom fields, you can add annotated data for custom training.	/
Bid information extraction	Intelligently parses bidding documents and automatically extracts over 20 fields, such as bid amount, bidding entity, and subject matter. This helps in reviewing bid submission documents and increases the bid-win rate.	/
Contract element extraction	Extracts specific or key elements from contracts. Supported formats include text-based PDF and Word. The more data you annotate, the better the results.	/

Application algorithm self-learning models: You can train models for natural language algorithm capabilities, such as contract extraction and judicial document analysis (fact-finding).

Model name	Model description	Maximum text length
Contract extraction	Extracts entities from contract text. It has over 20 built-in entity labels that do not require annotation, reducing the data annotation cost for model training to less than 20% of the original cost.	/
Judicial document analysis (fact-finding)	Extracts fact-finding entities from judicial documents. It has over 10 built-in entity labels that do not require annotation, reducing the data annotation cost for model training to less than 50% of the original cost.	/

Pre-trained models (ready to call): The platform provides pre-trained models that are ready to call for capabilities such as product review analysis (e-commerce and local services), telesales conversation analysis (classification, threat, and fraud detection), news classification, news event extraction (English), multilingual sentiment analysis, and judicial document extraction.

Model name	Model description	Maximum text length
Product review analysis - E-commerce	Supports 55 e-commerce industries and 192 review attributes for multi-dimensional analysis of product reviews.	500 characters
Product review analysis - Local services	Supports 2 local service industries (beauty/hair/nails and food/dining) and 11 review attributes for multi-dimensional analysis of product reviews.	500 characters
Product review analysis - Automotive	Supports 68 review attributes in the automotive industry for multi-dimensional analysis of product reviews.	500 characters
Purchase decision analysis for product reviews - E-commerce	Analyzes purchase decision information, such as user motivations, usage scenarios, feature requirements, and questions. This helps improve products, enhance user experience, segment user profiles, and target marketing campaigns.	500 characters
Purchase decision analysis for product reviews - Automotive	Analyzes purchase decision information, such as user motivations, usage scenarios, feature requirements, and questions. This helps improve products, enhance user experience, segment user profiles, and target marketing campaigns.	500 characters
Bid announcement classification service	Classifies bid announcements. It currently supports two types: "invitation to bid" and "winning bid".	/
Bid information extraction - Basic Plan	Extracts 13 fields from bid information, including project name, project number, bidder name, and winning bid amount.	/
Bid information extraction - Pro	Supports separate parsing for invitations to bid and winning bids. Extracts 22 fields from invitation-to-bid information.	/
Bid information extraction - Pro	Supports separate parsing for invitations to bid and winning bids. Extracts 29 fields from winning-bid information.	/
Contract element extraction - General	Extracts common elements from contracts. It supports a total of 26 general element fields.	/
Online customer service scenario analysis	Analyzes online chat scenarios for customer service in industries such as e-commerce. It parses consumer messages to determine intent, sentiment, and emotion.	/
Document structuring - Key-value information extraction	Extracts information that follows a key-value pattern from documents such as resumes, contracts, and reports.	/
Telesales conversation - Industry classification	Classifies outbound telesales conversations by industry and scenario for applications such as voice quality inspection. It supports over 30 industries and over 170 scenarios.	/
Profanity detection service	Supports application scenarios such as customer service quality inspection for telesales conversations and streamer monitoring for live streaming.	/
Telesales conversation - User intent recognition	Recognizes user intent (reactions) in manual or intelligent outbound telesales scenarios.	/
Resume extraction - English	Extracts 10 fields from English resumes, including name, contact information, degree, company, and job title.	/
Resume extraction - Chinese	Extracts 33 fields from Chinese resumes, including name, gender, age, education, and employer.	/
Event extraction (English)	Extracts events from English news articles. It covers 33 event categories.	/
Product title category prediction	Predicts the category of a product based on its title in an e-commerce scenario. The category system is consistent with e-commerce platforms like Taobao.	/
Conversational knowledge extraction	Extracts agent scripts and user questions from online customer service chats. This can be used for hot-spot issue analysis or to build a script library for customer service agents.	/
Pornography detection service for novels	Detects pornographic or erotic content in Chinese novels for content moderation scenarios. It outputs a confidence level for pornography and the relevant text.	600 characters
Sentiment analysis (Russian)	Predicts the sentiment of Russian text from social media (short text) in e-commerce scenarios. Sentiments are classified as positive, neutral, or negative.	/
Sentiment analysis (English)	Predicts the sentiment of English text from social media (short text) in e-commerce scenarios. Sentiments are classified as positive, neutral, or negative.	/
Sentiment analysis (Spanish)	Predicts the sentiment of Spanish text from social media (short text) in e-commerce scenarios. Sentiments are classified as positive, neutral, or negative.	/
Emotion detection service	Recognizes customer or agent emotions in application scenarios such as telesales and online support. It supports 8 general emotions and 3 common emotions for business scenarios.	1000 characters
News text classification	Classifies one or more news articles.	/
Live streaming ASR garbled text detection	In live streaming scenarios, it uses ASR to convert speech to text and identifies poor readability caused by multiple people speaking at the same time.	600 characters
Judicial document extraction	Parses documents for 10 causes of action and extracts 38 fields.	/
Keyword extraction and text summarization (extractive)	Extracts keywords or summaries from documents.	500 characters
Text summarization (generative)	Designed for common text generation needs in real-world scenarios. It is suitable for generating text summaries or article titles.	500 characters
Product description generation (Chinese)	Generates product descriptions related to specified selling points for a given product.	500 characters
Weather report welcome message generation (Chinese)	Generates an in-car startup welcome message based on given weather information fields.	500 characters
Text embedding generation	Takes Chinese text as input and outputs its corresponding vector representation.	/

Benefits

Easy to use: The platform features a simple workflow that requires no background in engineering or algorithms.
Fast: The optimized end-to-end process reduces the average model training time to less than 30 minutes.
Professional: The platform is built on expert technology. If you use more than 500 pieces of annotated data, the model accuracy is expected to exceed 85%.