Product introduction

更新时间:
复制 MD 格式

This document introduces the NLP Self-Learning Platform.

Product overview

The NLP Self-Learning Platform is designed for users without an algorithm background. It provides automated, industry-specific annotation, training, and services. The platform supports customizable Natural Language Processing (NLP) capabilities, such as entity extraction, text classification, key phrase extraction, sentiment analysis, relation extraction, short text matching, and product review analysis. You can obtain high-quality NLP models by annotating or uploading a sufficient amount of data.

Tutorial video

Features

The NLP Self-Learning Platform includes the following features:

Basic self-learning models: You can train models for natural language algorithm capabilities, such as entity extraction, text classification, key phrase extraction, relation extraction, short text matching, and conversational text classification.

Model name

Model description

Maximum text length

Text classification

Classifies text based on content type. For example, in a text message scenario, a carrier classifies messages as pornography, terrorism, political, or advertising based on their content. The platform supports custom classification models based on user-defined classification systems.

/

Entity extraction

Extracts entities with specific meanings from text. For example, in a contract review scenario, you can extract entities such as contract name, party A, party B, and receiving account number to quickly structure many contracts. The platform supports custom entity extraction models based on user-defined entity types.

/

Key phrase extraction

Extracts keyword and phrase labels based on the TextRank algorithm. This project type does not require you to upload annotated data. The algorithm automatically analyzes text features to extract key phrases. You can also upload a custom dictionary to optimize key phrase extraction for specific domains.

500 characters

Relation extraction

Extracts entities and their corresponding relationships from text, such as a person's name and their birthday, or an organization and its founding date. The platform supports custom relation extraction models based on user-defined relationship systems.

/

Short text matching

Calculates the similarity between different texts and outputs a score between 0 and 1. A higher score indicates greater similarity. The platform supports custom short text matching models based on user-defined datasets.

/

Sentence pair classification

Classifies a pair of sentences by content type, supporting both single-choice and multiple-choice classification. Common scenarios include determining if two sentences are semantically equivalent, checking if a question and an answer match, and performing contextual single-sentence classification.

/

Conversational text classification

Classifies an entire conversation by content type, supporting both single-choice and multiple-choice classification. Common scenarios include conversation quality inspection, customer intent recognition, and telesales lead mining.

/

Industry-specific self-learning models: You can train models for natural language algorithm capabilities, such as sentiment analysis, product review analysis, resume extraction, and bid information extraction.

Model name

Model description

Maximum text length

Sentiment analysis

Analyzes text to determine its positive or negative sentiment. The platform supports custom sentiment analysis models based on user-defined datasets.

/

Product review analysis

Builds custom models for various industries based on massive amounts of annotated data from Alibaba's e-commerce platforms to perform multi-dimensional analysis of product reviews. The platform supports custom product review analysis models based on user-defined review dimensions.

500 characters

Resume extraction

Uses models and a rules engine trained on massive amounts of internal annotated data from Alibaba to achieve high-accuracy extraction from Chinese and English resumes. It supports 27 common Chinese fields and 10 common English fields. For other custom fields, you can add annotated data for custom training.

/

Bid information extraction

Intelligently parses bidding documents and automatically extracts over 20 fields, such as bid amount, bidding entity, and subject matter. This helps in reviewing bid submission documents and increases the bid-win rate.

/

Contract element extraction

Extracts specific or key elements from contracts. Supported formats include text-based PDF and Word. The more data you annotate, the better the results.

/

Application algorithm self-learning models: You can train models for natural language algorithm capabilities, such as contract extraction and judicial document analysis (fact-finding).

Model name

Model description

Maximum text length

Contract extraction

Extracts entities from contract text. It has over 20 built-in entity labels that do not require annotation, reducing the data annotation cost for model training to less than 20% of the original cost.

/

Judicial document analysis (fact-finding)

Extracts fact-finding entities from judicial documents. It has over 10 built-in entity labels that do not require annotation, reducing the data annotation cost for model training to less than 50% of the original cost.

/

Pre-trained models (ready to call): The platform provides pre-trained models that are ready to call for capabilities such as product review analysis (e-commerce and local services), telesales conversation analysis (classification, threat, and fraud detection), news classification, news event extraction (English), multilingual sentiment analysis, and judicial document extraction.

Model name

Model description

Maximum text length

Product review analysis - E-commerce

Supports 55 e-commerce industries and 192 review attributes for multi-dimensional analysis of product reviews.

500 characters

Product review analysis - Local services

Supports 2 local service industries (beauty/hair/nails and food/dining) and 11 review attributes for multi-dimensional analysis of product reviews.

500 characters

Product review analysis - Automotive

Supports 68 review attributes in the automotive industry for multi-dimensional analysis of product reviews.

500 characters

Purchase decision analysis for product reviews - E-commerce

Analyzes purchase decision information, such as user motivations, usage scenarios, feature requirements, and questions. This helps improve products, enhance user experience, segment user profiles, and target marketing campaigns.

500 characters

Purchase decision analysis for product reviews - Automotive

Analyzes purchase decision information, such as user motivations, usage scenarios, feature requirements, and questions. This helps improve products, enhance user experience, segment user profiles, and target marketing campaigns.

500 characters

Bid announcement classification service

Classifies bid announcements. It currently supports two types: "invitation to bid" and "winning bid".

/

Bid information extraction - Basic Plan

Extracts 13 fields from bid information, including project name, project number, bidder name, and winning bid amount.

/

Bid information extraction - Pro

Supports separate parsing for invitations to bid and winning bids. Extracts 22 fields from invitation-to-bid information.

/

Bid information extraction - Pro

Supports separate parsing for invitations to bid and winning bids. Extracts 29 fields from winning-bid information.

/

Contract element extraction - General

Extracts common elements from contracts. It supports a total of 26 general element fields.

/

Online customer service scenario analysis

Analyzes online chat scenarios for customer service in industries such as e-commerce. It parses consumer messages to determine intent, sentiment, and emotion.

/

Document structuring - Key-value information extraction

Extracts information that follows a key-value pattern from documents such as resumes, contracts, and reports.

/

Telesales conversation - Industry classification

Classifies outbound telesales conversations by industry and scenario for applications such as voice quality inspection. It supports over 30 industries and over 170 scenarios.

/

Profanity detection service

Supports application scenarios such as customer service quality inspection for telesales conversations and streamer monitoring for live streaming.

/

Telesales conversation - User intent recognition

Recognizes user intent (reactions) in manual or intelligent outbound telesales scenarios.

/

Resume extraction - English

Extracts 10 fields from English resumes, including name, contact information, degree, company, and job title.

/

Resume extraction - Chinese

Extracts 33 fields from Chinese resumes, including name, gender, age, education, and employer.

/

Event extraction (English)

Extracts events from English news articles. It covers 33 event categories.

/

Product title category prediction

Predicts the category of a product based on its title in an e-commerce scenario. The category system is consistent with e-commerce platforms like Taobao.

/

Conversational knowledge extraction

Extracts agent scripts and user questions from online customer service chats. This can be used for hot-spot issue analysis or to build a script library for customer service agents.

/

Pornography detection service for novels

Detects pornographic or erotic content in Chinese novels for content moderation scenarios. It outputs a confidence level for pornography and the relevant text.

600 characters

Sentiment analysis (Russian)

Predicts the sentiment of Russian text from social media (short text) in e-commerce scenarios. Sentiments are classified as positive, neutral, or negative.

/

Sentiment analysis (English)

Predicts the sentiment of English text from social media (short text) in e-commerce scenarios. Sentiments are classified as positive, neutral, or negative.

/

Sentiment analysis (Spanish)

Predicts the sentiment of Spanish text from social media (short text) in e-commerce scenarios. Sentiments are classified as positive, neutral, or negative.

/

Emotion detection service

Recognizes customer or agent emotions in application scenarios such as telesales and online support. It supports 8 general emotions and 3 common emotions for business scenarios.

1000 characters

News text classification

Classifies one or more news articles.

/

Live streaming ASR garbled text detection

In live streaming scenarios, it uses ASR to convert speech to text and identifies poor readability caused by multiple people speaking at the same time.

600 characters

Judicial document extraction

Parses documents for 10 causes of action and extracts 38 fields.

/

Keyword extraction and text summarization (extractive)

Extracts keywords or summaries from documents.

500 characters

Text summarization (generative)

Designed for common text generation needs in real-world scenarios. It is suitable for generating text summaries or article titles.

500 characters

Product description generation (Chinese)

Generates product descriptions related to specified selling points for a given product.

500 characters

Weather report welcome message generation (Chinese)

Generates an in-car startup welcome message based on given weather information fields.

500 characters

Text embedding generation

Takes Chinese text as input and outputs its corresponding vector representation.

/

Benefits

  • Easy to use: The platform features a simple workflow that requires no background in engineering or algorithms.

  • Fast: The optimized end-to-end process reduces the average model training time to less than 30 minutes.

  • Professional: The platform is built on expert technology. If you use more than 500 pieces of annotated data, the model accuracy is expected to exceed 85%.