The content community industry

更新时间:
复制 MD 格式

Community content typically includes User Generated Content (UGC) and Professionally Generated Content (PGC). The keywords and content are diverse, and the quality of word usage varies. Therefore, search engines must perform intelligent semantic analysis on keywords and content to understand user query intent and retrieve the most comprehensive and relevant results. This topic describes how to use the OpenSearch Content Enhanced Edition in community forum scenarios to improve the user search experience.

The core of a community is its users, who primarily join to consume content. Content is broadly defined as text, images, audio, and video, but it can also be a way to find solutions to problems. High-quality content increases user popularity, drives traffic and engagement to the platform, and promotes user growth and retention. This, in turn, leads to more business opportunities and revenue.

Search is the most effective way to directly access content in a community. Every community constantly works to solve the following search performance problems:

  • How can we accurately understand user search intent and return the most relevant results?

  • How can we use differentiated and personalized content distribution to improve the user search experience and enhance a sense of community and user loyalty?

  • How can we enable interaction and connection when retrieving content from different realms, vertical categories, and channels within the community?

  • How can we better integrate and manage non-commercial and commercial content?

This topic addresses these questions by analyzing the characteristics and challenges of search in the content community industry. It also introduces the solutions and best practices of the Alibaba Cloud OpenSearch Content Community Enhanced Edition.

Search business requirements for the content industry

» More exposure opportunities: A low no-result rate. » Better search quality: High relevance and higher-quality sorting. » Richer business features: Adjust search results based on business attributes. » More comprehensive supporting features: Features such as spelling correction, top searches, hints, and drop-down suggestions. » Lower ownership cost: Lower development, resource, and O&M costs compared to self-managed search engines. » Easier development and use: A short time to market and reduced difficulty in search engine development and result tuning.

» Strong user search intent: The main search aggregates content from multiple channels and requires more precise search relevance.

For example, a forum community has product lines that cover multiple platforms, such as web pages, apps, and miniapps, and has multiple business channels. As the business grows, the traffic to the main search on the home page increases. The integration of paid services and traffic acquisition services makes search traffic operations more important. Consequently, the business demands on the main search increase, requiring it to aggregate content from multiple channels and provide more precise search relevance. In addition to text relevance, other business factors must be considered. A mature search engine involves systems such as offline modules, online modules, a query understanding service, and an algorithm platform. These systems require extensive development, algorithm tuning, and continuous, complex O&M work. With limited human resources, a self-managed search system struggles to meet business needs.

Common search scenarios

  • Search for content such as blog posts, Q&A pairs, and experience sharing

  • Discovery of premium content and popular posts

  • Traffic acquisition for paid resources

  • Filtering by tag categorization

(Image from Alibaba Cloud Developer Community)

  • Popular events and topic interactions

  • PGC and UGC

  • Search guidance such as top searches, hints, and drop-down suggestions

  • Personalization and timeliness

(Image from Alibaba Cloud Developer Community)

OpenSearch Content Community Enhanced Edition

Solution architecture

image

Features

The Industry-specific Enhanced Edition for the content industry is based on Alibaba's latest algorithms. It addresses the pain points and requirements of search scenarios in different vertical content categories. It provides intelligent semantic understanding, vector recall, and sorting algorithms that are exclusive to the content industry. This ensures both search performance and accuracy. It also effectively solves major industry challenges, such as high search latency, large resource consumption, and high no-result rates caused by large vocabularies. For the content industry, OpenSearch also provides a vector model to implement vector recall and multi-path search. This improves query accuracy and provides a multimodal search solution.

1. Feature differences

Features

General-purpose Edition

Industry-specific Enhanced Edition

One-stop configuration

After creating an application, you must manually create and configure query analysis, sort policies, and drop-down suggestion models.

Combines common search scenarios in the content industry. You can select the required capabilities and features. It also provides application schema templates and index schema templates for one-click configuration. This lowers the barrier to entry for new users.

Query analysis

Provides capabilities for general industries, such as synonym configuration, stop word filtering, spelling correction, term weight analysis, and category prediction.

Provides an enhanced analyzer and query analysis features for the content industry. It more accurately builds indexes and identifies user query intent by combining content search scenarios and industry challenges, delivering results that are better than the General-purpose Edition.

Sort policy

After creating an application, you must manually configure and debug the corresponding sort policies based on your business scenario.

Provides common sort expressions for the content industry based on the application and index schema templates. This feature meets most sorting requirements for the content industry without requiring extra configuration.

Feature iteration

Regularly updates the system default dictionaries for the analyzer, query analysis, and other features.

Continuously iterates and updates based on changes in nouns, products, and other elements in the content industry. It optimizes the existing tokenization and query analysis capabilities to provide more timely service guarantees.

2. Query analysis performance comparison

The Industry-specific Enhanced Edition offers more in-depth optimizations for query analysis compared to the General-purpose Edition. It not only addresses common problematic cases from the General-purpose Edition but also enriches its lexicon by incorporating data from various sources within the content industry.

  • Tokenization: (Tokens are separated by spaces)

Query

General-purpose Edition

Industry-specific Enhanced Edition

To decompress

Understanding compression

to decompress

actual parameter and formal parameter

Arguments and parameters

actual parameter and formal parameter

struct overload

structure body-weight load

struct overload

googlechromeframe

googlechromeframe

google chrome frame

  • Spelling correction

    :

Query

General-purpose Edition

Industry-specific Enhanced Edition

Taobao Intelligent Vision

Taobao can only vision

Taobao Intelligent Vision

mybatics code generation

mybatics code generation

mybatis code generation

Computer Network

Computer network

computer network

WeChat Mini Program

miniature miniapp

WeChat mini program

Deep learning

Deep learning

deep learning

It provides high-quality vector recall models for the data distribution of vertical categories in the content industry. This ensures high retrieval performance for long-tail queries, queries with typos, and queries that rely on synonym rewriting.

  • Vector recall

Query

US gmted2010 data download

Vector recall top 1

gmt43 related code, data download address

Vector recall top 2

gmt0054-2010.pdf

Vector recall top 3

gmted2010 US download address

Query

3D game graphics processing

Vector recall top 1

3d game animation processing basics

Vector recall top 2

basics of 3d game animation

Vector recall top 3

animation game processing

Query

disable NVIDIA card

Vector recall top 1

disable and enable NIC

Vector recall top 2

disable NIC

Vector recall top 3

disable and enable NIC

Personalized search based on sequence-based behavior modeling

For example, the search results for a user who consecutively searches for "interview" and "Java" are different from the results for a user who only searches for "Java". This provides personalized retrieval, meets the specific search needs of different users, and improves the user search experience.

DeepRanking deep sorting model

The model can scale to the hundred-billion parameter level, ensuring better search results. The model training and usage costs are significantly lower than the costs of self-developed solutions, which include expenses for labor, machines, and R&D support.

The deep retrieval model integrates the NLP capabilities of Alibaba DAMO Academy to improve search results and reduce the no-result rate.

image

The model structure is deeply customized based on user and data characteristics and combined with Alibaba's extensive technical expertise to create a unique deep model structure that is Made for you.

Integration process for the Enhanced Edition

You can easily integrate and quickly get started with one-click integration of industry templates. You can select features based on your business needs. The edition also supports business intervention tuning and digital operations, which can be performed by non-technical personnel.

Schema design

For more information, see Application Schema: Create a multi-table join.

Data ingestion

OpenSearch supports ingesting data from various data sources. You can also import data using an API, a software development kit (SDK), or by uploading a file in the console. The supported methods are as follows:

  1. RDS data source configuration

  2. MaxCompute (formerly ODPS) data source configuration

  3. PolarDB data source configuration

  4. Data import through API/SDK

Content Community Industry Template configuration

Feature selection. Taking the IT industry as an example of a vertical category, you can select the template features as needed. By default, all features are selected.

The template features include query analysis (such as IT term weights, IT synonym packages, and text embedding), sort policies (such as multi-path search, text relevance, and vector relevance), and drop-down suggestions.

Search testing

  1. To use "IT"

    vector index

    ", you must first configure the text embedding feature in query analysis and add the corresponding IT

    vector index

    :

  2. You can perform a test in Search Test:

Custom result tuning service

If you require deep retrieval, sort result tuning, or personalized search, the OpenSearch team of experts provides a custom result tuning service. For more information, you can contact technical support or your business representative.

Case study

A Chinese IT content community is dedicated to providing full lifecycle services for Chinese software developers, including knowledge sharing, online learning, and career development. It operates multiple products.

Since adopting Alibaba Cloud OpenSearch, the community has integrated multiple PC and mobile platforms over the course of one year. This integration covers channel searches for sub-businesses such as home page search, blogs, downloads, and Q&A. Using OpenSearch, the community provides high-quality search services for its users. These search optimizations have also led to more business conversions and an increase in overall business revenue.

  • Compared with a self-managed service based on open source software, the click-through rate (CTR) increased by over 80%.

  • Subsequently, algorithm experts continuously tuned the results for the customer using deeply customized models. This increased the clicks per user per impression by 16.7% and the Item-CTR by 11.8%. Performance continues to improve.

image