Vector analysis

更新时间:
复制 MD 格式

AnalyticDB for PostgreSQL extracts features from unstructured data as vectors, enabling fast retrieval and association analysis with structured data.

Introduction to vector databases

Most real-world data — images, audio, videos, and text — is unstructured. The volume of this unstructured data is growing rapidly due to applications like smart cities, short videos, personalized product recommendations, and visual product searches. AI techniques extract features from this data and convert them into feature vectors, which can then be stored, analyzed, and retrieved. A database that stores, analyzes, and retrieves feature vectors is a vector database.

Vector databases use Approximate Nearest Neighbor Search (ANNS) to retrieve feature vectors quickly. ANNS identifies likely neighbors rather than exact matches, trading a small amount of precision for significantly higher retrieval efficiency.

The industry uses two approaches to apply ANNS in production:

  • A standalone ANNS service for creating and retrieving vector indexes — a dedicated vector database.

  • ANNS vector indexing integrated into a traditional structured database — a DBMS with vector retrieval capabilities.

AnalyticDB for PostgreSQL takes the second approach, integrating the proprietary FastANN vector engine into a full-featured DBMS with transactions, high availability, and high scalability. You query vectors directly with SQL.

Features

AI algorithms extract features from unstructured data and represent them as vectors. The distance between vectors reflects the similarity of the source data. AnalyticDB for PostgreSQL builds vector retrieval on a Massively Parallel Processing (MPP) architecture, so you can query unstructured data and run association analysis with structured data through SQL.

Use cases

AnalyticDB for PostgreSQL vector analysis supports a range of intelligent applications:

  • Image-based search to retrieve visually similar images.

  • Video retrieval by matching specific frames.

  • Voiceprint matching across audio clips.

  • Personalized recommendations based on user feature matching.

  • Semantic text retrieval to find similar documents.

  • Q&A chatbots powered by Large Language Models (LLMs).

  • File deduplication using fingerprint vectors.

Advantages

The AnalyticDB for PostgreSQL vector database, powered by FastANN, is used across Alibaba's data mid-end, Alibaba's e-commerce, new retail, Urban Intelligence, and Qwen LLM Q&A services.

Key advantages over other vector databases:

  • Hybrid analysis of structured and unstructured data.

    AnalyticDB for PostgreSQL analyzes unstructured, structured, and semi-structured data together, with full indexing support for structured and semi-structured data.

  • Dual-channel retrieval with vector search and full-text search.

    AnalyticDB for PostgreSQL supports both vector and full-text indexes, combining them in queries to improve recall accuracy.

  • Real-time data updates and queries.

    AnalyticDB for PostgreSQL ingests streaming data and builds vector indexes in real time.

  • Ease of use.

    AnalyticDB for PostgreSQL is ready to use on demand with standard SQL syntax.

  • Low cost.

    AnalyticDB for PostgreSQL compresses vectors from FP32 to FP16, halving storage costs. The vector index in AnalyticDB for PostgreSQL uses segmented page storage that leverages PostgreSQL's shared_buffer cache, so AnalyticDB for PostgreSQL can store vectors exceeding available memory.