Traditionally, running AI, Machine Learning, or large-scale data analytics tasks requires moving data to an external computing cluster. This process increases architectural complexity and adds data transfer costs and latency. PolarDB for PostgreSQL and integrate a Ray application (known as PolarDB Ray). This lets you run these complex workloads directly within the database using the Python ecosystem and the Ray distributed computing framework. This approach significantly improves data processing and AI inference efficiency and effectively reduces costs.
Overview
Ray is an open-source distributed computing framework for large-scale AI and Python applications. PolarDB features a built-in Ray application that tightly integrates the framework with the database. This lets you write and execute Python code directly in the database, using a rich ecosystem of AI and machine learning (ML) libraries such as Pandas, NumPy, Scikit-learn, and PyTorch to process data. You can perform tasks such as model training, data preprocessing, and complex vector computation entirely within the database, without moving data to an external computing environment.
Advantages
Serverless Architecture: Combines Steady-state Nodes for consistent workloads, which do not scale in, with Agile Nodes that scale automatically based on real-time traffic. This design achieves both reliability and cost-efficiency.
High Availability: The Ray Head node supports high availability and disaster recovery capabilities, ensuring the stable operation of the cluster.
Security and Reliability
Provides runtime isolation through Secure Containers, ensuring a secure code execution environment.
Supports multi-tenant isolation to prevent data leakage between tenants.
Access to the Ray Dashboard and Ray Job submissions is secured with Username/Password and JWT authentication to prevent unauthorized access.
Integrated Metadata Management: The service runs in the same Virtual Private Cloud (VPC) as your PolarDB Cluster. Metadata processed by PolarDB Ray can be stored directly in the PolarDB Cluster, enabling seamless data exchange without any additional network configuration.
Unified Resource Pool: Offers a Unified Resource Pool for CPUs and supports GPU Workers, providing resource flexibility for diverse computing tasks.
Storage Service: Supports mounting the high-performance edition of PolarDB File System 2.0, enabling direct data access through a file interface.
Open-Source Customization: As a downstream project of open-source Ray, PolarDB Ray is seamlessly compatible with the Ray ecosystem. It also offers customized optimizations and enhancements for developers.
Scope
This feature is available for centralized PolarDB for PostgreSQL clusters, but not for PolarDB for PostgreSQL Distributed Edition clusters.
Billing
Component Fee: A Ray Application is billed separately for its Head Node and each Worker Node. The cost depends on the Component Specification (CPU and Memory) and the selected purchase duration.
Storage Fee: While there are no additional charges for data and files generated by the Ray Application, data that a Ray Job writes to a PolarDB for PostgreSQL or Cluster, or to a PolarDB File System, is subject to Storage Space Fees.
Traffic and Bandwidth: No fee.
Getting started
Create a Ray Application: Learn how to create a Ray Application (PolarDB Ray) on a new or existing cluster.
Configure a Ray Application: Learn how to set the Ray Application's Security Configuration, obtain connection information, and optionally enable Public Access.
Submit a Job to a Ray Application: Learn how to submit and execute a Ray job in your Ray Application using JupyterLab, the Python Software Development Kit (SDK), or the command-line interface (CLI).