Read/write splitting

更新时间:
复制 MD 格式

For read-heavy workloads, Tair (Redis OSS-compatible) allows you to dynamically enable or disable read/write splitting. This feature offers a highly available and performant solution for centralized hot data access and high-concurrency reads. In a read/write splitting instance, a proxy component developed by the Alibaba Cloud Tair team automatically identifies and routes read and write requests and handles failovers. This simplifies integration, as you do not need to modify your application code to manage request routing or failovers.

Read/write splitting in standard architecture

A read/write splitting instance that uses the standard architecture consists of a master node, one or more read replicas, proxy servers, and a high availability system. The following figures show the architectures.

Figure 1. Cloud-native

image

Figure 2. Classic (discontinued)

image

Component

Cloud-native

Classic (discontinued)

Master node

Handles write requests and shares the read workload with read replicas.

Read replicas

Handle read requests. Read replicas have the following features:

  • All read replicas provide disaster recovery and can serve as replica nodes for data backup.

  • Read replicas synchronize data from the master node by using a star replication topology. This topology has significantly lower data latency than the chained replication topology of the classic edition.

  • You can customize the number of read replicas. An instance can have 1 to 4 read replicas per shard in a cluster architecture, and 1 to 9 read replicas in a standard architecture.

Handle read requests. Read replicas have the following features:

  • Read replicas use a chained replication topology. The more read replicas in the chain, the higher the data latency on the read replicas at the end of the chain.

  • You can configure 1, 3, or 5 read replicas.

Replica node

Any read replica can serve as a replica node. If the master node fails, the high availability system promotes the read replica with the most complete data to be the new master node. After the switchover, a new read replica is immediately added to the instance.

Because a dedicated replica node is not required, cloud-native read/write splitting instances offer the same performance at a lower cost.

A cold standby node for data backup. It does not serve traffic. If the master node fails, requests are failed over to this node.

Proxy server

When a client connects, the proxy server automatically identifies the request type and distributes traffic to different data nodes by weight. All nodes have equal weights, and the weights are not customizable. For example, write requests are forwarded to the master node, and read requests are forwarded to the master node and read replicas.

Note
  • Clients can connect only to a proxy server. Direct connections to individual nodes are not supported.

  • The proxy server distributes read requests evenly among the master node and read replicas. You cannot customize this distribution. For example, for an instance with three read replicas, the master node and each of the three read replicas handle 25% of the read requests.

High availability system

  • Automatically monitors the health of each node. If a node becomes unavailable, the high availability system initiates a failover or rebuilds a read replica and updates the routing and weight information accordingly.

  • Failover logic for master node selection: Data integrity is the priority. The high availability system promotes the read replica with the most complete data to be the new master node.

Notes on dual-zone read/write splitting instances

Cloud-native (recommended)

Classic (discontinued)

Both the primary and secondary zones provide services. The minimum configuration is as follows:

  • Primary zone: one master node and one read replica.

  • Secondary zone: one read replica.

The primary and secondary zones have separate endpoints. Both endpoints support read and write operations. Read requests from the primary zone are routed only to the master node or read replicas within that zone. Read requests from the secondary zone are routed only to the read replicas in that zone. This architecture enables proximity-based access. All write requests are routed to the master node in the primary zone. The following figure shows the architecture.

image
Note

We recommend that you configure two or more nodes in both the primary and secondary zones:

  • Primary zone: one master node and one read replica.

  • Secondary zone: two read replicas.

The secondary zone contains only a cold standby replica node for data backup, and it does not serve traffic. If the master node fails, requests are failed over to the replica node.

Features

  • Dynamic and easy to use

    You can enable read/write splitting for an instance that uses the standard architecture. The proxy server intelligently identifies and forwards read and write requests from clients. After you enable this feature, you can use any Redis-compatible client to connect to the read/write splitting instance to improve read performance without modifying your application. Instances with read/write splitting enabled are compatible with Redis protocol commands, but some command restrictions apply due to the proxy. For more information, see Command restrictions for read/write splitting instances.

  • High availability

    • A proprietary high availability system from Alibaba Cloud automatically monitors the health of all data nodes to ensure instance availability. If a master node becomes unavailable, the system automatically selects a new master node and rebuilds the replication topology. If a read replica fails, the high availability system automatically detects the failure, launches a new node to complete data synchronization, and takes the failed node offline.

    • The proxy servers monitor the service status of each read replica in real time. If a read replica becomes unavailable, the proxy automatically reduces its service weight. If a read replica fails more than a specified number of consecutive times, the proxy suspends service to the unavailable node. It continues to monitor the node and resumes its service after the node recovers.

  • High performance

    You can scale out read replicas to linearly increase the overall performance of a read/write splitting instance. Source-code-level optimizations to the Redis replication process maximize system stability during linear replication and fully utilize the physical resources of each read replica.

Use cases

This feature is ideal for scenarios with high read queries per second (QPS). If your application is read-heavy, an instance that uses the standard architecture may not meet your QPS requirements. In this case, you can deploy multiple read replicas to overcome the performance bottleneck of a single node. After you enable read/write splitting, the read QPS of an instance can increase by up to nine times.

Note

Due to the asynchronous replication mechanism of Redis, data replication latency can occur during periods of high write volumes. If you use this architecture, your application must be able to tolerate a certain degree of data staleness.

Read/write splitting in cluster architecture

In a cluster architecture, you can enable read/write splitting only for cloud-native instances that run in proxy mode. The following figure shows an example architecture.

image

Component descriptions

Component

Description

Proxy server

After a client connects to a proxy server, the proxy automatically identifies client requests and forwards them to the appropriate data shards and their corresponding read/write nodes. For example, write requests are forwarded to the master node, and read requests are load-balanced across the master node and read replicas.

Data shard

Each data shard consists of one master node and up to four read replicas.

  • Master node: Handles write requests and shares the read workload with read replicas. The master node is always deployed in the primary zone.

  • Read replica: Handles read requests. Read replicas synchronize data from the master node by using a star replication topology. You can have 1 to 4 read replicas, and the number can be dynamically adjusted. You can also deploy read replicas in the secondary zone for disaster recovery.

High availability service

  • Automatically monitors the health of each node. If a node becomes unavailable, the high availability system initiates a failover or rebuilds a read replica and updates the routing and weight information accordingly.

  • Failover logic for master node selection: Data integrity is the priority. The high availability system promotes the read replica with the most complete data to be the new master node.

Note
  • If the instance is deployed in a single availability zone, all nodes are located in the primary zone, and the instance provides only an endpoint for the primary zone.

  • If the instance is deployed in a dual-zone configuration, separate endpoints are provided for the primary and secondary zones. Both endpoints support read and write operations. Read requests from the primary zone are routed to the master node or read replicas within that zone. Read requests from the secondary zone are routed only to the read replicas in that zone to ensure proximity-based access. All write requests are routed to the master node in the primary zone. If all read replicas in the secondary zone become unavailable, the system routes read requests from that zone to the master node to ensure business continuity.

Recommendations and usage notes

  • If a read replica fails, requests are forwarded to other nodes. If all read replicas become unavailable, all read requests are forwarded to the master node. If read replicas fail, the load on the master node increases, which can lengthen its response time. Therefore, we recommend that you use multiple read replicas for read-heavy workloads.

  • If a read replica fails, the high availability system suspends service to the failed node and launches a new read replica. This process involves resource allocation, instance creation, data synchronization, and service loading. The time required depends on the workload and data volume. Tair (Redis OSS-compatible) does not guarantee a recovery time objective for read replicas.

  • A dual-zone read/write splitting deployment requires the primary zone to have at least one master node and one read replica. Before you enable read/write splitting for a dual-zone standard architecture instance that has one master node in the primary zone and one replica node in the secondary zone, you must add a replica node to the primary zone to ensure it contains two nodes. Then, you can enable read/write splitting.

  • Some scenarios, such as a high-availability failover of the master node, trigger a full data synchronization on a read replica. During a full synchronization, the read replica is unavailable and returns the-LOADING Redis is loading the dataset in memory\r\n message.

  • Some read commands have special forwarding rules in the read/write splitting architecture. For example, the SCAN command is forwarded to the master node for execution, while the proxy distributes the HSCAN, SSCAN, and ZSCAN commands evenly across the master node and read replicas based on a slot modulo calculation. For the complete set of forwarding rules, see Proxy routing rules.

Prerequisites

Before you begin, make sure that:

  • The instance is deployed in cloud-native mode

  • The instance is a Redis Open-Source Edition or Tair (Enterprise Edition) DRAM-optimized or persistent memory-optimized instance

  • The instance has at least 1 GB of memory

  • The instance is a high availability instance

Procedure

FAQ

  • Q: Does enabling read/write splitting for a standard architecture instance increase its overall bandwidth?

    A: Yes. After you enable read/write splitting, the instance's theoretical total bandwidth is the bandwidth of the instance specification multiplied by the total number of nodes (for example, 96 MB/s × 3 nodes = 288 MB/s). The added proxy nodes forward most read requests to the read replicas, which reduces the bandwidth pressure on the master node. However, the actual bandwidth is affected by factors such as business requests and clients. The actual bandwidth is subject to the results of stress tests.

  • Q: After I enable read/write splitting for a standard architecture instance, can I change its architecture to the cluster architecture?

    A: Yes. You must first disable read/write splitting and then change the instance architecture.

  • Q: How do I check whether read/write splitting is enabled?

    A: You can go to the Node Management page of the instance to check whether the read/write splitting option is enabled.

  • Q: Why are read requests not routed to my read replicas?

    A: In a dual-zone read/write splitting architecture, the primary and secondary zones have separate endpoints. Read requests are routed only to the master node or read replicas within the same zone. If you use only the endpoint for the primary zone, no read requests are routed to the read replicas in the secondary zone. To enable proximity-based access and load balancing, you must explicitly differentiate the endpoints for the primary and secondary zones in your application code and route requests for the secondary zone to its specific endpoint.