关于Jedis客户端#2504的问题说明_云数据库 Tair（兼容 Redis®）(Tair)-阿里云帮助中心

Jedis客户端社区 Issue #2504描述了在低版本的JedisCluster中，JedisCluster本地缓存的路由表存在无法更新的问题。这可能会导致错误访问的情况。例如，当节点a（IP地址10.10.10.10:6379）从A集群中下线后，在某些环境中，该IP地址可能会被重新分配到新集群。JedisCluster可能会将A集群的请求发送至该IP地址所对应的新集群中，导致查询失败。上述问题已在Jedis的较新版本中得到修复，请确保将您的客户端升级至Jedis 3.10.0或更高版本。

升级建议

使用Jedis 4.x.x或5.x.x 版本：无上述问题，但推荐升级至Jedis最新版本。
使用Jedis 2.x.x或3.x.x版本：请升级至Jedis 3.10.0或更高版本。

说明

云数据库 Tair（兼容 Redis）集群代理模式无上述问题。

细节说明

以未修复该问题的Jedis 2.9.0版本为例，JedisCluster的本地路由表管理逻辑位于JedisClusterInfoCache.java中，主要依赖于以下两个变量（JedisClusterInfoCache.java L22 ~ L23）。

private final Map<String, JedisPool> nodes = new HashMap<String, JedisPool>();
private final Map<Integer, JedisPool> slots = new HashMap<Integer, JedisPool>();

nodes变量缓存了host:port和JedisPool的关系。
slots变量缓存了slot 和 JedisPool 的关系。

上述两个变量在初始化时是根据Cluster slots信息结果进行填充。由于在建立连接时仅需为该节点本身建立连接，但一个分片节点上通常会存在多个slot，随着分片节点的增减，nodes与slots之间的对应关系将会发生变化。假设一个3分片节点的集群路由表如下所示：

xxx 10.3.255.248:6379@13007 master,nofailover - 0 1738808475717 1 connected 0-5461
xxx 10.3.255.249:6379@13007 myself,master,nofailover - 0 0 1 connected 5462-10922
xxx 10.3.255.250:6379@13007 master,nofailover - 0 1738808474709 1 connected 10923-16383

nodes和slots关系如下：

老版本的问题（Bug）在于：当集群中路由表发生变化后，JedisCluster只能感知到添加操作，而不会主动释放已过期的节点。这导致后续拉取新的路由表时，仍可能使用已过期的节点进行拉取。JedisCluster拉取路由表的代码如下，其中getShuffledNodesPool会使用nodes变量，而nodes变量本身又没有得到更新维护，这导致了该问题。

for (JedisPool jp : getShuffledNodesPool()) {
  try {
    jedis = jp.getResource();
    discoverClusterSlots(jedis);
    return;
  } catch (JedisConnectionException e) {
    // try next nodes
  } finally {
    if (jedis != null) {
      jedis.close();
    }
  }
}

public List<JedisPool> getShuffledNodesPool() {
  r.lock();
  try {
    List<JedisPool> pools = new ArrayList<JedisPool>(nodes.values());
    Collections.shuffle(pools);
    return pools;
  } finally {
    r.unlock();
  }
}

该问题已在Jedis社区PR #2462中进行修复，JedisCluster将自动清除已过期的nodes节点。

如有需要，可展开查看Jedis修复代码。

      // Remove dead nodes according to the latest query
      Iterator<Entry<String, JedisPool>> entryIt = nodes.entrySet().iterator();
      while (entryIt.hasNext()) {
        Entry<String, JedisPool> entry = entryIt.next();
        if (!hostAndPortKeys.contains(entry.getKey())) {
          JedisPool pool = entry.getValue();
          try {
            if (pool != null) {
              pool.destroy();
            }
          } catch (Exception e) {
            // pass, may be this node dead
          }
          entryIt.remove();
        }

复现方式

若您希望复现该问题，可在Jedis 2.9.0版本中使用以下代码进行复现。

说明

首先您需要创建一个直连集群实例，初始任意节点数量，并进行手动增加和删除分片操作。

此程序会间隔60s打印一次nodes变量，当增加分片时nodes变量会增加，但减少分片却不会体现到nodes变量中。

如有需要，请展开查看复现代码与预计返回结果。

import java.util.Map;

import redis.clients.jedis.HostAndPort;
import redis.clients.jedis.JedisCluster;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;

public class JedisClusterTest{
    /** 该方法用于打印当前JedisCluster中的所有节点信息。*/
    private static void printNodes(JedisCluster jc) {
        Map<String, JedisPool> clusterNodes = jc.getClusterNodes();
        System.out.println("time: " + System.currentTimeMillis()/1000 + ", nodes map: " + clusterNodes.size());
        for (String key : clusterNodes.keySet()) {
            System.out.println(key);
        }
        System.out.println();
    }
    public static void main(String[] args) throws Exception {
        /** 检查命令行参数是否足够，如果参数不足，提示正确的使用方法。*/
        if (args.length < 3) {
            System.out.println("Usage: java -jar JedisClusterTest.jar <host> <port> <password>");
            return;
        }

        /** 从命令行参数中获取集群实例的连接地址、端口号和账号密码。*/
        String host = args[0];
        int port = Integer.parseInt(args[1]);
        String password = args[2];

        /** 连接到指定集群实例。*/
        JedisCluster jc = new JedisCluster(new HostAndPort(host, port), 2000, 2000, 5, password,
            new JedisPoolConfig());

        try {
            for (int i = 0; i < Integer.MAX_VALUE; i++) {
                Thread.sleep(1000);
                jc.set("" + i, "" + i);
                if (i % 60 == 0) {
                    printNodes(jc);
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

预计返回：

time: 1738808668, nodes map: 3
10.3.255.248:6379
10.3.255.250:6379
10.3.255.249:6379

// 手动添加一个分片。

time: 1738808848, nodes map: 4
10.3.255.248:6379
10.3.255.250:6379
10.3.255.249:6379
10.3.0.3:6379

...

// 手动减少一个分片，nodes不会减少。
time: 1738811309, nodes map: 4
10.3.255.248:6379
10.3.255.250:6379
10.3.255.249:6379
10.3.0.3:6379

time: 1738811369, nodes map: 4
10.3.255.248:6379
10.3.255.250:6379
10.3.255.249:6379
10.3.0.3:6379