文档

Oracle2PolarDB:Polardb-O兼容性之fetchsize

更新时间:
一键部署

适用于

  • 本文适用平台为公有云、专有云、PolarDB Stack和纯软。
  • PolarDB高度兼容Oracle版本。

目的

如何通过JDBC驱动导出大量数据。

解决办法

Oracle迁移到Polardb-o后,通过JDBC驱动导出大量数据时出现OOM错误,报错如下。但是在Oracle中导出小表和大表都不会出现OOM的报错。

步骤一:Oracle JDBC驱动的行为

默认情况下Oracle JDBC执行完一个查询后,会从数据库一次返回10行结果集,10行是默认的Oracle row-prefetch值。可以通过修改row-prefetch值来修改一次从数据库返回的数据行数。详情请参见Oracle JDBC官方文档或以下内容。

Row Fetch Size
By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows at a time from the database cursor. This is the default Oracle row fetch size value. You can change the number of rows retrieved with each trip to the database cursor by changing the row fetch size value.
Standard JDBC also enables you to specify the number of rows fetched with each database round-trip for a query, and this number is referred to as the fetch size. In Oracle JDBC, the row-prefetch value is used as the default fetch size in a statement object. Setting the fetch size overrides the row-prefetch setting and affects subsequent queries run through that statement object.

步骤二:PolarDB JDBC驱动行为

PolarDB的JDBC驱动是基于PG的,PolarDB默认一次性获取所有的结果集,所以当结果集超出JVM内存时,就会出现OOM报错,详情请参见PG的JDBC文档或以下内容。

Getting results based on a cursor
By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

步骤三:如何适配Oracle JDBC行为

可以通过PolarDB JDBC的cursor方式来适配Oracle的行为。将代码更改为游标模式,将语句的fetchsize设置为适当的大小。注意fetchsize设置为0,将会导致所有行被缓存(默认行为),如下所示。

// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();

// Turn use of the cursor on.
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next())
{
    System.out.print("a row was returned.");
}
rs.close();

// Turn the cursor off.
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next())
{
    System.out.print("many rows were returned.");
}
rs.close();

// Close the statement.
st.close();

注意:必须关闭autocommit才能保证JDBC cursor方式生效。

步骤四:测试性能影响

异构迁移时最好从默认设置开始,而不是尝试调整任何可能的参数,默认值通常适用于最常见的情况。以下步骤使用相同Java代码的示例,用于测试Oracle和PolarDB中的JDBC中的fetchsize的不同表现。

准备测试脚本

  1. 准备ojdbc.jar和postgresql.jar,并配置到CLASSPATH。
  2. 提供连接用户、密码和JDBC URL参数即可调用代码,代码执行简单查询,打印出表字段值和执行时的时间戳。创建的TEST.java文件内容如下所示。
    import java.io.*;
    import java.sql.*;
    import java.util.*;
    import java.time.*;
    import oracle.jdbc.*;
    public class TEST {
    public static void println(String text){
    System.out.println(LocalDateTime.now().toString() +":" + text);
    }
    public static void main(String[] args)
    throws SQLException,InterruptedException {
    try (Connection c = (Connection) DriverManager.getConnection(
      args[2],args[0],args[1])
    ) {
    c.setAutoCommit(false);
    try (PreparedStatement s = c.prepareStatement(
      "select ID,INFO,test_time(ID) t from test"
      )) {
      println(" PRS "+s);
      try ( ResultSet rs = s.executeQuery() ) {
      println(" EXE "+rs);
      while ( rs.next()) {
        println(" FCH "+rs.getLong("ID")+" "+rs.getString("INFO")+" "+rs.getTimestamp("T"));
      }
      System.out.println("fetch size: "+rs.getFetchSize());
      }
    }
    }
    }
    }

Oracle环境准备

依次执行以下SQL语句,准备Oracle的测试数据。

ALTER SESSION SET current_schema := wangt;
CREATE OR REPLACE FUNCTION test_time (
 id number
)
RETURN timestamp
BEGIN
 RETURN current_timestamp;
END;
/

CREATE TABLE wangt.test (
 id int,
 info varchar2(32)
);
INSERT INTO wangt.test SELECT rownum, lower(DBMS_RANDOM.STRING('x', 32)) FROM (  SELECT level  FROM dual  CONNECT BY level <= 1000 ), (  SELECT level  FROM dual  CONNECT BY level <= 1000 );
COMMIT;

PolarDB环境准备

依次执行以下SQL语句,准备PolarDB的测试数据。

create or replace function test_time(id int) returns timestamp as
$$
begin
return clock_timestamp();
end;
$$
language plpgsql;
create table test (id int,info varchar(32));
insert into test select generate_series(1,1000000),md5(random()::text);

运行脚本

为了效果测试我们对JVM进行会话级别设置,JVM内存的参数有四个:
  • -Xmx:Java Heap最大值,默认值为物理内存的1/4,最佳值应该以物理内存大小及计算机内其他内存开销为准。
  • -Xms:Java Heap初始值,Server端JVM最好将-Xms和-Xmx设为相同值,开发测试机JVM可以保留默认值。
  • -Xmn:Java Heap Young区大小,不熟悉最好保留默认值。
  • -Xss:每个线程的Stack大小,不熟悉最好保留默认值。

参考以下步骤,分别在Oracle和PolarDB环境中进行测试:

  • Oracle环境中测试
    1. 在Oracle环境中执行以下命令,将输出结果保存到a.txt文件。
      java -Xmx64m -Xms32m -Xmn32m -Xss16m TEST "wangt" "wangt" "jdbc:oracle:thin:@(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=orcl))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))">a.txt
    2. 执行以下命令,查看执行结果的a.txt文件。
      head -10 a.txt && tail -10 a.txt
      系统显示类似如下,Oracle的默认fetchsize是10,查询返回耗时在15秒左右。
      2020-10-27T19:00:57.813: PRS oracle.jdbc.driver.OraclePreparedStatementWrapper@71c8becc
      2020-10-27T19:00:57.870: EXE oracle.jdbc.driver.ForwardOnlyResultSet@399f45b1
      2020-10-27T19:00:57.874: FCH 348 zlnhl04fu9k2se5xnrc01cuzic8x9kwr 2020-10-27 19:00:57.825025
      2020-10-27T19:00:57.874: FCH 349 ua4h1bbcc6yjdltc6kz996vhywfuk16o 2020-10-27 19:00:57.825047
      2020-10-27T19:00:57.874: FCH 350 koktqha0xakfxobpx86irs9s7jcmci1w 2020-10-27 19:00:57.82505
      2020-10-27T19:00:57.875: FCH 351 uid423c4vjo6v77w4ldctghr7wg3zbxs 2020-10-27 19:00:57.825053
      2020-10-27T19:00:57.875: FCH 352 txjzsousrvpmzgpc4qy6yjo2ft71faxo 2020-10-27 19:00:57.825055
      2020-10-27T19:00:57.875: FCH 353 g85nw1pwg2ztx73bme9xulu8qq08ysh8 2020-10-27 19:00:57.825056
      2020-10-27T19:00:57.875: FCH 354 3uawie8lsxzwqse27gzavsw7y6vvt65j 2020-10-27 19:00:57.825058
      2020-10-27T19:00:57.875: FCH 355 x9pflcxfc9tle8ay70iukdf216rcke5q 2020-10-27 19:00:57.82506
      2020-10-27T19:01:12.202: FCH 998820 kjc9qbehqs397fmsvgd7d0lnsdldyrs7 2020-10-27 19:01:12.202311
      2020-10-27T19:01:12.202: FCH 998821 eb1wrmlwrs28351xgldrg6ydxymo36p5 2020-10-27 19:01:12.202314
      2020-10-27T19:01:12.202: FCH 998822 356z27pm9gfg5ml9zy4exrkb9olutb1u 2020-10-27 19:01:12.202316
      2020-10-27T19:01:12.202: FCH 998823 y975cqjp2s7mtjmwwr30xagcurtvjqg7 2020-10-27 19:01:12.202317
      2020-10-27T19:01:12.202: FCH 998824 pnlid4z1rc1qa785gpvrli6t010vh0xr 2020-10-27 19:01:12.202319
      2020-10-27T19:01:12.202: FCH 998825 gt56olyn34r8hohagom49n0qoleu4wt7 2020-10-27 19:01:12.202321
      2020-10-27T19:01:12.202: FCH 998826 56xzs6tjtmz36o3hqm0qnyi4ek1tirv9 2020-10-27 19:01:12.202323
      2020-10-27T19:01:12.202: FCH 998827 bo8avbuabm8gb47e8flsubckqysayot5 2020-10-27 19:01:12.202324
      2020-10-27T19:01:12.202: FCH 998828 b8e2zys1896z2pjnzmlc37t1n1ltzofn 2020-10-27 19:01:12.202326
      fetch size: 10
  • PolarDB环境中测试
    1. 在Oracle环境中执行以下命令。
      java -Xmx64m -Xms32m -Xmn32m -Xss16m TEST "postgres" "wangt" "jdbc:postgresql://localhost:5432/wangt"
      查看执行结果显示类似如下。
      2020-10-27T19:06:08.935: PRS select ID,INFO,test_time(ID) t from test
      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
            at java.lang.Class.newReflectionData(Class.java:2511)
            at java.lang.Class.reflectionData(Class.java:2503)
            at java.lang.Class.privateGetDeclaredFields(Class.java:2577)
            at java.lang.Class.getDeclaredField(Class.java:2068)
            at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl$1.run(AtomicReferenceFieldUpdater.java:316)
            at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl$1.run(AtomicReferenceFieldUpdater.java:314)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:313)
            at java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:109)
            at java.sql.SQLException.<clinit>(SQLException.java:372)
            at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2255)
            at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
            at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
            at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
            at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
            at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:114)
              at TEST.main(TEST.java:20)
    2. 增大JVM内存执行以下命令,将输出结果保存到d.txt文件。
      java -Xmx256m -Xms256m -Xmn32m -Xss16m TEST "postgres" "wangt" "jdbc:postgresql://localhost:5432/wangt">d.txt
    3. 执行以下命令,查看执行结果的d.txt文件。
      head -10 d.txt && tail -10 d.txt
      系统显示类似如下,发现查询返回耗时1秒。
      2020-10-27T20:08:00.495: PRS select ID,INFO,test_time(ID) t from test
      2020-10-27T20:08:01.566: EXE org.postgresql.jdbc.PgResultSet@2b80d80f
      2020-10-27T20:08:01.566: FCH 599041 28696429f925919ed00293889f78eaa7 2020-10-27 20:08:00.502665
      2020-10-27T20:08:01.567: FCH 599042 4922cb2967e21b9f3fb07030208faa07 2020-10-27 20:08:00.502772
      2020-10-27T20:08:01.567: FCH 599043 e008cc6d02002d9961c8404ebec20325 2020-10-27 20:08:00.502774
      2020-10-27T20:08:01.567: FCH 599044 a376cae9d03f4ffda7290531784f3e66 2020-10-27 20:08:00.502774
      2020-10-27T20:08:01.567: FCH 599045 ee713dcf6f7b71fb4db1b43d6e117b5b 2020-10-27 20:08:00.502775
      2020-10-27T20:08:01.568: FCH 599046 52e831df5cc6c475a35bdb68a88b8a76 2020-10-27 20:08:00.502776
      2020-10-27T20:08:01.568: FCH 599047 db802d43205ce2533ca28e1f1f18c498 2020-10-27 20:08:00.502777
      2020-10-27T20:08:01.568: FCH 599048 eabf53bf0d7a553520e91a9ec4b788c3 2020-10-27 20:08:00.502778
      2020-10-27T20:08:06.799: FCH 599032 5036a4f80c0d91001e88fc72f4166cb1 2020-10-27 20:08:01.552726
      2020-10-27T20:08:06.799: FCH 599033 cb619178166e82eac39cad85b555d94d 2020-10-27 20:08:01.552727
      2020-10-27T20:08:06.799: FCH 599034 e987a0a1b5cc16cdd3d8ad0c27c2bdc7 2020-10-27 20:08:01.552728
      2020-10-27T20:08:06.799: FCH 599035 2774edc4ebd0336fcb6540a458e8e9a0 2020-10-27 20:08:01.552729
      2020-10-27T20:08:06.799: FCH 599036 00935f16045911396f08b9e8c9918c3e 2020-10-27 20:08:01.552729
      2020-10-27T20:08:06.799: FCH 599037 d90671309c1a88587911d43ba2e47f57 2020-10-27 20:08:01.55273
      2020-10-27T20:08:06.799: FCH 599038 fa2e74464b70cae4133b69f826d16da8 2020-10-27 20:08:01.552731
      2020-10-27T20:08:06.799: FCH 599039 f02e575dc05b43934066a6a0b7924be9 2020-10-27 20:08:01.552732
      2020-10-27T20:08:06.799: FCH 599040 c8f5ded8fa63f609997290c6dd3f3962 2020-10-27 20:08:01.552732
      fetch size: 0

修改fetchsize后运行脚本

  • 将Java的TEST脚本文件的fetchsize修改为10000,脚本内容如下所示。
    import java.io.*;
    import java.sql.*;
    import java.util.*;
    import java.time.*;
    import oracle.jdbc.*;
    public class TEST {
    public static void println(String text){
      System.out.println(LocalDateTime.now().toString() +":" + text);
    }
    public static void main(String[] args)
    throws SQLException,InterruptedException {
      try (Connection c = (Connection) DriverManager.getConnection(
        args[2],args[0],args[1]) 
      ) {
      c.setAutoCommit(false);
      try (PreparedStatement s = c.prepareStatement(
        "select ID,INFO,test_time(ID) t from test"
        )) {
        s.setFetchSize(10000);
        println(" PRS "+s);
        try ( ResultSet rs = s.executeQuery() ) {
        println(" EXE "+rs);
        while ( rs.next()) {
          println(" FCH "+rs.getLong("ID")+" "+rs.getString("INFO")+" "+rs.getTimestamp("T"));
        }
        System.out.println("fetch size: "+rs.getFetchSize());
        }
      }
      }
    }
    }
  • 执行脚本文件:
    • Oracle环境:
      1. 执行以下命令,将输出结果保存到c.txt文件。
        java -Xmx64m -Xms32m -Xmn32m -Xss16m TEST "wangt" "wangt" "jdbc:oracle:thin:@(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=orcl))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))">c.txt
      2. 执行以下命令,查看执行结果。
        head -10 c.txt && tail -10 c.txt
        系统显示类似如下,发现fetchsize改为10000后,Oracle的查询耗时在7秒左右。
        2020-10-27T19:17:01.294: PRS oracle.jdbc.driver.OraclePreparedStatementWrapper@71c8becc
        2020-10-27T19:17:01.377: EXE oracle.jdbc.driver.ForwardOnlyResultSet@399f45b1
        2020-10-27T19:17:01.381: FCH 348 zlnhl04fu9k2se5xnrc01cuzic8x9kwr 2020-10-27 19:17:01.296989
        2020-10-27T19:17:01.381: FCH 349 ua4h1bbcc6yjdltc6kz996vhywfuk16o 2020-10-27 19:17:01.297038
        2020-10-27T19:17:01.381: FCH 350 koktqha0xakfxobpx86irs9s7jcmci1w 2020-10-27 19:17:01.297042
        2020-10-27T19:17:01.381: FCH 351 uid423c4vjo6v77w4ldctghr7wg3zbxs 2020-10-27 19:17:01.297044
        2020-10-27T19:17:01.381: FCH 352 txjzsousrvpmzgpc4qy6yjo2ft71faxo 2020-10-27 19:17:01.297046
        2020-10-27T19:17:01.381: FCH 353 g85nw1pwg2ztx73bme9xulu8qq08ysh8 2020-10-27 19:17:01.297048
        2020-10-27T19:17:01.381: FCH 354 3uawie8lsxzwqse27gzavsw7y6vvt65j 2020-10-27 19:17:01.29705
        2020-10-27T19:17:01.381: FCH 355 x9pflcxfc9tle8ay70iukdf216rcke5q 2020-10-27 19:17:01.297051
        2020-10-27T19:17:08.994: FCH 998820 kjc9qbehqs397fmsvgd7d0lnsdldyrs7 2020-10-27 19:17:08.943294
        2020-10-27T19:17:08.994: FCH 998821 eb1wrmlwrs28351xgldrg6ydxymo36p5 2020-10-27 19:17:08.943296
        2020-10-27T19:17:08.994: FCH 998822 356z27pm9gfg5ml9zy4exrkb9olutb1u 2020-10-27 19:17:08.943298
        2020-10-27T19:17:08.994: FCH 998823 y975cqjp2s7mtjmwwr30xagcurtvjqg7 2020-10-27 19:17:08.9433
        2020-10-27T19:17:08.994: FCH 998824 pnlid4z1rc1qa785gpvrli6t010vh0xr 2020-10-27 19:17:08.943301
        2020-10-27T19:17:08.994: FCH 998825 gt56olyn34r8hohagom49n0qoleu4wt7 2020-10-27 19:17:08.943303
        2020-10-27T19:17:08.994: FCH 998826 56xzs6tjtmz36o3hqm0qnyi4ek1tirv9 2020-10-27 19:17:08.943305
        2020-10-27T19:17:08.994: FCH 998827 bo8avbuabm8gb47e8flsubckqysayot5 2020-10-27 19:17:08.943307
        2020-10-27T19:17:08.994: FCH 998828 b8e2zys1896z2pjnzmlc37t1n1ltzofn 2020-10-27 19:17:08.94331
        fetch size: 10000
    • PolarDB环境:
      1. 执行以下命令,将输出结果保存到b.txt文件。
        java -Xmx64m -Xms32m -Xmn32m -Xss16m TEST "postgres" "wangt" "jdbc:postgresql://localhost:5432/wangt">b.txt
      2. 执行以下命令,查看执行结果。
        head -10 b.txt && tail -10 b.txt
        系统显示类似如下,发现fetchsize改为10000后,PolarDB的查询耗时在6秒左右。
        2020-10-27T19:11:29.156: PRS select ID,INFO,test_time(ID) t from test
        2020-10-27T19:11:29.176: EXE org.postgresql.jdbc.PgResultSet@3ab39c39
        2020-10-27T19:11:29.177: FCH 193921 36f65536abd922afac4c1e6e6bea58ba 2020-10-27 19:11:29.159064
        2020-10-27T19:11:29.177: FCH 193922 0c43c054d8037534a8eb21eb8288df91 2020-10-27 19:11:29.159108
        2020-10-27T19:11:29.177: FCH 193923 3a508526310438ee1eff4712a5b33d55 2020-10-27 19:11:29.159109
        2020-10-27T19:11:29.177: FCH 193924 a58c63770c8369b24fc9be8b0ec22d88 2020-10-27 19:11:29.15911
        2020-10-27T19:11:29.177: FCH 193925 2f904150185012521097af7a4e8043ad 2020-10-27 19:11:29.159111
        2020-10-27T19:11:29.177: FCH 193926 c4651a1652f4ee273ed7c4dbec67932f 2020-10-27 19:11:29.159112
        2020-10-27T19:11:29.177: FCH 193927 cdd9dd31ed70422bce32ab2d7796ae4e 2020-10-27 19:11:29.159113
        2020-10-27T19:11:29.177: FCH 193928 6a62f6d6364a0ed5886ee2efc41ca8ed 2020-10-27 19:11:29.159113
        2020-10-27T19:11:35.592: FCH 193912 82081ccea01df08169ad82dd65e59712 2020-10-27 19:11:35.542907
        2020-10-27T19:11:35.592: FCH 193913 e57c023d05555823c4f3c3cc05e25ef1 2020-10-27 19:11:35.542908
        2020-10-27T19:11:35.592: FCH 193914 868221b036df35c947829c3824405042 2020-10-27 19:11:35.542909
        2020-10-27T19:11:35.592: FCH 193915 ce5a77136d5179d61ea5e132807d74ef 2020-10-27 19:11:35.54291
        2020-10-27T19:11:35.592: FCH 193916 4a50939d4e0f72debe607435fcf65b08 2020-10-27 19:11:35.542911
        2020-10-27T19:11:35.592: FCH 193917 9fc2a7f55417617e86801b518360a7fd 2020-10-27 19:11:35.542911
        2020-10-27T19:11:35.592: FCH 193918 345439d6d22816b0666d31d12fe4dc6a 2020-10-27 19:11:35.542912
        2020-10-27T19:11:35.592: FCH 193919 c5f051e7feba37061e7498b264312bac 2020-10-27 19:11:35.542913
        2020-10-27T19:11:35.592: FCH 193920 08b6669f4da983913b941c0dd82072b0 2020-10-27 19:11:35.542914
        fetch size: 10000
JDBC解析SQL语句后,为每个Statement(包括PreparedStatement和CallableStatement)分配了两个Buffer缓存数据,即byte[]和char[]。字符类型的数据(CHAR、VARCHAR2、NCHAR和etc)缓存在char[]中,其他类型的数据缓存在byte[]中。在SQL语句解析后,语句所查询的列的数据类型就已经确定,JDBC会根据这些信息和Fetch Size一起计算出缓存的大小,并分配内存。所以如果不需要查询某张表的所有列时,使用SELECT * FROM XXX的方式是一种浪费内存的行为,特别是表的列数多且数据量大的时候,很容易造成OOM,代码也很少有select *的方式。
通过以上测试可知,fetch size的设置影响到JVM内存消耗和查询的响应时间,设置fetch size时要权衡两者。一般JDBC默认的fetch size是10,这个值比较小,可以防止查询时out of momory的问题,但是程序的响应时间会变长。字段多的表或者行数大的表需要一个小的fetch size值来降低查询结果集的JVM内存消耗;但是字段少或者行数据小的表需要一个大的fetch size值来降低查询结果集的响应时间,需要您根据实际情况进行设置。要设置fetch size,请在执行查询之前,对statement对象调用setFetchSize()。如果fetch size为N,那么每次往返数据库能返回N行数据。

参考文档

更多信息您可以参见以下文档:

  • 本页导读
文档反馈