Offline integration using Hive foreign tables created based on OSS

更新时间:
复制 MD 格式

Before you can use OSS-based Hive foreign tables for offline integration in Dataphin with an E-MapReduce 5.x Hadoop compute engine, you must configure the required parameters.

Configuration instructions

Configure the required parameters in the core-site.xml file of the Hive data source or Hadoop compute engine, and then upload the file.

  • If Dataphin and OSS are in the same region, configure the fs.oss.endpoint parameter in the core-site.xml file.

  • If Dataphin and OSS are in different regions, you must also configure the accessKeyId and accessKeySecret parameters in addition to fs.oss.endpoint.

Note

You do not need to configure accessKeyId and accessKeySecret for internal endpoints.

Configuration examples

  • Dataphin and OSS are in the same region.

    <property>
    <name>fs.oss.endpoint</name>
    <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
  • Dataphin and OSS are in different regions.

    <property>
    <name>fs.oss.endpoint</name>
    <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>ak</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>ks</value>
    </property>
    Note
    • Set {value} for fs.oss.endpoint based on your region. For more information, see Regions and endpoints.

    • For fs.oss.accessKeyId and fs.oss.accessKeySecret, set {value} to your AccessKey information. For more information about how to obtain an AccessKey, see Create AccessKey.

FAQ

Error during offline integration: com.alibaba.dt.pipeline.plugin.center.exception.DataXException: Code:[HDFSConnection-06], Description:[An IO exception occurred while establishing a connection with HDFS.]. - java.io.IOException: No FileSystem for scheme: oss

Add the following configuration to your core-site.xml file to resolve this error:

<property>
    <name>fs.oss.impl</name>
    <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
</property>
    <property>
    <name>fs.AbstractFileSystem.oss.impl</name>
    <value>com.aliyun.jindodata.oss.OSS</value>
</property>
<property>
    <name>fs.jindofsx.data.cache.enable</name>
    <value>false</value>
</property>
<property>
    <name>fs.jindofsx.namespace.rpc.address</name>
    <value>emr-cluster:8101</value>
</property>
Important

Set {value} for fs.jindofsx.namespace.rpc.address based on your cluster configuration. If you are unsure of the value, contact the EMR product helpdesk.

Error during offline integration: Description:[An IO exception occurred while establishing a connection with HDFS.]. - java.io.IOException: ERROR: not found login secrets, please configure the accessKeyId and accessKeySecret

Add the following configuration to your core-site.xml file to resolve this error:

<property>
    <name>fs.jindofsx.namespace.rpc.address</name>
    <value>emr-cluster:8101</value>
</property>