Install the file system SDK

更新时间:
复制 MD 格式

This topic explains how to install and use the file system SDK for Apsara File Storage for HDFS.

Prerequisites

  • You have created an Apsara File Storage for HDFS file system and added a mount target. For more information, see Create a file system and Add a mount target.

  • Your ECS instance must have JDK 1.8 or later installed.

Background

This topic demonstrates how to use the file system SDK with hadoop-mapreduce-examples. In this example, MapReduce runs in pseudo-distributed mode. For more information about the pseudo-distributed mode for MapReduce, see the Apache Hadoop documentation.

Configure Hadoop

This section explains how to configure Hadoop, using version 2.7.2 as an example.

  1. Download Hadoop. We recommend using version 2.7.2 or later.

  2. Run the following command to decompress the Hadoop package.

    tar -zxf hadoop-2.7.2.tar.gz
  3. Run the following command to set the Hadoop environment variable.

    export HADOOP_HOME=yourWorkingDir/hadoop-2.7.2
  4. Run the cd hadoop-2.7.2 command to change to the Hadoop directory.

  5. Configure the hadoop-env.sh file.

    1. Open the hadoop-env.sh file.

      vim etc/hadoop/hadoop-env.sh
    2. Configure JAVA_HOME.

      # Replace ${JAVA_HOME} with the path to the Java JDK on your ECS instance.
      export JAVA_HOME=${JAVA_HOME}
  6. Configure the core-site.xml file by modifying the required content in the core-site.xml file.

    1. Run the following command to open the core-site.xml file.

      vim etc/hadoop/core-site.xml
    2. In the core-site.xml file, add the following configuration.

      <property>
           <name>fs.defaultFS</name>
           <value>dfs://f-xxxx.cn-xxxxx.dfs.aliyuncs.com:10290</value>
      </property>
      <property>
           <name>fs.dfs.impl</name>
           <value>com.alibaba.dfs.DistributedFileSystem</value>
      </property>
      <property>
           <name>fs.AbstractFileSystem.dfs.impl</name>
           <value>com.alibaba.dfs.DFS</value>
      </property>
      Note
      • f-xxxx.cn-xxxxx.dfs.aliyuncs.com is a placeholder for the mount address of an Apsara File Storage for HDFS file system. Replace it with your mount address. On the details page of the HDFS file system, click the Mount Targets tab and find the address in the Mount Address column. The address format is similar to xxx.cn-hangzhou.dfs.aliyuncs.com. This address is used for the file system configuration in the core-site.xml file.

      • You must sync the contents of core-site.xml to all nodes that depend on hadoop-common.

Deploy dependencies

  1. Download the latest Java SDK for the Apsara File Storage for HDFS file system.

  2. Copy the downloaded SDK to the CLASSPATH of the Hadoop ecosystem components.

    Deploy the SDK to the directory that contains hadoop-common-x.y.z.jar and copy it to all Hadoop nodes. For the MapReduce component, this directory is ${HADOOP_HOME}/share/hadoop/hdfs. Example command:

    cp aliyun-sdk-dfs-x.y.z.jar ${HADOOP_HOME}/share/hadoop/hdfs

    In this command, replace x.y.z with the SDK version number.

Verify the installation

  1. Prepare the data.

    1. Create a directory.

      ${HADOOP_HOME}/bin/hadoop fs -mkdir -p inputDir
    2. Upload a file. For example, to upload a.txt:

      ${HADOOP_HOME}/bin/hadoop fs -put a.txt inputDir/
  2. Restart the YARN service.

    1. Stop the YARN service.

      ${HADOOP_HOME}/sbin/stop-yarn.sh
    2. Start the YARN service.

      ${HADOOP_HOME}/sbin/start-yarn.sh
  3. Run a sample test.

    • WordCount example

      ${HADOOP_HOME}/bin/hadoop jar \
      ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount \
      inputDir outputDir
    • Grep example

      ${HADOOP_HOME}/bin/hadoop jar \
      ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep \
      inputDir outputDirGrep "the"

Next steps

For examples of using the Apsara File Storage for HDFS file system with the Hadoop FileSystem API, see SDK examples.