Install and use the file system SDK-Apsara File Storage for HDFS(HDFS)-阿里云帮助中心

This topic explains how to install and use the file system SDK for Apsara File Storage for HDFS.

Prerequisites

You have created an Apsara File Storage for HDFS file system and added a mount target. For more information, see Create a file system and Add a mount target.
Your ECS instance must have JDK 1.8 or later installed.

Background

This topic demonstrates how to use the file system SDK with hadoop-mapreduce-examples. In this example, MapReduce runs in pseudo-distributed mode. For more information about the pseudo-distributed mode for MapReduce, see the Apache Hadoop documentation.

Configure Hadoop

This section explains how to configure Hadoop, using version 2.7.2 as an example.

Download Hadoop. We recommend using version 2.7.2 or later.
Run the following command to decompress the Hadoop package.
```
tar -zxf hadoop-2.7.2.tar.gz
```
Run the following command to set the Hadoop environment variable.
```
export HADOOP_HOME=yourWorkingDir/hadoop-2.7.2
```
Run the cd hadoop-2.7.2 command to change to the Hadoop directory.

Configure the hadoop-env.sh file.

Open the hadoop-env.sh file.
```
vim etc/hadoop/hadoop-env.sh
```

Configure JAVA_HOME.

# Replace ${JAVA_HOME} with the path to the Java JDK on your ECS instance.
export JAVA_HOME=${JAVA_HOME}

Configure the core-site.xml file by modifying the required content in the core-site.xml file.
1. Run the following command to open the core-site.xml file.
```
vim etc/hadoop/core-site.xml
```
2. In the core-site.xml file, add the following configuration.
```
<property>
     <name>fs.defaultFS</name>
     <value>dfs://f-xxxx.cn-xxxxx.dfs.aliyuncs.com:10290</value>
</property>
<property>
     <name>fs.dfs.impl</name>
     <value>com.alibaba.dfs.DistributedFileSystem</value>
</property>
<property>
     <name>fs.AbstractFileSystem.dfs.impl</name>
     <value>com.alibaba.dfs.DFS</value>
</property>
```
  Note
  
  f-xxxx.cn-xxxxx.dfs.aliyuncs.com is a placeholder for the mount address of an Apsara File Storage for HDFS file system. Replace it with your mount address. On the details page of the HDFS file system, click the Mount Targets tab and find the address in the Mount Address column. The address format is similar to xxx.cn-hangzhou.dfs.aliyuncs.com. This address is used for the file system configuration in the core-site.xml file.
  
  You must sync the contents of core-site.xml to all nodes that depend on hadoop-common.

Deploy dependencies

Download the latest Java SDK for the Apsara File Storage for HDFS file system.
Copy the downloaded SDK to the CLASSPATH of the Hadoop ecosystem components.
Deploy the SDK to the directory that contains hadoop-common-x.y.z.jar and copy it to all Hadoop nodes. For the MapReduce component, this directory is ${HADOOP_HOME}/share/hadoop/hdfs. Example command:
```
cp aliyun-sdk-dfs-x.y.z.jar ${HADOOP_HOME}/share/hadoop/hdfs
```
In this command, replace x.y.z with the SDK version number.

Verify the installation

Prepare the data.

Create a directory.

${HADOOP_HOME}/bin/hadoop fs -mkdir -p inputDir

Upload a file. For example, to upload a.txt:

${HADOOP_HOME}/bin/hadoop fs -put a.txt inputDir/

Restart the YARN service.
1. Stop the YARN service.
```
${HADOOP_HOME}/sbin/stop-yarn.sh
```
2. Start the YARN service.
```
${HADOOP_HOME}/sbin/start-yarn.sh
```

Run a sample test.

WordCount example

${HADOOP_HOME}/bin/hadoop jar \
${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount \
inputDir outputDir

Grep example

${HADOOP_HOME}/bin/hadoop jar \
${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep \
inputDir outputDirGrep "the"

Next steps

For examples of using the Apsara File Storage for HDFS file system with the Hadoop FileSystem API, see SDK examples.