Hadoop相关总结

问题


  1. 命令行执行Job任务时,出现如下问题,原因为MacOS系统的文件系统大小写不敏感,导致在unjar包时报错,有命名冲突。

    1
    2
    3
    4
    5
    6
    Exception in thread "main" java.io.IOException: Mkdirs failed to create /var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/hadoop-unjar3745345762036287746/license
    at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:128)
    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:104)
    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

    解决方法为执行如下命令,删除jar文件中的这两个文件。

    1
    2
    zip -d Documents/Test.jar LICENSE
    zip -d Documents/Test.jar META-INF/LICENSE
  2. 使用Eclipse导出可供Hadoop执行的jar包时,导出的类型应选择Runnable JAR file

  1. 在集群上搭建Hadoop时,一切正常,为了Debug在本机搭建,使用Pseudo-Distributed Mode模式时,node的状态变为unhealthy,可以采取比较trick的方法来解决,既在yarn-site.xml中添加属性来扩大健康范围,如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
        <?xml version="1.0"?>
    <configuration>
    <property>
    <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
    <value>0.0</value>
    </property>
    <property>
    <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
    <value>100.0</value>
    </property>
    </configuration>

用法


  1. 使用Hadoop命令执行Job示例:
    hadoop jar Documents/Test.jar hdfs://localhost:9000/user/zhongwu/sample.txt hdfs://localhost:9000/user/zhongwu/output111

知识点


  1. MapReduce作业成为job,有两类节点控制着作业执行过程:一个jobtracker及一系列tasktrackerjobtracker通过调度tasktracker上运行的任务来协调所有运行在系统上的作业。tasktracker在运行任务的同时将运行报告发送给jobtrackerjobtracker由此记录每项作业任务的整体进度情况。如果其中一个任务失败,jobtracker可以在另外一个tasktracker节点上重新调度该任务。

  2. HDFS集群有两类节点以管理者-工作者模式运行,即一个namenode(管理者)和多个datanode(工作者)。namenode管理文件系统的命名空间。它维护着文件系统树及整颗树内所有的文件和目录。这些信息以两个文件形式永久保存在本地磁盘上:命名空间镜像文件和编辑日志文件。namenode也记录着每个文件中各个块所在的数据节点信息,但它并不永久保存块的位置信息,因为这些信息会在系统启动时由数据节点重建。namenode只需要响应块位置的请求,无需响应数据请求,否则随着客户端数量的增长,namenode会很快成为瓶颈。

配置参考


hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>tag
<property>
<name>dfs.name.dir</name>
<value>/usr/local/Cellar/hadoop/name1</value></property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/Cellar/hadoop/data1</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx400m</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.input.dir.recursive</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>127.0.0.1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>127.0.0.1:19888</value>
</property>
</configuration>

core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/tmp/hadoop-${user.name}</value>
</property>
</configuration>

yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8090</value>
</property>
<property>
<name>yarn.nodemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://127.0.0.1:19888/jobhistory/logs/</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>0.0</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>100.0</value>
</property>
</configuration>