零起步的Hadoop实践日记(内存设置调整)

今天尝试跑了一个这样的Hive SQL,跑过去30天的用户的平均步数和卡路里。

#!/bin/bash

cur_date=`date +%Y%m%d`
pasts=""
for i in `seq 30`
do
    iday=`date -d "$i days ago" +%Y%m%d`
    if [ 1 -eq $i ]
    then
        pasts=$iday
    else
        pasts=$pasts","$iday
    fi
done
# echo $pasts
sudo -su hdfs hive -e "select uid,avg(steps),avg(calories) from dailystats where day in ($pasts) group by
 uid" > /ad/tongji/output/getAvgStats/$cur_date

  

结果到Web Tracker(默认8088端口的服务)中观察发现Hive启动了2个Map,然后这个Map就失败重试最后全部失败。

从Web Tracker返回的结果是:

AttemptID:attempt_1395208369821_0011_m_000004_0 Timed out after 600 secscleanup failed for container container_1395208369821_0011_01_000006 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:122) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:208) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:400) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) at com.sun.proxy.$Proxy29.stopContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:119) ... 5 more Caused by: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) ... 7 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291) at org.apache.hadoop.ipc.Client.call(Client.java:1209) ... 8 more

Hive Shell返回的是

Error during job, obtaining debugging information...
Job Tracking URL: http://AY130105124528d0c2393:8088/proxy/application_1395208369821_0011/
Examining task ID: task_1395208369821_0011_m_000003 (and more) from job job_1395208369821_0011

Task with the most failures(1):
-----
Task ID:
task_1395208369821_0011_m_000004

URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1395208369821_0011&tipid=task_1395208369821_0011_m_000004
-----
Diagnostic Messages for this Task:
AttemptID:attempt_1395208369821_0011_m_000004_0 Timed out after 600 secs
cleanup failed for container container_1395208369821_0011_01_000006 : java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:122)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:208)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:400)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212)
at com.sun.proxy.$Proxy29.stopContainer(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopContainer(ContainerManagerPBClientImpl.java:119)
... 5 more
Caused by: java.net.ConnectException: Call From AY130105124528d0c2393/10.200.134.127 to AY130105124528d0c2393:59937 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
at org.apache.hadoop.ipc.Client.call(Client.java:1242)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
... 7 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
at org.apache.hadoop.ipc.Client.call(Client.java:1209)
... 8 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

可是呢,完全没头脑,最后查Hive Shell找到对应的Application的log位置:

/var/log/hadoop-yarn/containers/application_1395208369821_0011/container_1395208369821_0011_01_000006 # ll
total 16
drwx--x--- 2 yarn yarn 4096 Mar 19 14:44 ./
drwx--x--- 8 yarn yarn 4096 Mar 19 14:44 ../
-rw-rw-r-- 1 yarn yarn 0 Mar 19 14:44 stderr
-rw-rw-r-- 1 yarn yarn 544 Mar 19 14:44 stdout
-rw-rw-r-- 1 yarn yarn 3852 Mar 19 14:44 syslog

查看stdout

/var/log/hadoop-yarn/containers/application_1395208369821_0011/container_1395208369821_0011_01_000006 # ll
more stdout 
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f5e80000, 99090432, 0) faile
d; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 99090432 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /ad/hadoop-yarn/cache/yarn/nm-local-dir/usercache/hdfs/appcache/application_1395208369821_0011/containe
r_1395208369821_0011_01_000006/hs_err_pid2286.log

卡在Map的原因就是 Cannot allocate memory

可以对Map内存使用进行设置,实际我只修改了mapred-site文件,加入这个property

<name>mapreduce.map.memory.mb</name>
<value>800</value>

机器内存4G,我就设置800M,当然也尝试过900和其他数值,这个数值可以了。30天数据大概450W,5分钟跑完。偶也~

参考:

http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

http://woopisy.hatenablog.com/entry/2013/11/19/131033

原文地址:https://www.cnblogs.com/aquastar/p/3611750.html