换下风格^_^

Hadoop2 开发环境搭建测试

Hadoop admin 4588℃ 0评论

Hadoop2.2.0 单机开发搭建。

环境:

系统 CentOS 6.3 64位
Jdk版本 oracle jdk 1.7
Hadoop版本 2.2.0
使用linux用户 Hadoop

目录配置

/home/hadoop 用户目录
/app/cloud/hadoop/hadoop-2.2.0 软件home
/app/cloud/hadoop/dfs/name 数据和编辑文件
/app/cloud/hadoop/dfs/data 数据和编辑文件
/app/cloud/hadoop/mapred/local 存放数据
/app/cloud/hadoop/mapred/system 存放数据

1. 安装jdk

sudo vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_45
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile

2. ssh无密码登录

Hadoop用户操作:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub> ~/.ssh/authorized_keys

Root用户操作:

chmod go-w /home/hadoop/.ssh
chmod 600 /home/hadoop/.ssh/authorized_keys

测试:

Hadoop用户

[hadoop@hadoop01 ~]$ ssh localhost

3. 安装hadoop

可以自己下载源码包编译成适合本地native包

为了简单我直接下载编译好的hadoop包:

地址:http://apache.fayea.com/apache-mirror/hadoop/common/current2/

解压到目录;

移动解压软件到软件目录:

/app/hadoop/hadoop/hadoop-2.2.0

4. 修改hadoop参数文件

Vim  core-site.xml

<property>
        <name>fs.default.name</name>
        <value>hdfs://hadoop-host:8020</value>
    </property>

修改hdfs-site.xml

 <property>
    	<name>dfs.namenode.name.dir</name>
        <value>file:/app/cloud/hadoop/dfs/name</value>
    </property>
 
    <property>
    	<name>dfs.datanode.data.dir</name>
	<value>file:/app/cloud/hadoop/dfs/data</value>
    </property> 

    <property>
    	<name>dfs.replication</name>
    	<value>1</value>
    </property>
 
    <property>
    	<name>dfs.permissions</name>
	<value>false</value>
    </property>

修改Mapred-site.xml

<property>
    	<name>mapreduce.framework.name</name>
	<value>yarn</value>
    </property>
 
    <property>
    	<name>mapred.system.dir</name>
	<value>file:/app/cloud/hadoop/mapred/system</value>
    </property>
 
    <property>
    	<name>mapred.local.dir</name>
	<value>file:/app/cloud/hadoop/mapred/local</value>
    </property>

<property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop:10020</value>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop:19888</value>
    </property>
 
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/mr-history/tmp</value>
    </property>
 
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/mr-history/done</value>
    </property>

修改Yarn-site.xml

<property>
    	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
    </property>

如果要配置成集群环境则Yarn-site.xml的配置如下:

<property>
    	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
    </property>

    <property>
    	<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
    	<name>yarn.resourcemanager.address</name>
	<value>hadoop-host:8032</value> 
    </property> 

    <property>
    	<name>yarn.resourcemanager.scheduler.address</name>
	<value>hadoop-host:8030</value>
    </property>
 
    <property>
    	<name>yarn.resourcemanager.resource-tracker.address</name>
	<value>hadoop-host:8031</value>
    </property>
 
    <property>
    	<name>yarn.resourcemanager.admin.address</name>
	<value>hadoop-host:8033</value>
    </property>
 
    <property>
    	<name>yarn.resourcemanager.webapp.address</name>
    	<value>hadoop-host:8088</value>
    </property>

修改slaves文件,放入你所有的datanode的主机名

hadoop-host1
hadoop-host2
...

添加JAVA_HOME环境变量
a) hadoop-env.sh,找到里面的JAVA_HOME,修改为实际路径
b) yarn-env.sh ,同样找到里面的JAVA_HOME,修改为实际路径
例如:

export JAVA_HOME=/usr/java/jdk1.7.0_45

添加hadoop环境变量
编辑/etc/profile文件:
#hadoop variable settings
export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”
增加之后保存
最后两行特殊说明下,有的文章中遗漏掉这部分配置,最后在启动hadoop2.2会出现:《Hadoop2.x安装启动错误》

创建本地目录

mkdir -p /app/hadoop/dfs/name

mkdir -p /app/hadoop/dfs/data

mkdir -p /app/hadoop/mapred/local

mkdir -p /app/hadoop/mapred/system

启动hadoop

格式化namenode

[hadoop@hadoop01 ~]$ hdfs namenode –format

开启dfs守护进程
启动:start-all.sh
停止:stop-all.sh

开启yarn守护进程
启动:start-yarn.sh
停止:stop-yarn.sh

开启historyserver服务
启动:mr-jobhistory-daemon.sh start historyserver
停止:mr-jobhistory-daemon.sh stop historyserver

使用jps查看启动的进程:

[hadoop@ttpod sbin]$ jps
7621 NameNode
11834 Jps
7734 DataNode
7881 SecondaryNameNode
10156 NodeManager
10053 ResourceManager

有以上内容说明已经启动

查看hadoop资源管理页面

http://192.168.6.124:8088/

hadoop2.2.0搭建

查看hdfs界面:http://192.168.6.124:50070

hadoop 2.2.0搭建测试

测试:

使用pi程序

hadoop jar  $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10

hadoop jar hadoop-mapreduce-examples-2.2.0.jar pi 10 10

Number of Maps  = 10Samples per Map = 1013/12/13 16:27:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicableWrote input for Map #0Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Wrote input for Map #5

Wrote input for Map #6

Wrote input for Map #7

Wrote input for Map #8

Wrote input for Map #9

Starting Job

13/12/13 16:27:24 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

13/12/13 16:27:25 INFO input.FileInputFormat: Total input paths to process : 10

13/12/13 16:27:25 INFO mapreduce.JobSubmitter: number of splits:10

13/12/13 16:27:25 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

13/12/13 16:27:25 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name

13/12/13 16:27:25 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

13/12/13 16:27:25 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

13/12/13 16:27:25 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

13/12/13 16:27:25 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

13/12/13 16:27:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1386923206015_0001

13/12/13 16:27:26 INFO impl.YarnClientImpl: Submitted application application_1386923206015_0001 to ResourceManager at /0.0.0.0:8032

13/12/13 16:27:26 INFO mapreduce.Job: The url to track the job: http://ttpod:8088/proxy/application_1386923206015_0001/

13/12/13 16:27:26 INFO mapreduce.Job: Running job: job_1386923206015_0001

13/12/13 16:27:34 INFO mapreduce.Job: Job job_1386923206015_0001 running in uber mode : false

13/12/13 16:27:34 INFO mapreduce.Job:  map 0% reduce 0%

13/12/13 16:27:56 INFO mapreduce.Job:  map 60% reduce 0%

13/12/13 16:28:13 INFO mapreduce.Job:  map 100% reduce 0%

13/12/13 16:28:14 INFO mapreduce.Job:  map 100% reduce 100%

13/12/13 16:28:15 INFO mapreduce.Job: Job job_1386923206015_0001 completed successfully

13/12/13 16:28:16 INFO mapreduce.Job: Counters: 43

File System Counters

FILE: Number of bytes read=226

FILE: Number of bytes written=871752

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=2610

HDFS: Number of bytes written=215

HDFS: Number of read operations=43

HDFS: Number of large read operations=0

HDFS: Number of write operations=3

Job Counters

Launched map tasks=10

Launched reduce tasks=1

Data-local map tasks=10

Total time spent by all maps in occupied slots (ms)=185391

Total time spent by all reduces in occupied slots (ms)=15412

Map-Reduce Framework

Map input records=10

Map output records=20

Map output bytes=180

Map output materialized bytes=280

Input split bytes=1430

Combine input records=0

Combine output records=0

Reduce input groups=2

Reduce shuffle bytes=280

Reduce input records=20

Reduce output records=0

Spilled Records=40

Shuffled Maps =10

Failed Shuffles=0

Merged Map outputs=10

GC time elapsed (ms)=1841

CPU time spent (ms)=7600

Physical memory (bytes) snapshot=2507419648

Virtual memory (bytes) snapshot=9588948992

Total committed heap usage (bytes)=1944584192

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1180

File Output Format Counters

Bytes Written=97

Job Finished in 51.367 seconds

Estimated value of Pi is 3.20000000000000000000

正常.

如出现什么异常请自行去查看运行日志。

转载请注明:极豆技术博客 » Hadoop2 开发环境搭建测试

喜欢 (0)
捐助本站极豆博客全站无广告。如果您觉得本博客的内容对您小有帮助,可以对我小额赞助,您的赞助将用于维持博客运营。

极豆博客

发表我的评论
取消评论
表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
(4)个小伙伴在吐槽
  1. 你好,我按照你这种方式配置的单机版hadoop,但是用jps命令查看结果很让人郁闷啊: hadoop@helianthus-Lenovo-G460:/usr/hadoop/hadoop-2.2.0/sbin$ ./start-dfs.sh Starting namenodes on [hadoop-host] hadoop-host: ssh: Could not resolve hostname hadoop-host: Name or service not known hadoop@localhost's password: localhost: starting datanode, logging to /usr/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-helianthus-Lenovo-G460.out Starting secondary namenodes [0.0.0.0] hadoop@0.0.0.0's password: 0.0.0.0: starting secondarynamenode, logging to /usr/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-secondarynamenode-helianthus-Lenovo-G460.out hadoop@helianthus-Lenovo-G460:/usr/hadoop/hadoop-2.2.0/sbin$ jps 3066 ResourceManager 4166 Jps 3293 NodeManager hadoop@helianthus-Lenovo-G460:/usr/hadoop/hadoop-2.2.0/sbin$ ./start-yarn.sh starting yarn daemons resourcemanager running as process 3066. Stop it first. hadoop@localhost's password: localhost: nodemanager running as process 3293. Stop it first. hadoop@helianthus-Lenovo-G460:/usr/hadoop/hadoop-2.2.0/sbin$ jps 3066 ResourceManager 3293 NodeManager 4395 Jps 拜托你帮忙看看啊 郁闷的很啊。。。
    orchid2014-03-05 21:23 回复
    • hadoop-host: ssh: Could not resolve hostname hadoop-host: Name or service not known。这个需要改成你自己的主机名。 hadoop@localhost’s password: 说明你没有配置ssh无密码登录
      admin2014-03-14 16:23 回复
  2. 谢谢诶!感激。。。。。。。
    orchid2014-03-05 21:25 回复
  3. 谢谢诶。。。
    orchid2014-03-05 21:28 回复