- 追加された行はこの色です。
- 削除された行はこの色です。
*Hdoop Install [#k24a6bff]
RIGHT:更新日&lastmod();
**事前準備 [#mf6c8ba4]
***Hadoop用ユーザhadoopを作成 [#s4b6b8f1]
# /usr/sbin/adduser hadoop
# passwd hadoop
***JDKのインストール [#u92f98ca]
# chmod u+x jdk-6uxx-linux-i586-rpm.bin
# ./jdk-6uxx-linux-i586-rpm.bin
hadoopでlogin
***sshの設定 [#u824fd52]
localhostにsshでパスワード不要でloginするための設定
$ mkdir .ssh
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat .ssh/id_dsa.pub >> .ssh/authorized_keys
$ chmod 700 .ssh
$ cd .ssh/
$ chmod 611 authorized_keys
$ chmod 600 id_dsa
javaのパスを指定
.bash_profile
export JAVA_HOME=/usr/java/default
**hadoopのインストール [#gfffab05]
[[ここ:http://ftp.riken.jp/net/apache/hadoop/core/hadoop-0.20.2/]]からDownloadして、展開し、オーナーをhadoopに変更するだけ
# cd /opt
# tar zxvf hadoop-0.20.2.tar.gz
# cd hadoop-0.20.2/conf
# chown -R hadoop.hadoop hadoop-0.20.2
***javaのパスを指定 [#gada097a]
hadoop-env.sh
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/java/default
***動作確認 [#u90510ca]
$ bin/hadoop jar hadoop-0.20.2-examples.jar pi 1 1000
Number of Maps = 1
Samples per Map = 1000
Wrote input for Map #0
Starting Job
10/07/15 11:09:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/07/15 11:09:23 INFO mapred.FileInputFormat: Total input paths to process : 1
(略)
10/07/15 11:09:24 INFO mapred.JobClient: Combine input records=0
10/07/15 11:09:24 INFO mapred.JobClient: Map output records=2
10/07/15 11:09:24 INFO mapred.JobClient: Reduce input records=2
Job Finished in 1.213 seconds
Estimated value of Pi is 3.14800000000000000000
**擬似クラスタモード [#a2669533]
1台のサーバでnamenode,datanodeを兼用して使用する。
conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
***namenodeのフォーマット [#ue212a47]
$ bin/hadoop namenode -format
10/07/15 11:15:12 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = d5270-47.mie-chukyo-u.ac.jp/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/15 11:15:12 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/07/15 11:15:12 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/15 11:15:12 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/15 11:15:12 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/07/15 11:15:12 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted.
10/07/15 11:15:12 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at d5270-47.mie-chukyo-u.ac.jp/127.0.0.1
************************************************************/
***hadoopの起動 [#e43a8c30]
$ bin/start-all.sh
starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out
localhost: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-47.mie-chukyo-u.ac.jp.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out
starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jo btracker-d5270-47.mie-chukyo-u.ac.jp.out
localhost: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-47.mie-chukyo-u.ac.jp.out
***確認 [#sd9f6faa]
http://localhost:50070/dfshealth.jspにアクセスしてLive Nodeが1になっていことを確認
&ref("./hadoop1.png);
http://10.3.4.47:50030/jobtracker.jspにアクセスしてNodesが1になっていことを確認
&ref("./hadoop1.png);
*** πを計算してみる [#abbf9789]
$ bin/hadoop jar hadoop-0.20.2-examples.jar pi 10 1000
Number of Maps = 10
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
10/07/15 11:55:55 INFO mapred.FileInputFormat: Total input paths to process : 10
10/07/15 11:55:56 INFO mapred.JobClient: Running job: job_201007151141_0003
10/07/15 11:55:57 INFO mapred.JobClient: map 0% reduce 0%
10/07/15 11:56:05 INFO mapred.JobClient: map 20% reduce 0%
10/07/15 11:56:08 INFO mapred.JobClient: map 40% reduce 0%
10/07/15 11:56:11 INFO mapred.JobClient: map 60% reduce 0%
10/07/15 11:56:14 INFO mapred.JobClient: map 80% reduce 20%
10/07/15 11:56:17 INFO mapred.JobClient: map 100% reduce 20%
10/07/15 11:56:23 INFO mapred.JobClient: map 100% reduce 26%
10/07/15 11:56:29 INFO mapred.JobClient: map 100% reduce 100%
10/07/15 11:56:31 INFO mapred.JobClient: Job complete: job_201007151141_0003
10/07/15 11:56:31 INFO mapred.JobClient: Counters: 18
10/07/15 11:56:31 INFO mapred.JobClient: Job Counters
10/07/15 11:56:31 INFO mapred.JobClient: Launched reduce tasks=1
10/07/15 11:56:31 INFO mapred.JobClient: Launched map tasks=10
10/07/15 11:56:31 INFO mapred.JobClient: Data-local map tasks=10
10/07/15 11:56:31 INFO mapred.JobClient: FileSystemCounters
10/07/15 11:56:31 INFO mapred.JobClient: FILE_BYTES_READ=226
10/07/15 11:56:31 INFO mapred.JobClient: HDFS_BYTES_READ=1180
10/07/15 11:56:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=826
10/07/15 11:56:31 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
10/07/15 11:56:31 INFO mapred.JobClient: Map-Reduce Framework
10/07/15 11:56:31 INFO mapred.JobClient: Reduce input groups=20
10/07/15 11:56:31 INFO mapred.JobClient: Combine output records=0
10/07/15 11:56:31 INFO mapred.JobClient: Map input records=10
10/07/15 11:56:31 INFO mapred.JobClient: Reduce shuffle bytes=280
10/07/15 11:56:31 INFO mapred.JobClient: Reduce output records=0
10/07/15 11:56:31 INFO mapred.JobClient: Spilled Records=40
10/07/15 11:56:31 INFO mapred.JobClient: Map output bytes=180
10/07/15 11:56:31 INFO mapred.JobClient: Map input bytes=240
10/07/15 11:56:31 INFO mapred.JobClient: Combine input records=0
10/07/15 11:56:31 INFO mapred.JobClient: Map output records=20
10/07/15 11:56:31 INFO mapred.JobClient: Reduce input records=20
Job Finished in 35.405 seconds
Estimated value of Pi is 3.14080000000000000000
***hadoopの停止 [#v25e36f0]
$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
**完全分散モード [#g9f7e4c6]
hadoopをnamenode(d5270-53)1台,datanode(d5270-47,d5270-49)2台で動作させる
&color(red){以下のように全てマシンにnamenode,datanodeがわかるようにhostsに記入しておく必要がある。
特に、127.0.0.1のところが以下のようになっていると、localhostの指定でしか動作しなかったため、datanodeが別マシンを指定してもうまく動作しなかった};
127.0.0.1 d5270-53.mie-chukyo-u.ac.jp localhost.localdomain localhost
/etc/hosts(マシン名:d5270-53)
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
10.3.4.47 d5270-47.mie-chukyo-u.ac.jp d5270-47
10.3.4.49 d5270-49.mie-chukyo-u.ac.jp d5270-49
各マシンがsshでパスワード不要でloginできるように以下のファイルに各マシンのパスワード不要で作成したpubキーを追加しておく。
.ssh/authorized_keys
ssh-dss AAAAB3NzaC1kc3MAAACBAPh+QR0K1tfrutT5PWQ6EqSYyzjJn5Yc30G+76sqV1tBvl33m9SEaI1mYM86Qc
enSvNb23zv/KQ05+hvxVsVP+wMOgQ6roag99JpXX3v9D/6pe8eXGeU7wpowxZ9dcXtAGU8HBZ8pB3qT5Tl2d4sGuxo
5yZm7YP1xAAyUT4sUcsLAAAAFQDIGD/XcyorEP0bjt+A2WxvD53HXwAAAIBRnu06GvAKhQUd890bja87MtrJrfxfpO
ezEagl593s+whTEncrzL5hNJmlxVg0c073jezKOCfK1akU1RU8ahqKC1Y7EDHOgN/W07p2uft2ZPA42+kB7b7XNPmZ
0QFXfjO06px0NT2fnUtLiLurshjtYhmWK5yW9FoGtj03F4tf/QAAAIEAjyHfNVpHXX9U1fwRjPIIBYvBbKdSeEFQMa
sx9DGN5D27otHiqF7EMPNGckMWWpiICw6lpM5BX8Lk+jAkb71brYk5A365nsjOf9ZQ5uJLunOXUmrO0WGRkaR2EynP
eD5gjTv9NacTRVcTmFjEGh2zlRVsZef0eZXx4Tw4r6aeDGE= hadoop@d5270-49.mie-chukyo-u.ac.jp
ssh-dss AAAAB3NzaC1kc3MAAACBAKCBtQBXC/N6lT3hxtvU8S+CovsgpwiFq767XBAM6gNVOwx7XhXKpiaOdIMC+G
ecFr574+V7LkkGqhtb7JmqjfXOa3AgvnWFYzEi7o+ICL8ET9nrVJm8ZJSG9FwXDGigIFyUFH8v9f62+jwoaRtJPVaR
MAq/Um1Y+IqLQhED7A4TAAAAFQDTNEwr9Ai+FJsjIAJ0jf9ydYZ7eQAAAIBXsL8DNtsIPaaAI0CeBu5dk3dxKRP6yK
oFhZpO1dyh+rCxH5V/G3veFyon2SOZGvaxT7i7NwvUoaM7UXQHpqJDTC5YLWoQM/4PtprtH2ACuBxe4f7wu+fosOi9
OtXdQxt+BY9USyI6W1nzMXf1freAQR+0GD0dUin/YTYpin8t0gAAAIEAnza3qTNyx4X4lnntVsgmczAdJhSJufvsJO
cR2ntamkH4PJ5GrgbF2IlSqKnUBJQ12HAfZY0Q7Zjf6KKyx1pt4EyVINVSRu0BSQoPu7amvh57Z0GQwRRKWyolWNUr
CFSqOmqJiUj/5Ee4dFBFkZPPw9lGfldpPq5JAgMQXD6/Ht8= hadoop@d5270-47.mie-chukyo-u.ac.jp
***NameNode,DataNode共通設定 [#hdef5683]
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://d5270-53:9000</value> <==namenodeのマシンを指定
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>d5270-53:9001</value> <==namenodeのマシンを指定
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
***NameNodeの設定 [#h7f25360]
DataNodeになるマシンを設定
slaves
d5270-47
d5270-49
masters にはNameNodeのセカンダリになるマシンを設定するが今回は同一マシンのためlocalhostでOK、またDataNodeのマシンもslave,masterともlocalhostでOK
**擬似クラスタモードで動作していた環境を削除 [#xf70bc8e]
$ rm -rf /tmp/*
**フォーマット [#x217959a]
$ ../bin/hadoop namenode -format
10/07/20 13:41:59 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = d5270-53.mie-chukyo-u.ac.jp/10.3.4.53 <=ここが127.0.0.1から変更されるのを確認する
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/20 13:42:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/07/20 13:42:00 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/20 13:42:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/20 13:42:00 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/07/20 13:42:00 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted.
10/07/20 13:42:00 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at d5270-53.mie-chukyo-u.ac.jp/10.3.4.53
************************************************************/
***hadoopの起動 [#nad7db8f]
$ bin/start-dfs.sh
starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out
d5270-49.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-49.mie-chukyo-u.ac.jp.out
d5270-53.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-53.mie-chukyo-u.ac.jp.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out
***起動確認 [#paba86b9]
''NameNode''での確認
$ /usr/java/jdk1.6.0_21/bin/jps
16889 NameNode
17035 SecondaryNameNode
17096 Jps
またはhttp://10.3.4.57:50070/dfshealth.jsp (namenodeにアクセスして)LiveNodeが2(2台のDataNode)を確認する。図は1台がアクティブでないことを示している
&ref("./hadoop4.png");
''DataNode''での確認
$ /usr/java/jdk1.6.0_21/bin/jps
3285 DataNode
3333 Jps
***datanodeが起動しないとき [#kdf4b57f]
$ rm -rf /tmp/*
$ hadoop namenode -format
を行い再度実行 /etc/hostsの記述に注意する
***Hadoop Map/Reduceの起動: [#q3bd89a6]
NameNodeで
$ bin/start-mapred.sh
starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-d5270-47.mie-chukyo-u.ac.jp.out
d5270-53.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-53.mie-chukyo-u.ac.jp.out
d5270-49.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-49.mie-chukyo-u.ac.jp.out
***確認 [#d42748ea]
NameNodeで
$ /usr/java/jdk1.6.0_21/bin/jps
16889 NameNode
17035 SecondaryNameNode
17221 Jps
17139 JobTracker
DataNodeで
$ /usr/java/jdk1.6.0_21/bin/jps
3285 DataNode
3415 Jps
3362 TaskTracker
hadoop.tmp.dirデフォルトでは、/tmp/hadoop-${user.name} になる。
namenodeで
/tmp/hadoop-hadoop/dfs/name/current/VERSION
#Wed Jul 21 12:57:10 JST 2010
namespaceID=802168764
cTime=0
storageType=NAME_NODE
layoutVersion=-18
datanodeで
/tmp/hadoop-hadoop/dfs/data/current/VERSION
#Wed Jul 21 12:52:14 JST 2010
namespaceID=802168764
storageID=DS-579881093-10.3.4.49-50010-1279684334128
cTime=0
storageType=DATA_NODE
layoutVersion=-18
それぞれ、namespaceIDは全て同一で、storageIDは全てユニークである必要があるようでだ
***参考 [#r04c5565]
SD 2010年5月号
http://metasearch.sourceforge.jp/wiki/index.php?Hadoop