*Hdoop Install [#k24a6bff]

RIGHT:更新日&lastmod();  

**事前準備 [#mf6c8ba4]

***Hadoop用ユーザhadoopを作成 [#s4b6b8f1]

# /usr/sbin/adduser hadoop
# passwd hadoop

***JDKのインストール [#u92f98ca]

 # chmod u+x jdk-6uxx-linux-i586-rpm.bin
 # ./jdk-6uxx-linux-i586-rpm.bin


hadoopでlogin

***sshの設定 [#u824fd52]

localhostにsshでパスワード不要でloginするための設定

 $ mkdir .ssh
 $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
 $ cat .ssh/id_dsa.pub >> .ssh/authorized_keys
 $ chmod 700 .ssh
 $ cd .ssh/
 $ chmod 611 authorized_keys
 $ chmod 600 id_dsa

javaのパスを指定

.bash_profile

 export JAVA_HOME=/usr/java/default

**hadoopのインストール [#gfffab05]

[[ここ:http://ftp.riken.jp/net/apache/hadoop/core/hadoop-0.20.2/]]からDownloadして、展開し、オーナーをhadoopに変更するだけ

 # cd /opt
 # tar zxvf hadoop-0.20.2.tar.gz
 # cd hadoop-0.20.2/conf
 # chown -R hadoop.hadoop hadoop-0.20.2

***javaのパスを指定 [#gada097a]

hadoop-env.sh

 # The java implementation to use.  Required.
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
 export JAVA_HOME=/usr/java/default

***動作確認 [#u90510ca]

 $ bin/hadoop jar hadoop-0.20.2-examples.jar pi 1 1000
 Number of Maps  = 1
 Samples per Map = 1000
 Wrote input for Map #0
 Starting Job
 10/07/15 11:09:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
 10/07/15 11:09:23 INFO mapred.FileInputFormat: Total input paths to process : 1 
 
(略)
 
 10/07/15 11:09:24 INFO mapred.JobClient:     Combine input records=0
 10/07/15 11:09:24 INFO mapred.JobClient:     Map output records=2
 10/07/15 11:09:24 INFO mapred.JobClient:     Reduce input records=2
 Job Finished in 1.213 seconds
 Estimated value of Pi is 3.14800000000000000000


**擬似クラスタモード [#a2669533]

1台のサーバでnamenode,datanodeを兼用して使用する。

 conf/core-site.xml

 <configuration>
 
   <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:9000</value>
   </property>
 
 </configuration>


 conf/hdfs-site.xml

 <configuration>
 
   <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
 
 </configuration>


conf/mapred-site.xml

 <configuration>
 
   <property>
     <name>mapred.job.tracker</name>
     <value>localhost:9001</value>
   </property>
 
 </configuration>

***namenodeのフォーマット [#ue212a47]

 $ bin/hadoop namenode -format
 10/07/15 11:15:12 INFO namenode.NameNode: STARTUP_MSG:
 /************************************************************
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = d5270-47.mie-chukyo-u.ac.jp/127.0.0.1
 STARTUP_MSG:   args = [-format]
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 ************************************************************/
 10/07/15 11:15:12 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
 10/07/15 11:15:12 INFO namenode.FSNamesystem: supergroup=supergroup
 10/07/15 11:15:12 INFO namenode.FSNamesystem: isPermissionEnabled=true
 10/07/15 11:15:12 INFO common.Storage: Image file of size 96 saved in 0 seconds.
 10/07/15 11:15:12 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted.
 10/07/15 11:15:12 INFO namenode.NameNode: SHUTDOWN_MSG:
 /************************************************************
 SHUTDOWN_MSG: Shutting down NameNode at d5270-47.mie-chukyo-u.ac.jp/127.0.0.1
 ************************************************************/

***hadoopの起動 [#e43a8c30]

 $ bin/start-all.sh
 starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out
 localhost: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-47.mie-chukyo-u.ac.jp.out
 localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out
 starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jo btracker-d5270-47.mie-chukyo-u.ac.jp.out
 localhost: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-47.mie-chukyo-u.ac.jp.out

***確認 [#sd9f6faa]

http://localhost:50070/dfshealth.jspにアクセスしてLive Nodeが1になっていことを確認

&ref("./hadoop1.png);


http://10.3.4.47:50030/jobtracker.jspにアクセスしてNodesが1になっていことを確認
&ref("./hadoop1.png);

*** πを計算してみる [#abbf9789]

 $ bin/hadoop jar hadoop-0.20.2-examples.jar pi 10 1000
 Number of Maps  = 10
 Samples per Map = 1000
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Starting Job
 10/07/15 11:55:55 INFO mapred.FileInputFormat: Total input paths to process : 10
 10/07/15 11:55:56 INFO mapred.JobClient: Running job: job_201007151141_0003
 10/07/15 11:55:57 INFO mapred.JobClient:  map 0% reduce 0%
 10/07/15 11:56:05 INFO mapred.JobClient:  map 20% reduce 0%
 10/07/15 11:56:08 INFO mapred.JobClient:  map 40% reduce 0%
 10/07/15 11:56:11 INFO mapred.JobClient:  map 60% reduce 0%
 10/07/15 11:56:14 INFO mapred.JobClient:  map 80% reduce 20%
 10/07/15 11:56:17 INFO mapred.JobClient:  map 100% reduce 20%
 10/07/15 11:56:23 INFO mapred.JobClient:  map 100% reduce 26%
 10/07/15 11:56:29 INFO mapred.JobClient:  map 100% reduce 100%
 10/07/15 11:56:31 INFO mapred.JobClient: Job complete: job_201007151141_0003
 10/07/15 11:56:31 INFO mapred.JobClient: Counters: 18
 10/07/15 11:56:31 INFO mapred.JobClient:   Job Counters
 10/07/15 11:56:31 INFO mapred.JobClient:     Launched reduce tasks=1
 10/07/15 11:56:31 INFO mapred.JobClient:     Launched map tasks=10
 10/07/15 11:56:31 INFO mapred.JobClient:     Data-local map tasks=10
 10/07/15 11:56:31 INFO mapred.JobClient:   FileSystemCounters
 10/07/15 11:56:31 INFO mapred.JobClient:     FILE_BYTES_READ=226
 10/07/15 11:56:31 INFO mapred.JobClient:     HDFS_BYTES_READ=1180
 10/07/15 11:56:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=826
 10/07/15 11:56:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
 10/07/15 11:56:31 INFO mapred.JobClient:   Map-Reduce Framework
 10/07/15 11:56:31 INFO mapred.JobClient:     Reduce input groups=20
 10/07/15 11:56:31 INFO mapred.JobClient:     Combine output records=0
 10/07/15 11:56:31 INFO mapred.JobClient:     Map input records=10
 10/07/15 11:56:31 INFO mapred.JobClient:     Reduce shuffle bytes=280
 10/07/15 11:56:31 INFO mapred.JobClient:     Reduce output records=0
 10/07/15 11:56:31 INFO mapred.JobClient:     Spilled Records=40
 10/07/15 11:56:31 INFO mapred.JobClient:     Map output bytes=180
 10/07/15 11:56:31 INFO mapred.JobClient:     Map input bytes=240
 10/07/15 11:56:31 INFO mapred.JobClient:     Combine input records=0
 10/07/15 11:56:31 INFO mapred.JobClient:     Map output records=20
 10/07/15 11:56:31 INFO mapred.JobClient:     Reduce input records=20
 Job Finished in 35.405 seconds
 Estimated value of Pi is 3.14080000000000000000


***hadoopの停止 [#v25e36f0]

 $ bin/stop-all.sh
 stopping jobtracker
 localhost: stopping tasktracker
 stopping namenode
 localhost: stopping datanode
 localhost: stopping secondarynamenode



**完全分散モード [#g9f7e4c6]

hadoopをnamenode(d5270-53)1台,datanode(d5270-47,d5270-49)2台で動作させる

&color(red){以下のように全てマシンにnamenode,datanodeがわかるようにhostsに記入しておく必要がある。
特に、127.0.0.1のところが以下のようになっていると、localhostの指定でしか動作しなかったため、datanodeが別マシンを指定してもうまく動作しなかった};

 127.0.0.1       d5270-53.mie-chukyo-u.ac.jp     localhost.localdomain   localhost

/etc/hosts(マシン名:d5270-53)

 # Do not remove the following line, or various programs
 # that require network functionality will fail.
 127.0.0.1       localhost.localdomain   localhost
 10.3.4.47       d5270-47.mie-chukyo-u.ac.jp     d5270-47
 10.3.4.49       d5270-49.mie-chukyo-u.ac.jp     d5270-49


各マシンがsshでパスワード不要でloginできるように以下のファイルに各マシンのパスワード不要で作成したpubキーを追加しておく。

.ssh/authorized_keys

 ssh-dss AAAAB3NzaC1kc3MAAACBAPh+QR0K1tfrutT5PWQ6EqSYyzjJn5Yc30G+76sqV1tBvl33m9SEaI1mYM86Qc
 enSvNb23zv/KQ05+hvxVsVP+wMOgQ6roag99JpXX3v9D/6pe8eXGeU7wpowxZ9dcXtAGU8HBZ8pB3qT5Tl2d4sGuxo
 5yZm7YP1xAAyUT4sUcsLAAAAFQDIGD/XcyorEP0bjt+A2WxvD53HXwAAAIBRnu06GvAKhQUd890bja87MtrJrfxfpO
 ezEagl593s+whTEncrzL5hNJmlxVg0c073jezKOCfK1akU1RU8ahqKC1Y7EDHOgN/W07p2uft2ZPA42+kB7b7XNPmZ
 0QFXfjO06px0NT2fnUtLiLurshjtYhmWK5yW9FoGtj03F4tf/QAAAIEAjyHfNVpHXX9U1fwRjPIIBYvBbKdSeEFQMa
 sx9DGN5D27otHiqF7EMPNGckMWWpiICw6lpM5BX8Lk+jAkb71brYk5A365nsjOf9ZQ5uJLunOXUmrO0WGRkaR2EynP
 eD5gjTv9NacTRVcTmFjEGh2zlRVsZef0eZXx4Tw4r6aeDGE= hadoop@d5270-49.mie-chukyo-u.ac.jp
 ssh-dss AAAAB3NzaC1kc3MAAACBAKCBtQBXC/N6lT3hxtvU8S+CovsgpwiFq767XBAM6gNVOwx7XhXKpiaOdIMC+G
 ecFr574+V7LkkGqhtb7JmqjfXOa3AgvnWFYzEi7o+ICL8ET9nrVJm8ZJSG9FwXDGigIFyUFH8v9f62+jwoaRtJPVaR
 MAq/Um1Y+IqLQhED7A4TAAAAFQDTNEwr9Ai+FJsjIAJ0jf9ydYZ7eQAAAIBXsL8DNtsIPaaAI0CeBu5dk3dxKRP6yK
 oFhZpO1dyh+rCxH5V/G3veFyon2SOZGvaxT7i7NwvUoaM7UXQHpqJDTC5YLWoQM/4PtprtH2ACuBxe4f7wu+fosOi9
 OtXdQxt+BY9USyI6W1nzMXf1freAQR+0GD0dUin/YTYpin8t0gAAAIEAnza3qTNyx4X4lnntVsgmczAdJhSJufvsJO
 cR2ntamkH4PJ5GrgbF2IlSqKnUBJQ12HAfZY0Q7Zjf6KKyx1pt4EyVINVSRu0BSQoPu7amvh57Z0GQwRRKWyolWNUr
 CFSqOmqJiUj/5Ee4dFBFkZPPw9lGfldpPq5JAgMQXD6/Ht8= hadoop@d5270-47.mie-chukyo-u.ac.jp


***NameNode,DataNode共通設定 [#hdef5683]

core-site.xml

 <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
 <!-- Put site-specific property overrides in this file. -->
 
 <configuration>
 
   <property>
     <name>fs.default.name</name>
     <value>hdfs://d5270-53:9000</value>  <==namenodeのマシンを指定
   </property>
 
 </configuration>


mapred-site.xml

 <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
 <!-- Put site-specific property overrides in this file. -->
 
 <configuration>
 
   <property>
     <name>mapred.job.tracker</name>
     <value>d5270-53:9001</value>  <==namenodeのマシンを指定
   </property>
 
 </configuration>


hdfs-site.xml

 <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
 <!-- Put site-specific property overrides in this file. --> 
 
 <configuration>
 
   <property>
     <name>dfs.replication</name>
     <value>1</value>
   </property>
 
 </configuration>


***NameNodeの設定 [#h7f25360]

DataNodeになるマシンを設定

slaves

 d5270-47
 d5270-49

masters	にはNameNodeのセカンダリになるマシンを設定するが今回は同一マシンのためlocalhostでOK、またDataNodeのマシンもslave,masterともlocalhostでOK


**擬似クラスタモードで動作していた環境を削除 [#xf70bc8e]

 $ rm -rf /tmp/*

**フォーマット [#x217959a]

 $ ../bin/hadoop namenode -format
 10/07/20 13:41:59 INFO namenode.NameNode: STARTUP_MSG:
 /************************************************************
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = d5270-53.mie-chukyo-u.ac.jp/10.3.4.53   <=ここが127.0.0.1から変更されるのを確認する
 STARTUP_MSG:   args = [-format]
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 ************************************************************/
 10/07/20 13:42:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
 10/07/20 13:42:00 INFO namenode.FSNamesystem: supergroup=supergroup
 10/07/20 13:42:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
 10/07/20 13:42:00 INFO common.Storage: Image file of size 96 saved in 0 seconds.
 10/07/20 13:42:00 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted.
 10/07/20 13:42:00 INFO namenode.NameNode: SHUTDOWN_MSG:
 /************************************************************
 SHUTDOWN_MSG: Shutting down NameNode at d5270-53.mie-chukyo-u.ac.jp/10.3.4.53
 ************************************************************/


***hadoopの起動 [#nad7db8f]

$ bin/start-dfs.sh
starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out
d5270-49.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-49.mie-chukyo-u.ac.jp.out
d5270-53.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-53.mie-chukyo-u.ac.jp.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out

***起動確認 [#paba86b9]

''NameNode''での確認

$ /usr/java/jdk1.6.0_21/bin/jps
16889 NameNode
17035 SecondaryNameNode
17096 Jps

またはhttp://10.3.4.57:50070/dfshealth.jsp (namenodeにアクセスして)LiveNodeが2(2台のDataNode)を確認する。図は1台がアクティブでないことを示している

&ref("./hadoop4.png");

''DataNode''での確認

$ /usr/java/jdk1.6.0_21/bin/jps
3285 DataNode
3333 Jps

***datanodeが起動しないとき [#kdf4b57f]

 $ rm -rf /tmp/*
 $ hadoop namenode -format

を行い再度実行 /etc/hostsの記述に注意する



***Hadoop Map/Reduceの起動: [#q3bd89a6]

NameNodeで

 $ bin/start-mapred.sh
 starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-d5270-47.mie-chukyo-u.ac.jp.out
 d5270-53.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-53.mie-chukyo-u.ac.jp.out
 d5270-49.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-49.mie-chukyo-u.ac.jp.out

***確認 [#d42748ea]

NameNodeで

 $ /usr/java/jdk1.6.0_21/bin/jps
 16889 NameNode
 17035 SecondaryNameNode
 17221 Jps
 17139 JobTracker


DataNodeで

 $ /usr/java/jdk1.6.0_21/bin/jps
 3285 DataNode
 3415 Jps
 3362 TaskTracker


 hadoop.tmp.dirデフォルトでは、/tmp/hadoop-${user.name} になる。

namenodeで

/tmp/hadoop-hadoop/dfs/name/current/VERSION

 #Wed Jul 21 12:57:10 JST 2010
 namespaceID=802168764
 cTime=0
 storageType=NAME_NODE
 layoutVersion=-18

datanodeで

/tmp/hadoop-hadoop/dfs/data/current/VERSION

 #Wed Jul 21 12:52:14 JST 2010
 namespaceID=802168764
 storageID=DS-579881093-10.3.4.49-50010-1279684334128
 cTime=0
 storageType=DATA_NODE
 layoutVersion=-18

それぞれ、namespaceIDは全て同一で、storageIDは全てユニークである必要があるようでだ

***参考 [#r04c5565]

SD 2010年5月号

http://metasearch.sourceforge.jp/wiki/index.php?Hadoop



トップ   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS