*Hdoop Install [#k24a6bff] RIGHT:更新日&lastmod(); **事前準備 [#mf6c8ba4] ***Hadoop用ユーザhadoopを作成 [#s4b6b8f1] # /usr/sbin/adduser hadoop # passwd hadoop ***JDKのインストール [#u92f98ca] # chmod u+x jdk-6uxx-linux-i586-rpm.bin # ./jdk-6uxx-linux-i586-rpm.bin hadoopでlogin ***sshの設定 [#u824fd52] localhostにsshでパスワード不要でloginするための設定 $ mkdir .ssh $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat .ssh/id_dsa.pub >> .ssh/authorized_keys $ chmod 700 .ssh $ cd .ssh/ $ chmod 611 authorized_keys $ chmod 600 id_dsa javaのパスを指定 .bash_profile export JAVA_HOME=/usr/java/default **hadoopのインストール [#gfffab05] [[ここ:http://ftp.riken.jp/net/apache/hadoop/core/hadoop-0.20.2/]]からDownloadして、展開し、オーナーをhadoopに変更するだけ # cd /opt # tar zxvf hadoop-0.20.2.tar.gz # cd hadoop-0.20.2/conf # chown -R hadoop.hadoop hadoop-0.20.2 ***javaのパスを指定 [#gada097a] hadoop-env.sh # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/java/default ***動作確認 [#u90510ca] $ bin/hadoop jar hadoop-0.20.2-examples.jar pi 1 1000 Number of Maps = 1 Samples per Map = 1000 Wrote input for Map #0 Starting Job 10/07/15 11:09:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 10/07/15 11:09:23 INFO mapred.FileInputFormat: Total input paths to process : 1 (略) 10/07/15 11:09:24 INFO mapred.JobClient: Combine input records=0 10/07/15 11:09:24 INFO mapred.JobClient: Map output records=2 10/07/15 11:09:24 INFO mapred.JobClient: Reduce input records=2 Job Finished in 1.213 seconds Estimated value of Pi is 3.14800000000000000000 **擬似クラスタモード [#a2669533] 1台のサーバでnamenode,datanodeを兼用して使用する。 conf/core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> conf/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> conf/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration> ***namenodeのフォーマット [#ue212a47] $ bin/hadoop namenode -format 10/07/15 11:15:12 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = d5270-47.mie-chukyo-u.ac.jp/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 10/07/15 11:15:12 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop 10/07/15 11:15:12 INFO namenode.FSNamesystem: supergroup=supergroup 10/07/15 11:15:12 INFO namenode.FSNamesystem: isPermissionEnabled=true 10/07/15 11:15:12 INFO common.Storage: Image file of size 96 saved in 0 seconds. 10/07/15 11:15:12 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted. 10/07/15 11:15:12 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at d5270-47.mie-chukyo-u.ac.jp/127.0.0.1 ************************************************************/ ***hadoopの起動 [#e43a8c30] $ bin/start-all.sh starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out localhost: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-47.mie-chukyo-u.ac.jp.out localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jo btracker-d5270-47.mie-chukyo-u.ac.jp.out localhost: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-47.mie-chukyo-u.ac.jp.out ***確認 [#sd9f6faa] http://localhost:50070/dfshealth.jspにアクセスしてLive Nodeが1になっていことを確認 &ref("./hadoop1.png"); http://10.3.4.47:50030/jobtracker.jspにアクセスしてNodesが1になっていことを確認 &ref("./hadoop2.png"); *** πを計算してみる [#abbf9789] $ bin/hadoop jar hadoop-0.20.2-examples.jar pi 10 1000 Number of Maps = 10 Samples per Map = 1000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 10/07/15 11:55:55 INFO mapred.FileInputFormat: Total input paths to process : 10 10/07/15 11:55:56 INFO mapred.JobClient: Running job: job_201007151141_0003 10/07/15 11:55:57 INFO mapred.JobClient: map 0% reduce 0% 10/07/15 11:56:05 INFO mapred.JobClient: map 20% reduce 0% 10/07/15 11:56:08 INFO mapred.JobClient: map 40% reduce 0% 10/07/15 11:56:11 INFO mapred.JobClient: map 60% reduce 0% 10/07/15 11:56:14 INFO mapred.JobClient: map 80% reduce 20% 10/07/15 11:56:17 INFO mapred.JobClient: map 100% reduce 20% 10/07/15 11:56:23 INFO mapred.JobClient: map 100% reduce 26% 10/07/15 11:56:29 INFO mapred.JobClient: map 100% reduce 100% 10/07/15 11:56:31 INFO mapred.JobClient: Job complete: job_201007151141_0003 10/07/15 11:56:31 INFO mapred.JobClient: Counters: 18 10/07/15 11:56:31 INFO mapred.JobClient: Job Counters 10/07/15 11:56:31 INFO mapred.JobClient: Launched reduce tasks=1 10/07/15 11:56:31 INFO mapred.JobClient: Launched map tasks=10 10/07/15 11:56:31 INFO mapred.JobClient: Data-local map tasks=10 10/07/15 11:56:31 INFO mapred.JobClient: FileSystemCounters 10/07/15 11:56:31 INFO mapred.JobClient: FILE_BYTES_READ=226 10/07/15 11:56:31 INFO mapred.JobClient: HDFS_BYTES_READ=1180 10/07/15 11:56:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=826 10/07/15 11:56:31 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215 10/07/15 11:56:31 INFO mapred.JobClient: Map-Reduce Framework 10/07/15 11:56:31 INFO mapred.JobClient: Reduce input groups=20 10/07/15 11:56:31 INFO mapred.JobClient: Combine output records=0 10/07/15 11:56:31 INFO mapred.JobClient: Map input records=10 10/07/15 11:56:31 INFO mapred.JobClient: Reduce shuffle bytes=280 10/07/15 11:56:31 INFO mapred.JobClient: Reduce output records=0 10/07/15 11:56:31 INFO mapred.JobClient: Spilled Records=40 10/07/15 11:56:31 INFO mapred.JobClient: Map output bytes=180 10/07/15 11:56:31 INFO mapred.JobClient: Map input bytes=240 10/07/15 11:56:31 INFO mapred.JobClient: Combine input records=0 10/07/15 11:56:31 INFO mapred.JobClient: Map output records=20 10/07/15 11:56:31 INFO mapred.JobClient: Reduce input records=20 Job Finished in 35.405 seconds Estimated value of Pi is 3.14080000000000000000 ***hadoopの停止 [#v25e36f0] $ bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode **完全分散モード [#g9f7e4c6] hadoopをnamenode(d5270-53)1台,datanode(d5270-47,d5270-49)2台で動作させる &color(red){以下のように全てマシンにnamenode,datanodeがわかるようにhostsに記入しておく必要がある。特に、127.0.0.1のところが以下のようになっていると、localhostの指定でしか動作しなかったため、datanodeが別マシンを指定してもうまく動作しなかった}; 127.0.0.1 d5270-53.mie-chukyo-u.ac.jp localhost.localdomain localhost /etc/hosts(マシン名:d5270-53) # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 10.3.4.47 d5270-47.mie-chukyo-u.ac.jp d5270-47 10.3.4.49 d5270-49.mie-chukyo-u.ac.jp d5270-49 各マシンがsshでパスワード不要でloginできるように以下のファイルに各マシンのパスワード不要で作成したpubキーを追加しておく。 .ssh/authorized_keys ssh-dss AAAAB3NzaC1kc3MAAACBAPh+QR0K1tfrutT5PWQ6EqSYyzjJn5Yc30G+76sqV1tBvl33m9SEaI1mYM86Qc enSvNb23zv/KQ05+hvxVsVP+wMOgQ6roag99JpXX3v9D/6pe8eXGeU7wpowxZ9dcXtAGU8HBZ8pB3qT5Tl2d4sGuxo 5yZm7YP1xAAyUT4sUcsLAAAAFQDIGD/XcyorEP0bjt+A2WxvD53HXwAAAIBRnu06GvAKhQUd890bja87MtrJrfxfpO ezEagl593s+whTEncrzL5hNJmlxVg0c073jezKOCfK1akU1RU8ahqKC1Y7EDHOgN/W07p2uft2ZPA42+kB7b7XNPmZ 0QFXfjO06px0NT2fnUtLiLurshjtYhmWK5yW9FoGtj03F4tf/QAAAIEAjyHfNVpHXX9U1fwRjPIIBYvBbKdSeEFQMa sx9DGN5D27otHiqF7EMPNGckMWWpiICw6lpM5BX8Lk+jAkb71brYk5A365nsjOf9ZQ5uJLunOXUmrO0WGRkaR2EynP eD5gjTv9NacTRVcTmFjEGh2zlRVsZef0eZXx4Tw4r6aeDGE= hadoop@d5270-49.mie-chukyo-u.ac.jp ssh-dss AAAAB3NzaC1kc3MAAACBAKCBtQBXC/N6lT3hxtvU8S+CovsgpwiFq767XBAM6gNVOwx7XhXKpiaOdIMC+G ecFr574+V7LkkGqhtb7JmqjfXOa3AgvnWFYzEi7o+ICL8ET9nrVJm8ZJSG9FwXDGigIFyUFH8v9f62+jwoaRtJPVaR MAq/Um1Y+IqLQhED7A4TAAAAFQDTNEwr9Ai+FJsjIAJ0jf9ydYZ7eQAAAIBXsL8DNtsIPaaAI0CeBu5dk3dxKRP6yK oFhZpO1dyh+rCxH5V/G3veFyon2SOZGvaxT7i7NwvUoaM7UXQHpqJDTC5YLWoQM/4PtprtH2ACuBxe4f7wu+fosOi9 OtXdQxt+BY9USyI6W1nzMXf1freAQR+0GD0dUin/YTYpin8t0gAAAIEAnza3qTNyx4X4lnntVsgmczAdJhSJufvsJO cR2ntamkH4PJ5GrgbF2IlSqKnUBJQ12HAfZY0Q7Zjf6KKyx1pt4EyVINVSRu0BSQoPu7amvh57Z0GQwRRKWyolWNUr CFSqOmqJiUj/5Ee4dFBFkZPPw9lGfldpPq5JAgMQXD6/Ht8= hadoop@d5270-47.mie-chukyo-u.ac.jp ***NameNode,DataNode共通設定 [#hdef5683] core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://d5270-53:9000</value> <==namenodeのマシンを指定 </property> </configuration> mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>d5270-53:9001</value> <==namenodeのマシンを指定 </property> </configuration> hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> ***NameNodeの設定 [#h7f25360] DataNodeになるマシンを設定 slaves d5270-47 d5270-49 masters にはNameNodeのセカンダリになるマシンを設定するが今回は同一マシンのためlocalhostでOK、またDataNodeのマシンもslave,masterともlocalhostでOK **擬似クラスタモードで動作していた環境を削除 [#xf70bc8e] $ rm -rf /tmp/* **フォーマット [#x217959a] $ ../bin/hadoop namenode -format 10/07/20 13:41:59 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = d5270-53.mie-chukyo-u.ac.jp/10.3.4.53 <=ここが127.0.0.1から変更されるのを確認する STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 10/07/20 13:42:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop 10/07/20 13:42:00 INFO namenode.FSNamesystem: supergroup=supergroup 10/07/20 13:42:00 INFO namenode.FSNamesystem: isPermissionEnabled=true 10/07/20 13:42:00 INFO common.Storage: Image file of size 96 saved in 0 seconds. 10/07/20 13:42:00 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted. 10/07/20 13:42:00 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at d5270-53.mie-chukyo-u.ac.jp/10.3.4.53 ************************************************************/ ***hadoopの起動 [#nad7db8f] $ bin/start-dfs.sh starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out d5270-49.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-49.mie-chukyo-u.ac.jp.out d5270-53.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-53.mie-chukyo-u.ac.jp.out localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out ***起動確認 [#paba86b9] ''NameNode''での確認 $ /usr/java/jdk1.6.0_21/bin/jps 16889 NameNode 17035 SecondaryNameNode 17096 Jps またはhttp://10.3.4.57:50070/dfshealth.jsp (namenodeにアクセスして)LiveNodeが2(2台のDataNode)を確認する。図は1台がアクティブでないことを示している &ref("./hadoop4.png"); ''DataNode''での確認 $ /usr/java/jdk1.6.0_21/bin/jps 3285 DataNode 3333 Jps ***datanodeが起動しないとき [#kdf4b57f] $ rm -rf /tmp/* $ hadoop namenode -format を行い再度実行 /etc/hostsの記述に注意する ***Hadoop Map/Reduceの起動: [#q3bd89a6] NameNodeで $ bin/start-mapred.sh starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-d5270-47.mie-chukyo-u.ac.jp.out d5270-53.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-53.mie-chukyo-u.ac.jp.out d5270-49.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-49.mie-chukyo-u.ac.jp.out ***確認 [#d42748ea] NameNodeで $ /usr/java/jdk1.6.0_21/bin/jps 16889 NameNode 17035 SecondaryNameNode 17221 Jps 17139 JobTracker DataNodeで $ /usr/java/jdk1.6.0_21/bin/jps 3285 DataNode 3415 Jps 3362 TaskTracker hadoop.tmp.dirデフォルトでは、/tmp/hadoop-${user.name} になる。 namenodeで /tmp/hadoop-hadoop/dfs/name/current/VERSION #Wed Jul 21 12:57:10 JST 2010 namespaceID=802168764 cTime=0 storageType=NAME_NODE layoutVersion=-18 datanodeで /tmp/hadoop-hadoop/dfs/data/current/VERSION #Wed Jul 21 12:52:14 JST 2010 namespaceID=802168764 storageID=DS-579881093-10.3.4.49-50010-1279684334128 cTime=0 storageType=DATA_NODE layoutVersion=-18 それぞれ、namespaceIDは全て同一で、storageIDは全てユニークである必要があるようでだ ***参考 [#r04c5565] ***DataNodeの復旧 [#of1bbc7c] datanode上で $ bin/hadoop-daemon.sh start datanode $ bin/hadoop-daemon.sh start tasktracker **参考 [#r04c5565] SD 2010年5月号 http://metasearch.sourceforge.jp/wiki/index.php?Hadoop