Hdoop Install

更新日2010-07-21 (水) 14:42:18

事前準備

Hadoop用ユーザhadoopを作成

# /usr/sbin/adduser hadoop

# passwd hadoop

JDKのインストール

# chmod u+x jdk-6uxx-linux-i586-rpm.bin
# ./jdk-6uxx-linux-i586-rpm.bin

hadoopでlogin

sshの設定

localhostにsshでパスワード不要でloginするための設定

$ mkdir .ssh
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat .ssh/id_dsa.pub >> .ssh/authorized_keys
$ chmod 700 .ssh
$ cd .ssh/
$ chmod 611 authorized_keys
$ chmod 600 id_dsa

javaのパスを指定

.bash_profile

export JAVA_HOME=/usr/java/default

hadoopのインストール

ここからDownloadして、展開し、オーナーをhadoopに変更するだけ

# cd /opt
# tar zxvf hadoop-0.20.2.tar.gz
# cd hadoop-0.20.2/conf
# chown -R hadoop.hadoop hadoop-0.20.2

javaのパスを指定

hadoop-env.sh

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/java/default

動作確認

$ bin/hadoop jar hadoop-0.20.2-examples.jar pi 1 1000
Number of Maps  = 1
Samples per Map = 1000
Wrote input for Map #0
Starting Job
10/07/15 11:09:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/07/15 11:09:23 INFO mapred.FileInputFormat: Total input paths to process : 1 

(略)

10/07/15 11:09:24 INFO mapred.JobClient:     Combine input records=0
10/07/15 11:09:24 INFO mapred.JobClient:     Map output records=2
10/07/15 11:09:24 INFO mapred.JobClient:     Reduce input records=2
Job Finished in 1.213 seconds
Estimated value of Pi is 3.14800000000000000000

擬似クラスタモード

1台のサーバでnamenode,datanodeを兼用して使用する。

conf/core-site.xml
<configuration>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>

</configuration>
conf/hdfs-site.xml
<configuration>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

</configuration>

conf/mapred-site.xml

<configuration>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>

</configuration>

namenodeのフォーマット

$ bin/hadoop namenode -format
10/07/15 11:15:12 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = d5270-47.mie-chukyo-u.ac.jp/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/15 11:15:12 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/07/15 11:15:12 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/15 11:15:12 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/15 11:15:12 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/07/15 11:15:12 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted.
10/07/15 11:15:12 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at d5270-47.mie-chukyo-u.ac.jp/127.0.0.1
************************************************************/

hadoopの起動

$ bin/start-all.sh
starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out
localhost: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-47.mie-chukyo-u.ac.jp.out
localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out
starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jo btracker-d5270-47.mie-chukyo-u.ac.jp.out
localhost: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-47.mie-chukyo-u.ac.jp.out

確認

http://localhost:50070/dfshealth.jspにアクセスしてLive Nodeが1になっていことを確認

&ref(): File not found: "hadoop1.png" at page "".";

http://10.3.4.47:50030/jobtracker.jspにアクセスしてNodesが1になっていことを確認 &ref(): File not found: "hadoop1.png" at page "".";

πを計算してみる

$ bin/hadoop jar hadoop-0.20.2-examples.jar pi 10 1000
Number of Maps  = 10
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
10/07/15 11:55:55 INFO mapred.FileInputFormat: Total input paths to process : 10
10/07/15 11:55:56 INFO mapred.JobClient: Running job: job_201007151141_0003
10/07/15 11:55:57 INFO mapred.JobClient:  map 0% reduce 0%
10/07/15 11:56:05 INFO mapred.JobClient:  map 20% reduce 0%
10/07/15 11:56:08 INFO mapred.JobClient:  map 40% reduce 0%
10/07/15 11:56:11 INFO mapred.JobClient:  map 60% reduce 0%
10/07/15 11:56:14 INFO mapred.JobClient:  map 80% reduce 20%
10/07/15 11:56:17 INFO mapred.JobClient:  map 100% reduce 20%
10/07/15 11:56:23 INFO mapred.JobClient:  map 100% reduce 26%
10/07/15 11:56:29 INFO mapred.JobClient:  map 100% reduce 100%
10/07/15 11:56:31 INFO mapred.JobClient: Job complete: job_201007151141_0003
10/07/15 11:56:31 INFO mapred.JobClient: Counters: 18
10/07/15 11:56:31 INFO mapred.JobClient:   Job Counters
10/07/15 11:56:31 INFO mapred.JobClient:     Launched reduce tasks=1
10/07/15 11:56:31 INFO mapred.JobClient:     Launched map tasks=10
10/07/15 11:56:31 INFO mapred.JobClient:     Data-local map tasks=10
10/07/15 11:56:31 INFO mapred.JobClient:   FileSystemCounters
10/07/15 11:56:31 INFO mapred.JobClient:     FILE_BYTES_READ=226
10/07/15 11:56:31 INFO mapred.JobClient:     HDFS_BYTES_READ=1180
10/07/15 11:56:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=826
10/07/15 11:56:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
10/07/15 11:56:31 INFO mapred.JobClient:   Map-Reduce Framework
10/07/15 11:56:31 INFO mapred.JobClient:     Reduce input groups=20
10/07/15 11:56:31 INFO mapred.JobClient:     Combine output records=0
10/07/15 11:56:31 INFO mapred.JobClient:     Map input records=10
10/07/15 11:56:31 INFO mapred.JobClient:     Reduce shuffle bytes=280
10/07/15 11:56:31 INFO mapred.JobClient:     Reduce output records=0
10/07/15 11:56:31 INFO mapred.JobClient:     Spilled Records=40
10/07/15 11:56:31 INFO mapred.JobClient:     Map output bytes=180
10/07/15 11:56:31 INFO mapred.JobClient:     Map input bytes=240
10/07/15 11:56:31 INFO mapred.JobClient:     Combine input records=0
10/07/15 11:56:31 INFO mapred.JobClient:     Map output records=20
10/07/15 11:56:31 INFO mapred.JobClient:     Reduce input records=20
Job Finished in 35.405 seconds
Estimated value of Pi is 3.14080000000000000000

hadoopの停止

$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode

完全分散モード

hadoopをnamenode(d5270-53)1台,datanode(d5270-47,d5270-49)2台で動作させる

&color(red){以下のように全てマシンにnamenode,datanodeがわかるようにhostsに記入しておく必要がある。 特に、127.0.0.1のところが以下のようになっていると、localhostの指定でしか動作しなかったため、datanodeが別マシンを指定してもうまく動作しなかった};

127.0.0.1       d5270-53.mie-chukyo-u.ac.jp     localhost.localdomain   localhost

/etc/hosts(マシン名:d5270-53)

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain   localhost
10.3.4.47       d5270-47.mie-chukyo-u.ac.jp     d5270-47
10.3.4.49       d5270-49.mie-chukyo-u.ac.jp     d5270-49

各マシンがsshでパスワード不要でloginできるように以下のファイルに各マシンのパスワード不要で作成したpubキーを追加しておく。

.ssh/authorized_keys

ssh-dss AAAAB3NzaC1kc3MAAACBAPh+QR0K1tfrutT5PWQ6EqSYyzjJn5Yc30G+76sqV1tBvl33m9SEaI1mYM86Qc
enSvNb23zv/KQ05+hvxVsVP+wMOgQ6roag99JpXX3v9D/6pe8eXGeU7wpowxZ9dcXtAGU8HBZ8pB3qT5Tl2d4sGuxo
5yZm7YP1xAAyUT4sUcsLAAAAFQDIGD/XcyorEP0bjt+A2WxvD53HXwAAAIBRnu06GvAKhQUd890bja87MtrJrfxfpO
ezEagl593s+whTEncrzL5hNJmlxVg0c073jezKOCfK1akU1RU8ahqKC1Y7EDHOgN/W07p2uft2ZPA42+kB7b7XNPmZ
0QFXfjO06px0NT2fnUtLiLurshjtYhmWK5yW9FoGtj03F4tf/QAAAIEAjyHfNVpHXX9U1fwRjPIIBYvBbKdSeEFQMa
sx9DGN5D27otHiqF7EMPNGckMWWpiICw6lpM5BX8Lk+jAkb71brYk5A365nsjOf9ZQ5uJLunOXUmrO0WGRkaR2EynP
eD5gjTv9NacTRVcTmFjEGh2zlRVsZef0eZXx4Tw4r6aeDGE= hadoop@d5270-49.mie-chukyo-u.ac.jp
ssh-dss AAAAB3NzaC1kc3MAAACBAKCBtQBXC/N6lT3hxtvU8S+CovsgpwiFq767XBAM6gNVOwx7XhXKpiaOdIMC+G
ecFr574+V7LkkGqhtb7JmqjfXOa3AgvnWFYzEi7o+ICL8ET9nrVJm8ZJSG9FwXDGigIFyUFH8v9f62+jwoaRtJPVaR
MAq/Um1Y+IqLQhED7A4TAAAAFQDTNEwr9Ai+FJsjIAJ0jf9ydYZ7eQAAAIBXsL8DNtsIPaaAI0CeBu5dk3dxKRP6yK
oFhZpO1dyh+rCxH5V/G3veFyon2SOZGvaxT7i7NwvUoaM7UXQHpqJDTC5YLWoQM/4PtprtH2ACuBxe4f7wu+fosOi9
OtXdQxt+BY9USyI6W1nzMXf1freAQR+0GD0dUin/YTYpin8t0gAAAIEAnza3qTNyx4X4lnntVsgmczAdJhSJufvsJO
cR2ntamkH4PJ5GrgbF2IlSqKnUBJQ12HAfZY0Q7Zjf6KKyx1pt4EyVINVSRu0BSQoPu7amvh57Z0GQwRRKWyolWNUr
CFSqOmqJiUj/5Ee4dFBFkZPPw9lGfldpPq5JAgMQXD6/Ht8= hadoop@d5270-47.mie-chukyo-u.ac.jp

NameNode,DataNode共通設定

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://d5270-53:9000</value>  <==namenodeのマシンを指定
  </property>

</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>mapred.job.tracker</name>
    <value>d5270-53:9001</value>  <==namenodeのマシンを指定
  </property>

</configuration>

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. --> 

<configuration>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

</configuration>

NameNodeの設定

DataNodeになるマシンを設定

slaves

d5270-47
d5270-49

masters にはNameNodeのセカンダリになるマシンを設定するが今回は同一マシンのためlocalhostでOK、またDataNodeのマシンもslave,masterともlocalhostでOK

擬似クラスタモードで動作していた環境を削除

$ rm -rf /tmp/*

フォーマット

$ ../bin/hadoop namenode -format
10/07/20 13:41:59 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = d5270-53.mie-chukyo-u.ac.jp/10.3.4.53   <=ここが127.0.0.1から変更されるのを確認する
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/20 13:42:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/07/20 13:42:00 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/20 13:42:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/20 13:42:00 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/07/20 13:42:00 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs /name has been successfully formatted.
10/07/20 13:42:00 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at d5270-53.mie-chukyo-u.ac.jp/10.3.4.53
************************************************************/

hadoopの起動

$ bin/start-dfs.sh starting namenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-d5270-47.mie-chukyo-u.ac.jp.out d5270-49.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-49.mie-chukyo-u.ac.jp.out d5270-53.mie-chukyo-u.ac.jp: starting datanode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-d5270-53.mie-chukyo-u.ac.jp.out localhost: starting secondarynamenode, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-d5270-47.mie-chukyo-u.ac.jp.out

起動確認

NameNodeでの確認

$ /usr/java/jdk1.6.0_21/bin/jps 16889 NameNode 17035 SecondaryNameNode 17096 Jps

またはhttp://10.3.4.57:50070/dfshealth.jsp (namenodeにアクセスして)LiveNodeが2(2台のDataNode)を確認する。図は1台がアクティブでないことを示している

hadoop4.png

DataNodeでの確認

$ /usr/java/jdk1.6.0_21/bin/jps 3285 DataNode 3333 Jps

datanodeが起動しないとき

$ rm -rf /tmp/*
$ hadoop namenode -format

を行い再度実行 /etc/hostsの記述に注意する

Hadoop Map/Reduceの起動:

NameNodeで

$ bin/start-mapred.sh
starting jobtracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-d5270-47.mie-chukyo-u.ac.jp.out
d5270-53.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-53.mie-chukyo-u.ac.jp.out
d5270-49.mie-chukyo-u.ac.jp: starting tasktracker, logging to /opt/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-d5270-49.mie-chukyo-u.ac.jp.out

確認

NameNodeで

$ /usr/java/jdk1.6.0_21/bin/jps
16889 NameNode
17035 SecondaryNameNode
17221 Jps
17139 JobTracker

DataNodeで

$ /usr/java/jdk1.6.0_21/bin/jps
3285 DataNode
3415 Jps
3362 TaskTracker
hadoop.tmp.dirデフォルトでは、/tmp/hadoop-${user.name} になる。

namenodeで

/tmp/hadoop-hadoop/dfs/name/current/VERSION

#Wed Jul 21 12:57:10 JST 2010
namespaceID=802168764
cTime=0
storageType=NAME_NODE
layoutVersion=-18

datanodeで

/tmp/hadoop-hadoop/dfs/data/current/VERSION

#Wed Jul 21 12:52:14 JST 2010
namespaceID=802168764
storageID=DS-579881093-10.3.4.49-50010-1279684334128
cTime=0
storageType=DATA_NODE
layoutVersion=-18

それぞれ、namespaceIDは全て同一で、storageIDは全てユニークである必要があるようでだ

参考

SD 2010年5月号

http://metasearch.sourceforge.jp/wiki/index.php?Hadoop


トップ   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS