HDFS的安装和部署
1.准备工作
准备3台机器,设置好hosts 一台作为Namenode,cc-staging-session2命名为master, 两台作为dataNode,cc-staging-frontslave1, cc-staging-imcenter 命名为slave2 #3台机器都创建hadoop用户 passwd hadoop # 安装JDK,并设置JAVA_HOME和PATH #下载安装jdk1.7 tar zxvf jdk-7u21-linux-x64.gz -C /usr/local/ #/etc/profile增加环境变量 export JRE_HOME=/usr/local/jdk1.7.0_21/jre export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar |
2.下载安装hadoop
#下载hadoop 下载地址https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz wget http://archive.cloudera.com/cdh/3/hbase-0.90.6-cdh3u6.tar.gz wget http://archive.cloudera.com/cdh/3/hive-0.7.1-cdh3u6.tar.gz #在3太机器上创建相同的目录路径, name目录只存放在master上,且权限为755,否则会导致后面的格式化失败 #解压安装包到/hadoop/install下 tar zxvf hadoop-0.20.2-cdh3u6.tar.gz -C /hadoop/install/ #修改属主为hadoop chown -R hadoop.hadoop /hadoop |
3.设置hadoop账户的ssh信任关系
#在master机器上操作 su – hadoop ssh-keygen ssh-copy-id -i .ssh/id_rsa.pub hadoop@cc-staging-front ssh-copy-id -i .ssh/id_rsa.pub hadoop@cc-staging-imcenter ssh-copy-id -i .ssh/id_rsa.pub hadoop@cc-staging-session2 #测试一下,都能成功登录就行 ssh hadoop@master ssh hadoop@slave1 ssh hadoop@slave2 |
4.编辑HDFS配置文件,所以节点都有保持一致
#core-site.xml核心配置 <configuration> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration> #hdfs-site.xml:站点多项参数配置 <configuration> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/hadoop/name</value> </property> <property> <name>dfs.data.dir</name> <value>/hadoop/data1,/hadoop/data2</value> </property> <property> <name>dfs.tmp.dir</name> <value>/hadoop/tmp</value> </property> </configuration> #在hadoop-env.sh中配置JAVA_HOME变量 export JAVA_HOME=/usr/local/jdk1.7.0_21/ |
5.初始化namenode节点
#在master上操作,格式化Image文件的存储空间,必需是大写的Y su - hadoop [hadoop@cc-staging-session2 bin]$ ./hadoop namenode -format 13/04/27 01:46:40 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cc-staging-session2/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2-cdh3u6 STARTUP_MSG: build = git://ubuntu-slave01/var/lib/jenkins/workspace/CDH3u6-Full-RC/build/cdh3/hadoop20/0.20.2-cdh3u6/source -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a; compiled by 'jenkins' on Wed Mar 20 11:45:36 PDT 2013 ************************************************************/ Re-format filesystem in /hadoop/name ? (Y or N) Y 13/04/27 01:46:42 INFO util.GSet: VM type = 64-bit 13/04/27 01:46:42 INFO util.GSet: 2% max memory = 17.77875 MB 13/04/27 01:46:42 INFO util.GSet: capacity = 2^21 = 2097152 entries 13/04/27 01:46:42 INFO util.GSet: recommended=2097152, actual=2097152 13/04/27 01:46:42 INFO namenode.FSNamesystem: fsOwner=hadoop (auth:SIMPLE) 13/04/27 01:46:42 INFO namenode.FSNamesystem: supergroup=supergroup 13/04/27 01:46:42 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/04/27 01:46:42 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000 13/04/27 01:46:42 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/04/27 01:46:43 INFO common.Storage: Image file of size 112 saved in 0 seconds. 13/04/27 01:46:43 INFO common.Storage: Storage directory /hadoop/name has been successfully formatted. 13/04/27 01:46:43 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at cc-staging-session2/127.0.0.1 ************************************************************/ #启动namenode和datanode cd /hadoop/install/hadoop-0.20.2-cdh3u6/bin/ ./hadoop-daemon.sh start namenode #在/hadoop/install/hadoop-0.20.2-cdh3u6/bin/下有很多命令, * start-all.sh 启动所有的Hadoop守护,包括namenode, datanode,jobtracker,tasktrack,secondarynamenode。 * stop-all.sh 停止所有的Hadoop。 * start-mapred.sh 启动Map/Reduce守护,包括Jobtracker和Tasktrack。 * stop-mapred.sh 停止Map/Reduce守护 * start-dfs.sh 启动Hadoop DFS守护,Namenode和Datanode。 * stop-dfs.sh 停止DFS守护#在slave1和slave2上启动datanode cd /hadoop/install/hadoop-0.20.2-cdh3u6/bin/ datanode #可以在各个节点上运行jps命令查看是否启动成功 [hadoop@cc-staging-session2 bin]$ jps 11926 NameNode 12566 Jps 12233 SecondaryNameNode 12066 DataNode #数据节点必需在硬盘上不然会报错 [hadoop@cc-staging-front bin]$ jps 14582 DataNode 14637 Jps [hadoop@cc-staging-imcenter bin]$ jps 23355 DataNode 23419 Jps # |
6.简单的测试
#在任意一个节点创建目录: #可以在所有数据节点上查看到目录: Found 2 items drwxr-xr-x - hadoop supergroup 0 2013-04-27 02:32 /user/hadoop/test #拷贝文件,即把本地的文件存放到HDFS中 #删除文件 ./hadoop dfs -rm test/services |