大数据Hadoop完全分布式环境搭建轻松搞掂步骤

1、Linux虚拟机安装,添加centos用户


Linux发行版:CentOS 7
新增用户:centos

配置分布式的规划IP:

Select Code
1
2
3
4
5
127.0.0.1 localhost
192.168.10.201 s201
192.168.10.202 s202
192.168.10.203 s203
192.168.10.204 s204

配置IP地址:

Select Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$>vim/etc/sysconfig/network-scripts/ifcfg-eno16777736

TYPE="Ethernet"
BOOTPROTO="static" //修改为静态地址
DEFROUTE="yes"
PEERDNS="yes"
PEERROUTES="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_PEERDNS="yes"
IPV6_PEERROUTES="yes"
IPV6_FAILURE_FATAL="no"
NAME="eno16777736"
UUID="e6767070-2995-451c-bf70-293efa16d962"
DEVICE="eno16777736"
ONBOOT="yes"	//修改为yes
// 新增IP地址配置信息,其他保持不变
IPADDR=192.168.10.201
PREFIX=24	
GATEWAY=192.168.10.2
NETMASK=255.255.255.0
DNS=8.8.8.8
DNS2=114.114.114.114

配置hostname

Select Code
1
$>vim /etc/hostname

输入:s201

s202,s203,s204克隆s201。

2、安装JAVA


下载:jdk-8u65-linux-x64.tar.gz
根目录下创建目录:sudo mkdir /soft

Select Code
1
2
3
4
$>chown centos:centos /soft
$>cd /soft
$>tar -xzvf jdk-8u65-linux-x64.tar.gz
$>ln -s /soft/jdk1.8.0_131 /soft/jdk

3、安装Hadoop


下载:hadoop-2.7.3.tar.gz

Select Code
1
2
$>tar -xzvf hadoop-2.7.3.tar.gz
$>ln -s /soft/hadoop-2.7.3 /soft/hadoop

4、写入环境变量

Select Code
1
$>sudo vim /etc/profile

写入如下配置:

Select Code
1
2
3
4
5
6
7
export JAVA_HOME=/soft/jdk
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export PS1='[\u@\h `pwd`]\$'

使配置生效:

Select Code
1
$>source /etc/profile

检查配置是否生效:

Select Code
1
2
java -version
hadoop version

5、配置Hadoop


配置hadoop,使用符号连接的方式,让三种配置形态共存。

Select Code
1
2
3
4
$>cp -r /soft/hadoop/etc/hadoop /soft/hadoop/etc/local
$>cp -r /soft/hadoop/etc/hadoop /soft/hadoop/etc/pesudo
$>cp -r /soft/hadoop/etc/hadoop /soft/hadoop/etc/full
$>rm -rf /soft/hadoop/etc/hadoop

需要使用哪种模式就创建软链接指向对应的配置:

Select Code
1
$>ln -s /soft/hadoop/etc/full /soft/hadoop/etc/hadoop

a) 进入${HADOOP_HOME}/etc/hadoop目录

b) 编辑core-site.xml

Select Code
1
2
3
4
5
6
7
<?xml version="1.0"?>
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://s201/</value>
	</property>
</configuration>

c) 编辑hdfs-site.xml

Select Code
1
2
3
4
5
6
7
<?xml version="1.0"?>
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
</configuration>

d) 编辑mapred-site.xml

Select Code
1
2
3
4
5
6
7
8
cp mapred-site.xml.template mapred-site.xml
<?xml version="1.0"?>
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>

e) 编辑yarn-site.xml

Select Code
1
2
3
4
5
6
7
8
9
10
11
<?xml version="1.0"?>
<configuration>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>s201</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
</configuration>

f) 编辑slaves

Select Code
1
2
3
s202
s203
s204

g) 编辑hadoop-env.sh

Select Code
1
2
3
...
export JAVA_HOME=/soft/jdk
...

6、克隆虚拟机


s202,s203,s204克隆s201。

克隆后:修改每台机器的ip地址:

Select Code
1
$>vim /etc/sysconfig/network-scripts/ifcfg-eno16777736

检测是否成功:CMD客户端是否可以ping ip通过。

7、配置SSH无密登录


远程连接s202,s203,s204清空 /home/centos/.ssh,使用centos用户创建 /home/centos/.ssh 目录(注意目录的权限)

在客户端s201生成公钥和秘钥:

Select Code
1
2
3
4
5
6
$>ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

$>scp id_rsa.pub centos@s201:/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub centos@s202:/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub centos@s203:/home/centos/.ssh/authorized_keys
$>scp id_rsa.pub centos@s204:/home/centos/.ssh/authorized_keys

测试

Select Code
1
$>ssh localhost

(注意:/home/centos/.ssh的目录权限为700)

Select Code
1
2
3
-rw-r--r--  1 centos centos  393 Nov 18 04:59 authorized_keys
-rw-------  1 centos centos 1675 Nov 18 04:58 id_rsa
-rw-r--r--  1 centos centos  393 Nov 18 04:58 id_rsa.pub

authorized_keys 文件的权限为 644。

Select Code
1
2
3
4
ssh s201 chmod 644 ~/.ssh/authorized_keys;
ssh s202 chmod 644 ~/.ssh/authorized_keys;
ssh s203 chmod 644 ~/.ssh/authorized_keys;
ssh s204 chmod 644 ~/.ssh/authorized_keys;

测试s201无密登录s202,s203,s204 (登录后exit退出,测试下一个)

8、如果存在临时文件,则需要清空

Select Code
1
2
3
4
5
6
7
8
$>ssh s202 rm -rf /tmp/hadoop-centos/;
ssh s203 rm -rf /tmp/hadoop-centos/;
ssh s204 rm -rf /tmp/hadoop-centos/;
ssh s201 rm -rf /tmp/hadoop-centos/;
ssh s201 rm -rf /soft/hadoop/logs/*;
ssh s202 rm -rf /soft/hadoop/logs/*;
ssh s203 rm -rf /soft/hadoop/logs/*;
ssh s204 rm -rf /soft/hadoop/logs/*;
Select Code
1
hadoop namenode -format;//清空完毕后,格式化

注意:命令可以在一行执行,命令之间使用 ‘;’ 隔开

9、一切准备后续后,开始启动

Select Code
1
$>start-all.sh

完成启动后,jps 测试:
s201
[centos@s201 /home/centos]$jps
24032 SecondaryNameNode
24753 Jps
24211 ResourceManager
23868 NameNode

s202
[centos@s202 /home/centos]$jps
21217 DataNode
21322 NodeManager
21549 Jps

s203
[centos@s203 /home/centos]$jps
21204 Jps
20886 DataNode
20990 NodeManager

s204
[centos@s204 /home/centos]$jps
21028 NodeManager
21241 Jps
20924 DataNode

最终测试WEB页面:
http://192.168.10.201:50070/