安装hadoop

整体

服务器目前是3台,centos stream 9.0、最好采用centos7
注意,centos 必须选择英文版本,否则会提示缺水包langpack-en

192.168.230.136/146/147

前面做master,后面做 slave

配置/etc/hosts

cat <<EOF >>/etc/hosts
192.168.230.136 master
192.168.230.146 node1
192.168.230.147 node2
EOF

关闭防火墙

systemctl stop firewalld

centos7需要修改yum

wget -O /etc/yum.repos.d/ali.repo http://mirrors.aliyun.com/repo/Centos-7.repo

安装hadoop,hive

下载hadoop并解压

 wget https://archive.apache.org/dist/hadoop/common/hadoop-2.10.2/hadoop-2.10.2.tar.gz

下载jdk1.8
jdk到 5080下载。

更改java1.8d的环境变量

echo "export JAVA_HOME=/opt/jdk1.8.0_202" >>/etc/profile
cho "export PATH=$PATH:/opt/jdk1.8.0_202/bin" >>/etc/profile
. /etc/profile

###

配置hadoop 的环境变量

[root@localhost hadoop-2.10.2]# echo "export HADOOP_HOME=/opt/hadoop-2.10.2" >>/etc/profile
[root@localhost hadoop-2.10.2]# echo "export PATH=$PATH:/opt/hadoop-2.10.2/bin" >>/etc/profile
. /etc/profile

修改site文件 etc/hadoop/core-site.xml

文件主要内容如下

<configuration>
 <!--用于设置Hadoop的文件系统,由URI指定-->
    <property>
        <name>fs.defaultFS</name>
        <!--用于指定namenode地址在master机器上-->
        <value>hdfs://master:9000</value>
    </property>
    <!--配置Hadoop的临时目录,默认/tmp/hadoop-${user.name}-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp</value>
    </property>
</configuration>

修改 hdfs的配置

<configuration>
    <!--指定HDFS的数量-->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!--secondary namenode 所在主机的IP和端口-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node1:50090</value>
    </property>

</configuration>

修改配置文件中的slaves文件

增加node1,node2 原来的localhost去掉。

初始化

主节点初始化文件系统

hdfs namenode -format 

24/09/15 17:01:28 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
24/09/15 17:01:28 INFO util.GSet: capacity      = 2^15 = 32768 entries
24/09/15 17:01:28 INFO namenode.FSImage: Allocated new BlockPoolId: BP-571852059-192.168.230.136-1726390888193
24/09/15 17:01:28 INFO common.Storage: Storage directory /export/servers/hadoop-2.7.4/tmp/dfs/name has been successfully formatted.
24/09/15 17:01:28 INFO namenode.FSImageFormatProtobuf: Saving image file /export/servers/hadoop-2.7.4/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
24/09/15 17:01:28 INFO namenode.FSImageFormatProtobuf: Image file /export/servers/hadoop-2.7.4/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
24/09/15 17:01:28 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
24/09/15 17:01:28 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
24/09/15 17:01:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.230.136

启动集群

在主节点启动所有HDFS服务进程

start-dfs.sh
执行以后出现java_home not set的错误
在HADOOP_HOME/libexec/hdfs-config.sh中,增加java-home的设置
在这个文件中的 hadoop-config.sh 文件前部增加
export JAVA_HOME=/opt/jdk1.8.0_202
注意,增加到其他任何地方都没作用。

在主节点启动所有HDFS服务进程

start-yarn.sh

正常启动成功

排错

  1. 注意所有的配置文件都从master机器复制过去,避免配置问题
  2. slave文件不要忘记配置
    3、主机的IP地址要固定IP。

安装hive

安装mysql

wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm

修改启用Mysql5.7的安装,修改vi /etc/yum.repos.d/mysql-community.repo,注意安装的时候后面要加 –noppgcheck

yum install mysql-server --nopgpcheck
systemctl start mysqld
grep password /var/log/mysqld.log
2024-09-15T12:19:08.484269Z 1 [Note] A temporary password is generated for root@localhost: .JHqnJLUa94#
2024-09-15T12:19:21.186969Z 2 [Note] Access denied for user 'root'@'localhost' (using password: NO)

创建hive的数据库

初始化密码

mysql> create database hive charset utf8;
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.
mysql> set password=password('1qaz!QAZ');
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;

重新登陆创建

mysql> create database hive charset utf8;
Query OK, 1 row affected (0.00 sec)

mysql> create user hive@localhost identified by '1qaz!QAZ';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

上面不能忘记对数据库用户授权。
grant all privileges on hive.* to hive@localhost;

安装hive

下载hive 3.1.3

wget https://mirrors.aliyun.com/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz -O /opt

cd /opt && tar zxf /apache-hive-3.1.3-bin.tar.gz

修改环境变量

在/etc/profile 尾部添加或者合并

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/jdk1.8.0_202/bin
export HADOOP_HOME=/opt/hadoop-2.10.2
export PATH=/opt/apache-hive-3.1.3-bin/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/jdk1.8.0_202/bin:/opt/hadoop-2.10.2/bin:$HADOOP_HOME/sbin
export HIVE_HOME=/opt/apache-hive-3.1.3-bin

执行 . /etc/profile生效。

修改hive配置文件

修改hive的hive-site.xml文文件

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>mysql驱动</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>数据库使用用户名</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>1qaz!QAZ</value>
  <description>数据库密码</description>
</property>
<!-- 数据库 end -->


<!-- 其它 end -->

修改hadoop的core-site配置文件

<property>
       <name>hadoop.proxyuser.root.groups</name>
       <value>*</value>
       <description>配置超级用户允许通过代理用户所属组</description>
   </property>
   <property>
       <name>hadoop.proxyuser.root.hosts</name>
       <value>*</value>
       <description>配置超级用户允许通过代理访问的主机节点</description>
   </property>
   <property>
       <name>hadoop.proxyuser.root.users</name>
       <value>*</value>
   </property>

###
下载mysql驱动到 $HIVE_HOME/lib的目录下

/opt/apache-hive-3.1.3-bin/lib
curl 'http://pan.itshine.cn:5080/?explorer/share/fileOut&shareID=64h6PiQQ&path=%7BshareItemLink%3A64h6PiQQ%7D%2F%E6%95%B0%E6%8D%AE%E5%BA%93%E5%AE%89%E8%A3%85%E5%8C%85%2Fmysql5%E9%A9%B1%E5%8A%A8%2Fmysql-connector-java-5.1.49-bin.jar' > './mysql-connector-java-5.1.49-bin.jar'

初始化hive meta


[root@master conf]# schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-3.1.3-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       hive
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql


Error: Syntax error: Encountered "<EOF>" at line 1, column 64. (state=42X01,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

y
都是配置文件错误
最终的hive-site.xml文件如下


<!-- H2S运行绑定host -->
<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>node1</value>
</property>

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
</property>    

<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8</value>
    </property>
 <property>
    <name>hive.metastore.db.type</name>
    <value>mysql</value>
    <description>
      Expects one of [derby, oracle, mysql, mssql, postgres].
      Type of database used by the metastore. Information schema &amp; JDBCStorageHandler depend on it.
    </description>
  </property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>1qaz!QAZ</value>
</property>


</configuration>

然后执行 schematool -dbType mysql -initSchema

# schematool -dbType mysql -initSchema
....
jdbc:mysql://localhost:3306/hive> /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */
No rows affected (0.001 seconds)
0: jdbc:mysql://localhost:3306/hive> /*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */
No rows affected (0.001 seconds)
0: jdbc:mysql://localhost:3306/hive> !closeall
Closing: 0: jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8
beeline>
beeline> Initialization script completed
Sun Sep 15 21:10:13 CST 2024 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool complete

启动hive

前台启动启动hiveserver2服务:

nohup bin/hive –service metastore &
nohup bin/hive –service hiveserver2 &

impala安装

下载cdh5,这个要用来做yum仓库

curl 'http://pan.itshine.cn:5080/?explorer/share/file&hash=259dTd3iBsOzHLUkqX2Sh1KOVfnsPvwROb5XzGdPZuM6CHHpHyAnklZ8ESxozwuMi989UQ' > './cdh5.14.0-centos6.tar.gz' 

##在所有的服务器上操作。可以考虑scp复制

安装impala 会丢2个so文件,需要安装以下二个包

curl 'http://pan.itshine.cn:5080/?explorer/share/file&hash=8e12BhKHTII9vSrr3G0Z7ir-ZkJW4YQ7d5yk-pqdq_wYbxqw2bhcvSAhA51ptYA6Pbc0VQ' > './cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64.rpm'

curl 'http://pan.itshine.cn:5080/?explorer/share/file&hash=898fjDM37_4UoZdz9rDnl5jpeKinnHEyFMCJNv_iMfctxNKzEtBfxQ-dVzGL5_gY9hYaqw' > './python-libs-2.6.6-66.el6_8.x86_64.rpm'

rpm -ivh --nodeps --force cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64.rpm

rpm -ivh --nodeps --force python-libs-2.6.6-66.el6_8.x86_64.rpm

安装nginx ,来做yum云

cd /opt
tar xzf cdh5.14.0-centos6.tar.gz
yum -y install nginx
ln -s /opt/cdh/5.14.0 /usr/share/nginx/html/5.14.0

生成yum文件`

cat <<EOF >/etc/yum.repos.d/cdh.repo
[cdh]
name=Cloudera's Distribution for Hadoop, Version 5
baseurl=http://node1//5.14.0
gpgcheck=0
enabled =1
EOF

安装impala

yum -y install impala*

配置impala

配置javahome

cat > /etc/default/bigtop-utils <<EOF
export JAVA_HOME=/opt/jdk1.8.0_202
EOF

配置/etc/impala/conf

mkdir -p /etc/impala/conf
cp /opt/apache-hive-3.1.3-bin/conf/hive-site.xml .
cp /opt/hadoop-2.10.2/etc/hadoop/hdfs-site.xml /etc/impala/conf
cp /opt/hadoop-2.10.2/etc/hadoop/core-site.xml /etc/impala/conf

运行 &测试

service impala-server  start
service impala-catalog  start
service impala-state-store  start

[root@localhost yum.repos.d]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to localhost.localdomain:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

To see live updates on a query's progress, run 'set LIVE_SUMMARY=1;'.
***********************************************************************************
[localhost.localdomain:21000] >

到此安装结束

作者:严锋  创建时间:2023-12-21 10:11
最后编辑:严锋  更新时间:2025-05-09 15:48