千家信息网

Hive的安装及使用方法

发表于:2025-12-02 作者:千家信息网编辑
千家信息网最后更新 2025年12月02日,这篇文章主要介绍" Hive的安装及使用方法",在日常操作中,相信很多人在 Hive的安装及使用方法问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答" Hive的安装及使用
千家信息网最后更新 2025年12月02日Hive的安装及使用方法

这篇文章主要介绍" Hive的安装及使用方法",在日常操作中,相信很多人在 Hive的安装及使用方法问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答" Hive的安装及使用方法"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!

前言

Hive是Hadoop一个程序接口,Hive让数据分析人员快速上手,Hive使用了类SQL的语法,Hive让JAVA的世界变得简单而轻巧,Hive让Hadoop普及到了程序员以外的人。

从Hive开始,让分析师们也能玩转大数据。

目录

  1. Hive的安装

  2. Hive的基本使用:CRUD

  3. Hive交互式模式

  4. 数据导入

  5. 数据导出

  6. Hive查询HiveQL

  7. Hive视图

  8. Hive分区表

1. Hive的安装

系统环境
装好hadoop的环境后,我们可以把Hive装在namenode机器上(c1)。
hadoop的环境,请参考:让Hadoop跑在云端系列文章,RHadoop实践系列之一:Hadoop环境搭建

下载: hive-0.9.0.tar.gz
解压到: /home/cos/toolkit/hive-0.9.0

hive配置

~ cd /home/cos/toolkit/hive-0.9.0~ cp hive-default.xml.template hive-site.xml~ cp hive-log4j.properties.template hive-log4j.properties

修改hive-site.xml配置文件
把Hive的元数据存储到MySQL中

~ vi conf/hive-site.xmljavax.jdo.option.ConnectionURLjdbc:mysql://c1:3306/hive_metadata?createDatabaseIfNotExist=trueJDBC connect string for a JDBC metastorejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBC metastorejavax.jdo.option.ConnectionUserNamehiveusername to use against metastore databasejavax.jdo.option.ConnectionPasswordhivepassword to use against metastore databasehive.metastore.warehouse.dir/user/hive/warehouselocation of default database for the warehouse

修改hive-log4j.properties

#log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounterlog4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

设置环境变量

~ sudo vi /etc/environmentPATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cos/toolkit/ant184/bin:/home/cos/toolkit/jdk16/bin:/home/cos/toolkit/maven3/bin:/home/cos/toolkit/hadoop-1.0.3/bin:/home/cos/toolkit/hive-0.9.0/bin"JAVA_HOME=/home/cos/toolkit/jdk16ANT_HOME=/home/cos/toolkit/ant184MAVEN_HOME=/home/cos/toolkit/maven3HADOOP_HOME=/home/cos/toolkit/hadoop-1.0.3HIVE_HOME=/home/cos/toolkit/hive-0.9.0CLASSPATH=/home/cos/toolkit/jdk16/lib/dt.jar:/home/cos/toolkit/jdk16/lib/tools.jar

在hdfs上面,创建目录

$HADOOP_HOME/bin/hadoop fs -mkidr /tmp$HADOOP_HOME/bin/hadoop fs -mkidr /user/hive/warehouse$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

在MySQL中创建数据库

create database hive_metadata;grant all on hive_metadata.* to hive@'%' identified by 'hive';grant all on hive_metadata.* to hive@localhost identified by 'hive';ALTER DATABASE hive_metadata CHARACTER SET latin1;

手动上传mysql的jdbc库到hive/lib

~ ls /home/cos/toolkit/hive-0.9.0/libmysql-connector-java-5.1.22-bin.jar

启动hive

#启动metastore服务~ bin/hive --service metastore &Starting Hive Metastore Server#启动hiveserver服务~ bin/hive --service hiveserver &Starting Hive Thrift Server#启动hive客户端~ bin/hive shellLogging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txthive> show tablesOK

查询MySQL数据库中的元数据

~ mysql -uroot -pmysql> use hive_metadata;Database changedmysql> show tables;+-------------------------+| Tables_in_hive_metadata |+-------------------------+| BUCKETING_COLS          || CDS                     || COLUMNS_V2              || DATABASE_PARAMS         || DBS                     || IDXS                    || INDEX_PARAMS            || PARTITIONS              || PARTITION_KEYS          || PARTITION_KEY_VALS      || PARTITION_PARAMS        || PART_COL_PRIVS          || PART_PRIVS              || SDS                     || SD_PARAMS               || SEQUENCE_TABLE          || SERDES                  || SERDE_PARAMS            || SORT_COLS               || TABLE_PARAMS            || TBLS                    || TBL_COL_PRIVS           || TBL_PRIVS               |+-------------------------+23 rows in set (0.00 sec)

Hive已经成功安装,下面是hive的使用攻略。

2. Hive的基本使用

1. 进入hive控制台

~ cd /home/cos/toolkit/hive-0.9.0~ bin/hive shellLogging initialized using configuration in file:/home/cos/toolkit/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/cos/hive_job_log_cos_201307160003_95040367.txthive>

新建表

#创建数据(文本以tab分隔)~ vi /home/cos/demo/t_hive.txt16      2       361      12      1341      2       3117      21      371      2       311       12      3411      2       34#创建新表hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';OKTime taken: 0.489 seconds#导入数据t_hive.txt到t_hive表hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.397 seconds

查看表和数据

#查看表 hive> show tables;OKt_hiveTime taken: 0.099 seconds#正则匹配表名hive>show tables '*t*';OKt_hiveTime taken: 0.065 seconds#查看表数据hive> select * from t_hive;OK16      2       361      12      1341      2       3117      21      371      2       311       12      3411      2       34Time taken: 0.264 seconds#查看表结构hive> desc t_hive;OKa       intb       intc       intTime taken: 0.1 seconds

修改表

#增加一个字段hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);OKTime taken: 0.186 secondshive> desc t_hive;OKa       intb       intc       intnew_col stringTime taken: 0.086 seconds#重命令表名~ ALTER TABLE t_hive RENAME TO t_hadoop;OKTime taken: 0.45 secondshive> show tables;OKt_hadoopTime taken: 0.07 seconds

删除表

hive> DROP TABLE t_hadoop;OKTime taken: 0.767 secondshive> show tables;OKTime taken: 0.064 seconds

3. Hive交互式模式

  • quit,exit: 退出交互式shell

  • reset: 重置配置为默认值

  • set = : 修改特定变量的值(如果变量名拼写错误,不会报错)

  • set : 输出用户覆盖的hive配置变量

  • set -v : 输出所有Hadoop和Hive的配置变量

  • add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存

  • list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。

  • list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存

  • delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源

  • ! :

  • dfs : 从Hive shell执行一个dfs命令

  • : 执行一个Hive 查询,然后输出结果到标准输出

  • source FILE : 在CLI里执行一个hive脚本文件

4. 数据导入

还以刚才的t_hive为例。

#创建表结构hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

从操作本地文件系统加载数据(LOCAL)

hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.612 seconds#在HDFS中查找刚刚导入的数据~ hadoop fs -cat /user/hive/warehouse/t_hive/t_hive.txt16      2       361      12      1341      2       3117      21      371      2       311       12      3411      2       34

从HDFS加载数据

创建表t_hive2hive> CREATE TABLE t_hive2 (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';#从HDFS加载数据hive> LOAD DATA INPATH '/user/hive/warehouse/t_hive/t_hive.txt' OVERWRITE INTO TABLE t_hive2;Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2OKTime taken: 0.325 seconds#查看数据hive> select * from t_hive2;OK16      2       361      12      1341      2       3117      21      371      2       311       12      3411      2       34Time taken: 0.287 seconds

从其他表导入数据

hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:32:41,979 Stage-1 map = 0%,  reduce = 0%2013-07-16 10:32:48,034 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:49,050 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:50,068 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:51,082 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:52,093 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:53,102 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:54,112 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.03 secMapReduce Total cumulative CPU time: 1 seconds 30 msecEnded Job = job_201307131407_0002Ended Job = -314818888, job is filtered out (removed at runtime).Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764014154/-ext-10000Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]7 Rows loaded to t_hive2MapReduce Jobs Launched:Job 0: Map: 1   Cumulative CPU: 1.03 sec   HDFS Read: 273 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 30 msecOKTime taken: 23.227 secondshive> select * from t_hive2;OK16      2       361      12      1341      2       3117      21      371      2       311       12      3411      2       34Time taken: 0.134 seconds

创建表并从其他表导入数据

#删除表hive> DROP TABLE t_hive;#创建表并从其他表导入数据hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:36:48,612 Stage-1 map = 0%,  reduce = 0%2013-07-16 10:36:54,648 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:55,657 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:56,666 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:57,673 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:58,683 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:59,691 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.13 secMapReduce Total cumulative CPU time: 1 seconds 130 msecEnded Job = job_201307131407_0003Ended Job = -670956236, job is filtered out (removed at runtime).Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10001Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveTable default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000MapReduce Jobs Launched:Job 0: Map: 1   Cumulative CPU: 1.13 sec   HDFS Read: 272 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 130 msecOKTime taken: 20.13 secondshive> select * from t_hive;OK16      2       361      12      1341      2       3117      21      371      2       311       12      3411      2       34Time taken: 0.109 seconds

仅复制表结构不导数据

hive> CREATE TABLE t_hive3 LIKE t_hive;hive> select * from t_hive3;OKTime taken: 0.077 seconds

从MySQL数据库导入数据
我们将在介绍Sqoop时讲。

5. 数据导出

从HDFS复制到HDFS其他位置

~ hadoop fs -cp /user/hive/warehouse/t_hive /~ hadoop fs -ls /t_hiveFound 1 items-rw-r--r--   1 cos supergroup         56 2013-07-16 10:41 /t_hive/000000_0~ hadoop fs -cat /t_hive/000000_016236112134123117213712311123411234

通过Hive导出到本地文件系统

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/t_hive' SELECT * FROM t_hive;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0005Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:46:24,774 Stage-1 map = 0%,  reduce = 0%2013-07-16 10:46:30,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:31,833 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:32,844 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:33,856 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:34,865 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:35,873 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:36,884 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job = job_201307131407_0005Copying data to local directory /tmp/t_hiveCopying data to local directory /tmp/t_hive7 Rows loaded to /tmp/t_hiveMapReduce Jobs Launched:Job 0: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 271 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 23.369 seconds#查看本地操作系统hive> ! cat /tmp/t_hive/000000_0;hive> 16236112134123117213712311123411234

6. Hive查询HiveQL

注:以下代码将去掉map,reduce的日志输出部分。

普通查询:排序,列别名,嵌套子查询

hive> FROM (    >   SELECT b,c as c2 FROM t_hive    > ) t    > SELECT t.b, t.c2    > WHERE b>2    > LIMIT 2;12      1321      3

连接查询:JOIN

hive> SELECT t1.a,t1.b,t2.a,t2.b    > FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a    > WHERE t1.c>10;1       12      1       1211      2       11      241      2       41      261      12      61      1271      2       71      2

聚合查询1:count, avg

hive> SELECT count(*), avg(a) FROM t_hive;7       31.142857142857142

聚合查询2:count, distinct

hive> SELECT count(DISTINCT b) FROM t_hive;3

聚合查询3:GROUP BY, HAVING

#GROUP BYhive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c16.0    2       356.0    2       6211.0    2       3461.0    12      131.0     12      3417.0    21      3#HAVINGhive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>3056.0    2       6211.0    2       341.0     12      34

7. Hive视图

Hive视图和数据库视图的概念是一样的,我们还以t_hive为例。

hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;hive> select * from v_hive;41      271      21       1211      2

删除视图

hive> DROP VIEW IF EXISTS v_hive;OKTime taken: 0.495 seconds

8. Hive分区表

分区表是数据库的基本概念,但很多时候数据量不大,我们完全用不到分区表。Hive是一种OLAP数据仓库软件,涉及的数据量是非常大的,所以分区表在这个场景就显得非常重要!!

下面我们重新定义一个数据表结构:t_hft

创建数据

~ vi /home/cos/demo/t_hft_20130627.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45~ vi /home/cos/demo/t_hft_20130628.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45

创建数据表

DROP TABLE IF EXISTS t_hft;CREATE TABLE t_hft(SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

创建分区数据表
根据业务:按天和股票ID进行分区设计

DROP TABLE IF EXISTS t_hft;CREATE TABLE t_hft(SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) PARTITIONED BY (tradeDate INT)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

导入数据

#20130627hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130627.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627);Copying data from file:/home/cos/demo/t_hft_20130627.csvCopying file: file:/home/cos/demo/t_hft_20130627.csvLoading data to table default.t_hft partition (tradedate=20130627)#20130628hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130628.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628);Copying data from file:/home/cos/demo/t_hft_20130628.csvCopying file: file:/home/cos/demo/t_hft_20130628.csvLoading data to table default.t_hft partition (tradedate=20130628)

查看分区表

hive> SHOW PARTITIONS t_hft;tradedate=20130627tradedate=20130628Time taken: 0.082 seconds

查询数据

hive> select * from t_hft where securityid='000001';000001  092023  9.76    20130627000001  092008  9.7     20130627000001  092059  9.45    20130627000001  092023  9.76    20130628000001  092008  9.7     20130628000001  092059  9.45    20130628hive> select * from t_hft where tradedate=20130627 and PreClosePx<9;000002  091947  8.99    20130627000005  091514  2.2     20130627

到此,关于" Hive的安装及使用方法"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!

数据 查询 分区表 输出 方法 变量 数据库 环境 视图 配置 使用方法 分布式 文件 系统 结构 缓存 学习 交互式 数据表 资源 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 视易系统网关服务器的作用 腾讯云服务器学生四年 宏图服务器逍遥三国 华为无线网络技术解决副总裁 世界互联网大会方正科技 淮安专业联想服务器技术指导 通信运营商疫情期间网络安全 ofbiz 查询数据库 网络安全三无七边是什么 数据库隐藏后文件打开不了 合肥计算机网络技术培训 揽众网络安全手抄报 网络安全性要求 郧阳区公司软件开发售后保障 湖北医院数显钟服务器 公安厅外聘网络安全专家 湖北移动云服务器云主机 江苏扬帆公司网络安全 哪一个数据库有知识脉络检索 工行软件开发 笔试 苹果手机商店链接不上服务器 各种服务器架设方式性能比较 网络安全非常赚钱吗 奉贤区上门软件开发报价表 开启服务器管理器 非法使用计算机网络安全罪 网络安全nsx 初中网络安全课件 江西网络技术基础期末考试 软件开发功能点价格
0