0010-Hive多分隔符支持示例
发表于:2025-12-02 作者:千家信息网编辑
千家信息网最后更新 2025年12月02日,温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。1.问题描述如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:字段分隔符为"@#$"test1@#$test1nam
千家信息网最后更新 2025年12月02日0010-Hive多分隔符支持示例
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
1.问题描述
如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:
字段分隔符为"@#$"
test1@#$test1name@#$test2valuetest2@#$test2name@#$test2valuetest3@#$test3name@#$test4value如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:
| 字段名 | 字段类型 |
|---|---|
| s1 | String |
| s2 | String |
| s3 | String |
2.Hive多分隔符支持
Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
3.实现方式
测试环境说明
测试环境为CDH5.11.1Hive版本为1.1.0操作系统为RedHat6.5
- 操作步骤
1.准备多分隔符文件并装载到HDFS对应目录
[ec2-user@ip-172-31-8-141 ~]$ cat multi_delimiter_test.dattest1@#$test1name@#$test2valuetest2@#$test2name@#$test2valuetest3@#$test3name@#$test4value [ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test[ec2-user@ip-172-31-8-141 ~]$ hadoop dfs -ls /fayson/multi_delimiter_testDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.Found 1 items-rw-r--r-- 3 user_r supergroup 93 2017-08-23 03:24 /fayson/multi_delimiter_test/multi_delimiter_test.dat[ec2-user@ip-172-31-8-141 ~]$2.基于准备好的多分隔符文件建表
create external table multi_delimiter_test(s1 string,s2 string,s3 string)ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@#$")stored as textfile location '/fayson/multi_delimiter_test';3.测试
2: jdbc:hive2://localhost:10000/default> select * from multi_delimiter_test;+--------------------------+--------------------------+--------------------------+--+| multi_delimiter_test.s1 | multi_delimiter_test.s2 | multi_delimiter_test.s3 |+--------------------------+--------------------------+--------------------------+--+| test1 | test1name | test2value || test2 | test2name | test2value || test3 | test3name | test4value |+--------------------------+--------------------------+--------------------------+--+2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test;INFO : Ended Job = job_1503469952834_0006INFO : MapReduce Jobs Launched:INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.25 sec HDFS Read: 6755 HDFS Write: 2 SUCCESSINFO : Total MapReduce CPU Time Spent: 3 seconds 250 msecINFO : Completed executing command(queryId=hive_20170823041818_ce58aae2-e6db-4eed-b6af-652235a6e66a); Time taken: 33.286 secondsINFO : OK+------+--+| _c0 |+------+--+| 3 |+------+--+1 row selected (33.679 seconds)2: jdbc:hive2://localhost:10000/def4.常见问题
1.执行count查询时报错
- 异常日志
通过beeline执行count查询时报错
2: jdbc:hive2://localhost:10000/default> select count(*) from multi_delimiter_test;INFO : Compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)INFO : Completed compiling command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 0.291 secondsINFO : Executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97): select count(*) from multi_delimiter_testINFO : Query ID = hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97INFO : Total jobs = 1INFO : Launching Job 1 out of 1INFO : Starting task [Stage-1:MAPRED] in serial modeINFO : Number of reduce tasks determined at compile time: 1INFO : In order to change the average load for a reducer (in bytes):INFO : set hive.exec.reducers.bytes.per.reducer=INFO : In order to limit the maximum number of reducers:INFO : set hive.exec.reducers.max=INFO : In order to set a constant number of reducers:INFO : set mapreduce.job.reduces=INFO : number of splits:1INFO : Submitting tokens for job: job_1503469952834_0002INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/ip-172-31-8-141.ap-southeast-1.compute.internal@CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1503475160778, maxDate=1504079960778, sequenceNumber=27, masterKeyId=9)INFO : The url to track the job: http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/INFO : Starting Job = job_1503469952834_0002, Tracking URL = http://ip-172-31-9-186.ap-southeast-1.compute.internal:8088/proxy/application_1503469952834_0002/INFO : Kill Command = /opt/cloudera/parcels/CDH-5.10.2-1.cdh6.10.2.p0.5/lib/hadoop/bin/hadoop job -kill job_1503469952834_0002INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1INFO : 2017-08-23 03:59:32,039 Stage-1 map = 0%, reduce = 0%INFO : 2017-08-23 04:00:08,106 Stage-1 map = 100%, reduce = 100%ERROR : Ended Job = job_1503469952834_0002 with errorsERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskINFO : MapReduce Jobs Launched:INFO : Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAILINFO : Total MapReduce CPU Time Spent: 0 msecINFO : Completed executing command(queryId=hive_20170823035959_f1b11a9b-757d-4d9b-b8a7-6d4ab1c00a97); Time taken: 48.737 secondsError: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2) 使用Hive的shell操作报错如下
Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 moreCaused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 moreCaused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147) ... 22 moreCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116) ... 22 moreCaused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105) at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:140) at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297) ... 24 moreFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTaskMapReduce Jobs Launched:Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAILTotal MapReduce CPU Time Spent: 0 ms- 问题原因分析
org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe类是hive-contrib.jar包里。
在执行非聚合类操作查询时,sql能正常执行,在进行聚合类函数操作时报错,说明在执行MapReduce任务时缺少jar依赖包;MapReduce属于yarn作业,所以yarn运行环境缺少hive-contrib.jar的依赖包。
- 解决方法
在CDH集群的所有节点一下操作,将hive-contrib-1.1.0-cdh6.10.2.jar包拷贝到yarn的lib目录下
sudo scp -r /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib-1.1.0-cdh6.10.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/ 重新运行count语句,执行成功
醉酒鞭名马,少年多浮夸! 岭南浣溪沙,呕吐酒肆下!挚友不肯放,数据玩的花!
温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。
分隔符
字段
数据
文件
时报
环境
问题
查询
测试
支持
温馨
事例
图片
手机
版本
目录
高清
无码
准备
提示
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
网络安全重要性英语作文80字
数据库安全初始化
网络安全信息主题手抄报
红桥区数据网络技术答疑解惑
天津云腾网络技术有限公司
网络安全自检自查工作总结
生存战争服务器一共有多少
软件开发流程数据结构
地图类软件开发公司
成才知心计算机网络技术
新媒体网络技术是啥
数据库主数据文件什么意思
超微服务器机箱
数据库技术在人工管理阶段
长时间搞软件开发
软件开发专业自我评价
徐州企业软件开发推荐咨询
2019年第十届数据库技术
郝永清网络安全
公安涉企网络安全等级保护
dnf数据库在哪里
南瑞继保软件开发在哪个部门
网络技术二级基础知识
沈阳软件开发制作
数据库跨域导入数据
网络技术服务中心实践
网络安全管理报考
谈谈网络安全
apex电妹网络安全
软件开发专业自我评价
