千家信息网

千家信息网

请输入关键字词

热门搜索排行

最新搜索排行

导航：首页 > 数据库 >

HBase 之HFileOutputFormat

发表于：2025-11-07 作者：千家信息网编辑

千家信息网最后更新 2025年11月07日，hadoop mr 输出需要导入hbase的话最好先输出成HFile格式，再导入到HBase,因为HFile是HBase的内部存储格式，所以导入效率很高,下面是一个示例1. 创建HBase表t1h

千家信息网最后更新 2025年11月07日HBase 之HFileOutputFormat

hadoop mr 输出需要导入hbase的话最好先输出成HFile格式，再导入到HBase,因为HFile是HBase的内部存储格式，所以导入效率很高,下面是一个示例
1. 创建HBase表t1

hbase(main):157:0* create 't1','f1'
0 row(s) in 1.3280 seconds
hbase(main):158:0> scan 't1'
ROW COLUMN+CELL
0 row(s) in 1.2770 seconds

2.写MR作业
HBaseHFileMapper.java

package com.test.hfile;
import java.io.IOException;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class HBaseHFileMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Text> {
private ImmutableBytesWritable immutableBytesWritable = new ImmutableBytesWritable();
@Override
protected void map(LongWritable key, Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException {
immutableBytesWritable.set(Bytes.toBytes(key.get()));
context.write(immutableBytesWritable, value);
}
}

HBaseHFileReducer.java

package com.test.hfile;
import java.io.IOException;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class HBaseHFileReducer extends Reducer<ImmutableBytesWritable, Text, ImmutableBytesWritable, KeyValue> {
protected void reduce(ImmutableBytesWritable key, Iterable<Text> values,
Context context)
throws IOException, InterruptedException {
String value="";
while(values.iterator().hasNext())
{
value = values.iterator().next().toString();
if(value != null && !"".equals(value))
{
KeyValue kv = createKeyValue(value.toString());
if(kv!=null)
context.write(key, kv);
}
}
}
// str格式为row:family:qualifier:value 简单模拟下
private KeyValue createKeyValue(String str)
{
String[] strstrs = str.split(":");
if(strs.length<4)
return null;
String row=strs[0];
String family=strs[1];
String qualifier=strs[2];
String value=strs[3];
return new KeyValue(Bytes.toBytes(row),Bytes.toBytes(family),Bytes.toBytes(qualifier),System.currentTimeMillis(), Bytes.toBytes(value));
}
}

HbaseHFileDriver.java

package com.test.hfile;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class HbaseHFileDriver {
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Job job = new Job(conf, "testhbasehfile");
job.setJarByClass(HbaseHFileDriver.class);
job.setMapperClass(com.test.hfile.HBaseHFileMapper.class);
job.setReducerClass(com.test.hfile.HBaseHFileReducer.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Text.class);
// 偷懒，直接写死在程序里了，实际应用中不能这样, 应从命令行获取
FileInputFormat.addInputPath(job, new Path("/home/yinjie/input"));
FileOutputFormat.setOutputPath(job, new Path("/home/yinjie/output"));
Configuration HBASE_CONFIG = new Configuration();
HBASE_CONFIG.set("hbase.zookeeper.quorum", "localhost");
HBASE_CONFIG.set("hbase.zookeeper.property.clientPort", "2181");
HBaseConfiguration cfg = new HBaseConfiguration(HBASE_CONFIG);
String tableName = "t1";
HTable htable = new HTable(cfg, tableName);
HFileOutputFormat.configureIncrementalLoad(job, htable);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

/home/yinjie/input目录下有一个hbasedata.txt文件,内容为

[root@localhost input]# cat hbasedata.txt
r1:f1:c1:value1
r2:f1:c2:value2
r3:f1:c3:value3

将作业打包，我的到处路径为/home/yinjie/job/hbasetest.jar
提交作业到hadoop运行:

[root@localhost job]# hadoop jar /home/yinjie/job/hbasetest.jar com.test.hfile.HbaseHFileDriver -libjars /home/yinjie/hbase-0.90.3/hbase-0.90.3.jar

作业运行完毕后查看下输出目录:

[root@localhost input]# hadoop fs -ls /home/yinjie/output
Found 2 items
drwxr-xr-x - root supergroup 0 2011-08-28 21:02 /home/yinjie/output/_logs
drwxr-xr-x - root supergroup 0 2011-08-28 21:03 /home/yinjie/output/f1

OK, 已经生成以列族f1命名的文件夹了。
接下去使用Bulk Load将数据导入到HBbase

[root@localhost job]# hadoop jar /home/yinjie/hbase-0.90.3/hbase-0.90.3.jar completebulkload /home/yinjie/output t1

导入完毕，查询hbase表t1进行验证

hbase(main):166:0> scan 't1'
ROW COLUMN+CELL
r1 column=f1:c1, timestamp=1314591150788, value=value1
r2 column=f1:c2, timestamp=1314591150814, value=value2
r3 column=f1:c3, timestamp=1314591150815, value=value3
3 row(s) in 0.0210 seconds

数据已经导入!

作业格式输出数据文件目录运行下有内容命令实际效率文件夹最好示例程序路径存储应用查询数据库的安全要保护哪些东西数据库安全各自的含义是什么生产安全数据库录入数据库的安全性及管理数据库安全策略包含哪些海淀数据库安全审计系统建立农村房屋安全信息数据库易用的数据库客户端支持安全管理连接数据库失败ssl安全错误数据库的锁怎样保障安全查看数据库中所有集合代码适合自学的计算机网络技术中科创达软件开发累吗青岛中国网络安全年会简阳市总工会网络安全用知网查重会进入数据库吗数据库外键怎么只显示数字十堰极捷网络技术四川惠普服务器虚拟化哪家便宜县国家网络安全宣传周青浦区项目数据库服务收费标准山西多功能软件开发厂家报价简述网络安全设计原则有哪些云数据库上云技术是什么深信服安全认证服务器网络安全与技术文献 hp服务器proc 倍八数列解析软件开发华为服务器设置命令 outlook怎么配置接收邮件服务器服务器内存损坏 lol如何登陆服务器以太网公用网络安全设置设计软件开发的作用诚信科技软件开发有限公司构建蛋白质二级数据库的方法国家玉米种子数据库中心大棒68 广东正规软件开发服务费中普金服互联网科技清远无限软件开发报价行情

相关文章