千家信息网

使用happybase访问HBase出现Broken pipe问题---两个“惊天”大bug

发表于:2025-12-02 作者:千家信息网编辑
千家信息网最后更新 2025年12月02日,来源使用happybase通过thrift接口向HBase读取、写入数据时,出现Broken pipe的错误。排查步骤:1、查看hbase的日志:Java HotSpot(TM) 64-Bit Ser
千家信息网最后更新 2025年12月02日使用happybase访问HBase出现Broken pipe问题---两个“惊天”大bug

来源
使用happybase通过thrift接口向HBase读取、写入数据时,出现Broken pipe的错误。排查步骤:

1、查看hbase的日志:
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release17/05/12 18:08:41 INFO util.VersionInfo: HBase 1.2.0-cdh6.10.117/05/12 18:08:41 INFO util.VersionInfo: Source code repository file:///data/jenkins/workspace/generic-package-centos64-7-0/topdir/BUILD/hbase-1.2.0-cdh6.10.1 revision=Unknown17/05/12 18:08:41 INFO util.VersionInfo: Compiled by jenkins on Mon Mar 20 02:46:09 PDT 201717/05/12 18:08:41 INFO util.VersionInfo: From source with checksum c6d9864e1358df7e7f39d39a40338b4e17/05/12 18:08:41 INFO thrift.ThriftServerRunner: Using default thrift server type17/05/12 18:08:41 INFO thrift.ThriftServerRunner: Using thrift server type threadpool17/05/12 18:08:42 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties17/05/12 18:08:42 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).17/05/12 18:08:42 INFO impl.MetricsSystemImpl: HBase metrics system started17/05/12 18:08:42 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog17/05/12 18:08:42 INFO http.HttpRequestLog: Http request log for http.requests.thrift is not defined17/05/12 18:08:42 INFO http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)17/05/12 18:08:42 INFO http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter)17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context thrift17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs17/05/12 18:08:42 INFO http.HttpServer: Jetty bound to port 909517/05/12 18:08:42 INFO mortbay.log: jetty-6.1.26.cloudera.417/05/12 18:08:42 WARN mortbay.log: Can't reuse /tmp/Jetty_0_0_0_0_9095_thrift____.vqpz9l, using /tmp/Jetty_0_0_0_0_9095_thrift____.vqpz9l_512017503248018505817/05/12 18:08:43 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:909517/05/12 18:08:43 INFO thrift.ThriftServerRunner: starting TBoundedThreadPoolServer on /0.0.0.0:9090 with readTimeout 300000ms; min worker threads=128, max worker threads=1000, max queued requests=1000.../05/08 15:05:51 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x645132bf connecting to ZooKeeper ensemble=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:218117/05/08 15:05:51 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181 sessionTimeout=60000 watcher=hconnection-0x64513-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181, baseZNode=/hbase17/05/08 15:05:51 INFO zookeeper.ClientCnxn: Opening socket connection to server cdh-slave3/192.168.10.219:2181. Will not attempt to authenticate using SASL (unknown error)17/05/08 15:05:51 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.10.23:43170, server: cdh-slave3/192.168.10.219:218117/05/08 15:05:51 INFO zookeeper.ClientCnxn: Session establishment complete on server cdh-slave3/192.168.10.219:2181, sessionid = 0x35bd74a77802148, negotiated timeout = 60000[caitinggui@cdh-master-slave1 example]$ 17/05/08 15:32:50 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x35bd74a7780214817/05/08 15:32:51 INFO zookeeper.ZooKeeper: Session: 0x35bd74a77802148 closed17/05/08 15:32:51 INFO zookeeper.ClientCnxn: EventThread shut down17/05/08 15:38:53 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xb876351 connecting to ZooKeeper ensemble=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:218117/05/08 15:38:53 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181 sessionTimeout=60000 watcher=hconnection-0xb8763510x0, quorum=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181, baseZNode=/hbase17/05/08 15:38:53 INFO zookeeper.ClientCnxn: Opening socket connection to server cdh-master-slave1/192.168.10.23:2181. Will not attempt to authenticate using SASL (unknown error)17/05/08 15:38:53 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.10.23:35526, server: cdh-master-slave1/192.168.10.23:218117/05/08 15:38:53 INFO zookeeper.ClientCnxn: Session establishment complete on server cdh-master-slave1/192.168.10.23:2181, sessionid = 0x15ba3ddc6cc90d4, negotiated timeout = 60000

初步推断是hbase设置了某个超时时间,导致连接断开

2、查看官方文档,但是没有发现很有意义的timeout参数
3、Google相似问题

查看相似的内容:

Uploaded image for project: 'HBase' HBaseHBASE-14926Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck readingAgile Board ExportDetailsType: BugStatus:RESOLVEDPriority: MajorResolution: FixedAffects Version/s:2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16Fix Version/s:2.0.0, 1.2.0, 1.3.0, 0.98.17Component/s:ThriftLabels:NoneHadoop Flags:ReviewedRelease Note: Adds a timeout to server read from clients. Adds new configs hbase.thrift.server.socket.read.timeout for setting read timeout on server socket in milliseconds. Default is 60000;DescriptionThrift server is hung. All worker threads are doing this:"thrift-worker-0" daemon prio=10 tid=0x00007f0bb95c2800 nid=0xf6a7 runnable [0x00007f0b956e0000]   java.lang.Thread.State: RUNNABLE        at java.net.SocketInputStream.socketRead0(Native Method)        at java.net.SocketInputStream.read(SocketInputStream.java:152)        at java.net.SocketInputStream.read(SocketInputStream.java:122)        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)        - locked <0x000000066d859490> (a java.io.BufferedInputStream)        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)        at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)        at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)        at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)        at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)        at java.lang.Thread.run(Thread.java:745)They never recover.I don't have client side logs.We've been here before: HBASE-4967 "connected client thrift sockets should have a server side read timeout" but this patch only got applied to fb branch (and thrift has changed since then)ps:来源https://issues.apache.org/jira/browse/HBASE-14926
4、Google "hbase.thrift.server.socket.read.timeout"

可以看到一个网页内容:

问题背景测试环境是三台服务器搭建的Hadoop分布式环境。Hadoop版本是:hadoop-2.7.3;Hbase-1.2.4; zookeeper-3.4.9。 使用thrift c++接口向hbase中写入数据,每次都是刚开始写入正常,过一段时间就开始报错。 但之前使用的hbase-0.94.27版本就没遇到过该问题,配置也相同,一直用的好好地。thrift接口报错解决办法通过抓包可以看出,hbase server响应了RST包,导致连接中断。 通过 bin/hbase thrift start -threadpool命令可以readTimeout的设置为60s。thriftpool经过验证却是和这个设置有关,配置中没有配置过该项,通过查看代码发现60s是默认值,如果没有配置即按照以该值为准。因此在conf/hbase-site.xml中添加上配置即可:         hbase.thrift.server.socket.read.timeout         6000000         eg:milisecondps:来源http://blog.csdn.net/wwlhz/article/details/56012053

所以添加参数后,重启hbase thrift,发现问题解决

5、查看源码,可以看到
#https://github.com/apache/hbase/blob/master/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java...  public static final String THRIFT_SERVER_SOCKET_READ_TIMEOUT_KEY =    "hbase.thrift.server.socket.read.timeout";  public static final int THRIFT_SERVER_SOCKET_READ_TIMEOUT_DEFAULT = 60000;...      int readTimeout = conf.getInt(THRIFT_SERVER_SOCKET_READ_TIMEOUT_KEY,          THRIFT_SERVER_SOCKET_READ_TIMEOUT_DEFAULT);      TServerTransport serverTransport = new TServerSocket(          new TServerSocket.ServerSocketTransportArgs().              bindAddr(new InetSocketAddress(listenAddress, listenPort)).              backlog(backlog).              clientTimeout(readTimeout));

问题解决~~~

6、然而问题解决了吗?

实际上还是有问题,一段时间发现连续scan大概20多分钟后,连接又被断开了,又是一次艰难的搜索,发现是hbase该版本自带的问题,它将所有连接(不管有没有在使用)都默认为idle的状态,然后有个hbase.thrift.connection.max-idletime的配置,所以我将此项配置为31104000(一年),如果是在CDH中,应该在管理页面配置,如图:

遇到问题一般步骤:
技术进步型:
1、查看日志,查看报错的地方,初步定位问题
2、查看官方文档
3、Google相似的问题,或者查看源码去定位问题

快速解决问题型:
1、查看日志,查看报错的地方,初步定位问题
2、Google相似问题
3、查看官方文档,或者查看源码


参考:

  • [1]HBase thrift/thrift2 使用指南
问题 配置 相似 官方 接口 文档 日志 时间 来源 源码 版本 定位 内容 参数 地方 数据 步骤 环境 相同 艰难 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 服务器进入计算机管理 计算机网络安全技术b卷 写业务和服务器代码哪个语言快 泸州软件开发要多少钱 谁对网络安全威胁更大 打开服务器管理工具的命令 斑马打印机数据库字段长度 中国电子学会网络安全标准 软件开发完成后怎么上线呢 网络安全的结束语怎么写 光伏电站网络安全演练记录 数据库服务器主要功能 浙江宁波浪潮服务器云主机 建筑行业企业软件开发 计算机人网络安全专业 南京百信服务器订购 apex什么服务器最好打排位 网络安全保卫大队民警十九大报告 刺客信条3显示无法连接服务器 郑州股票软件开发哪个好 网络安全例子 英语 网络安全与执法专业录取 服务器硬盘接口不足怎么解决 高性能韩国服务器多少钱 服务器优惠 数据库ps协议 excel数据库怎么设计 辽宁中文版服务器租用云空间 服务器副管理员权限 计算机网络技术可以先自学什么
0