导航：首页 > 互联网科技 >

Hadoop集群问题集

发表于：2025-12-03 作者：千家信息网编辑

千家信息网最后更新 2025年12月03日，1、bigdata is not allowed to impersonate xxx原因：用户代理未生效。检查core-site.xml文件是否正确配置。 hadoop.proxyuser.big

千家信息网最后更新 2025年12月03日Hadoop集群问题集

1、bigdata is not allowed to impersonate xxx

原因：用户代理未生效。检查core-site.xml文件是否正确配置。

  hadoop.proxyuser.bigdata.hosts  *  hadoop.proxyuser.bigdata.groups *

备注hadoop.proxyuser.XXX.hosts 与 hadoop.proxyuser.XXX.groups 中XXX为异常信息中User:* 中的用户名部分

     hadoop.proxyuser.bigdata.hosts     *     The superuser can connect only from host1 and host2 to impersonate a user      hadoop.proxyuser.bigdata.groups     *     Allow the superuser oozie to impersonate any members of the group group1 and group2

增加以上配置后，无需重启集群，可以直接在namenode节点上使用管理员账号重新加载这两个属性值，命令为：

$ hdfs dfsadmin -refreshSuperUserGroupsConfigurationRefresh super user groups configuration successful$ yarn rmadmin -refreshSuperUserGroupsConfiguration 19/01/16 15:02:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8033

如果集群配置了HA，执行如下命令namenode节点全部重新加载：

# hadoop dfsadmin -fs hdfs://ns -refreshSuperUserGroupsConfigurationDEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.Refresh super user groups configuration successful for master/192.168.99.219:9000Refresh super user groups configuration successful for node01/192.168.99.173:9000

2、org.apache.hadoop.hbase.exceptions.ConnectionClosingException

现象：使用beeline、jdbc、python调用hiveserver2时，无法查询、建表等Hbase关联表，

          hive.server2.enable.doAs        false              Setting this property to true will have HiveServer2 execute      Hive operations as the user making the calls to it.

在hive创建Hbase关联表

# Hive中的表名test_tbCREATE TABLE test_tb(key int, value string) # 指定存储处理器STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'# 声明列族,列名WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") # hbase.table.name声明HBase表名,为可选属性默认与Hive的表名相同# hbase.mapred.output.outputtable指定插入数据时写入的表,如果以后需要往该表插入数据就需要指定该值TBLPROPERTIES ("hbase.table.name" = "test_tb", "hbase.mapred.output.outputtable" = "test_tb");

Spark work目录定时清理

使用spark standalone模式执行任务，没提交一次任务，在每个节点work目录下都会生成一个文件夹，命名规则app-xxxxxxx-xxxx。该文件夹下是任务提交时，各节点从主节点下载的程序所需要的资源文件。这些目录每次执行都会生成，且不会自动清理，执行任务过多会将内存撑爆。

每一个application的目录中都是该spark任务运行所需要的依赖包：

export SPARK_WORKER_OPTS="  -Dspark.worker.cleanup.enabled=true  # 是否开启自动清理-Dspark.worker.cleanup.interval=1800  # 清理周期，每隔多长时间清理一次，单位秒-Dspark.worker.cleanup.appDataTtl=3600"  # 保留最近多长时间的数据

zookeeper连接数过多导致hbase、hive无法连接

2019-01-25 03:26:41,627 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /172.17.0.1 - max is 60

根据线上环境修改hbase、hive连接Zookeeper配置

hbase-site.xmlhbase.zookeeper.property.maxClientCnxns

hive-site.xml

hive.server2.thrift.min.worker.threadshive.server2.thrift.max.worker.threadshive.zookeeper.session.timeout

zoo.cfg

# Limits the number of concurrent connections (at the socket level) that a single client, identified by IP addressmaxClientCnxns=200# The minimum session timeout in milliseconds that the server will allow the client to negotiateminSessionTimeout=1000# The maximum session timeout in milliseconds that the server will allow the client to negotiatemaxSessionTimeout=60000

持续更新....

很赞哦！