千家信息网

由于OOM导致锁没有释放的定位问题是怎么就觉得

发表于:2025-12-02 作者:千家信息网编辑
千家信息网最后更新 2025年12月02日,由于OOM导致锁没有释放的定位问题是怎么就觉得,针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。最近发现业务上返回慢,并且feign的f
千家信息网最后更新 2025年12月02日由于OOM导致锁没有释放的定位问题是怎么就觉得

由于OOM导致锁没有释放的定位问题是怎么就觉得,针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。

最近发现业务上返回慢,并且feign的fallback被触发了。查看日志,发现触发了重试,是什么触发的重试呢,通过异常堆栈发现:

Caused by: feign.RetryableException: connect timed out executing GET http://test-service/test-api        at feign.FeignException.errorExecuting(FeignException.java:65) ~[feign-core-9.7.0.jar!/:?]        at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:105) ~[feign-core-9.7.0.jar!/:?]        at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:77) ~[feign-core-9.7.0.jar!/:?]        at feign.hystrix.HystrixInvocationHandler$1.run(HystrixInvocationHandler.java:107) ~[feign-hystrix-9.7.0.jar!/:?]        at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:302) ~[hystrix-core-1.5.18.jar!/:1.5.18]        at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:298) ~[hystrix-core-1.5.18.jar!/:1.5.18]        at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46) ~[rxjava-1.3.8.jar!/:1.3.8]        ... 27 moreCaused by: java.net.SocketTimeoutException: connect timed out        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[?:?]        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[?:?]        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[?:?]        at java.net.Socket.connect(Socket.java:591) ~[?:?]        at sun.net.NetworkClient.doConnect(NetworkClient.java:177) ~[?:?]        at sun.net.www.http.HttpClient.openServer(HttpClient.java:474) ~[?:?]        at sun.net.www.http.HttpClient.openServer(HttpClient.java:569) ~[?:?]        at sun.net.www.http.HttpClient.(HttpClient.java:242) ~[?:?]        at sun.net.www.http.HttpClient.New(HttpClient.java:341) ~[?:?]        at sun.net.www.http.HttpClient.New(HttpClient.java:362) ~[?:?]        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1248) ~[?:?]        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187) ~[?:?]        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081) ~[?:?]        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015) ~[?:?]        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587) ~[?:?]        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515) ~[?:?]        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527) ~[?:?]        at feign.Client$Default.convertResponse(Client.java:150) ~[feign-core-9.7.0.jar!/:?]        at feign.Client$Default.execute(Client.java:72) ~[feign-core-9.7.0.jar!/:?]        at org.springframework.cloud.sleuth.instrument.web.client.feign.TracingFeignClient.execute(TracingFeignClient.java:91) ~[spring-cloud-sleuth-core-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]        at org.springframework.cloud.sleuth.instrument.web.client.feign.LazyTracingFeignClient.execute(LazyTracingFeignClient.java:55) ~[spring-cloud-sleuth-core-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]        at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer$1.doWithRetry(RetryableFeignLoadBalancer.java:103) ~[spring-cloud-openfeign-core-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]        at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer$1.doWithRetry(RetryableFeignLoadBalancer.java:88) ~[spring-cloud-openfeign-core-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]        at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287) ~[spring-retry-1.2.4.RELEASE.jar!/:?]        at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:180) ~[spring-retry-1.2.4.RELEASE.jar!/:?]        at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute(RetryableFeignLoadBalancer.java:88) ~[spring-cloud-openfeign-core-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]        at org.springframework.cloud.openfeign.ribbon.RetryableFeignLoadBalancer.execute(RetryableFeignLoadBalancer.java:54) ~[spring-cloud-openfeign-core-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]        at com.netflix.client.AbstractLoadBalancerAwareClient$1.call(AbstractLoadBalancerAwareClient.java:104) ~[ribbon-loadbalancer-2.2.5.jar!/:2.2.5]        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:303) ~[ribbon-loadbalancer-2.2.5.jar!/:2.2.5]        at com.netflix.loadbalancer.reactive.LoadBalancerCommand$3$1.call(LoadBalancerCommand.java:287) ~[ribbon-loadbalancer-2.2.5.jar!/:2.2.5]        at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:231) ~[rxjava-1.3.8.jar!/:1.3.8]        at rx.internal.util.ScalarSynchronousObservable$3.call(ScalarSynchronousObservable.java:228) ~[rxjava-1.3.8.jar!/:1.3.8]        at rx.Observable.unsafeSubscribe(Observable.java:10327) ~[rxjava-1.3.8.jar!/:1.3.8]

发现调用微服务test-service连接超时,注意是连接超时,而不是读取超时。接下来,首先查看ribbon连接超时配置,看是否配置的连接超时过短,发现是:

#ribbon默认连接超时ribbon.ConnectTimeout=500

500ms建立tcp连接已经不短了,所以配置没有问题。从当前实例手动调用微服务实例的接口,看是否是通的:

curl http://test-service的ip:test-service的端口

发现访问成功,没有阻塞。证明网络连接没有问题,我们来看下netstat网络连接状态:

Active Internet connections (w/o servers)Proto Recv-Q Send-Q Local Address           Foreign Address         State       tcp        0      0 test-5c665cc496-6jkx5:54644 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54666 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54680 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54088 ip-10-238-8-68.eu-central-1.compute.internal:8211 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54674 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54682 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54652 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp       32      0 test-5c665cc496-6jkx5:47092 244-213-201-91.test.com:https CLOSE_WAIT  tcp        0      0 test-5c665cc496-6jkx5:54662 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54624 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp       32      0 test-5c665cc496-6jkx5:47098 244-213-201-91.test.com:https CLOSE_WAIT  tcp        0      0 test-5c665cc496-6jkx5:54672 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54630 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:58758 ip-10-238-9-71.eu-central-1.compute.internal:mysql ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:8251 10-238-8-117.api-gateway.test1.svc.cluster.local:43132 FIN_WAIT2   tcp        0      0 test-5c665cc496-6jkx5:54648 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:58778 ip-10-238-9-71.eu-central-1.compute.internal:mysql ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54646 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54628 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54650 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54632 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:54638 ip-10-238-8-119.eu-central-1.compute.internal:6379 ESTABLISHED tcp        0      0 test-5c665cc496-6jkx5:59434 ip-10-238-9-71.eu-central-1.compute.internal:mysql ESTABLISHED tcp       32      0 test-5c665cc496-6jkx5:47104 244-213-201-91.test.com:https CLOSE_WAIT  tcp        1      0 test-5c665cc496-6jkx5:54146 ip-10-238-8-68.eu-central-1.compute.internal:8211 CLOSE_WAIT  tcp       32      0 test-5c665cc496-6jkx5:47100 244-213-201-91.test.com:https CLOSE_WAIT  tcp       32      0 test-5c665cc496-6jkx5:47106 244-213-201-91.test.com:https CLOSE_WAIT

发现:

  1. 并没有建立与现在service-test这个微服务任何实例ip的链接

  2. 并没有很多网络连接,也并没有很多Timed_waiting还有close_wait

猜想,可能是调用的service-test的ip地址不是最新的。

现在该Athas登场了,启动Arthas,我们查看下feign调用的真实ip: 由于我们使用了sleuth,所以这里用sleuth监控的feigh client去看调用ip:

watch org.springframework.cloud.sleuth.instrument.web.client.feign.TracingFeignClient execute params[0].url()

对应的代码是:

package org.springframework.cloud.sleuth.instrument.web.client.feign;final class TracingFeignClient implements Client {        @Override public Response execute(Request request, Request.Options options)                        throws IOException {                            //.....                        }}

通过观察,我们发现,确实是以前的ip。 那么为什么没有更新呢,我们来查看下EurekaClient相关的代码,参考我另一篇文章:Spring Cloud Eureka 全解 (4) - 核心流程-服务与实例列表获取详解. 我们来看下PollingServerListUpdater这个类,负责更新的线程池是:

int coreSize = poolSizeProp.get();ThreadFactory factory = (new ThreadFactoryBuilder())        .setNameFormat("PollingServerListUpdater-%d")        .setDaemon(true)        .build();_serverListRefreshExecutor = new ScheduledThreadPoolExecutor(coreSize, factory);

谢天谢地,这个线程是有名字的。我们通过Arthas看下线程列表:

[arthas@24]$ threadthreadThreads Total: 736, NEW: 0, RUNNABLE: 81, BLOCKED: 0, WAITING: 634, TIMED_WAITING: 21, TERMINATED: 0                                                            ID     NAME                GROUP        PRIORI STATE %CPU   TIME   INTER DAEMON 11     Log4j2-TF-2-AsyncLo main         5      TIMED 39     138:46 false true   17     Log4j2-TF-5-AsyncLo main         5      TIMED 37     137:27 false true   1045   as-command-execute- system       10     RUNNA 19     0:0    false true   1027   AsyncAppender-Worke system       9      WAITI 0      0:0    false true   68     AsyncResolver-boots main         5      TIMED 0      0:0    false true   264    AsyncResolver-boots main         5      WAITI 0      0:0    false true   1025   Attach Listener     system       9      RUNNA 0      0:0    false true   9      Common-Cleaner      InnocuousThr 8      TIMED 0      0:0    false true   151    DataPublisher       main         5      TIMED 0      0:2    false true   107    DestroyJavaVM       main         5      RUNNA 0      0:37   false false  69     DiscoveryClient-0   main         5      TIMED 0      0:1    false true   70     DiscoveryClient-1   main         5      WAITI 0      0:1    false true   748    DiscoveryClient-Cac main         5      WAITI 0      0:9    false true   751    DiscoveryClient-Hea main         5      WAITI 0      0:7    false true   71     DiscoveryClient-Ins main         5      TIMED 0      0:0    false true   24     Druid-ConnectionPoo main         5      WAITI 0      0:0    false true   21     Druid-ConnectionPoo main         5      WAITI 0      0:0    false true   25     Druid-ConnectionPoo main         5      TIMED 0      0:0    false true   22     Druid-ConnectionPoo main         5      TIMED 0      0:0    false true   20     Druid-ConnectionPoo main         5      TIMED 0      0:0    false true   67     Eureka-JerseyClient main         5      WAITI 0      0:0    false true   3      Finalizer           system       8      WAITI 0      0:0    false true   119    ForkJoinPool.common main         5      WAITI 0      0:9    false true   30     I/O dispatcher 1    main         5      RUNNA 0      0:2    false false  155    NFLoadBalancer-Ping main         5      TIMED 0      0:0    false true   150    PollingServerListUp main         5      WAITI 0      0:0    false true   157    PollingServerListUp main         5      WAITI 0      0:0    false true   2      Reference Handler   system       10     RUNNA 0      0:0    false true   146    RibbonApacheHttpCli main         5      TIMED 0      0:0    false true   135    RxComputationSchedu main         5      TIMED 0      1:44   false true   132    RxIoScheduler-1 (Ev main         5      TIMED 0      0:0    false true   4      Signal Dispatcher   system       9      RUNNA 0      0:0    false true   114    SimplePauseDetector main         5      TIMED 0      2:10   false true   115    SimplePauseDetector main         5      TIMED 0      2:10   false true   116    SimplePauseDetector main         5      TIMED 0      2:12   false true

发现关键字PollingServerListUp有两个线程,150和157,分别看看他们在干什么:

[arthas@24]$ thread 150thread 150"PollingServerListUpdater-0" Id=150 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@778af81b    at java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method)    -  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@778af81b    at java.base@11.0.4/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)    at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081)    at java.base@11.0.4/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1170)    at java.base@11.0.4/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)    at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)    at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)    at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)    at java.base@11.0.4/java.lang.Thread.run(Thread.java:834)Affect(row-cnt:0) cost in 322 ms.[arthas@24]$ thread 157thread 157"PollingServerListUpdater-1" Id=157 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3adcda2    at java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method)    -  waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3adcda2    at java.base@11.0.4/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)    at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)    at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)    at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)    at java.base@11.0.4/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)    at com.netflix.loadbalancer.BaseLoadBalancer.setServersList(BaseLoadBalancer.java:475)    at com.netflix.loadbalancer.DynamicServerListLoadBalancer.setServersList(DynamicServerListLoadBalancer.java:156)    at com.netflix.loadbalancer.DynamicServerListLoadBalancer.updateAllServerList(DynamicServerListLoadBalancer.java:267)    at com.netflix.loadbalancer.DynamicServerListLoadBalancer.updateListOfServers(DynamicServerListLoadBalancer.java:250)    at com.netflix.loadbalancer.DynamicServerListLoadBalancer$1.doUpdate(DynamicServerListLoadBalancer.java:62)    at com.netflix.loadbalancer.PollingServerListUpdater$1.run(PollingServerListUpdater.java:116)    at java.base@11.0.4/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)    at java.base@11.0.4/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)    at java.base@11.0.4/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)    at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)    at java.base@11.0.4/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)    at java.base@11.0.4/java.lang.Thread.run(Thread.java:834)    Number of locked synchronizers = 1    - java.util.concurrent.ThreadPoolExecutor$Worker@6ab1c5f9

发现PollingServerListUpdater-1一直等待获取锁,看来问题就在这里了,查看对应的Ribbon代码,发现:

  1. PollingServerListUpdater-1需要获取allServerLock的写锁

  2. allServerLock的读锁,只有runPinger(对于每个实例ping,需要读取实例列表)获取。

那么是不是Ping出了问题,我们看下对应的获取锁的代码BaseLoadBalancer

public void runPinger() throws Exception {    //省略无用代码    allLock = allServerLock.readLock();    allLock.lock();    allServers = allServerList.toArray(new Server[allServerList.size()]);    allLock.unlock();

我们看到,这里的代码,并没有用try{lock} finally {unlock}的套路,如果中间代码异常,则会锁不能释放。 并且这里是重入锁,lock多少次就要unlock多少次,少一次,其他线程都获取不到。

allServers = allServerList.toArray(new Server[allServerList.size()]);并没有涉及任何IO,只是数据变换,也没有空指针异常。猜想是发生了OOM异常,导致内存没有分配。检查日志,果然发现了OOM。

这件事告诉我们,对于锁,一定要try{lock} finally {unlock}。就算代码不会抛出任何异常,发生OOM时,也有可能导致锁不能释放

关于由于OOM导致锁没有释放的定位问题是怎么就觉得问题的解答就分享到这里了,希望以上内容可以对大家有一定的帮助,如果你还有很多疑惑没有解开,可以关注行业资讯频道了解更多相关知识。

问题 代码 实例 线程 网络 服务 配置 定位 日志 更多 篇文章 帮助 更新 猜想 解答 易行 谢天谢地 接下来 简单易行 两个 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 本溪会员报单软件开发 通过哪些数据库可以获取学位论文 软件开发公司你员配备 网络安全方案设计与实施论文 网络技术史 检索效率低的数据库 山西专业软件开发价格 应城学软件开发 社会如何保障公民网络安全 软件开发有什么电脑好 科技互联网生活英语作文 中兴软件开发综合面试会挂么 香港恒生互联网科技指数简介 盗墓笔记服务器进不去 数据库查询姓名与出生日期 怎么根据软件开发脚本 数据库判断成绩 数据库天数加一 考勤管理系统数据库设计 甘肃汇宇网络技术有限公司 加州大学专利用什么数据库 河南河马网络技术怎么样 上海软件开发大连 宝鸡软件开发培训学校 网络安全法开始湿湿的时间 网络安全综合防控体系建设方案 如何设置管家婆数据库密码 国网学堂网络安全法视频 网络安全法考试能做几次 医保网络安全周活动
0