记一次内存泄漏DUMP分析

自从进入一家创业公司以后,逐渐忙成狗,却无所收获,感觉自身的技术能力用武之地很少,工作生活都在业务逻辑中颠倒。

前些天线上服务内存吃紧,让运维把DUMP拿下来,分析一下聊以自慰。

先来统计一下大对象信息

0:000> !dumpheap -min 85000 -stat
Statistics:
              MT    Count    TotalSize Class Name
000007feec34c168        7     57734750 System.Char[]
000007feec34aee0       14    115469904 System.String
00000000013032d0      101    621925414      Free
Total 122 objects
Fragmented blocks larger than 0.5 MB:
            Addr     Size      Followed by
000000010d382018    2.8MB 000000010d645e90 System.String
000000010d971aa8    1.8MB 000000010db43530 System.Random
000000010db70bd0    1.1MB 000000010dc8e238 System.String
000000010dd2f6a8    0.7MB 000000010ddd9160 System.Random
000000010ddd92e8    1.1MB 000000010dee8d38 System.Security.Cryptography.SafeHashHandle
000000010e223090    3.0MB 000000010e51dcc8 System.Random

看看字符串

0:000> !dumpheap -type System.String -min 85000
         Address               MT     Size
00000004ffed5250 000007feec34aee0 12721650     
0000000501f4aec0 000007feec34aee0  1322018     
000000050208dae8 000007feec34aee0  1322022     
00000005021d0710 000007feec34aee0 12726120     
0000000502df3678 000007feec34aee0 12726124     
00000005121b3168 000007feec34aee0 12726120     
000000052001c2b0 000007feec34aee0 12721654     
0000000521053930 000007feec34aee0   732168     
00000005211b9120 000007feec34aee0   732168     
00000005216efa08 000007feec34aee0 12726124     
0000000522312978 000007feec34aee0 12726124     
00000005307564d8 000007feec34aee0  4780744     
0000000531074a50 000007feec34aee0  4780748     
0000000531503d20 000007feec34aee0 12726120     

Statistics:
              MT    Count    TotalSize Class Name
000007feec34aee0       14    115469904 System.String

查看字符串详情

0:000> !DumpObj /d 0000000501f4aec0
Name:        System.String
MethodTable: 000007feec34aee0
EEClass:     000007feebcb3720
Size:        1322018(0x142c22) bytes
File:        C:WindowsMicrosoft.NetassemblyGAC_64mscorlibv4.0_4.0.0.0__b77a5c561934e089mscorlib.dll
String:      {"SalePriceStrategyList":[{"SaleRuleID":5178,"StrategyTypeID":5,"StrategyTypeName":"标准售......

  

0:000> dc 00000004ffed5250 L1000
00000004`ffed5250  ec34aee0 000007fe 00610eec 0022007b  ..4.......a.{.".
00000004`ffed5260  00720050 0064006f 00630075 00490074  P.r.o.d.u.c.t.I.
00000004`ffed5270  00220064 0034003a 00390037 002c0031  d.".:.4.7.9.1.,.
00000004`ffed5280  00500022 006f0072 00750064 00740063  ".P.r.o.d.u.c.t.
00000004`ffed5290  0061004e 0065006d 003a0022 4e3d0022  N.a.m.e.".:.".=N
00000004`ffed52a0  002d6c5f 683c9999 62c991cc 901a76f4  _l-...<h...b.v..
00000004`ffed52b0  00228f66 0022002c 00750053 004e0062  f.".,.".S.u.b.N.
00000004`ffed52c0  006d0061 00220065 0022003a 002c0022  a.m.e.".:.".".,.
00000004`ffed52d0  00440022 00700065 00610052 0067006e  ".D.e.p.R.a.n.g.
00000004`ffed52e0  004c0065 00730069 00220074 006e003a  e.L.i.s.t.".:.n.
00000004`ffed52f0  006c0075 002c006c 00440022 00730065  u.l.l.,.".D.e.s.
00000004`ffed5300  00520074 006e0061 00650067 0069004c  t.R.a.n.g.e.L.i.
00000004`ffed5310  00740073 003a0022 007b005b 00520022  s.t.".:.[.{.".R.
00000004`ffed5320  006e0061 00650067 00640049 003a0022  a.n.g.e.I.d.".:.
00000004`ffed5330  00330022 00220037 0022002c 00610052  ".3.7.".,.".R.a.
00000004`ffed5340  0067006e 004e0065 006d0061 00220065  n.g.e.N.a.m.e.".
00000004`ffed5350  0022003a 6c5f4e3d 002c0022 00450022  :.".=N_l".,.".E.
00000004`ffed5360  004e006e 006d0061 00220065 0022003a  n.N.a.m.e.".:.".
00000004`ffed5370  0069004c 0069006a 006e0061 00220067  L.i.j.i.a.n.g.".
00000004`ffed5380  0022002c 00610052 0067006e 00540065  ,.".R.a.n.g.e.T.
00000004`ffed5390  00700079 00220065 0031003a 002c0036  y.p.e.".:.1.6.,.

发现是指纹或销控缓存反序列产生的

同理看看字符数组,结果类似。

继续,分析线程

0:000> !threads
ThreadCount:      1710
UnstartedThread:  1
BackgroundThread: 122
PendingThread:    0
DeadThread:       1587
Hosted Runtime:   no

发现deadthread很多,用类似的方式,发现这些线程的地址都在终结器队列GCHandle中,推测室短时间内AMQ批量触发而无法大量共用线程池中的现有线程引起新开辟了很多额外的线程。通过调用栈发现的确如此:

0:000> !clrstack
OS Thread Id: 0x5718 (42)
        Child SP               IP Call Site
0000000013e8e6c8 000000007706df6a [GCFrame: 0000000013e8e6c8] 
0000000013e8e798 000000007706df6a [HelperMethodFrame_1OBJ: 0000000013e8e798] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
0000000013e8e8b0 000007fe9302c6da Apache.NMS.ActiveMQ.Threads.DedicatedTaskRunner.Run()
0000000013e8e930 000007feec19f8a5 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
0000000013e8ea90 000007feec19f609 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
0000000013e8eac0 000007feec19f5c7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0000000013e8eb10 000007feec1b2d21 System.Threading.ThreadHelper.ThreadStart()
0000000013e8ee28 000007feed27f713 [GCFrame: 0000000013e8ee28] 
0000000013e8f158 000007feed27f713 [DebuggerU2MCatchHandlerFrame: 0000000013e8f158] 
0000000013e8f338 000007feed27f713 [ContextTransitionFrame: 0000000013e8f338] 
0000000013e8f528 000007feed27f713 [DebuggerU2MCatchHandlerFrame: 0000000013e8f528] 

推测是由于AMQ短时间内批量触发指纹、销控缓存更新引起。

本来想着手解决缓存反序列化大对象、改善AMQ批量触发开辟过多线程、以及是否有未退订的订阅等问题,不过产品过来说,业务码好了没》》》

原文地址:https://www.cnblogs.com/LoveOfPrince/p/6032523.html