探究内存泄露—Part1—编写泄露代码

本文由 ImportNew - 黄索远翻译自 captaindebug。如需转载本文，请先参见文章末尾处的转载要求。

ImportNew注：如果你也对Java技术翻译分享感兴趣，欢迎加入我们的 Java开发小组。参与方式请查看小组简介。

几天前我发现了一个小问题：有一个服务器在跑了一段时间后挂掉了。重启脚本和系统后，这个问题还是会出现。因为问题代码不是关键业务，所以尽管有大量的数据丢失，但是问题并不严重。不过我还是决定作进一步的调查，来探寻一下问题到底出现在哪。首先注意到的是，服务器通过了所有的单元测试和集成环境的完整测试。在测试环境下使用测试数据时运行得非常正常。那么为什么在工作环境中一跑起来就会出现问题呢？很容易就能想到，也许是因为在实际运行时的负载大于测试，甚至超过了设计时所能承载的负重，从而耗尽了资源。但是到底是什么资源，又是在哪里耗尽的呢？这就是本文需要探究的难题。

为了演示如何调查这个问题，第一件事情就是写一些内存泄露的代码。我将会采用生产者—消费者模型，以便更好的说明这个问题。

和往常一样，为了说明内存泄露代码，我需要人为建立一个场景。在这个场景中，假定你为一个证劵经纪公司工作，这个公司将股票的销售额和股份记录在一个数据库中。通过一个简单进程获取命令并将其存放在一个队列中。另一个进程从该队列中读取命令并将其写入数据库。命令的POJO（简单Java对象）非常的直观：

public class Order {
 
  private final int id;
 
  private final String code;
 
  private final int amount;
 
  private final double price;
 
  private final long time;
 
  private final long[] padding;
 
  /**
   * @param id
   *            The order id
   * @param code
   *            The stock code
   * @param amount
   *            the number of shares
   * @param price
   *            the price of the share
   * @param time
   *            the transaction time
   */
  public Order(int id, String code, int amount, double price, long time) {
    super();
    this.id = id;
    this.code = code;
    this.amount = amount;
    this.price = price;
    this.time = time;
    // This just makes the Order object bigger so that
    // the example runs out of heap more quickly.
    this.padding = new long[3000];
    Arrays.fill(padding, 0, padding.length - 1, -2);
  }
 
  public int getId() {
    return id;
  }
 
  public String getCode() {
    return code;
  }
 
  public int getAmount() {
    return amount;
  }
 
  public double getPrice() {
    return price;
  }
 
  public long getTime() {
    return time;
  }
 
}

这个命令POJO是Spring应用的一部分。这个应用有三个主要的抽象类，当应用调用他们的start()方法时分别创建一个新进程。

第一个抽象类是OrderFeed。run()方法会生成一个虚拟的命令并将其放置在队列中。生成命令后它会睡眠一会儿，然后生成一个新的命令。

publicclassOrderFeedimplementsRunnable
 {

privatestaticRandom
 rand = newRandom();

privatestaticintid
 = 0;

privatefinalBlockingQueue<Order>
 orderQueue;

publicOrderFeed(BlockingQueue<Order>
 orderQueue) {

this.orderQueue
 = orderQueue;

}

/**

*
 Called by Spring after loading the context. Start producing orders

*/

publicvoidstart()
 {

Thread
 thread = newThread(this,"Order
 producer");

thread.start();

}

/**
 The main run loop */

@Override

publicvoidrun()
 {

while(true)
 {

Order
 order = createOrder();

orderQueue.add(order);

sleep();

}

}

privateOrder
 createOrder() {

finalString[]
 stocks = { "BLND.L","DGE.L","MKS.L","PSON.L","RIO.L","PRU.L",

"LSE.L","WMH.L"};

intnext
 = rand.nextInt(stocks.length);

longnow
 = System.currentTimeMillis();

Order
 order = newOrder(++id,
 stocks[next], next * 100,
 next * 10,
 now);

returnorder;

}

privatevoidsleep()
 {

try{

TimeUnit.MILLISECONDS.sleep(100);

}catch(InterruptedException
 e) {

e.printStackTrace();

}

}

}

第二个类是OrderRecord，这个类负责从队列中提取命令并将它们写入数据库。问题是，将命令写入数据库的耗时比产生命令的耗时要长得多。为展示这一现象，我将在recordOrder()方法中让其睡眠1秒。

publicclassOrderRecordimplementsRunnable
 {

privatefinalBlockingQueue<Order>
 orderQueue;

publicOrderRecord(BlockingQueue<Order>
 orderQueue) {

this.orderQueue
 = orderQueue;

}

publicvoidstart()
 {

Thread
 thread = newThread(this,"Order
 Recorder");

thread.start();

}

@Override

publicvoidrun()
 {

while(true)
 {

try{

Order
 order = orderQueue.take();

recordOrder(order);

}catch(InterruptedException
 e) {

e.printStackTrace();

}

}

}

/**

*
 Record the order in the database

*

*
 This is a dummy method

*

*
 @param order

*           
 The order

*
 @throws InterruptedException

*/

publicvoidrecordOrder(Order
 order) throwsInterruptedException
 {

TimeUnit.SECONDS.sleep(1);

}

结果将是显而易见的，OrderRecord线程跟不上命令产生的速度，导致这个队列越来越长，直到JAVA虚拟机用尽堆内存从而崩溃。这就是生产者—消费者模式的存在一个大问题：消费者的速度必须跟上生产者的速度。

为了证明这一点，我加入了第三个类OrderMonitor。这个类每隔几秒就会打印出队列的大小，这样就能看到运行时产生的问题。

publicclassOrderQueueMonitorimplementsRunnable
 {

    privatefinalBlockingQueue<Order>
 orderQueue;

    publicOrderQueueMonitor(BlockingQueue<Order>
 orderQueue) { 

        this.orderQueue
 = orderQueue; 

    }

    publicvoidstart()
 {

       Thread
 thread = newThread(this,"Order
 Queue Monitor");

       thread.start();

    }

    @Overridepublicvoidrun()
 {

       while(true)
 {

          try{

            TimeUnit.SECONDS.sleep(2);

            intsize
 = orderQueue.size();

            System.out.println("Queue
 size is:" 
+ size);

          }catch(InterruptedException
 e) {

            e.printStackTrace();

         }

    }

}

为了完成Spring框架，我加入了应用上下文，示例代码如下：

<?xmlversion="1.0"encoding="UTF-8"?>

<beansxmlns="http://www.springframework.org/schema/beans"

xmlns:p="http://www.springframework.org/schema/p"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:context="http://www.springframework.org/schema/context"

xsi:schemaLocation="http://www.springframework.org/schema/beans
 http://www.springframework.org/schema/beans/spring-beans.xsd

http://www.springframework.org/schema/context
 http://www.springframework.org/schema/context/spring-context-3.1.xsd"

default-init-method="start"

default-destroy-method="destroy">

<beanid="theQueue"/>

<beanid="orderProducer">

<constructor-argref="theQueue"/>

</bean>

<beanid="OrderRecorder">

<constructor-argref="theQueue"/>

</bean>

<beanid="QueueMonitor">

<constructor-argref="theQueue"/>

</bean>

</beans>

下一步就是把这个内存泄露的代码跑起来，你需要改变下面的目录：

<your-path>/git/captaindebug/producer-consumer/target/classes

然后输入下面的命令：

java
 -cp /path-to/spring-beans-3.2.3.RELEASE.jar:/path-to/spring-context-3.2.3.RELEASE.jar:/path-to/spring-core-3.2.3.RELEASE.jar:/path-to/slf4j-api-1.6.1-javadoc.jar:/path-to/commons-logging-1.1.1.jar:/path-to/spring-expression-3.2.3.RELEASE.jar:.
 com.captaindebug.producerconsumer.problem.Main

“path-to”对应着你的jar文件目录。

Java比较讨厌的一点是，从命令行来运行程序非常的困难——你必须要搞清楚类的目录、选项、需要设定的属性、main所在的类在哪里。当然，有方法能让你只需要输入Java的项目名称，然后Java虚拟机帮你把一切都搞定，特别是使用默认设置：这有多难呢？

你也可以通过附加一个简单的JConsole来监控应用程序的内存泄漏。如果你最近运行过，则需要在上面的命令行中添加如下的选项（选择自己的端口号）：

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.port=9010

-Dcom.sun.management.jmxremote.local.only=false

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.ssl=false

如果你看看堆的使用量，你会发现随着队列的增大堆逐渐变大。

你可能不会发现1KB的内存泄露，但1GB的内存泄露就很明显了。所以，接下来要做的事情就是等待内存的泄露直到进入下一个阶段的研究。下回见……

原文链接： captaindebug 翻译： ImportNew.com - 黄索远
译文链接： http://www.importnew.com/7807.html