Java 8 VM GC Tuning Guide Charter3-4

第三章 Generations

One strength of the Java SE platform is that it shields the developer from the complexity of memory allocation and garbage collection. However, when garbage collection is the principal bottleneck, it is useful to understand some aspects of this hidden implementation. Garbage collectors make assumptions about the way applications use objects, and these are reflected in tunable parameters that can be adjusted for improved performance without sacrificing the power of the abstraction.

Java平台的优点之一就是对开发者屏蔽内存分配和gc的复杂性。然而，当gc成为主要瓶颈的时候，了解一些gc隐藏实现的内容是十分有必要的。gc对应用程序使用对象的方式做了一些预先假设，这反映在调优参数中，这些参数调整之后可以提升性能而又不失抽象的灵活性。

An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms iterate over every reachable object. Any objects left over are considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.

当一个对象，在运行的程序中，从任意一个指针都不可达的时候，就会被认为是垃圾。最直白的gc算法就是遍历每一个可达的对象。任何被剩下的对象都被认作垃圾。这种算法的耗时和生存对象的数量有关，当面对超大型程序持有海量对象时，开销过高。

The virtual machine incorporates a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to minimize the work required to reclaim unused (garbage) objects. The most important of these observed properties is the weak generational hypothesis, which states that most objects survive for only a short period of time.

虚拟机采用不同的算法对内存进行分代回收。尽管gc还是对堆里面每一个对象都逐一检查，但是通过经验观察法，分代回收的策略观察到大部分应用程序的一些性质，利用这些性质，可以大大减少回收垃圾对象的负担。这些被观察到的性质中，最重要的特性就是弱代假设，即大部分的对象生存的周期都很短。

The blue area in Figure 3-1, "Typical Distribution for Lifetimes of Objects" is a typical distribution for the lifetimes of objects. The x-axis is object lifetimes measured in bytes allocated. The byte count on the y-axis is the total bytes in objects with the corresponding lifetime. The sharp peak at the left represents objects that can be reclaimed (in other words, have "died") shortly after being allocated. Iterator objects, for example, are often alive for the duration of a single loop.

Figure 3-1 Typical Distribution for Lifetimes of Objects

Description of "Figure 3-1 Typical Distribution for Lifetimes of Objects"

Some objects do live longer, and so the distribution stretches out to the right. For instance, there are typically some objects allocated at initialization that live until the process exits. Between these two extremes are objects that live for the duration of some intermediate computation, seen here as the lump to the right of the initial peak. Some applications have very different looking distributions, but a surprisingly large number possess this general shape. Efficient collection is made possible by focusing on the fact that a majority of objects "die young."

有效率的回收算法应该聚焦于这样一个事实：大部分的对象在创建之初就死了。

To optimize for this scenario, memory is managed in generations (memory pools holding objects of different ages). Garbage collection occurs in each generation when the generation fills up. The vast majority of objects are allocated in a pool dedicated to young objects (the young generation), and most objects die there. When the young generation fills up, it causes a minor collection in which only the young generation is collected; garbage in other generations is not reclaimed. Minor collections can be optimized, assuming that the weak generational hypothesis holds and most objects in the young generation are garbage and can be reclaimed. The costs of such collections are, to the first order, proportional to the number of live objects being collected; a young generation full of dead objects is collected very quickly. Typically, some fraction of the surviving objects from the young generation are moved to the tenured generation during each minor collection. Eventually, the tenured generation will fill up and must be collected, resulting in a major collection, in which the entire heap is collected. Major collections usually last much longer than minor collections because a significantly larger number of objects are involved.

为了优化上图的场景，内存被分代管理（内存池里持有的对象按照不同的时期划分）。gc在每一个代被填满的时候发生，并且发生在每个代内。大多数的对象被分配在一个为年轻对象准备的内存池里（年轻代），并且大部分的对象也死在这里。当年轻代被占满的时候，会触发minor gc，minor gc仅回收年轻代的内存，不回收其他代的内存。基于弱代假设的前提：大部分在年轻代的对象是垃圾，并且可以被回收，minor gc是有优化的余地的。Minor gc的开销和年轻代存活的对象数量成正比，如果年轻代的对象全是死对象，那么回收起来是很快的。一般情况下，某些在年轻代存活的对象，在minor gc后，将被从年轻代移动到成熟代。最终，成熟代也会被占满，需要回收，导致发生major gc，在major gc中，整个堆将被回收。Major gc比minor gc持续的时间要长很多，因为major gc涉及到全局所有的对象。

As noted in the section Ergonomics, ergonomics selects the garbage collector dynamically to provide good performance on a variety of applications. The serial garbage collector is designed for applications with small data sets, and its default parameters were chosen to be effective for most small applications. The parallel or throughput garbage collector is meant to be used with applications that have medium to large data sets. The heap size parameters selected by ergonomics plus the features of the adaptive size policy are meant to provide good performance for server applications. These choices work well in most, but not all, cases, which leads to the central tenet of this document:

如同在第二章Ergonomics开头中提到的，针对不同的应用，Ergonomics会动态选择gc来提供好的性能。串行gc适合小量数据集合，默认的参数设定也适合大多数小型程序。并行或者吞吐量gc适合中到大型数据集合。Ergonomics通过选择设定合适的堆的初始大小，加上动态调整策略，可以为不同的应用程序提供较好的性能保证。大部分情况下这种机制工作量好，但并不是所有情况都一帆风顺，这也就引出了本文的主旨：

If garbage collection becomes a bottleneck, you will most likely have to customize the total heap size as well as the sizes of the individual generations. Check the verbose garbage collector output and then explore the sensitivity of your individual performance metric to the garbage collector parameters.

如果gc变成了瓶颈，你将不得不亲自定制堆的大小，并仔细的设定每个单独的代的大小。检查gc的详细输出信息，采用你程序最为性能敏感的参数设定。

Figure 3-2, "Default Arrangement of Generations, Except for Parallel Collector and G1" shows the default arrangement of generations (for all collectors with the exception of the parallel collector and G1):

Figure 3-2 Default Arrangement of Generations, Except for Parallel Collector and G1

Description of "Figure 3-2 Default Arrangement of Generations, Except for Parallel Collector and G1"

At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and tenured generations.

在初始化的时候，一个最大的地址空间被保留，但是仅在需要的时候才会分配物理地址。整个内存地址空间被划分为年轻代和成熟代。

The young generation consists of eden and two survivor spaces. Most objects are initially allocated in eden. One survivor space is empty at any time, and serves as the destination of any live objects in eden; the other survivor space is the destination during the next copying collection. Objects are copied between survivor spaces in this way until they are old enough to be tenured (copied to the tenured generation).

年轻代包含一个eden区和两个survivor区。大部分对象初始被分配在eden区。其中一个survivor区在任意时点始终是空的，时刻准备着在下一次回收中拷贝另外一个survivor区存活的对象（这种空区拷贝可以有效保持对象紧凑并且避免删除操作，回收的时候，只需要把存活的对象紧凑的搬到另外一个区域然后一股脑儿的将本区域置为空即可，效率高）

1. Performance Considerations

性能考量主要有两点：一是吞吐量，即gc时间占总运行时间的比值，二是暂停时间，即gc运行时，主程序停止的时间。

不同的用户对gc有不同的需求，如果是一个web用户，那么吞吐量是优先考虑的因素，因为偶尔的长暂停可以被归咎于网络延迟；但是如果是一个强交互的应用，那么即便是一个短暂的停止后也是用户体验上无法忍受的。

但是有些用户也有其他考量，Footprint（资源占用）是一个工作进程的所包含的内容的集合，其度量参照物通常为page（页，内存概念）和cache line（缓存行，内存概念）。在物理内存和进程数有限的操作系统上，footprint需要精确考量。

还有一个指标叫做promptness，这个是指当一个对象变为不可达状态（dead），到被回收，将内存空间释放的时间差。这个指标对分布式系统有很大的意义，比如Java RMI。

总体上来说，在上述这些指标的考量之间来选择每个内存“代”的大小是一种平衡工作。比如，设定一个非常大的年轻代可以显著提高吞吐量，但是却以更大的资源占用（footprint），更长的promptness时间和gc暂停时间为代价的。相反，小的年轻代，因为gc暂停的时间更小，但是却牺牲了吞吐量。每一个代进行回收的频率和暂停的时间，不会影响其他代。

法无常法，要根据实际应用程序的需要来配置每代内存的大小。

2. Measurement

使用虚拟机参数-verbose:gc，可以打印出gc时候的详细信息。输出的格式如下：

minor gc是年轻代的gc；major gc是全gc

其中total仅包含了一个survivor区域的内存大小，因为另外一个总是空的。

在虚拟机参数中，使用-XX:+PrintGCDetails参数，可以打印更为详细的回收日志。

[GC [PSYoungGen: 76256K->10745K(141824K)] 95363K->41563K(315392K), 0.0073975 secs] [Times: user=0.09 sys=0.02, real=0.01 secs]

使用-XX:+PrintGCTimeStamps可以增加一个gc的时间戳，来看到底gc有多频繁，比如：

1.617: [GC [PSYoungGen: 76268K->10743K(76288K)] 78162K->19282K(249856K), 0.0110779 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]

这个1.617是距离程序启动的时间点

第四章 Sizing the Generations

上图中，committed就是用掉的，virtual就是保留的，但是这些内存都已经是VM向操作系统申请的内存，理论上说是已经被接管的。

By default, the virtual machine grows or shrinks the heap at each collection to try to keep the proportion of free space to live objects at each collection within a specific range. This target range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and -XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms<min> and above by -Xmx<max>.

默认虚拟机动态增长或者减少堆内存占用的空间，将活动对象占用的内存控制在一定比例之内。这个行为可以通过设定参数来改变。

-XX:MinHeapFreeRatio=<minimum>

设定空闲堆内存占总使用内存的最小比率，在Solaris 64 bit操作系统上默认是40，如果空闲比例比这个要低，那么内存中的“代”就会扩大，来保持这个比例

-XX:MaxHeapFreeRatio=<maximum>

和上方的选项类似，设定空闲堆内存占总使用内存的最大比率，默认70，如果超过70，那么各代就会压缩，来保证不会空闲太多。

-Xms<min>

最小堆内存，单位可以使用M和K来表示

-Xmx<max>

最大堆内存，单位可以使用M和K来表示

举个例子如下：

当给eclipse中的tomcat，设定一个过小的最大内存时：

在启动tomcat时，会看到vm为了节省内存而疯狂的gc信息，但最终导致内存溢出

年轻代

The bigger the young generation, the less often minor collections occur. However, for a bounded heap size, a larger young generation implies a smaller tenured generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application.

年轻代越大，minor gc发生的次数就越少，但是因为内存总是有限的，如果年轻大越大就意味着老生代越小，那么又会增加major gc的频率。最优的选择应当基于程序里对象的生存特性。

By default, the young generation size is controlled by the parameter NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the young and tenured generation is 1:3. In other words, the combined size of the eden and survivor spaces will be one-fourth of the total heap size.

默认年轻代的大小被参数-XX:NewRatio控制，比如，-XX:NewRatio=3意味着年轻代和老生代的比例是年轻代占1分，老生代占3分，比例是1：3。换句话说，eden和survior区的空间是整个堆的四分之一。

The parameters NewSize and MaxNewSize bound the young generation size from below and above. Setting these to the same value fixes the young generation, just as setting -Xms and -Xmx to the same value fixes the total heap size. This is useful for tuning the young generation at a finer granularity than the integral multiples allowed by NewRatio.

参数NewSize和MaxNewSize来限定年轻代大小的上下限。如果两个值设定为相同，就是强制年轻代为固定大小。

默认值：

-XX:NewRatio=2

-XX:NewSize=1310M

-XX:MaxNewSize=根据系统容量

Survivor Space Sizing

You can use the parameter SurvivorRatio can be used to tune the size of the survivor spaces, but this is often not important for performance. For example, -XX:SurvivorRatio=6 sets the ratio between eden and a survivor space to 1:6. In other words, each survivor space will be one-sixth the size of eden, and thus one-eighth the size of the young generation (not one-seventh, because there are two survivor spaces).

使用参数SurvivorRatio来控制survivor区域和eden区域的比值，但是通常这个设定对性能的影响不大。比如给这个值设定为6，那么意味着eden区和survivor区的比值大小为6:1（从这里看到，所有百分比的设定，设定的都是分母），因为有两个survivor区，并且区域大小一致，可以推得结论：每个survivor区域占整个新生代的1/8。

If survivor spaces are too small, copying collection overflows directly into the tenured generation. If survivor spaces are too large, they will be uselessly empty. At each garbage collection, the virtual machine chooses a threshold number, which is the number times an object can be copied before it is tenured. This threshold is chosen to keep the survivors half full. The command line option -XX:+PrintTenuringDistribution (not available on all garbage collectors) can be used to show this threshold and the ages of objects in the new generation. It is also useful for observing the lifetime distribution of an application.

如果survivor区域过小，那么VM就会将对象直接拷贝到老生代。如果survivor过大，那又会造成浪费。每一种gc，虚拟机都会选择一个阀值，这个阀值是一个对象在变为老生代之前可以被拷贝的次数（两个survivor之间互相copy的次数）。这个阀值是保持survivor半空的值。命令行参数-XX:+PrintTenuringDistribution（不是所有gc都有的）可以展示这个门槛值，和对象的生存时间。