Effective C# 原则6：区别值类型数据和引用类型数据

Effective C# 原则6：区别值类型数据和引用类型数据
Item 6: Distinguish Between Value Types and Reference Types

值类型数据还是引用类型数据？结构还是类？什么你须要使用它们呢？这不是C＋＋，你可以把所有类型都定义为值类型，并为它们做一个引用。这也不是Java，所有的类型都是值类型。你在创建每个类型实例时，你必须决定它们以什么样的形式存在。这是一个为了取得正确结果，必须在一开始就要面对的重要决定。（一但做也决定）你就必须一直面对这个决定给你带来的后果，因为想在后面再对它进行改动，你就不得不在很多细小的地方强行添加很多代码。当你设计一个类型时，选择struct或者class是件简单的小事情，但是，一但你的类型发生了改变，对所有使用了该类型的用户进行更新却要付出(比设计时)多得多的工作。

这不是一个简单的非此及彼的选择。正确的选择取决于你希望你的新类型该如何使用。值类型不具备多态性，但它们在你的应用程序对数据的存取却是性能有佳；引用类型可以有多态性，并且你还可以在你的应用程序中为它们定义一些表现行为。考虑你期望给你的类型设计什么样的职能，并根据这些职能来决定设计什么样的类型。结构存储数据，而类表现行为。

因为很多的常见问题在C++以及Javaj里存在，因此.Net和C#对值类型和引用类型的做了区分。在C++里，所有的参数和返回值都是以值类型的进行传递的。以值类型进行传递是件很有效率的事，但不得不承受这样的问题：对象的浅拷贝(partial copying)(有时也称为slicing object)。如果你对一个派生的对象COPY数据时，是以基类的形式进行COPY的，那么只有基类的部分数据进行了COPY。你就直接丢失了派生对象的所有信息。即使时使用基类的虚函数。

而Java语言呢，在放弃了值类型数据后，或多或少有些表现吧。Javs里，所有的用户定义类型都是引用类型，所有的参数及返回数据都是以引用类型进行传递的。这一策略在(数据)一致性上有它的优势，但在性能上却有缺陷。让我们面对这样的情况，有些类型不是多态性的--它们并不须要。Java的程序员们为所有的变量准备了一个内存堆分配器和一个最终的垃圾回收器。他们还须要为每个引用变量的访问花上额外的时间，因为所有的变量都是引用类型。在C#里，你或者用struct声明一个值类型数据，或者用class声明一个引用类型数据。值类型数据应该比较小，是轻量级的。引用类型是从你的类继承来的。这一节将练习用不同的方法来使用一个数据类型，以便你给掌握值类型数据和引用类型数据之间的区别。

我们开始了，这有一个从一个方法上返回的类型：

private MyData _myData;
public MyData Foo()
{
return _myData;
}
// call it:
MyData v = Foo();
TotalSum += v.Value;

如果MyData是一个值类型，那么回返值会被COPY到V中存起来。而且v是在栈内存上的。然而，如果MyData是一个引用类型，你就已经把一个引用导入到了一个内部变量上。同时，
你也违犯了封装原则(见原则23)。

或者，考虑这个变量：

private MyData _myData;
public MyData Foo()
{
return _myData.Clone( ) as MyData;
}

// call it:
MyData v = Foo();
TotalSum += v.Value;

现在，v是原始数据_myData的一个COPY。做为一个引用类型，两个对象都是在内存堆上创建的。你不会因为暴露内部数据而遇到麻烦。取而代之的是你会在堆上建立了一个额外的数据对象。如果v是局部变量，它很快会成为垃圾，而且Clone要求你在运行时做类型检测。总而言之，这是低效的。

以公共方法或属性暴露出去的数据应该是值类型的。但这并不是说所有从公共成员返回的类型必须是值类型的。对前面的代码段做一个假设，MyData有数据存在，它的责任就是保存这些数据。

但是，可以考虑选择下面的代码段：
private MyType _myType;
public IMyInterface Foo()
{
return _myType as IMyInterface;
}

// call it:
IMyInterface iMe = Foo();
iMe.DoWork( );

变量_myType还是从Foo方法返回。但这次不同的是，取而代之的是访问返回值的内部数据，通过调用一个定义好了的接口上的方法来访问对象。你正在访问一个MyType的对象，而不是它的具体数据，只是使用它的行为。该行为是IMyInterface展示给我们的，同时，这个接口是可以被其它很多类型所实现的。做为这个例子，MyType应该是一个引用类型，而不是一个值类型。MyType的责任是考虑它周围的行为，而不是它的数据成员。

这段简单的代码开始告诉你它们的区别：值类型存储数据，引用类型表现行为。现在我们深入的看一下这些类型在内存里是如何存储的，以及在存储模型上表现的性能。考虑下面这个类：

public class C
{
private MyType _a = new MyType( );
private MyType _b = new MyType( );

// Remaining implementation removed.
}

C var = new C();

多少个对象被创建了？它们占用多少内存？这还不好说。如果MyType是值类型，那么你只做了一次堆内存分配。大小正好是MyType大小的2倍。然而，如果MyType是引用类型，那么你就做了三次堆内存分配：一次是为C对象，占8字节(假设你用的是32位的指针)(译注：应该是4字节，可能是笔误)，另2次是为包含在C对象内的MyType对象分配堆内存。之所以有这样不同的结果是因为值类型是以内联的方式存在于一个对象内，相反，引用类型就不是。每一个引用类型只保留一个引用指针，而数据存储还须要另外的空间。
为了理解这一点，考虑下面这个内存分配：

MyType [] var = new MyType[ 100 ];

如果MyType是一个值类型数据，一次就分配出100个MyType的空间。然而，如果MyType是引用类型，就只有一次内存分配。每一个数据元素都是null。当你初始化数组里的每一个元素时，你要上演101次分配工作--并且这101次内存分配比1次分配占用更多的时间。分配大量的引用类型数据会使堆内存出现碎片，从而降低程序性能。如果你创建的类型意图存储数据的值，那么值类型是你要选择的。

采用值类型数据还是引用类型数据是一个很重要的决定。把一个值类型数据转变为类是一个深层次的改变。考虑下面这种情况：

public struct Employee
{
private string _name;
private int _ID;
private decimal _salary;

// Properties elided

public void Pay( BankAccount b )
{
b.Balance += _salary;
}
}

这是个很清楚的例子，这个类型包含一个方法，你可以用它为你的雇员付薪水。时间流逝，你的系统也公正的在运行。接着，你决定为不同的雇员分等级了：销售人员取得拥金，经理取得红利。你决定把这个Employee类型改为一个类：

public class Employee
{
private string _name;
private int _ID;
private decimal _salary;

// Properties elided

public virtual void Pay( BankAccount b )
{
b.Balance += _salary;
}
}

这扰乱了很多已经存在并使用了你设计的结构的代码。返回值类型的变为返回引用类型。参数也由原来的值传递变为现在的引用传递。下面代码段的行为将受到重创：

Employee e1 = Employees.Find( "CEO" );
e1.Salary += Bonus; // Add one time bonus.
e1.Pay( CEOBankAccount );

就是这个一次性的在工资中添加红利的操作，成了持续的提升。曾经是值类型COPY的地方，如今都变成了引用类型的引用。编译器很乐意为你做这样的改变，你的CEO更是乐意这样的改变。另一方面，你的CEO将会给你报告BUG。
你还是没能改变对值类型和引用类型的看法，以至于你犯下这样的错误还不知道：它改变了行为!

出现这个问题的原因就是因为Employee已经不再遵守值类型数据的的原则。
另外，定义为Empolyee的保存数据的元素，在这个例子里你必须为它添加一个职责：为雇员付工资。职责是属于类范围内的事。类可以被定义多态的，从而很容易的实现一些常见的职责；而结构则不充许，它应该仅限于保存数据。

在值类型和引用类型间做选择时，.Net的说明文档建议你把类型的大小做为一个决定因素来考虑。而实际上，更多的因素是类型的使用。简单的结构或单纯的数据载体是值类型数据优秀的候选对象。事实表明，值类型数据在内存管理上有很好的性能：它们很少会有堆内存碎片，很少会有垃圾产生，并且很少间接访问。
(译注：这里的垃圾，以及前面提到过的垃圾，是指堆内存上“死”掉的对象，用户无法访问，只等着由垃圾回收器来收集的对象，因此认为是垃圾。在.net里，一般说垃圾时，都是指这些对象。建议看一下.net下垃圾回收器的管理模型)
更重要是：当从一个方法或者属性上返回时，值类型是COPY的数据。这不会有因为暴露内部结构而存在的危险。But you pay in terms of features. 值类型在面向对象技术上的支持是有限的。你应该把所有的值类型当成是封闭的。你可以建立一个实现了接口的值类型，但这须要装箱，原则17会给你解释这会带来性能方面的损失。把值类型就当成是一个数据的容器吧，不再感觉是OO里的对象。

你创建的引用类型可能比值类型要多。如果你对下面所有问题回答YES，你应该创建值类型数据。把下面的问题与前面的Employee例子做对比：

1、类型的最基本的职责是存储数据吗？
2、它的属性上有定义完整的公共接口来访问或者修改数据成员吗？
3、我对类型决不会有子类自信吗？
4、我对类型决不会有多太性自信吗？

把值类型当成一个低层次的数据存储类型，把应用程序的行为用引用类型来表现。
你会在从类暴露的方法那取得安全数据的COPY。你会从使用内联的值类型那里得到内存使用高率的好处。并且你可以用标准的面向对象技术创建应用程序逻辑。当你对期望的使用拿不准时，使用引用类型。

=================================
小结：这一原则有点长，花的时间也比较多一点，本想下班后，两三个小时就搞定的，因为我昨天已经翻译了一些的，结果，还是一不小心搞到了11点。
最后说明一个，这一原则还是没有说明白什么是引用类型什么是值类型。当然，用class说明的类型一定是引用类型，用struct说明的是值类型。还要注意其它一些类型的性质：例如：枚举是什么类型？委托是什么类型？事件呢？

Item 6: Distinguish Between Value Types and Reference Types
Value types or reference types? Structs or classes? When should you use each? This isn't C++, in which you define all types as value types and can create references to them. This isn't Java, in which everything is a reference type. You must decide how all instances of your type will behave when you create it. It's an important decision to get right the first time. You must live with the consequences of your decision because changing later can cause quite a bit of code to break in subtle ways. It's a simple matter of choosing the struct or class keyword when you create the type, but it's much more work to update all the clients using your type if you change it later.

It's not as simple as preferring one over the other. The right choice depends on how you expect to use the new type. Value types are not polymorphic. They are better suited to storing the data that your application manipulates. Reference types can be polymorphic and should be used to define the behavior of your application. Consider the expected responsibilities of your new type, and from those responsibilities, decide which type to create. Structs store data. Classes define behavior.

The distinction between value types and reference types was added to .NET and C# because of common problems that occurred in C++ and Java. In C++, all parameters and return values were passed by value. Passing by value is very efficient, but it suffers from one problem: partial copying (sometimes called slicing the object). If you use a derived object where a base object is expected, only the base portion of the object gets copied. You have effectively lost all knowledge that a derived object was ever there. Even calls to virtual functions are sent to the base class version.

The Java language responded by more or less removing value types from the language. All user-defined types are reference types. In the Javalanguage, all parameters and return values are passed by reference. This strategy has the advantage of being consistent, but it's a drain on performance. Let's face it, some types are not polymorphicthey were not intended to be. Java programmers pay a heap allocation and an eventual garbage collection for every variable. They also pay an extra time cost to dereference every variable. All variables are references. In C#, you declare whether a new type should be a value type or a reference type using the struct or class keywords. Value types should be small, lightweight types. Reference types form your class hierarchy. This section examines different uses for a type so that you understand all the distinctions between value types and reference types.

To start, this type is used as the return value from a method:

private MyData _myData;
public MyData Foo()
{
return _myData;
}

// call it:
MyData v = Foo();
TotalSum += v.Value;

If MyData is a value type, the return value gets copied into the storage for v. Furthermore, v is on the stack. However, if MyData is a reference type, you've exported a reference to an internal variable. You've violated the principal of encapsulation (see Item 23).

Or, consider this variant:

private MyData _myData;
public MyData Foo()
{
return _myData.Clone( ) as MyData;
}

// call it:
MyData v = Foo();
TotalSum += v.Value;

Now, v is a copy of the original _myData. As a reference type, two objects are created on the heap. You don't have the problem of exposing internal data. Instead, you've created an extra object on the heap. If v is a local variable, it quickly becomes garbage and Clone forces you to use runtime type checking. All in all, it's inefficient.

Types that are used to export data through public methods and properties should be value types. But that's not to say that every type returned from a public member should be a value type. There was an assumption in the earlier code snippet that MyData stores values. Its responsibility is to store those values.

But, consider this alternative code snippet:

private MyType _myType;
public IMyInterface Foo()
{
return _myType as IMyInterface;
}

// call it:
IMyInterface iMe = Foo();
iMe.DoWork( );

The _myType variable is still returned from the Foo method. But this time, instead of accessing the data inside the returned value, the object is accessed to invoke a method through a defined interface. You're accessing the MyType object not for its data contents, but for its behavior. That behavior is expressed through the IMyInterface, which can be implemented by multiple different types. For this example, MyType should be a reference type, not a value type. MyType's responsibilities revolve around its behavior, not its data members.

That simple code snippet starts to show you the distinction: Value types store values, and reference types define behavior. Now look a little deeper at how those types are stored in memory and the performance considerations related to the storage models. Consider this class:

public class C
{
private MyType _a = new MyType( );
private MyType _b = new MyType( );

// Remaining implementation removed.
}

C var = new C();

How many objects are created? How big are they? It depends. If MyType is a value type, you've made one allocation. The size of that allocation is twice the size of MyType. However, if MyType is a reference type, you've made three allocations: one for the C object, which is 8 bytes (assuming 32-bit pointers), and two more for each of the MyType objects that are contained in a C object. The difference results because value types are stored inline in an object, whereas reference types are not. Each variable of a reference type holds a reference, and the storage requires extra allocation.

To drive this point home, consider this allocation:

MyType [] var = new MyType[ 100 ];

If MyType is a value type, one allocation of 100 times the size of a MyType object occurs. However, if MyType is a reference type, one allocation just occurred. Every element of the array is null. When you initialize each element in the array, you will have performed 101 allocationsand 101 allocations take more time than 1 allocation. Allocating a large number of reference types fragments the heap and slows you down. If you are creating types that are meant to store data values, value types are the way to go.

The decision to make a value type or a reference type is an important one. It is a far-reaching change to turn a value type into a class type. Consider this type:

public struct Employee
{
private string _name;
private int _ID;
private decimal _salary;

// Properties elided

public void Pay( BankAccount b )
{
b.Balance += _salary;
}
}

This fairly simple type contains one method to let you pay your employees. Time passes, and the system runs fairly well. Then you decide that there are different classes of Employees: Salespeople get commissions, and managers get bonuses. You decide to change the Employee type into a class:

public class Employee
{
private string _name;
private int _ID;
private decimal _salary;

// Properties elided

public virtual void Pay( BankAccount b )
{
b.Balance += _salary;
}
}

That breaks much of the existing code that uses your customer struct. Return by value becomes return by reference. Parameters that were passed by value are now passed by reference. The behavior of this little snippet changed drastically:

Employee e1 = Employees.Find( "CEO" );
e1.Salary += Bonus; // Add one time bonus.
e1.Pay( CEOBankAccount );

What was a one-time bump in pay to add a bonus just became a permanent raise. Where a copy by value had been used, a reference is now in place. The compiler happily makes the changes for you. The CEO is probably happy, too. The CFO, on the other hand, will report the bug. You just can't change your mind about value and reference types after the fact: It changes behavior.

This problem occurred because the Employee type no longer follow the guidelines for a value type. In addition to storing the data elements that define an employee, you've added responsibilitiesin this example, paying the employee. Responsibilities are the domain of class types. Classes can define polymorphic implementations of common responsibilities easily; structs cannot and should be limited to storing values.

The documentation for .NET recommends that you consider the size of a type as a determining factor between value types and reference types. In reality, a much better factor is the use of the type. Types that are simple structures or data carriers are excellent candidates for value types. It's true that value types are more efficient in terms of memory management: There is less heap fragmentation, less garbage, and less indirection. More important, value types are copied when they are returned from methods or properties. There is no danger of exposing references to internal structures. But you pay in terms of features. Value types have very limited support for common object-oriented techniques. You cannot create object hierarchies of value types. You should consider all value types as though they were sealed. You can create value types that implement interfaces, but that requires boxing, which Item 17 shows causes performance degradation. Think of value types as storage containers, not objects in the OO sense.

You'll create more reference types than value types. If you answer yes to all these questions, you should create a value type. Compare these to the previous Employee example:

Is this type's principal responsibility data storage?

Is its public interface defined entirely by properties that access or modify its data members?

Am I confident that this type will never have subclasses?

Am I confident that this type will never be treated polymorphically?

Build low-level data storage types as value types. Build the behavior of your application using reference types. You get the safety of copying data that gets exported from your class objects. You get the memory usage benefits that come with stack-based and inline value storage, and you can utilize standard object-oriented techniques to create the logic of your application. When in doubt about the expected use, use a reference type.