A Pythonic Object

1. Object Representations

Every object-oriented language has at least one standard way of getting a string representation from any object. Python has two: repr() and str() . And the special methods __repr__ and __str__ support repr() and str().

There are two additional special methods to support alternative representations of objects: __bytes__ and __format__ . The __bytes__ is called by bytes() to get the object represented as a byte sequence . Regarding __format__ , both the built-in function format() and the str.format() method call it to get string displays of objects using special formatting codes.

# In Python3 , __repr__ , __str__ , and __format__ must always return Unicode strings (type str). Only __bytes__ is supposed to return a byte sequence (type bytes)

2. Vector Class Redux

Example 9-2. vector2d_v0.py: methods so far are all special methods

from array import array
import math


class Vector2d:
    typecode = 'd'  # typecode is a class attribute used when converting Vector2d instances to/from bytes.

    def __init__(self, x, y):
        self.x = float(x)  # Converting x and y to float in __init__ catches error EARLY.
        self.y = float(y)

    def __iter__(self):
        # __iter__ makes a Vector2d iterable; this is what makes unpacking work(e.g., x, y = my_vector). We implement it simply by using a generator expression to yield the components one after the other.(component 指代 x 或 y等)
        return (i for i in (self.x, self.y))  # This line could also be written as: yield self.x; yield self.y

    def __repr__(self):
        class_name = type(self).__name__  # self.__class__.__name__ 也可以
        # __repr__ builds a string; because Vector2d is iterable, *self feeds the x and y components to format
        return '{}({!r}, {!r})'.format(class_name, *self)

    def __str__(self):
        return str(tuple(self))  # From an iterable Vector2d, it's easy to build a tuple (for display).

    def __bytes__(self):
        return (bytes([ord(self.typecode)]) +  # To generate bytes, we convert the typecode to bytes and concatenate...
                bytes(array(self.typecode, self)))  # ...bytes converted from an array built by iterating over the instance.

    def __eq__(self, other):
        return tuple(self) == tuple(other)
        # To quickly compare all components, build tuples out of the operands(为了快速比较，利用了 tuple). This works for
        # operands that are instances of Vector2d, but has issues. See the following warning.

    def __abs__(self):
        return math.hypot(self.x, self.y)

    def __bool__(self):
        return bool(abs(self))
        # In this case, __bool__ uses abs(self) to compute the magnitude, then converts it to bool,
        # so 0.0 becomes False, nonzero is True.


# warning:
"""
Method __eq__ in this example works for Vector2d operands but also returns True when comparing Vector2d instances to 
other iterables holding the same numeric values(e.g., Vector(3, 4) == [3, 4]). This may be considered a feature or a 
bug. Further discussion needs to wait until Chapter 13, when operator overloading is covered.
"""

# 运行结果：
"""
>>> v1 = Vector2d(3, 4)
>>> print(v1.x, v1.y)
3.0 4.0
>>> x,y = v1    # A Vector2d can be unpacked to a tuple of variables.
>>> x, y
(3.0, 4.0)
>>> v1
Vector2d(3.0, 4.0)
>>> v1_clone = eval(repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)   # print calls str
(3.0, 4.0)
>>> octets = bytes(v1)  # bytes uses the __bytes__ method to produce a binary representation
>>> octets
b'dx00x00x00x00x00x00x08@x00x00x00x00x00x00x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector2d(0, 0))
(True, False)
"""

3. An Alternative Constructor

Example 9-3. Part of vector2d_v1.py: this snippet shows only the frombytes class method, added to the Vector2d definition in vector2d_v0.py (Example 9-2)

    @classmethod
    def frombytes(cls, octets):     # No self argument; instead, the class itself is passed as cls.
        typecode = chr(octets[0])   # Read the typecode from the first byte.
        memv = memoryview(octets[1:]).cast(typecode)    # Create a memoryview from the octets binary sequence and use the typecode to cast it.
        return cls(*memv)   # Unpack the memoryview resulting from the cast into the pair of arguments needed for the constructor.


# 运行结果：
"""
>>> v1 = Vector2d(3, 4)    # 准备数据
>>> octets = bytes(v1)
>>> Vector2d.frombytes(octets)
Vector2d(3.0, 4.0)
"""

The most common use of classmethod is for alternative constructors, like the frombytes in the example 9-3.

Example 9-4. Comparing behaviors of classmethod and staticmethod

>>> class Demo:
...     @classmethod
...     def classmeth(*args):
...         return args    # klassmeth just returns all positional arguments
...     @staticmethod
...     def statmeth(*args):
...         return args
... 
>>> Demo.classmeth()   # No matter how you invoke it, Demo.klassmeth receives the Demo class as the first argument.
(<class '__main__.Demo'>,)
>>> Demo.classmeth('spam')
(<class '__main__.Demo'>, 'spam')
>>> Demo.statmeth()
()
>>> Demo.statmeth('spam')
('spam',)

4. Formatted Display

The format() built-in function and the str.format() method delegate the actual formatting to each type by calling their .__format__(format_spec) method. The format_spec is a formatting specifier.

The Format Specification Mini-Language is extensible because each class gets to interpret the format_spec argument as it likes.

If a class has no __format__ , the method inherited from object returns str(my_object).

We will fix that by implementing our own format mini-language. The first step will be to assume the format specifier provided by the user is intended to format each float component of the vector.

Example 9-5. Vector2d.format method, take #1

    # inside the Vector2d class    
    def __format__(self, fmt_spec=''):
        components = (format(c, fmt_spec) for c in self)    # Use the format built-in to apply the fmt_spec to each vector component, building an iterable of formatted string.
        return '({}, {})'.format(*components)   # Plug the formatted strings in the formula '(x, y)'


# 格式化1：
"""
>>> v1 = Vector2d(3,4)
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'
"""

Now let's add a custom formatting code to our mini-language: if the format specifier ends with a 'p', we'll display the vector in polar coordinates: <r, θ>, where r is the magnitude and θ (theta) is the angle in radians. The rest of the format specifier(whatever comes before the 'p') will be used as before.

Example 9-6. Vector2d.format method, take #w, now with polar coordinates.

    def angle(self):
        return math.atan2(self.y, self.x)

    def __format__(self, fmt_spec=''):
        if fmt_spec.endswith('p'):
            fmt_spec = fmt_spec[:-1]    # remove 'p' suffix from fmt_spec.
            coords = (abs(self), self.angle())      # Build tuple of polar coordinates:(magnitude, angle)
            outer_fmt = '<{}, {}>'  # Configure outer format with angle brackets.
        else:
            coords = self   # Otherwise, use x,y components of self for rectangular coordinates.
            outer_fmt = '({}, {})'  # Configure outer format with parentheses.
        components = (format(c, fmt_spec) for c in coords)  # Generate iterable with components as formatted strings.
        return outer_fmt.format(*components)    # Plug formatted strings into outer format.


# 格式化2：
"""
>>> format(Vector2d(1,1), 'p')
'<1.4142135623730951, 0.7853981633974483>'
>>> format(Vector2d(1,1), '.3ep')
'<1.414e+00, 7.854e-01>'
>>> format(Vector2d(1,1), '0.5fp')
'<1.41421, 0.78540>'
>>> 
"""

5. A Hashable Vector2d

We will make our Vector2d hashable, so we can build sets of vectors, or use them as dict keys. But before we can do that, we must make vectors immutable.

As defined, so far our Vector2d instance are unhashable, so we can't put them in a set:

>>> v1 = Vector2d(3, 4)
>>> hash(v1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Vector2d'
>>> set([v1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Vector2d'
>>>

To make a Vector2d hashable, we must implement __hash__ (__eq__ is also required, and we already have it). We also need to make vector instances immutable. We'll do that by making the x and y components read-only properties.

Example 9-7. vector2d_v3.py: only the changes needed to make Vector2d immutable are shown here:

class Vector2d:
    typecode = 'd'  

    def __init__(self, x, y):
        self.__x = float(x)     # Use exactly two leading underscores(with zero or one trailing underscore) to make an attribute private.
        self.__y = float(y)

    @property   # The @property decorator marks the getter method of a property.
    def x(self):    # The getter method is named after the public property it exposes:x.
        return self.__x

    @property
    def y(self):
        return self.__y

    def __iter__(self):
        # __iter__ makes a Vector2d iterable; this is what makes unpacking work(e.g., x, y = my_vector). We implement it simply by using a generator expression to yield the components one after the other.(component 指代 x 或 y等)
        return (i for i in (self.x, self.y))  # This line could also be written as: yield self.x; yield self.y

Now that our vector are reasonably immutable, we can implement the __hash__ method. It should return an int and ideally take into account the hashes of the object attibutes that are also used in the __eq__ method, because objects that compare equal should have the same hash. The __hash__ special method documentation suggests using the bitwise XOR operator (^) to mix the hashes of the components, so that's what we do.

Example 9-8. vector2d_v3.py: implementation of hash

    # inside class Vector2d    
    def __hash__(self):
        return hash(self.x) ^ hash(self.y)


# hash运行结果：
"""
>>> v1 = Vector2d(3,4)
>>> v2 = Vector2d(3.1, 4.2)
>>> 
>>> hash(v1), hash(v2)
(7, 384307168202284039)
>>> set([v1, v2])
{Vector2d(3.1, 4.2), Vector2d(3.0, 4.0)}
>>> 
"""

"""
It's not strictly necessaryn to implement properties or otherwise protect the instance attributes to create a hashable type. Implementing __hash__ and __eq__ correctly is all it takes. But The hash value of an instance is never supposed to change, so this provides an excellent opportunity to talk about read-only properties.
"""

6. Private and "Protected" Attributes in Python

The single unserscore prefix has no special meaning to the Python interpreter when used in attribute names, but it's a very strong convention among Python programmers that you should not access such attributes from outside the class. It's easy to respect the privacy of an object that marks its attributes with a single _ .

In module, a single _ in front of a top-level name does have an effect: if you write from mymod import * the names with a _ prefix are not imported from mymod. However, you can still write from mymod import _privatefunc .

7. Saving Space with the __slots__ Class Attribute

By default, Python stores instance attributes in a per-instance dict named __dict__ . But dictionaries have a significant memory overhead because of the underlying hash table used to provide fast access. If you are healing with millons of instances with few attributes, the __slots__ class attribute can save a lot of memory, by letting the interpreter store the instance attributes in a tuple instead of a dict.

# A __slots__ attribute inherited from a superclass has no effect. Python only takes into account __slots__ attributes defined in each class individually.

To define __slots__ , you create a class attribute with that name and assign it an iterable of str with identifiers for the instance attributes. I like to use a tuple for that, because it conveys the message that the __slots__ definition cannot change.

class Vector2d:
    __slots__ = ('__x', '__y')

By defining __slots__ in the class, you are telling the interpreter:"These are all the instance attributes in this class". Python then stores them in a tuple-like structure in each instance, avoiding the memory overhead of the per-instance __dict__ . This can make a huge difference in memory usage if you have millions of instances active at the same time.

# When __slots__ is specified in a class, its instances will not be allowed to have any other attributes apart from those named in __slots__ . This is really a side effect, and not the reason why __slots__ exists. It's considered bad from to use __slots__ just to prevent users of your class from creating new attributes in the instances if they want to. __slots__ should be used for optimization, not for programmer restraint.

It my be possible, however, to "save memory and eat it too": if you add the '__dict__' name to the __slots__ list, your instances will keep attributes named in __slots__ in the per-instance tuple, but will also support dynamically created attributes, which will be stored in the usual __dict__ . Of course, having '__dict__' in __slots__ may entirely defeat its purpose.

The attribute of __weakref__ is presented by default in instances of user-defined classes. However, if the class defines __slots__ , and you need the instances to be targets of weak references, then you need to include '__weakref__' among the attributes named in __slots__ .