【转】Inside the ObjectiveC Runtime

Inside the Objective-C Runtime

by Ezra Epstein
05/24/2002

Introduction

Once upon a time dynamism in languages (especially OO languages) was a point of debate. Essentially dynamism won: Java added "reflection" and C++ added Run-Time Type Information. And the out-and-out dynamic language, Objective-C or ObjC for short, was used to build Mac OS X's User Interface frameworks, dubbed "Cocoa."

With dynamism you can encode features that are configured as the program is running. This comes in extremely handy in a number of places, including writinggeneralized database access code (like in EOF), writing generalized HTTP request/response or Web Services systems (like with WebObjects), or building dynamic, custom user interfaces (e.g., with InterfaceBuilder).

One reason -- yes, there are others -- that scripting languages like Perl became so popular for building dynamic Web sites is they are inherently dynamic. Being interpreted (rather than compiled) runtime is "compile-time". Of course there's a performance penalty (and the lack of type checking -- which some might count as a feature -- among other issues) and that's where compiled dynamic languages like Objective-C come in.

Most compilers are designed with a single goal: translate the logic encoded in a format designed to be used by people (programming language) toward a format that tells a computer what to do (machine language). A small amount of "additional" information is retained to allow a linker to stitch together pieces of executable (machine) code (or to allow a programmer to debug the code).

This means that most compilers lose a lot of information. The Objective-C compiler is "smart". By that I mean that it doesn't "forget" all the intelligence you put into designing your code. Information about the structure of your code (things like class names and method names) is extracted by the compiler and stored in data structures (C structs). Those data structures are available when the program is executing and, together with functions for accessing and updating that information, comprise the Objective-C runtime.

Basic Features of Objective-C Runtime

Objective-C provides robust dynamism so, among other things, you don't have to "roll your own." Access to the ObjC runtime via ObjC (Cocoa) classes and functions comes with the Foundation framework. NSObjCRuntime.h defines several such functions and all classes that implement the NSObject Protocol (e.g., all subclasses of NSObject) have more built-in. The Foundation framework is on every Mac OS X machine. If you want to write code, however, you'll need to install the Developer package, which will add the header files for this and other frameworks.

Objective-C is, of course, object-oriented, so let's start with an object, or rather a class (of which the object is an instance). Since we're looking at dynamism, let's imagine we've got a text or XML-based config file (perhaps a property-list, a .plist). This config file will contain instructions, starting with the name of a class to use. With runtime access we can transform the name of a class (a string) into the Class itself:

#import <Foundation/NSObjCRuntime.h>

NSString *aClassName; // read from a "config" file

Class namedClass = NSClassFromString(aClassName);

If the named class doesn't exist in the runtime (e.g., hasn't been loaded), NSClassFromString() returns Nil. (Note: We've imported the precise .h file above for clarity. Usually you #import <Foundation/Foundation.h> to take advantage of the speed of pre-compiled headers.)

Once we've got a class we'll want to invoke its methods. To do so we use a "selector" of type SEL.

NSString *aMethodName; // Exists. E.g., from a "config" file

SEL aSelector = NSSelectorFromString(aMethodName);

If aMethodName does not refer to an actual method name on some class currently loaded in the runtime then aSelector will be NULL.

(Deeper: Selectors are used for method invocation. ObjC distinguishes method invocation from method implementation. It's a lot like having function pointers in C with the added *pow* that all functions are named and can be referenced by name. Functions with the same name can resolve to (different) implementations depending on an object's class. It sometimes takes a little while to get it, but when you do, you realize this amounts to a lot of power for the developer.)

Internally, a selector (SEL) is a const char*. See for yourself:

printf("%s\n", aSelector);

(You may need to cast aSelector to const char* to avoid a warning from your compiler). SELs have one additional feature: they are unique within the runtime so two SELs of the same method exhibit pointer equality.

The NSObject Protocol defines a method that lets us invoke a selector on a class or an instance. On a class we just:

SEL desc = NSSelectorFromString(@"description");

[namedClass performSelector:desc];

In the invocation above, the runtime is being used to "dynamically bind" the selector to an underlying method implementation. In the case of ObjC, methods are alwaysbound dynamically. Even when we invoke methods the "regular" way the invocations are dynamically bound to their corresponding implementations. (SeeDynamic Binding in the Cocoa documentation for more details on this subject.)

Let's get an instance of our named class:

id anInstance = [[[namedClass alloc] init] autorelease];

(This assumes that namedClass's designated initializer is -init. See Allocation and Initialization: The Designated Initializer for more details.) Now we can message our instance of namedClass (if it inherits from NSObject or otherwise implements the NSObjectProtocol):

[namedClass performSelector:desc];

Things are pretty much the same when messaging (or invoking) a method that takes parameters. NSObject defines two convenience methods:

- (id)performSelector:(SEL)aSelector withObject:(id)object;

- (id)performSelector:(SEL)aSelector withObject:(id)object1

withObject:(id)object2;

The methods specified by aSelector should return an object. (If your method returns a C type, you can wrap the result in an NSValue before returning. Otherwise, to message methods that do not return an object, you'll need to use NSInvocation.)

Digging Deeper Into Runtime

The Objective-C runtime is written in C. In fact, the original version of Objective-C was implemented as a C compiler pre-processor. To get "inside" the Objective-C runtime we use C functions to access C data structures. The central C struct is struct objc_class* (a.k.a., a Class).

To get the struct and function definitions we'll include the objc header files. They should already be on your system in: /usr/include/objc (or System/Developer/Headers/objc). You don't need Mac OS X, however. The source code and the headers for the runtime are available as part of the Darwin open source project: objc4-217.tar.gz. (If you haven't already you'll need to register to download the source, but registration is free.)

In the previous article, we got a class from a name (NSString*). But what if we want to list out all the classes loaded in the runtime? Fire up your editor of choice (I used ProjectBuilder and started with a "Tool" project) and make sure the header files from objc are in your include path (they should be by default).

#import <stdio.h>

#import <objc/objc-runtime.h>

#import <objc/hashtable.h>

void showIvars(Class klass) {  /* code is below */ }

void showMethodGroups(Class klass, char mType) { /* code is below */ }

int main (int argc, const char *argv[]) {

    NXHashTable * class_hash = objc_getClasses(); // NOT in a multi-threaded*** App.

    NXHashState state = NXInitHashState(class_hash);

    struct objc_class * klass;

    while (NXNextHashState(class_hash, &state, (void **)&klass)) {

        printf ("%s\n", klass->name);

        showIvars(klass);

        showMethodGroups(klass, '-'); // instance methods

        showMethodGroups(klass->isa, '+'); // class methods

    return 0;

Please note -- The runtime code changed somewhat since Mac OS X. It now uses locks to support multi-threaded access. The code above will work on the greatest number of deployed systems. For details on how to use the newer function, see the comment for objc_getClassList() in objc-runtime.h or see the code in AllClasses.m of the RuntimeBrowser.

Compile this and run it. Since showIvars() and showMethodGroups() are just stubs -- which we'll fill in later -- you'll get class names without other details. Which classes you see depends on which Frameworks (or other ObjC binaries) you've linked in. By default, using ProjectBuilder, you'll see all of the classes in the Foundation.framework. This includes hidden, undocumented classes and sub-classes of class clusters like NSString!

Take a look at objc/objc.h. Among other things, you'll see:

typedef struct objc_class *Class;

Yup, a Class is a pointer to struct objc_class as advertised. Once we've got a Class we access runtime information from this struct. While we're here let's look for a moment at the definition of type id (the universal Objective-C object pointer) also defined in objc/objc.h:

typedef struct objc_object {

    Class isa;

} *id;

An object of type id is a pointer to a struct that contains a single element: an isapointer to its struct objc_class in the runtime. That's it.

Objc_class is defined, surprisingly enough, in objc/objc-class.h. We're going to use it to extract info from the runtime. Let's look at its parts.

struct objc_class {

struct objc_class *isa;

struct objc_class *super_class;

const char *name;

long version;

long info;

long instance_size;

struct objc_ivar_list *ivars;

struct objc_method_list **methodLists;

struct objc_cache *cache;

struct objc_protocol_list *protocols;

};

The first and second elements are pointers to objc_class structs. The first is the isapointer for the class. Class objects are full-fledged objects: they each have an isapointer to their Class. The way it works is this: an object's isa points to the Class. That Class (struct objc_class) contains all the instance variables (objc_ivar_list) declared in the Class, but its objc_method_list contains only the instance methods defined with the class.

The Class pointed to by the Class' isa (the Class' Class) contains the Class' classmethods in its objc_method_list. Got that? The terminology used in the runtime is that while an object's isa points to its Class, a Class's isa points to the object's "meta Class".

So what about a meta Class' isa pointer? Well, that points to the root class of the hierarchy (NSObject in most cases). (In the Foundation framework each meta class of a subclass of NSObject "isa" instance of NSObject.) And yes, by the way, NSObject's meta Class' isa points back to the same struct -- it's a circular reference so no Class'isa is ever NULL.

Whew! The relevant bit out of all that: the object's Class has the instance methods, the class's Class (a.k.a., meta Class) has the class methods.

The next part of the struct is the super_class, a pointer to a class' superclass: the class it inherits from. The super_class pointer is NULL for root classes in the hierarchy (e.g., NSObject).

The const char *name is, big surprise, the name of the class.

The long version records information about which version of the compiler a class was compiled with. It’s checked in the runtime to see if certain features are available for a given class. We can ignore this.

Long info contains information about the class structure: whether it's a meta class, whether it's posing as another class, etc. Again this is of use mostly to the internal functioning of the runtime and we'll ignore it. The objc/objc-class.h file contains a list of#defined values (just after the struct objc_class definition) if you're interested.

The long instance_size is the number of bytes occupied by an instance of this class.

Each of a class' instance variables (ivars) is represented by a struct objc_ivar which contains the name (ivar_name), encoded type information (ivar_type) and the ivar's offset from the instance's address in memory. You can access the structs for all of a Class' ivars by traversing its struct objc_ivar_list *ivars.

OK, enough abstract background, let's use this. Let's take a look at some of this rich trough of information. Here's how to show the ivars:

void showIvars(Class klass) {

int i;

Ivar rtIvar;

struct objc_ivar_list* ivarList = klass->ivars;

if (ivarList!= NULL && (ivarList->ivar_count>0)) {

printf (" Instance Variabes:\n");

for ( i = 0; i < ivarList->ivar_count; ++i ) {

rtIvar = (ivarList->ivar_list + i);

printf (" name: '%s' encodedType: '%s' offset: %d\n",

rtIvar->ivar_name, rtIvar->ivar_type, rtIvar->ivar_offset);

}

But be warned: you'll get a lot of information. Better to display this amount of detail for individual classes. Note this code displays type information as it is encoded in the runtime. The RuntimeBrowser (see sidebar) decodes this information back to the more familiar.h file form.

In a similar way you can get information about all the methods implemented by a class. Remember a class holds the instance methods; you have use the "meta class" (klass' isa) to get information about class methods. Here's a function to display method information:

void showMethodGroups(Class klass, char mType) {

void *iterator = 0; // Method list (category) iterator

struct objc_method_list* mlist;

Method currMethod;

int j;

while ( mlist = class_nextMethodList( klass, &iterator ) ) {

printf (" Methods:\n");

for ( j = 0; j < mlist->method_count; ++j ) {

currMethod = (mlist->method_list + j);

printf (" method: '%c%s' encodedReturnTypeAndArguments: '%s'\n", mType,

(const char *)currMethod->method_name, currMethod->method_types);

}

Again, this produces a lot of information and is best used to view details of a single class.

The remaining elements of struct objc_class include:

The cache, which is used to speed up message dispatch. The first time a method is invoked on a class the result of the lookup is stored in the cache -- subsequent lookups are FAST.
The list of protocols this class conforms to. If you're curious about displaying Protocol names, see the RuntimeBrowser.

Notice in showMethodGroups() there's a nested loop. What's going on here? Well the inner loop is of methods. And the outer loop is over groups of methods or ...categories. So now, with a little more knowledge of method lookup, we can answer the question from the first article: if you've got two methods with the same name defined on a class (at least one therefore in a category) which one gets invoked?

During method lookup, the runtime traverses the arrays of methods in the order they are stored. That order is the reverse of the order in which each group of methods is loaded. As the class must be loaded before categories are added to the class, the methods declared in a class itself are always loaded first, so end up last in the list. Since the runtime does method lookups from first-to-last, a category method will always override a method declared in the class itself.

If two different Categories define the same method, however, then it all depends on the order in which the bundles that contain those categories are loaded when an application is launched. To a large extent you can control this order by forcing bundles to be loaded when a program first starts, and being wary of loading bundles dynamically. But generally the rule is: don't redefine category methods in another category as this can lead to inconsistent results...

Please let me, or the good folks at O'Reilly, know what you think about this article. If you've noticed errors or thought of ways in which the presentation could be improved, feel free to comment in the TalkBack section at the bottom of this page.