CCI and Metadata

CCI represents an assembly’s metadata through the CCI Metadata model. The associated CCI Metadata API is used by all CCI applications, even those that use CCI Code to work with code blocks. The following sections describe the basics of CCI Metadata.

Host Environment

Many aspects of applications are specific to the application environment. For example:
  • Applications can target any of several mscorlib.dll versions. Typically, applications specify unification, which directs .NET to use the most recent version on the system, but they can also target a specific version.
  • Applications can be stored in a variety of ways. Typically, applications are stored on the hard drive, but they could also be stored in a database, downloaded from a Web site, and so on. Applications can even be embedded in a Word or Excel document.
The CCI libraries are not associated with any particular application environment. For example, you can use CCI to work with assemblies from any kind of storage. However, the CCI libraries require application-specific information such as where files are located, where to find referenced assemblies, or the targeted .NET version.

From a CCI perspective, such issues are matters of application policy, and are handled by a separate host object that understands the application environment. CCI queries the host for application-specific information. The host handles the details and returns the results to CCI through a standard interface. For example, when CCI must locate an assembly, it calls the host’s FindAssembly method and passes it a string that defines the assembly. The string is typically a file path, but it could represent another type of storage, such as a SQL query to retrieve the assembly from a database. FindAssembly interprets the string appropriately and retrieves the assembly.

CCI provides a default host object that is sufficient for many applications. If your application requires non-standard support, such as handling files that are not stored on the hard drive or specifying a particular mscorlib.dll version, you must implement a custom host. For more details, see Hosts.

Applications typically use a single host object, but there are some cases that require multiple hosts. For example, a host object can represent only one instance of mscorlib.dll. If you want to compare two mscorlib.dll versions you need a separate host object for each DLL.

Comparing Strings and Types

CCI must often test strings or types for equality. CCI supports two objects, NameTable and InternFactory, which improve the performance of such tests.

Comparing Strings

The simplest way to test strings for equality, which is used by many .NET methods, is a character-by-character comparison. However, this approach is not very efficient. CCI Metadata improves the efficiency of string comparison by using a NameTable object.

NameTable is a container for a collection of key-value pairs. Each value is a string and the associated key is a unique integer. Once you have added the relevant strings to a NameTable, you can test for equality by comparing two integers, which is much faster than character-by-character comparison.

DefaultHost automatically creates a NameTable object, and stores it in its NameTable property, which is sufficient for most applications. However, there are some applications that use multiple hosts. In that case, you create your own NameTable object and pass it to each DefaultHost constructor when you create each new host object. That ensures that every host uses the same key-value pairs.

Comparing Types

Often, there is only one object per type, so you can compare types by comparing the object identities. However, applications can also have references to types, which are contained in reference objects. For example, you use a reference object to reference a type in another assembly.

Two type references might not be for the same object, so you can’t simply resolve the references and compare object identity. To determine whether two objects are instances of the same type, you must compare the objects’ structure, which is relatively expensive.

CCI uses an InternFactory object to improve the performance of type comparisons. InternFactory works somewhat like NameTable. When CCI constructs a type, it uses an InternFactory to determine the type structure, and assigns a unique integer to the type. You can then use that integer for all subsequent comparisons, which is much faster than comparing structure.

Mutable and Immutable Representations

CCI Metadata provides two ways to represent a PE file, mutable and immutable.

The Mutable Representation

The mutable representation is an object model that represents the contents of a PE file. You can use the properties and methods exposed by mutable objects to modify the assembly. For example, a method’s metadata is represented by a MethodDefinition object, which applications can use to modify a method’s metadata. The objects that support the mutable object model are in the Microsoft.Cci.MutableCodeModel namespace.

Note: The Microsoft.Cci.MutableCodeModel namespace is a somewhat confusing historical artifact. It actually supports the CCI Metadata mutable representation. The objects that support CCI Code are in the Microsoft.Cci.Ast namespace.

The Immutable Representation

The immutable representation is a set of interfaces, each of which provides read-only access to a corresponding mutable object’s properties. An immutable interface uses the mutable class name, prefaced by “I”. For example, the immutable interface for MethodDefinition is I MethodDefinition. The interfaces that support the immutable representation are in the Microsoft.Cci namespace. The immutable representation includes everything in a PE file except:
  • Data that is the same for all PE files.
  • Data that can be derived from other information in the file.
The objects that make up the mutable representation all have immutable interfaces, so the immutable representation is essentially a passive data structure that provides read-only access to the corresponding mutable representation.

How to Create a Mutable Representation

Applications typically start with an immutable representation of a PE file. For example, CCI Metadata applications often use DefaultHost.LoadUnitFrom to load an assembly from the hard drive. LoadUnitFrom returns an immutable representation of the assembly, IAssembly.

The immutable representation is sufficient for most analysis applications, which just need information from the assembly. Any application that modifies the assembly, such as rewriting applications, must work with a mutable representation. There are two approaches to obtaining a mutable representation:
  • Create a mutable copy of an immutable representation. CCI provides objects, called mutators, which traverse an immutable representation and produce a mutable copy.
  • Create the assembly from scratch. The HelloIL Sample Walkthrough describes a simple example of how to use CCI Metadata to create an assembly.

The Immutable Contract

When you pass all or part of an object model to a method, you always pass the immutable representation, even if you are working with a mutable representation. For example, CCI applications typically store an assembly by passing its Microsoft.Cci.IAssembly interface to the Microsoft.Cci.MutableCodeModel.PeWriter method, which converts the assembly's object model representation to the PE format and stores it as a PE file.

When you pass an immutable interface to a method, you implicitly accept a contract to make no further changes to the underlying object model. This contract allows methods to safely cache property values, multiple threads to safely access the object model without obtaining locks, and so on. Follow these basic guidelines for using the mutable and immutable representations:
  • If an object model is in flux, don’t pass it to anyone; keep the object model private until it is stable.
  • Once you have passed an object model to someone, you’ve effectively given up ownership; do not make any further changes.

Next: Reading and Writing PE and PDB Files
Return to Beginning

Last edited Jan 21, 2010 at 5:20 PM by Guy_Smith, version 1

Comments

jpmartin Sep 27, 2010 at 11:26 PM 
perhaps worth saying: to get a mutable version of a model, call codeMutator.GetMutableCopy(model). Then, you can use e.g. the Dispatch methods: you get an immutable reference but you can cast it to the mutable version.