This project is read-only.

PE Basics

Windows applications and dynamically linked libraries (DLLs) are packaged as as portable executable (PE) files. CCI is primarily concerned with two types of PE file, .NET assemblies and modules.

In general, an assembly contains four related sets of data:
  • A manifest, which contains metadata that describes the assembly, including version requirements and security identity.
  • Type metadata, which describes the types in the assembly.
  • Microsoft intermediate language (MSIL) code for all the types in the assembly.
  • Resources, such as images.
Modules have almost the same structure and contents as assemblies. The difference is that modules do not include a manifest. This document focuses on assemblies, but most of the discussion also applies to modules.

Assemblies are commonly packaged in a single file, but not necessarily. For example, the manifest, type metadata, and MSIL could be in one file, and the images in a separate resource file. Assemblies can also be dynamically generated and exist only in memory.

The contents of an assembly can be divided in to two basic parts: metadata and code.

Metadata

Assemblies are basically a container for a collection of types. They include metadata that completely describes the assembly’s contents, including each type and its members. For example, class metadata includes:
  • The class’s visibility.
  • The parameters, visibility, and calling conventions for each method.
  • The type and visibility of each data member.
  • How the class fits into the assembly’s namespace hierarchy.
Some applications focus entirely on metadata. For example, the SandCastle application uses metadata to determine the overall structure and contents of an assembly, and uses the type metadata to populate an HTML page with syntax blocks, parameter lists, and so on for each API member. Even applications that focus primarily on an assembly’s code must still work with the assembly’s metadata.

Code

The compiler translates .NET source code into MSIL, which is a CPU-independent intermediate language, and then stores the code in the assembly. Type members have a body, which contains a code block of MSIL instructions and some related information. For example, the following example shows the HelloIL sample’s Main method in C#:
main()
{
  Console.WriteLine(“Hello”);
}

The corresponding MSIL is as follows:
.method public static void Main() cil managed
{
  .entrypoint
  .maxstack 8
  L_0000: ldstr "hello"
  L_0005: call void [mscorlib]System.Console::WriteLine(string)
  L_000a: ret 
}

The code block is the collection of lines starting with “L_”, and the remainder is the method body. The code consists of three instructions, which:
  1. Load the “hello” string.
  2. Call Console.WriteLine and pass it the loaded string.
  3. Return.
Note: MSIL requires an explicit return instruction for every method, even when it is not required in high-level code, such as the C# equivalent of Main, shown earlier.

When you run the application, a just-in-time (JIT) compiler converts the MSIL instructions into machine code for the local platform. MSIL is designed to be easily converted to machine code, so a code block consists of a flat list of instructions that is generally similar to assembler code. Even complex and highly structured source code is still represented in an assembly as a flat list of MSIL instructions.

Type References

All assemblies reference types in the .NET core assembly, mscorlib.dll, and most assemblies also reference types in other DLLs. The same type can be exported by multiple DLLs, so a type reference must be associated with a specific DLL. This is a particular issue with the .NET DLLs, because most systems have several .NET versions, and the same type is often exported by multiple DLLs.
Next: CCI Models
Return to Beginning

Last edited Jan 21, 2010 at 6:52 PM by Guy_Smith, version 2

Comments

No comments yet.