Abstraction Research: Common Intermediate Language

Alexander Morou · May 4, 2008

Well, I'm not sure if there is a forum for this. So I figured it'd go under 'Application Development'. Since what I've been doing lately would be categorized under 'programming research', I figured this would be fine. If it's in the wrong area, let me know.

Lately I've been further familiarizing myself with Common Intermediate Language (CIL for short), which is an integral part of the Common Language Infrastructure as the core language that drives the infrastructure.

Today my research point is Arrays, specifically fixed-length literal arrays that you define in code. For example in CÃ¢â„¢Â¯:

Constant Arrays - Quick Initialization

Code:

byte[, ,] o0 = new byte[,,] { 
                                { 
                                    { 0x01, 0x02, 0x03, 0x1C }, 
                                    { 0x04, 0x05, 0x06, 0x1D }, 
                                    { 0x07, 0x08, 0x09, 0x1E } 
                                }, 
                                { 
                                    { 0x0A, 0x0B, 0x0C, 0x1F }, 
                                    { 0x0D, 0x0E, 0x0F, 0x20 }, 
                                    { 0x10, 0x11, 0x12, 0x21 } 
                                },
                                { 
                                    { 0x13, 0x14, 0x15, 0x22 }, 
                                    { 0x16, 0x17, 0x18, 0x23 }, 
                                    { 0x19, 0x1A, 0x1B, 0x24 } 
                                } 
                            };

The following creates a multi-dimensional array which is 3x3x4 elements in size, or 36 bytes in length. You might think that CÃ¢â„¢Â¯ would simply re-create your array, using an array create call, and assigning each value individually, but it has one better.

It actually creates a <PrivateImplementationDetails>{Version-GUID} (lets call it 'PID' for short). That name is obviously invalid in CÃ¢â„¢Â¯, but valid in CIL provided you escape the name with single quotes. The 'Version-GUID' is a random GUID generated on compile an example of the class generated is: "<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}". As for why it uses it I can only surmise that it deters, someone reverse-engineering the project, and depending upon the PrivateImplementationDetails of that given build.

If you used a fixed-length blob of data, like above, it would generate a '__StaticArrayInitTypeSize=x' where 'x' would be the number of bytes your data covers. The actual structure is just an empty shell, but here's where the interesting part comes in, the structure is actually packed per byte, and it's given a fixed size of the length of your data. It then creates a private field using that structure and assigns it to a value at a given .data location. For example:

Code:

.field assembly static valuetype  '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'/'__StaticArrayInitTypeSize=36' '$$method0x6000001-1' at I_000020D0

The data would be defined separately in the assembly, the IL version of it is:

Code:

.data cil I_000020D0 = bytearray (01 02 03 1C 04 05 06 1D 07 08 09 1E 0A 0B 0C 1F 0D 0E 0F 20 10 11 12 21 13 14 15 22 16 17 18 23 19 1A 1B 24)

So, if you wanted to write CIL that would show as the CÃ¢â„¢Â¯ equivalent in .NET reflector, here's how you would do it, using the array shown above:

Code:

.method public static hidebysig void main() cil managed
{
	.entrypoint
	.maxstack 3
	.locals init (
		[0] uint8[0...] ja)

	//Load the constant 4-byte integer '36' onto the stack
	ldc.i4.s 36
	//Create a new instance of an array and push it onto the stack.
	newobj instance void uint8[0...]::.ctor(int32)
	//Duplicate the reference on the stack, since we'll be passing 
	//it to a method that doesn't return a value.
	dup
	//Load the RuntimeFieldHandle of the data blob onto the stack.
	ldtoken field valuetype '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'/'__StaticArrayInitTypeSize=36' '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'::'$$method0x6000001-1'
	//Initialize the array.
	call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
	//Store the array
	stloc.0
	
}

Naturally doing this kind of work by hand would be silly, but it can be done. I found out the hard way that you can't just use .NET reflector to view what's what. Main reason for this is the .NET reflector does all the work itself, without using reflection. This means it's prone to 'miss' things. Thankfully ildasm does not. It took a while to find the specifics of how these things are managed but it's fairly straight forward once you see how it's done.

I think it should be fairly easy to mimic such behavior when I make the CIL Translator. All I would need to do is do a simple type-check of the array element type and the rest is pretty easy. Arrays allow you multi-dimensional step-through using an enumerator, just determine the element size, iterate and gather the bits that way, translate to a hexadecimal bit format, and you're done with the encode.

I'll be posting later on the specifics of Short-circuiting. I have to program the transitory code necessary to handle operator overloads, implicit conversions and so on. Should be 'fun'; however, I plan on streamlining the process so I can use it in more than one area. Namely there's the CIL Code translator and there'll also be the CILTranslator that builds types/methods et cetera using the Dynamic Type Building system introduced recently into .NET (I think version 2.0, but you have to know CIL to use it, so it's not common knowledge).

I'm posting this here, in case anyone is interested.

Alexander Morou · May 4, 2008

Multi-file Assemblies
Multi-file assemblies are probably not a well-known feature of the .NET framework. The concept is simple: Your project compiles, to not one target file, but multiple, as you specify.

While the idea is simple, tracking such a thing like this is not as easy as it sounds.
Take, for instance, two modules that both cross-link between one another. Were you working on two separate projects and wanted to use classes from one in the other, and vice versa, you would not be able to do this without a multi-stage build and complex use of conditional compilation arguments. Multi-file assemblies are different. You can use Class X, from Module Z, in Module Y. If Module Y has class U, it can be utilized without any problems in Module Z.

The trick to making this work might sound simple, but there's a few annoying quirks:
If you have ModuleA and ModuleB, you must not only use a .module extern ModuleName call, but you also have to use a .file ModuleFileName.
ModuleName and ModuleFileName must be the same. So if you define a module as ModuleA, it has to exist in the file ModuleA, no extension allowed. However, if you wanted your sub-modules to be something like: ModuleB.dll, every time you referred to a type in that module, you'd use, you guessed it, ModuleB.dll.

I took two hours figuring out that the module name and filename had to be the same before I nearly gave up and tried that as a last ditch effort. I'm not sure what causes it, but .NET reflector can view the file and understands the relationships properly, but the program won't run unless the .module extern and .file references are exactly the same.

This kind of awareness is necessary, because when I handle the actual translation stage of the OIL framework using the CIL translator, I'll need to have a general idea of what the file names are going to be.

Best guess I can give right off is to have a compiler context that designates the current module and a special transformed version of its name used during translation to indicate the name as it should appear in CIL; or I could make it so that the first module will determine the output filename, subsequent module names would do the same. When they compile they merely need to specify the output path. There's a few ways, but I'm not sure on what would be best right now.

Envision, Create, Share

Abstraction Research: Common Intermediate Language

Alexander Morou

Alexander Morou

Share this page

Thank you for viewing

Discord

RPG Maker

Content