Enumerating over Generics - rustc_codegen_clr v0.0.3


It has been quite a while since I last wrote about the progress of the project. Writing even a part of a compiler is not an easy feat, so it is still far from finished. But, Rome was not built in a day!

I have made some very substantial improvements, which should enable the compilation of a lot more rust code, which would not work previously.

Some of the work is very visible (support for generics, enums, for loops, dynamic allocation), other things not so much (better testing, optimization). There have also been a bit of other, small improvements, that should allow the project to move forward at a much greater pace. So, let's jump right into what new things 0.0.3 brings.

Counting to 3

Something might seem off to you. If the last version I wrote about was 0.0.1, why on earth is this one called 0.0.3?

Shortly after finishing 0.0.2, as I was preparing to write the next article, something did not sit right with me.

While the new version did bring a lot of new features, this was also the time I realized I made a serious mistake. This was to be expected - this project was created as a silly way of learning more about compilers. Since I started working on it without fully understanding what I was doing, I was bound to make a lot of bad decisions.

The issue was not that I made a mistake. Oh no, I made a MISTAKE.

One that could not be fixed without a lot of effort, and made work going forward way harder than they should be. In theory, I could take the wonderfully approach of pretending the issues did not exist, happily building a horrific pile of spaghetti held only by duct tape and faith. The issue with this approach is that, at some point, I would run out of both duct tape, and faith. So, I decided to rewrite the project, from scratch.

Rewriting it in Rust

This may seem like a reaction blown way out of proportion. And normally, I would agree that starting the project, all over again, is a silly way to refactor.

While this was the biggest roadblock, was not the only issue (small, minor problems plagued the project).

Those were fixable with a smaller scale refactor, but this one was ingrained pretty deep inside the project.

So, forgive me if I bore you, but I want to take a minute to explain why it was so hard to fix.

Trouble with types

One of the most often used parts of this codebase is the representation of a type. It is used everywhere - from method arguments, through arithmetic, to initializing object fields.

Since it is so key, a smallest issue with it will cause knock-on effects, requiring changes all over the place. A substantial enough problem will cause an avalanche of trouble.

At the begging of the project, I had decided to use one representation of a type for everything. This seems entirely reasonable - a type is a type, right?

So, I would just convert the types I got from the fronted of rustc, and then use one, universal representation everywhere. This was quite convenient, and seemed simple at the time.

Well, it turns out there is a bit more to types than meets the eye. There is no "One type representation to rule them all" . While all the data I wanted to represent was indeed a "type", it does not mean all places I used it in had the same needs. Getting a field of a struct needs information, such as field names, that is not needed at all in something like a function signature.

I now use 3 different type representations. Each with their respective, well-defined function.

  1. TypeDef - This is the representation of a non-primitive .NET type. Types, such as Rust's Option, have one definition, but are used in many places. Previously, I would copy type data for every place the type was used. So, an instruction storing a field would have not only the information about the particular field it referenced, it had the information about every other field this type has. Separating type definitions from type uses both saves on memory, and improved performance. It also made things far clearer and easy to read.
  2. Type - This represents either a primitive .NET type (float, native int, UInt128) or a reference to a .NET type. For a reference, it stores only the data needed to look up the type. It does not store info about nested types, fields, all that other stuff - this is the job of a TypeDef.
  3. Rustc's Ty - I have decided to stop trying to reinvent the wheel and use the type representation the smart compiler people gave me. One of the disadvantages of this type is that it is not supposed to exit the codegen unit it lives in. Codegen unit does not exactly correspond to a module or library, but it is a small, self-contained part of the compiled code. There can be a lot of codegen units, and you need to somehow collect all of their results and create the final executable. This is why I initially avoided using Ty too much. I needed something that could cross codegen units - and this type can't. By separating the types I used into TypeDef, Type and the builtin Ty, I only needed to ensure my 2 custom representations could be saved to disk. This then allowed me to just use Ty in far more places - and convert to a serializable format just before saving the results to disk.

By doing things this way, I avoid handling a lot of the complexity myself - and since Ty is quite well optimized, it improved performance too.

To place things into perspective - the 3 representations are used around 400–500 times in the whole codebase. When refactoring, I would need to take each of them into account, and choose the right one depending on the circumstances. This would also force me to bin quite a bit of code, since it was built with the old representation in mind.

So, fixing the old code would require substantial am mount of effort, and I would still be left with all the other issues. My decision to rewrite everything may seem far more reasonable now. In the end, it turned out to not take all that long (2 days for a bare-bones skeleton and around 5 for roughly feature parity).

I am quite happy with the results. The codebase should be much cleaner now, although some things still are not fully supported(e.g. not all type casts work, since I discovered a flaw with the previous implementation).

Originally, I wanted to write a bit about the other issues, but they were a bit boring. I don't want to pretend like everything I write just works, so I thought mentioning the underlying issues behind the rewrite could be a nice thing to share.

I had also considered naming this release 0.1.0(since it was a pretty big change), but I ultimately decided against it. I fell like using 0.1.0 would suggest the project is far further along than it is. I want 0.1.0 to be the version that can be used more widely, and can compile a lot of Rust code without any changes.

Generics

One other big advantage of this rewrite is that I now got to implement generics the proper way. One of my goals is keeping a high-quality translation when I can, without sacrificing Performance. This is why I decided to implement rust generic types as C# generics. NOTE: this does not mean you can instantiate new variants of generics from C#. All rust functions aren't generic. Types within the translated rust code have their generic parameters fully specified at compile time.

C# generics are only used to make the resulting code more readable and easier to use. A rust function with this signature:

fn distance(point:Vec3<f32>,destination:Vec3<f32>)->f32

Will get translated to a function like this in C#:

float distance(Vec3<float> P_0,Vec3<float> P_1){
    //Code here
}

You can't, however, call this function with Vec3<int>. This may seem obvious to most, but I want to clarify what the project can, and can't, do.

Generics are also used in type definitions. This is how the translated version of the rust core::ops::Range type looks like.

[StructLayout(3)]
public struct Range<G0>
{
	public G0 start;

	public G0 end;
}

Generic arguments are declared as Gn, where n is the index of the generic argument. There are a couple of advantages to this approach. For one, Rust MIR does not store info about the name of a generic argument - it stores them as indices. When substitution (turning a generic type into its variant) happens, this indexed generic is replaced with the type at position n within the substitution array. Storing them this way not only saves memory, but makes working with generics easier.

Generics open a whole world of possibilities. They are everywhere in Rust. Even a "simple" for loop uses them (Every iterator in Rust returns generic type Option<T>).

Rust enums are the other piece needed to bring a large chunk of Rust code to .NET

Anatomy of an enum

Rust enums may look deceivingly very hard to implement, but they turned out to be relatively straightforward. A rust enum consists of 2 parts: a "tag" describing which variant of the enum this particular instance holds, and additional data specific to that variant. They are very similar to F#'s Discriminated Unions (The main difference is in implementation details). For people familiar only with C#, I can say that the closest thing resembling a Rust enum is a polymorphic class, with only a fixed amount of children classes.

This is roughly how the Rust Option<T> enum looks like:

enum Option<T> {
    None,
    Some(T),
}

And here is how the codegen translated it:

public struct Option<G0>
{
	[StructLayout(3)]
	public struct None
	{
	}
	[StructLayout(3)]
	public struct Some
	{
		public G0 m_0;
	}
	[FieldOffset(0)]
	public byte _tag;
	[FieldOffset(1)]
	public None v_None;
	[FieldOffset(1)]
	public Some v_Some;
}

As you can see, every variant has its own subtype. This makes a lot of work related to enums very easy. The layout of the enum type itself is explicit: it consists of a tag, starting at the very beinging of this data structure, and all the data specific to variants, which starts just after the tag ends. Getting the data of a particular variant is quite easy:

// If this is the `Some` variant
if (option._tag == 1)
{
    //Get the data stored in it.
    long value = option3.v_Some.m_0;
}

Setting the enum variant is equally trivial:

//Set data
option3.v_Some.m_0 = value;
//Set tag
option3._tag = 1;

The v_ prefix before the variant field name is there to make the decompiler happy, and simplify interop. The runtime had no problems with field name and type name being the same, but referring to those variants from C# would be impossible without this.

Overall, enums turned out to be very simple. Mixing enums with generics provided a bit more of a challenge, with the rules about field definitions being a bit confusing. Accessing field like that:

stfld int32 valuetype core.option.Option/Some<int32>::m_0

turned out to be invalid, since accessing generic fields requires using generic parameters.

stfld !0 valuetype core.option.Option/Some<int32>::m_0

Those, and other kinds of nasty surprises, took a couple of days to fully iron out, but all enums should "just work" now.

Formidable for loops

For loops in Rust look deceptively simple. What if I told you this is a Rust for loop, turned into C#?

    	long end = 90L;
	global::System.Runtime.CompilerServices.Unsafe.SkipInit(out core.ops.Range<long> range3);
	range3.start = 65L;
	range3.end = end;
	core.ops.Range<long> range4 = range3;
	global::System.Runtime.CompilerServices.Unsafe.SkipInit(out core.option.Option<long> option3);
	while (true)
	{
		long start3 = range4.start;
		long end4 = range4.end;
		if (start3 < end4)
		{
			long start4;
			long start5 = core_iter_range_Step_forward_unchecked(start4 = range4.start, (nuint)1);
			range4.start = start5;
			option3.v_Some.m_0 = start4;
			option3._tag = 1;
		}
		else
		{
			option3._tag = 0;
		}
		byte tag = option3._tag;
		nint num2 = (int)tag;
		if (tag != 0 && num2 == 1)
		{
			long v = option3.v_Some.m_0;
			long num3 = 0xA00 | v;
			puts((byte*)(&num3));
			continue;
		}
		break;
	}

This looks absolutely ridicules! Admittedly, part of this madness comes from my imperfect attempts at optimization, and the decompiler getting very confused. Another thing to consider is that Rust for loops are more like C# foreach, which too can look a tad bit silly when looking at the IL. After a bit of cleanup, the loop should look far clearer. I will first show how the loop looked like in Rust, and then explain this bizarre sight.

for i in 65..black_box(90){
    let msg = (0x00_0a_00_i64)| i;
    unsafe{puts(core::ptr::addr_of!(msg).cast())};
}

This Rust loop is quite weird too. The black_box is here to prevent a certain compiler optimization, with which the codegen does not know how to deal with. The msg is just an ASCII C-style string. It would look something like this: "\0\n" and the value of i as an ASCII character. Because almost all modern processors are little-endian, the order of this string hidden inside an integer, is inverted. In reality, it is a string "{i}\n\0" - with \n delimiting a new line, and \0 marking the end of this string. This whole loop just prints the characters from A-Z, using as little Rust features as possible. Let's take a look at the cleaned-up C# translation.

//Variable declaration
core.ops.Range<long> range = default(core.ops.Range<long>);
core.option.Option<long> option = default(core.option.Option<long>);
// Variable initialization
range.start = 65L;
range.end = 90L;

while (true)
{
	// Check if you are within range
	if (range.start < range.end)
	{
		long next = core_iter_range_Step_forward_unchecked(range.start, (nuint)1);
		range.start = next;
		// Set option to variant `Some(next)`
		option.v_Some.m_0 = next;
		option._tag = 1;
	}
	else
	{
		//Set option to variant `None`
		option._tag = 0;
	}
	//If option is Some, print the letter of alphabet, and continue, if not, end the loop.
	if (option._tag == 1)
	{
		long num3 = 0xA00 | option.v_Some.m_0;
		puts((byte*)(&num3));
		continue;
	}
	break;
}

I have removed all the stuff coming from the decompiler. While this still is not fully clear, it is far easier to understand.

Optimizing for loops.

The amount of effort necessary for explaining this to a human makes you think how confused the poor JIT must be getting. I sadly can't provide the native instructions it generates (the tool I use for getting such JIT seems to have some issues), or exact timings, but it is pretty safe to assume that a loop having 100 IL instructions will be harder to optimize than 10 IL one.

You may be wondering why CIL generated by rustc_codegen_clr uses so many IL instructions. There is a good reason for that: I want the CIL to be as accurate as possible. At the first stage, I transform each Rust MIR statement into a sequence of instructions that match it exactly. This, of course, leads to CIL being a bit more complicated than it needs to be. If rust IR sets a local variable, and then immediately reads its value, you will be able to see that in the initial CIL. initial is the key word here. After that, the optimization module preforms a set of small, incremental changes to the CIL. Each one of those micro-optimizations does very little on its own - but combined, they slash the instruction count in half.

This approach ensures accuracy - since each of those small transformations does not change the behavior of the compiled program. It also makes debugging easier - since I can look at the CIL generated for each Rust statement.

So, what more can we do about for loops being a bit inefficient?

Struct splitting

NOTE: I am describing a planned, WIP feature, that does not work yet. I am writing about it because I find it interesting from a technical POV. You can skip this chapter if you want.

Let's imagine we are refactoring this piece of code. One thing that we may immediately do is get rid of the Range and Option structs. They are not passed anywhere, and we only read/set the values of their fields. So, let's change all fields to variables.

//Variable declaration
long v_Some = default(long);
byte tag = default(byte);
// Variable initialization
long current = 65L;
long end = 90L;

while (true)
{
	// Check if you are within range
	if (current < end)
	{
		long next = core_iter_range_Step_forward_unchecked(current, (nuint)1);
		current = next;
		// Set option to variant `Some(next)`
		v_Some = next;
		_tag = 1;
	}
	else
	{
		//Set option to variant `None`
		_tag = 0;
	}
	//If option is Some, print the letter of alphabet, and continue, if not, end the loop.
	if (_tag == 1)
	{
		long num3 = 0xA00 | v_Some;
		puts((byte*)(&num3));
		continue;
	}
	break;
}

This not only makes the code easier to look at - it will also make reasoning about those variables easier. This should also make the way our code works easier to understand by the JIT, since it removes a lot of needless instructions. But how to do something like this automatically?

Detecting if struct can be split

Not all structs can be treated this way. Remember what I said? They are not passed anywhere - those structs live only as local variables. We can't (or at least we should not) do things like this with struts that either come from outside or leave the function. Doing this would require deconstructing/reconstructing the struct, which would balloon the size of the CIL, and likely lead to worse performance. We need to detect local structs, which have only their fields accessed. How can we do that?

Local variables in C# can be accessed using 3 instructions: stloc, ldloc, and ldloca. If a local is set by stloc or read by ldloc, we can immediately discard it as non-spllitable. It is either directly read from or written into as a whole. Reading fields involves the combination of ldloca(LoaD LOCal Adress) and ldfld, stfld or ldflda. If we see a local accessed only using ldloca and each access followed by one of the field-related instructions, we know we have a split table struct. We can then transform all of its fields into local variables, replace combos of ldloca and field accesses instructions with reads/sets of locals representing each field, and voilà! We have transformed a struct into variables, reducing the amount of CIL we will emit, and enabled further optimizations.

Why is this not fully implemented already?

This would take some time to implement, and will not work efficiently without another small optimization.

Currently, "moving" of a Rust type is implemented as coping its value. This is accurate, and works as MIR specifies. There is an unfortunate side effect to this - a lot of those copies are needless. This is not a performance problem, but the codegen optimizer sees those as loading/storing a value into a local. Until situations in which a move can be removed are properly detected, this optimization won't be applied in a lot of cases. So, while I work on more features and finish figuring move optimizations out, automatic splitting structs is on the back burner.

Malloc, Realloc, and Freends.

This is probably the smallest feature - C functions malloc, realloc and free are implemented, and you can use them to dynamically allocate memory. This is useful for tests - I had used it to partially re-implement the rust Vec type for testing purposes. You will not get the full benefits of the borrow checker yet - since drop's are currently ignored.

Testing

While testing may not be the most exciting thing in the world, it is still quite important. And this is why cargo test will now also use the codegen to build a few simple programs, and to use mono and dotnet to ensure they work exactly as expected. The tests are assertion-based, so there is still a bit of room for a mistake slipping trough, but it is unlikely. I would be shocked if a bug lead to assertions not triggering and would not be detected any other way. Still, I am working on tests sending the results of calculations over the standard input, and the runner verifying those results. This is far more involved, but has the advantage of not relying on the tests "checking themselves", thus improving the test quality.

Thanking contributors

It is always nice to see when someone thinks your project is worth contributing their time and effort to. So, I would like to thank people who helped with the development of rustc_codegen_clr:

SasakiSaki - who contributed a small fix allowing for the codegen to run on Windows

karashiiro - who reported a bug, with codegen not playing nicely with new versions of ilasm, and helped in fixing it, also contributing a fix to dotnet version detection.

I also would like to thank the people over on the rust zulip, for answering my questions both with incredible speed and a lot of care.

I would like to thank bjorn3 in particlular, whose in-depth yet straightforward answers helped tremendously.

I hope you enjoyed this article! It is a bit shorter / less polished than I would like, but my time is seriously limited. If you liked this article, you can take a look the github project, follow me on my other social media(linked up top), or subscribe to the RSS feed.