Mixing C# and Rust - Interop


This is the 3rd part of my series about writing a Rust compiler backend targeting .NET

In this article, I am going to first describe a rough, draft version(0.0.0) of the interop framework - mycorrhiza. After that, I plan on getting into how interop works under the hood, showing some examples which do not use the abstraction layer, and use raw compiler intrinsics. This will enable me to explain how exactly the backend handles concepts not present in Rust - such as virtual methods, and GC managed objects.

mycorrhiza

mycorrhiza is a crate (Rust library) facilitating easy Rust/.NET interop. My goal is to build an abstraction layer, which hides the fact that the code is running inside the .NET runtime. I want code using mycorrhiza to look as much as "standard" Rust code as possible. Ideally, I would like it to be possible to implement the exact same API using something like mono embedding API. In simpler terms, I don't want mycorrhiza to be look like some kind of "magic". I don't want to create another language. I want to use fully standard Rust syntax to express .NET specific concepts. There shouldn't be a Managed or Iron Rust - just Rust compiled for .NET.

This approach is not without its own difficulties. Not all .NET concepts map nicely to Rust, and vice versa. But I believe the benefits of this more moderate approach will outweigh its disadvantages in the end.

Hello world using mycorrhiza

So, how does interop look like?

    // Construct a new StringBuilder
    let sb = mycorrhiza::system::text::StringBuilder::empty();
    // Append charcaters which make up the `HelloWorld` string
    sb.append_char('H');
    sb.append_char('e');
    sb.append_char('l');
    sb.append_char('l');
    sb.append_char('o');
    sb.append_char(' ');
    sb.append_char('W');
    sb.append_char('o');
    sb.append_char('r');
    sb.append_char('l');
    sb.append_char('d');
    sb.append_char('!');
    // Get the .NET (managed)String from the StringBuilder
    let mstr = sb.to_mstring();
    // Call Console.WriteLine(mstr)
    mycorrhiza::system::console::Console::writeln_string(mstr);

I would say that it is pretty seamless, while still very much looking like Rust. It is not perfect(function overloading in Rust is a bit tricky), but is already quite close to what my ideal API would look like.

Here is the equivalent C# code for comparison:

    var sb = new System.Text.StringBuilder();
    sb.Append('H');
    sb.Append('e');
    // And so on..
    sb.Append('d');
    sb.Append('!');
    var mstring = sb.ToString();
    Console.WriteLine(mstring);

The CIL generated by the codegen is a little bit different, since it also coverts Rusts utf8 chars to .NETs utf16 ones, but this is done in Rust code within mycorrhiza, and not on the compiler side. In the future, you will be able to use rust crates to deal with utf16 directly, but we are still a bit off from that.

This is as good a place to mention it as any, so: the codegen now works with cargo! You can now use Rust crates, and the codegen will link them appropriately. Granted, most of them don't compile yet, but all the linking infrastructure is now there. I now plan on testing and getting more of them running, but this is a task for the future. Because of that, I'd appreciate it if you suggested some simple no-std, no-alloc, no-build.rs crates for testing purposes.

Unveiling the curtain a little bit

You are probably wondering how all of this works. How does Rust know about C# types? What you have seen there is a full Rust wrapper around StringBuilder. What if you wanted to use some other type?

I will demonstrate how to write such a wrapper using System.Diagnostics.Stopwatch as an example:

// Specify the name of the assembly a class resides in, and then its full path.
pub type Stopwatch = mycorrhiza::intrinsics::RustcCLRInteropManagedClass<"System.Runtime","System.Diagnostics.Stopwatch",>;
impl Stopwatch {
    #[inline(always)]
    pub fn new() -> Self {
        // New just calls the constuructor with 0 arguments
        Self::ctor0()
    }
    #[inline(always)]
    pub fn start(self) {
        // Call instance method with 0 arguments using instance0
        Self::instance0::<"Start", ()>(self)
    }
    #[inline(always)]
    pub fn stop(self) {
        Self::instance0::<"Stop", ()>(self)
    }
    #[inline(always)]
    pub fn reset(self) {
        Self::instance0::<"Reset", ()>(self)
    }
    #[inline(always)]
    pub fn restart(self) {
        Self::instance0::<"Restart", ()>(self)
    }
    #[inline(always)]
    pub fn elapsed_milliseconds(self) -> i64 {
        // Call virtual method `get_ElapsedMilliseconds` to get the value behind the accessor of `ElapsedMilliseconds`.
        Self::virt0::<"get_ElapsedMilliseconds", i64>(self)
    }
}

As you can see, specifying a .NET type is almost trivial: we just need to specify which assembly it is in(System.Runtime) and its full path (class name and all namespaces). By creating a type alias, we now can refer to it simply as Stopwatch, preventing potential issues. It is important to get the name of the type right, since .NET can be a bit surprising in that department. Support for generic types is not quite there yet, but it is planned.

For methods, you call them using instance*n*, static*n* and virtaul*n* respectively, where n is the number of arguments a given function takes. You then specify a list of n arguments, followed by the return type - in this case the (), Rust void type. This works similarly for constructors, with the difference of not specifying the return type, and the name being ctor.

Writing wrappers such as this one is important! The codegen has no way to check if a C# type or method exists at compile time, so if you mess this up, you might get a NotFoundException on runtime. This is not ideal, but without having the ability to open assemblies, there is not much more we can do.

#[inline(always)] just slightly improves performance and cleans up the generated code.

This is the deepest layer of abstractions almost all people will be dealing with - but there is a bit more to this than what I have shown.

Scary depths

One question still remains - how on earth does the compiler know that instance0 should be changed to call a managed function? Or that ctor0 should be replaced with a newobj instruction? Let's look at the implementation of instance0:

    #[inline(always)]
    pub fn instance0<const METHOD: &'static str, Ret>(self) -> Ret {
        rustc_clr_interop_managed_call1_::<ASSEMBLY, CLASS_PATH, false, METHOD, false, Ret, Self>(
            self,
        )
    }

rustc_clr_interop_managed_call1_ and its friends are the key to all of this. This all the magic happens. When the codegen is handling calls, and it sees that a function name contains the phrase rustc_clr_interop_managed_call the codegen recognizes that this function is special - and changes the way it is handled. The definition of rustc_clr_interop_managed_call1_ looks like this:

pub fn rustc_clr_interop_managed_call1_<const ASSEMBLY: &'static str,const CLASS_PATH: &'static str,const IS_VALUETYPE: bool,const METHOD: &'static str,const IS_STATIC: bool,Ret,Arg1>(arg1: Arg1) -> Ret 

The codegen checks the functions' generic arguments (string literals and booleans can be generic arguments in Rust), and uses them to produce a CallSite - internal representation which tells the codegen where to call. And this is really all there is to it - the compiler sees a function with a special name, and replaces it with a call to a managed method.

Almost exactly the same thing happens to ctor0 - with the difference being it is converted to a newiobj instruction.

This is the way I plan on implementing some other .NET features (field access, managed array indexing) - with magic functions hidden and abstracted away within mycorrhize.

Some unresolved questions

There are still some iffy things and problems without good solutions. One of them is enforcing some of the constraints of n managed references. Managed references can't be stored on the unmaged heap - it is an UB. This means that in Rust, they should not leave the stack. They can be copied, moved around the stack - but they should not leave this. This concept is not really something easy to model in Rust. Lifetimes only control how long a piece of data lives - not where it exactly is. Pinning enforces something stays in one place - but can't be mixed with Clone and Copy, since those enable you to move the data off the stack. My current solution would be for the codegen to enforce those by preforming checks while compiling - but it is not its job. It can work, but it will be quite clunky. If you have any ideas - let me know.

Another option is to enforce using reference handles - but that has a fair bit of overhead.

First benchmarks!

By adding support for basic interop, I managed to get some very rough benchmarks. You should not take them at face value, since my methodology is far from perfect. They were quite consistent, but I did not account for the JIT, optimizing things in longer tests - leading to some bizarre results.

The test was designed to give C# best footing - I only wanted to see how much performance penalty I incur by not optimizing the generated Rust CIL as much as I should. Because of that, those tests don't involve allocating any memory at all, and focus only on speed of raw computations.

For test setup details, check the GitHub repo.

Codegen Optimzations Disabled means that the Rust code was compiled in release mode, but post-MIR, codegen-internal optimizations were disabled.

Fibonacci of 10, recursive, Avg of 10K runs

Rust native (release): 100 ns

Rust native (debug): 360 ns

Rust .NET (default optimizations): 270 ns

Rust .NET (codegen optimizations disabled): 330 ns

C# release (pure IL): 250 ns

C# debug (pure IL): 370 ns

As you can see, the difference between optimized C# and optimized .NET Rust code is not all that big. It is noticeable(~10%), but I would say it is a pretty good result considering how few optimizations are done right now. The results also show that with optimizations disabled, the generated code is roughly in the same ballpark for all compilation methods. This also shows what the project is not - it does not bring Rust-level raw performance to .NET. I am looking into potentially using the LLVM backend to generate native instructions to then store in a mixed-mode assembly, but that is something I will start seriously working on when the project is mature - at best in a couple of years.

So, let's now look at the "wrong" test results - the ones JIT(?) decided to mess up:

Fibonacci of 10, recursive, Avg of 100M runs

Rust native (release) : 107 ns

Rust .NET (default optimizations) : 252.3 ns

C# release (pure IL) : 281.66 ns

C# somehow got slower as time went on(IDK how that could happen), while .NET Rust got a small boost. This is not my PC throttling - I repeated the tests, switching between C# and Rust, so that they would be run one after the another, back to back. The order did not matter - the results were nothing but consistent. Which is an issue - because while .NET Rust getting faster due to JIT optimizing it better seems reasonable, I would not expect C# to get slower. The simplest explanation is that I messed up the tests. I am using the simplest benchmark method possible (due to interop being not good enough yet) - not the most accurate one.

Nonetheless, I believe those results are good enough to carefully guesstimate that the performance of generated Rust code is good enough to not warrant further optimizations just yet.

Conclusion

While we are still a bit off from 0.0.3, I think the project is coming along rather nicely. While the school and a gang of 17th century poets seem to be trying to take as much of my time as possible, I got a bit of help from a couple of contributors, which should hopefully even things out ;).

If you want to check it out, here is the link to the project. I started working on improving documentation, so feel free to take a look!

Tanking my contributors

For all my projects, I like to thank people who help me along the way:

karashiiro - Kara, who has committed quite a bit of more complicated stuff.

Their work included: slice indexing support, support for setting/getting a tuple from a reference, enhanced support for setting/loading some primitive types, and fixes to parsing dotnet versions(used for testing)

Ignacy Koper - who improved the readability of the README

Henrik Kjerringvåg and Tshepang Mbambo - who both fixed small typos within the README