simoneb's blog

about design, development, profession and avoidance of fluff

Past, Present and Future of .NET Assembly Merging

Compiling code written for the .NET framework usually produces assemblies in the form of either standalone executables or libraries. These assemblies contain mainly Microsoft Intermediate Language (MSIL) code which is then jitted upon execution.

It’s not uncommon for libraries which expose public APIs to rely internally on assemblies provided by third parties that should not end up in the final package. I’ll take the mocking library NSubstitute as an example.

There are commonly a couple of reasons why they shouldn’t. First, the client code does not need to reference them directly as they are only used internally. Back to the mocking framework example, it wouldn’t make a lot of sense to build a dynamic proxy from scratch as there exist excellent implementations already, like Castle DynamicProxy, which NSubstitute uses.
The other reason is to avoid cluttering the release package of the library with dependencies that are of no interest to the end user, thus simplifying the distribution of the library which would then consist of a smaller number of files.

The aim is thus to embed somehow the dependent assemblies in a single assembly. There has been mainly a single approach to this so far, and it’s called ILMerge.

ILMerge

Mike Barnett’s ILMerge is a free tool which statically links several assemblies into a single output assembly. It runs as a console application and rewrites the IL of the main assembly by embedding the contents of the other assemblies into it. It takes care of a couple of additional things, like strong naming and target framework version, but accepts additional configuration options which allow further customization. One particularly useful is internalization of types, which allows to change the accessibility levels of the types contained in the dependent assemblies, thus effectively hiding them to everyone except the containing assembly.

One thing to be aware of is that embedding types belonging to one assembly into another assembly makes the first lose its original identity. This can represent either a pro or a con depending on the use case. Serialization and security may not work correctly, as they usually make some use of metadata related to the assembly, which now no longer exists, but also opens up interesting scenarios.

NSubstitute ILMerges the Castle.Core.dll assembly, which contains the DynamicProxy part. If you were to use NSubstitute in a project where you needed to make use of DynamicProxy directly you would be safe, and could use whatever version of the library you liked, as the runtime would consider the types as belonging to different assemblies and thus not conflicting in any way. The only issue you could encounter is of conflicting namespaces if the assembly wasn’t merged with its types internalized. Public types would in fact still be visible and you would need to specify some alias to avoid ambiguous references.

If on the other hand NSubstitute didn’t merge Castle.Core.dll and distributed it alongside NSubstitute.dll, then you would be somewhat tied to use the version distributed with NSubstitute. Loading different versions of the same assembly in one AppDomain is in fact discouraged as it can generate all sort of weird behaviors. Also it wouldn’t be very easy to load two versions of the same assembly because unless the assemblies resided in the GAC you would have to find a way to keep them both on the file system, and assuming you usually load assemblies from the same folder you couldn’t let two files with the same name stay in the same folder. In this case the version of the assembly ending up in the output folder would depend on the build process, according to the order it chose to resolve dependencies during compilation.

Another recent and blocking shortcoming of ILMerge has to do with WPF assemblies. These assemblies contain embedded resources which in turn encode the identities of the original containing assembly. As I said merged assemblies loose their identities, therefore these resources would no longer behave correctly. ILMerge would thereby need to extract the resources, patch the encoded assembly identities replacing them with the target assembly identity and re-encode them, something which it is not capable of doing as of now.

And that’s from embedded resources that comes another solution.

Embedding assemblies as resources

Serializing other files into assemblies as resources has always been possible. Resources can then be deserialized at runtime and manipulated. They can be anything, although commonly used for storing media, icons and inanimate data in general. Nothing prevents you from storing assemblies in there, and load them at runtime. This approach has been described thoroughly by Jeffrey Richter in his excellent CLR via C# book, whose relevant excerpt is available here.

The approach is simple, so simple that ILMerge creator commented:

As the author of ILMerge, I think this is fantastic! If I had known about this, I never would have written ILMerge.

The trick to make it work is to subscribe to the AppDomain’s AssemblyResolve event. This event is fired when the runtime is unable to locate a referenced assembly, and as a last chance gives the application developer a chance to pick it up according to custom logic. The logic here is to load the assembly from the resources and load it into the application domain.
One caveat is that the AssemblyResolve event needs to be subscribed to before any code requiring the embedded assemblies is ever run, which is not always possible. One such case is a .dll, which has not explicit entry point.

To work around that you have to rely on a feature of the .NET framework which is not exposed to the programming languages directly, called module initializers. Think of type initializers (aka static constructors), just for modules. In the same way as type initializers are guaranteed to be called before any members in a type are accessed, modules initializers extend this guarantee to the types with an module (assemblies usually contain a single module).

Placing the AssemblyResolve subscription instructions in a module initializer is thereby either the safest or unique option to inject custom assembly resolution logic, but requires IL manipulation.

Costura

Costura is a neat open source project developed by Simon Cropp which takes care of all the steps described above, and is thereby the suggested way to merge third party assemblies using the embedded resource approach.

It does so by embedding all referenced assemblies marked as copy local (thus excluding assemblies living in the GAC, for instance) into the main assembly, and then injecting the module initializer code which subscribes to the AppDomain’s AssemblyResolve event and looks up unresolved assemblies from the embedded resources.

Costura only requires you to call its custom MSBuild task in the AfterBuild target of the main assembly project.

Costura task (costura.xml) download
1
2
3
4
5
6
7
<UsingTask
    TaskName="Costura.EmbedTask"
    AssemblyFile="$(SolutionDir)[path to]\Costura.dll" />

<Target Name="AfterBuild">
    <Costura.EmbedTask />
</Target>

This approach however comes with its drawbacks too. Most notably it prevents multiple version of the same assembly to be loaded, as in this case the assemblies maintain their identity. This can be dangerous because it could lead to non-determinism in which version of one assembly is loaded. For example, if you distributed a library using this approach and the library client was referencing directly an assembly with the same identity as one of those embedded in your library, which assembly gets loaded in the application domain would depend on the order in which the client code uses your library. If your library is loaded in memory first, then its version of the third party assembly would be loaded in memory, in the other case the client code version of the library would be loaded instead, leading to a fully working application in the best case and to a runtime exception in the worst.

Both approaches have their pros and cons, but it’s useful to realize that both exist and know when to apply each.

Speaking’s Overrated

Today is the Italian Agile Day, happening in Rome.

I didn’t attend mostly because past editions haven’t been very interesting, with a few exceptions, and I’ve been following some sessions on the live streaming site from home.

It happens quite rarely that I hear anything really interesting being said at these meetings, but I can live with that, there’s usually not enough time to delve into a topic deep enough to provide any valuable understanding. More often, there is barely time to instigate curiosity so that people can further research when they go back home. This is perfectly healthy, and what I try to do when I speak myself.

Sadly what I see happening most of the times are speakers talking for hours without really saying anything except trying to sound convincing. I’ve experienced this so many times now that I’m starting to think that it is probably intentional attitude rather than a side effect due to lack of skill or anything else. 
When I was younger I would look at these talks in wonder, asking myself when I would have been able to speak in the same way about complex topics that at the time I just could not understand. Now that I’ve somewhat grown up professionaly I can look at things a bit differently and judge with more awareness what is going on, and it pisses me off. 
It does mostly because when you speak to people you have some responsibility towards your listeners, and they don’t expect you to deceive them or may not even be able to realize that it’s what you’re doing. There are other reasons, however.

Out of context - deliberate omission

You have probably heard someone saying that context is king. It indeed is. Many people tend to make absolute statements way too often, and to avoid the effort of figuring out whether there may be any truth in what they’re saying my current strategy is to simply assume that they are wrong.

Frameworks are a smell

QA is useless

These are just some examples of things I’ve heard presenters say in the recent past. Now I consider myself as being on neither side of the spectrum when it comes to these topics, because as usual context is king. How can you say that using an ORM is always wrong and slows you down? There are compromises that you need to be ready to accept, but aren’t always there?

Another variation of taking things out of context is when something is deliberately omitted. It is way too easy to support your thesis when you intentionally omit arguments that go against it. It’s not easy to realize omissions unless you know a topic very well, or you’re directly involved in the omission, like I had to hear right today about proposing (I proposed it) a supposedly terribly bad TDD video course to be watched by development teams in the company.

Knowledge flaunt

As I said it’s usually hard to delve into a topic enough during a time constrained speech. Nonetheless, it should not be an excuse to pretend you know much more about the topic than you actually do.
Once again it’s hard to judge whether someone is really an expert or is pretending to be one, and the most effective way is to know the person yourself. The most annoying attitude is when the speaker is presenting at the most of his knowledge while pretending that he’s talking about the easy things, and there’s much more to it that you’re not entitled to know just yet.

This happened very recently to me, with a person I know presenting the easy things about a topic and laughing annoyed at most people not getting it right. Just a week before me and him talking about the very same topic in a slightly more advanced manner caught him off guard, with me thinking whether he really knew anything about it.

Fluff

Lots of presenters finally just speak of fluff. In Italian it translates nicely to fuffa. This is probably the easier to figure out when attending a presentation, because at the end you are left thinking whether anything of what you’ve heard really made any sense. 
Someone built their entire career on this, and you usually see them proposing training and courses. These people are dangerous mostly because they are attractive to companies. I personally believe that these people are not even worth a penny.

Why speaking is overrated

There is much more to this than I’ve been able to explain in my first blog post after several years, but I hope I have given some valid arguments.

I am under the impression that pretty much everything that you can hear talking about is overrated, as people have to make a living out of it.