simoneb's blog

about design, development, profession and avoidance of fluff

Past, Present and Future of .NET Assembly Merging

Compiling code written for the .NET framework usually produces assemblies in the form of either standalone executables or libraries. These assemblies contain mainly Microsoft Intermediate Language (MSIL) code which is then jitted upon execution.

It’s not uncommon for libraries which expose public APIs to rely internally on assemblies provided by third parties that should not end up in the final package. I’ll take the mocking library NSubstitute as an example.

There are commonly a couple of reasons why they shouldn’t. First, the client code does not need to reference them directly as they are only used internally. Back to the mocking framework example, it wouldn’t make a lot of sense to build a dynamic proxy from scratch as there exist excellent implementations already, like Castle DynamicProxy, which NSubstitute uses.
The other reason is to avoid cluttering the release package of the library with dependencies that are of no interest to the end user, thus simplifying the distribution of the library which would then consist of a smaller number of files.

The aim is thus to embed somehow the dependent assemblies in a single assembly. There has been mainly a single approach to this so far, and it’s called ILMerge.

ILMerge

Mike Barnett’s ILMerge is a free tool which statically links several assemblies into a single output assembly. It runs as a console application and rewrites the IL of the main assembly by embedding the contents of the other assemblies into it. It takes care of a couple of additional things, like strong naming and target framework version, but accepts additional configuration options which allow further customization. One particularly useful is internalization of types, which allows to change the accessibility levels of the types contained in the dependent assemblies, thus effectively hiding them to everyone except the containing assembly.

One thing to be aware of is that embedding types belonging to one assembly into another assembly makes the first lose its original identity. This can represent either a pro or a con depending on the use case. Serialization and security may not work correctly, as they usually make some use of metadata related to the assembly, which now no longer exists, but also opens up interesting scenarios.

NSubstitute ILMerges the Castle.Core.dll assembly, which contains the DynamicProxy part. If you were to use NSubstitute in a project where you needed to make use of DynamicProxy directly you would be safe, and could use whatever version of the library you liked, as the runtime would consider the types as belonging to different assemblies and thus not conflicting in any way. The only issue you could encounter is of conflicting namespaces if the assembly wasn’t merged with its types internalized. Public types would in fact still be visible and you would need to specify some alias to avoid ambiguous references.

If on the other hand NSubstitute didn’t merge Castle.Core.dll and distributed it alongside NSubstitute.dll, then you would be somewhat tied to use the version distributed with NSubstitute. Loading different versions of the same assembly in one AppDomain is in fact discouraged as it can generate all sort of weird behaviors. Also it wouldn’t be very easy to load two versions of the same assembly because unless the assemblies resided in the GAC you would have to find a way to keep them both on the file system, and assuming you usually load assemblies from the same folder you couldn’t let two files with the same name stay in the same folder. In this case the version of the assembly ending up in the output folder would depend on the build process, according to the order it chose to resolve dependencies during compilation.

Another recent and blocking shortcoming of ILMerge has to do with WPF assemblies. These assemblies contain embedded resources which in turn encode the identities of the original containing assembly. As I said merged assemblies loose their identities, therefore these resources would no longer behave correctly. ILMerge would thereby need to extract the resources, patch the encoded assembly identities replacing them with the target assembly identity and re-encode them, something which it is not capable of doing as of now.

And that’s from embedded resources that comes another solution.

Embedding assemblies as resources

Serializing other files into assemblies as resources has always been possible. Resources can then be deserialized at runtime and manipulated. They can be anything, although commonly used for storing media, icons and inanimate data in general. Nothing prevents you from storing assemblies in there, and load them at runtime. This approach has been described thoroughly by Jeffrey Richter in his excellent CLR via C# book, whose relevant excerpt is available here.

The approach is simple, so simple that ILMerge creator commented:

As the author of ILMerge, I think this is fantastic! If I had known about this, I never would have written ILMerge.

The trick to make it work is to subscribe to the AppDomain’s AssemblyResolve event. This event is fired when the runtime is unable to locate a referenced assembly, and as a last chance gives the application developer a chance to pick it up according to custom logic. The logic here is to load the assembly from the resources and load it into the application domain.
One caveat is that the AssemblyResolve event needs to be subscribed to before any code requiring the embedded assemblies is ever run, which is not always possible. One such case is a .dll, which has not explicit entry point.

To work around that you have to rely on a feature of the .NET framework which is not exposed to the programming languages directly, called module initializers. Think of type initializers (aka static constructors), just for modules. In the same way as type initializers are guaranteed to be called before any members in a type are accessed, modules initializers extend this guarantee to the types with an module (assemblies usually contain a single module).

Placing the AssemblyResolve subscription instructions in a module initializer is thereby either the safest or unique option to inject custom assembly resolution logic, but requires IL manipulation.

Costura

Costura is a neat open source project developed by Simon Cropp which takes care of all the steps described above, and is thereby the suggested way to merge third party assemblies using the embedded resource approach.

It does so by embedding all referenced assemblies marked as copy local (thus excluding assemblies living in the GAC, for instance) into the main assembly, and then injecting the module initializer code which subscribes to the AppDomain’s AssemblyResolve event and looks up unresolved assemblies from the embedded resources.

Costura only requires you to call its custom MSBuild task in the AfterBuild target of the main assembly project.

Costura task (costura.xml) download
1
2
3
4
5
6
7
<UsingTask
    TaskName="Costura.EmbedTask"
    AssemblyFile="$(SolutionDir)[path to]\Costura.dll" />

<Target Name="AfterBuild">
    <Costura.EmbedTask />
</Target>

This approach however comes with its drawbacks too. Most notably it prevents multiple version of the same assembly to be loaded, as in this case the assemblies maintain their identity. This can be dangerous because it could lead to non-determinism in which version of one assembly is loaded. For example, if you distributed a library using this approach and the library client was referencing directly an assembly with the same identity as one of those embedded in your library, which assembly gets loaded in the application domain would depend on the order in which the client code uses your library. If your library is loaded in memory first, then its version of the third party assembly would be loaded in memory, in the other case the client code version of the library would be loaded instead, leading to a fully working application in the best case and to a runtime exception in the worst.

Both approaches have their pros and cons, but it’s useful to realize that both exist and know when to apply each.

Comments