How big is the performance overhead of MonoGame \ Xamarin?
Feb. 12, 2019
Have you ever wondered what is the performance overhead of using Xamarin and MonoGame? Here you can find out the results after testing them on various platforms and devices.
I don’t know about you, but since the moment I heard about this new technology, known as Xamarin, I have been wondering: “But what about the performance?”. It had always been so: with Java and others that came before. They would have slogans like “Write once, run anywhere”. The real story, however, often was quite a bit more nuanced. Making it the worst when it came to the actual application performance. So, naturally, I was really skeptical when Xamarin came about and I needed to do a proper real-world performance test to draw my own conclusions. And what would be a better way to do it, I figured, than the force behind computer technology advancement for the last few decades – real-time computer graphics.
Xamarin is really different from the other multiplatform frameworks in a sense that it is not really a framework. It is actually an entirely different animal. When other frameworks just introduce, for example a virtual machine on top of the OS, which in extreme cases might result in a virtual machine inception, Xamarin just uses regular old .NET as the development base and then translates it into native code for the target platform. This, hopefully, means a much better performance and better native API support, because on the base level it means getting rid of abstraction layers.
Before we move onto actual tests, a disclaimer is in order. This article should not be regarded as a race of devices or operating systems. It is not the purpose of this analysis in any sense. The devices that were selected are of different brands, use different Operating Systems and represent vastly different price points and release dates. The purpose of the test is to see the impact of an “added black-box framework” and the viability of releasing application from the same codebase using Xamarin. To be honest, even before starting the tests, it was absolutely clear that a Desktop PC would crush all other devices if we were looking only at the raw results. If only because it has no physical limitations meaning it can unleash the full power within, where other devices are of limited size and their batteries just could not be allowed to consume 400W of power constantly.
The whole testing procedure has been separated into a few different stages to test various areas that could possibly have performance bottlenecks. So, from top to bottom:
Swarm particles. This test renders an increasing number of opaque 2D sprites in a limited 800x400 area. It primarily tests for any CPU boundaries as most of the heavy lifting happens processing individual sprite positions. It is also a good test for Garbage Collector performance.
Swarm particles (with pooling). Same with added pooling for sprite particles.
Pooling is a technique where you avoid instantiating new objects. When a situation arises where a new object would be needed, you’d use an object from a “pool” of unused object instances instead. This technique was extremely popular in the early days of game development (think NES, SEGA era), where memory and processing power was very limited. You would often see this manifesting as a limit to bullets that could be fired (or even seen) on the screen at the same time. In more recent times there were reports that garbage collection on consoles (XBox 360 for example) was a really time-consuming operation, so it was generally a good idea to limit the instantiation as much as possible. When testing the application, I did not see much of a difference between the pooled or un-pooled tests, so this is another plus for Xamarin as it seems the garbage collection is implemented rather well.
Alpha fill rate. This test renders an increasing number of transparent sprites which get combined on the alpha channel for the additive alpha effect. It tests the overall fill rate and alpha channel calculation performance.
Alpha fill rate (with pooling). Same, with added pooling.
3D cube swarm. This test renders an increasing number of simple 3D geometry – cubes with all basic 3D rendering logic you’d find in a 3D game – transformations, lighting, texturing. It tests the 3D rendering performance and is still, in essence, CPU bound as most of the calculations with simple geometry would fall within spatial particle data manipulation.
3D cube swarm (with pooling). Same, with added pooling for cube particle data.
High poly model render. This test renders a higher polygon count model. The model has 11523 polygons, as opposed to 12 polygons of the cube. This test should be a middle ground in a tug-of-war between CPU and GPU, probably leaning more on the latter.
Über high poly model render. This test renders an absurdly high (483055 polygons) polygon count model which you would never use in a game, if you were sensible. After loading the model in, all the heavy lifting falls into the capable hands of the GPU. CPU only supplies minimal transformation and lighting data every frame and then it proceeds to the actual rendering.
Shader (GPU) particle swarm. This test is the favorite of mine, as it showcases the true power of the GPU. It renders particles only by employing the GPU itself: for transformations, positioning and rendering. You might notice a significant increase in the particle count if compared to the other tests. Sadly, only a few of the devices got to run this stage, as this uses advanced shader code from shader model 5. Monogame’s HLSL to GLSL translator MojoShader only supports parsing instructions from Shader Model 1 through 3. This means any platform that is not HLSL based (read – not Windows based), is off limits.
It was the most challenging to get running, but the most rewarding of the lot. Mostly, because it allowed to feel the sheer power of a graphical processor.
If you want to see a full test in action, here is a video showing one full test run. Be warned though, it’s quite hypnotic!
Every test runs by doubling the particle count every few seconds until it reaches a certain point, around 28-30 FPS (Frames Per Second) which could be considered the limit between a smooth-running application and an unsatisfactory application performance. It tries to stabilize the run at a certain measured particle count and then does a final straight where it just increases the particle count until the device tanks. Afterwards the test results get posted to a results server.
After sifting through the data, the limiting particle count was picked for every device and a corresponding test. Then the highest performance (most particles on screen) for a test was taken as the 100% and all the others ranked as percentages below that. This is the data we will be looking at.
First – the raw data.
Resolution adjusted values. For this we take the previous data and adjust it to take into account the different native resolutions of the devices. The device with a highest combined pixel count (resolution Width x Height) keeps its score while the others get their score diminished as percentage of the total area, accordingly. In essence, here we are seeing the performance per pixel. As you can see, the device ratings are immediately shifted around.
CPU/GPU adjusted values. It is similar to the resolution adjustment, but in this case, we try to mitigate different hardware on the devices. For this, we have taken test data from notebookcheck , which lists and rates various GPUs: desktop, notebook, mobile, you name it. Even cooler, it lists the most prelevant chipset combos.
This, together with the last test, are actually the most unscientific ones, as there is still a myriad of unknowns, but it nevertheless should allow the gist of the data to ring true. The adjustment ratio is based on 3DMark – Ice Storm Unlimited Graphics Score 1280x720 offscreen test because it had the widest device test base, which, luckily included chipsets in all of the tested devices.
After this adjustement all the device results start to clump togeter.
CPU/GPU and resolution adjusted. This one combines resolution and hardware adjustments into one final score. And as you can see there is no longer a clear dominating device. The discrepancies start to converge as test bias.
To be honest, I was not expecting results this close. I really thought the Windows-based devices would be on top and all the others would rather be in the category of “well, it runs..”. To my surprise the same code is running on all major platforms without any adjustments and the performance is really stellar. You would, of course, get a much better performance by writing strictly native code and using native low-level libraries, but this would mean a lot more work for additional performance gain which in most cases would be negligible. I am really pleasantly surprised by MonoGame and Xamarin. Keep up the good work guys!
If you are interested, you can see the raw results here: