Layout rework and benchmarks
One of our ongoing projects in the past three months has been a long-overdue revisit of the layout system, to bring it to a fully release ready state.
Today, we'll be looking at the results of that work, and then doing some benchmark comparisons against some other popular layout systems, followed by some of our analysis and thoughts on the numbers that came up.
The layout rework
The old layout implementation was something that we arrived at quite early on in the development process of PanGui. Over the years, as our requirements changed and we added new features like more sophisticated min/max constraints, property blending, and support for arbitrary node content like text, it grew dense, buggy, difficult to understand, and far more complicated than it needed to be.
The core principles remained sound, but we had to revisit the implementation and create a maintainable version that knew exactly what it had to be and do from the beginning. Ultimately, we ended up choosing to rewrite the entire implementation from scratch. This gave us the chance to choose the right abstractions for what the layout feature-set had become, rather than what it started as.
As a result, layouting has gone from around 11.000 lines to around 4.800 lines of C#. This line count includes all of the data structure and enum declarations, and the user API method definitions - not merely the algorithm implementation itself.
Most of this reduction came from getting rid of our previous mirrored width and height implementations and switching to a single axis-agnostic implementation instead, integrating blending directly into the core property computations rather than bolting it on top, better code sharing between hypothetical computations and the committing of final answers, and a greatly simplified and more robust expand solver algorithm.
The code is overall now far simpler, making it both easier to understand, debug and modify than it was before. We also added a great deal of new tests to ensure we didn't break anything, improving test coverage significantly.
We even managed to add a few new low-hanging fruit features along the way, such as fit-largest-element - generally useful for various wrap cases. We also greatly simplified how layout property blending works. Any property can now very easily animated from any state to any other state on a per-property basis. We handle many situations where both Yoga and browsers fail to provide correct answers. The new implementation does more than the old one, despite being simpler and having far less code - which is always nice!
Benchmarking
To figure out whether we had regressed or improved the performance, we decided to do some simple benchmarking to compare the old and new implementations. Performance improved in many cases, at worst remaining around the same, and at best being almost twice as fast. Because it wasn't far out of our way, we decided to also add some other popular layout libraries to see how our new implementation stacked up against them.
For this, we chose Yoga, a very popular C++ implementation of the flexbox spec; Taffy, a Rust implementation of the flexbox spec; and Clay, a single-header high performance library written in C which offers some simplified flexbox-like features.
These benchmarks were run with the following settings:
| Library | Repo | Commit SHA | Compiler |
|---|---|---|---|
| Clay | nicbarker/clay | 76ec363 | GCC 13.1.0 -O3 -ffast-math -fomit-frame-pointer -DNDEBUG |
| Yoga | facebook/yoga | 8ba025e | GCC 13.1.0 -O3 -fomit-frame-pointer -DNDEBUG |
| Taffy | DioxusLabs/taffy | 175442d | Rust 1.92.0 opt-level = 3, lto = true, codegen-units = 1 |
The methodology we used is that we only measure actual layout computation time, not node/tree creation time, memory allocation, or tree destruction. We do 100 warmup iterations, then sample layout computation for at least five seconds (and at least 100 measured iterations). The time shown is average per iteration.
Finally, the benchmarks are running on a Ryzen 7 7800X3D chip, with 64GB RAM running at 5600 MHz, and PanGui is running on .NET in Release mode, with no further tweaking of settings.
With that out of the way, these are the numbers we measured. You can expand each benchmark to see the structure being tested. Note that the syntax shown is completely arbitrary and was chosen for ease of parsing - where PanGui uses Expand, flexbox uses flex-grow with a flex-basis of 0.
| Test | Nodes | PanGui (C#) | Clay (C) | Taffy (Rust) | Yoga (C++) |
|---|---|---|---|---|---|
| ▶ expand_with_max_constraint | 3001 | 0.2077 ms | Min/Max size constraints not supported | 1.6222 ms | 1.6559 ms |
| ▶ expand_with_min_constraint | 3001 | 0.2060 ms | Min/Max size constraints not supported | 1.6154 ms | Min constraint on growing node is incorrectly calculated |
| ▶ fit_nesting | 101111 | 6.6357 ms | 4.0009 ms | 110.9751 ms | 60.0391 ms |
| ▶ flex_expand_equal_weights | 15001 | 1.0659 ms | 0.6220 ms | 5.3728 ms | 4.8458 ms |
| ▶ flex_expand_weights | 15001 | 1.0404 ms | Only expand weights of 1 is supported | 7.5270 ms | 5.7768 ms |
| ▶ nested_vertical_stack | 10001 | 0.6461 ms | 0.3551 ms | 3.7867 ms | 2.9214 ms |
| ▶ padding_and_margin | 101 | 0.0086 ms | Margin not supported | 0.0149 ms | 0.0332 ms |
| ▶ percentage_and_ratio | 10001 | 0.4398 ms | Clay is supposed to have aspect ratio, but we couldn't figure out how to get it to work - sorry! | 3.3133 ms | 2.9622 ms |
| ▶ perpendicular_expand_with_wrap | 12001 | 0.8500 ms | Wrapping of elements not supported. | 4.5410 ms | 58.7574 ms |
| ▶ pixels_with_min_expand_constraint | 30001 | 2.4423 ms | Min/Max size constraints not supported | Expand as min/max constraint is not supported | Expand as min/max constraint is not supported |
| ▶ wide_no_wrap_simple_few | 1001 | 0.0412 ms | 0.0343 ms | 0.2507 ms | 0.1921 ms |
| ▶ wide_no_wrap_simple_many | 100001 | 4.6859 ms | MISMATCH (uint16 constrained) | 48.6879 ms | 28.5494 ms |
| ▶ wide_wrapping | 10001 | 0.5672 ms | Wrapping of elements not supported. | 3.8239 ms | 16.5290 ms |
Analysis
There's a few things to note, here.
First of all, GCC is doing an incredible job with the C/C++ libraries, Clay and Yoga. We first ran these benchmarks using MSVC with similar optimization settings, and switching to GCC sped Yoga up by a factor of 2x - 3x across most benchmarks, and Clay averaged 20% - 50% performance gains. We knew there would be a difference, but it was shocking that it was that stark.
Yoga fails the expand_with_min_constraint test, as it appears to not treat min-width + flex-grow correctly, using min-width as a flex-basis to grow from, rather than simply as a constraint. We confirmed this by comparing against various browser implementations.
Yoga also seems to have notably bad exponential algorithmic time scaling in the perpendicular_expand_with_wrap test, leading to a rather extreme 69x performance ratio deficit against PanGui. If we 10x the number of nodes in the test, the ratio actually grows to PanGui being 703x faster than Yoga, and Taffy being 129x faster, with both PanGui and Taffy having roughly linear performance scaling in this test. We'll also note here that compiling with -ffast-math on seems to break Yoga, so that's why we left that off.
Lastly, using .Net 9, PanGui overall seems to be doing pretty good and we're far closer to Clay than we ever expected to be, typically staying within a respectable shouting distance. It will be interesting to see the results later this year when we run the C/C++ version of PanGui against the same benchmarks - we will get to see just how well the .NET JIT produces machine code.
There is also some variance by hardware. X3D chips do seem to particularly favour PanGui for reasons we're not quite sure of, sometimes resulting in up to 10% - 20% better performance over same-gen non-X3D chips. That said, the overall patterns in the benchmarks seem to hold across a range of our dev team's hardware, from Intel to AMD to Apple M4 chips.
A caveat
We should be clear that, to some degree, we are comparing apples to oranges. These benchmarks were thrown together relatively quickly, and it's certainly possible to be far more thorough than we've been here. That said, the numbers should be fairly indicative of the sorts of results we can expect. We do intend to release this benchmark suite (or something very like it) alongside PanGui, so other people can double-check our numbers and benchmark implementations.
These tests were initially written to compare two versions of PanGui's layouting, so the tests are written from "PanGui's perspective". PanGui is not implementing the full flexbox spec, quirks and warts and edge-cases and all. It is instead offering a different feature-set that largely lets you solve all of the same constraints that flexbox does (and many that flexbox doesn't), just using a smaller set of composable primitives.
The one notable exception is that PanGui currently does not have a direct equivalent to flex-basis, a base size from which an element grows or shrinks. Typically, in PanGui, you would instead express that sort of thing by declaring a base desired size, and then constraining it using a min and max of expand. Adding a concept like flex-basis to PanGui's expander algorithm would not be that difficult, but we suspect this category of problem is solved sufficiently by the current features, and so we are holding off on adding a direct analogue until we see some users with cases that have a legitimate need for it.
Some things PanGui offers have no analogues in other layout systems at all (that we know of), such as the layout property blending; all properties of a PanGui layout node can individually and simultaneously consist of any weighted combination of its potential types. For example, size, min and max can all be any combination of pixels, percent, ratio, expand, fit-content and fit-largest-element - something that is incredibly useful for animating between arbitrary layout states.
It's also important to note that it's easy to set up cases where Yoga and Taffy will out-perform PanGui after the initial layout calculation, when only minor low-dependency changes happen that do not require large parts of the layout to be recomputed. PanGui currently does not do partial or incremental recomputation of layout trees. We don't recompute the layout if nothing changed, but if anything is dirty, we recompute the entire layout tree. We do have some ideas for how to add partial layout tree recomputation, but for now, we consider the layouting fast enough (and we have enough other things to get done before launch!) to hold off on that. In general, we try to be very reluctant to add any unnecessary complexity.
Overall, we're very pleased with these results. We do think we can get this even faster if we actually sit down and concentrate on optimization, but this is a very good place to be for launch in terms of layout features, stability and performance.
What's next?
We’re still in closed alpha for a little while longer, as we’re taking our time to establish a solid foundation to build on. Because of that, the alpha has remained relatively small so far.
As more of PanGui’s core systems, like layouting, continue to mature and hit a release ready state, we’ll be shifting our focus next month toward higher-level, user-facing functionality. Alongside that, we’ll be inviting a new batch of alpha testers very soon to help us shape a strong alpha and onboarding experience, as well as continuing to gather early feedback on PanGui itself.
Once PanGui is in a more robust state and backed by improved onboarding material, we’ll begin scaling access more aggressively, gradually inviting more and more developers until PanGui transitions into a fully open beta.
We are excited about the months ahead and cannot wait to get PanGui into more people's hands.
As always, if you have any notes or questions, then please feel free to join the discussion on our Discord server.