Given that the next release of OctoberProject is all about core logic loop optimizations, I have been playing with an integration stress test in order to fix a key issue. It turns out that lighting updates are currently extremely expensive within the system.
At first, I thought it was going to be all the container juggling while performing the thousands of updates required when adding or removing a light source. After doing some profiling (visualvm is working well for this), it turns out that the big expense is in reading/writing the light values, themselves.
This shouldn't come as a surprise as I had a similar issue with basic block data, once upon a time. I ended up resolving that by inlining one level of the OctreeShort data structure, resulting in a reasonable performance improvement. Beyond that, there is the data walker interface where the entire octree is described as it is evaluated, which can be useful for bulk data access. However, it is unusual that an update changes thousands of block types within a cuboid, in one tick, so this wasn't as painful. Lighting updates work that way in their common case.
The main reason for all of these being slow is that I can't arbitrarily seek into the octree, but instead discover the structure by reading through it. While this allows for a very tight description of the data, being a versatile way of describing sparse data with no assumption of an "empty" value, it does mean reading/writing values into it is linear-time.
So, that is the problem here: The lighting updates cause lots of changes to the lighting values, making the octree very large, and they all keep changing so that they all now need to manipulate this more complex structure.
I came up with a way of resolving this by using an "inflated" representation of the light octree: Just a 3-dimensional byte array where null entries are for only dark blocks. This works well for this case, but wouldn't be as applicable for things like block types (since "air" isn't always the empty type and this data could get big - over 64 KiB to represent something the common-case octree represents in 3 bytes).
However, now is the question of how to integrate this. I don't want to change this representation at the data storage level (at least not yet), so I think I will make this something ephemeral just within the tick for lighting updates. Further, I think I will avoid the common block proxy usage within the lighting algorithm so that I don't have to figure out how this new representation slides underneath that. I know that the lighting algorithm reads light value and block type but only writes light value so it probably makes sense to change it to not think in terms of common block proxies.
In the future, it might make sense to represent non-uniform data aspects with an obvious "empty" value in this way, even at the storage and network level, but I am not sure that this is required yet. In terms of users of this, light could use it, potentially also logic and maybe damage, but those are less obvious.
I will see what happens once I integrate a real version of this solution. I hope that I can come up with some good stress integration tests which will make me confident that I can reduce the tick time from 100ms to at least 50ms with no outliers beyond start-up, in stress tests.
Tricky but interesting,
Jeff.