I have been experimenting with different approaches for the "block update" mechanism in OctoberProject. This means that, for all blocks which somehow changed in a given tick, an event needs to be run on the adjacent 6 blocks in the following tick. This should be done without duplication and it should also be run on blocks adjacent to newly-loaded blocks.
The "obvious" approach is just to look at all of this information while in the synchronous block where the tick results are combined for the snapshot. This is not ideal, though, since it is doing a lot of work in the single-threaded scalability bottleneck (the main point of OctoberProject is to experiment with scalable parallel game logic). It also means storing a lot of objects for the following tick in a way which can't ever be inlined (consider the cuboid load case - there are 1024 blocks which require updates along each face - so easily several thousand per-tick, all escaping the scope). This does make some unit testing easier, though, since you can always see the enqueued updates in the snapshot.
Next, I tried a "parallel pull" approach where these updates are synthesized when running the next tick by just looking at the locations of updates from the previous tick, on a per-cuboid basis (check the cuboid you are updating and the 6 adjacent ones and build the set of updates immediately before running them). While this results in redundant work (each of these cuboids could be scanned 7 times to do the same thing on different threads), it doe show a potential win in testing time and should scale well. It should use less memory at the cost of being invisible to testing, though (as these don't appear in the snapshot).
I suppose another possibility is a "parallel push" where some of the first solution could be done in the parallel area but it would add more long-ish memory usage and spread the logic of the first solution through more components. Hence, this doesn't sound worth doing.
Not sure which approach is best so I will need to think through this and potentially fully implement solutions 1 and 2 in order to compare them with some kind of micro-bench.
Too bad that there is nobody to talk to about this since I think it is an interesting technical design problem,
Jeff.