Volumetric World Technology: A Discussion
Hello hbgames!
This topic is to discuss some thoughts and opinions I have in relations to Volumetric Worlds and the future of gaming in general. Specifically, I enjoy tinkering and I'd like to gather some thoughts and ideas that may help me tinker my way into something significantly useful.
This post may be a bit long-winded, just a heads up. If you have no interest in voxels or volumetric world data, turn around now because it only gets worse.
What is Volumetric World Data?
Why I'm glad you asked. Volumetric implies that matter has substance. Polygon meshes, for example, are not volumetric. If you cut them open, without the aid of artist supplied polygons and animations, there would be nothing inside. There is no knowledge of what's inside. The leather strap on a character's clothing is not leather at all - its a bump that an artist painted brown. Obviously, this fashion of simulating reality has clear and inherent limits.
A volumetric simulation of reality, on the other hand, would somehow divide that leather strap into a number of quantized space units, and assign each one a material value of some sort - i.e. "leather". With this information, we can still render the same polygonal mesh, or any other unit of rendering. But if an interaction is made that an artist did not accomodate, assuming some fancy programming, at least the engine would know what belongs on those missing surfaces - leather, in the example above.
What are Voxels?
Well, the term is a bit subjective, but generally describes arrays of data, like pixels, that apply to 3d volumetric representation. In short, Minecraft. This is one solution for volumetric world data. In Minecraft, if you dig away some dirt, rather than empty space being below it, we know that there is stone there, and it becomes visible. That said, voxels generally require massive amounts of data, as even if you only store 1 byte per voxel, that racks up faster than you can imagine. Assuming you want more information than that, it gets obscene rather quickly. You could consider Minecraft to be a volumetric world of extremely low resolution - as each "voxel" represents a cubic meter of space. That's big. Imagine if a pixel on a screen had to represent an entire meter of space - aside from cool aerial shots of the earth, that resolution would be deemed nearly unusuable. The challenge, therefore, is to use less than 1 byte per voxel, to allow for higher resolutions to be rendered without running out of RAM on a household machine.
How do we reduce the overhead?
Well, there are dozens of potential methods. Flat arrays, instead of 3d arrays, compressed somehow (typically something cheap and efficient like Run-Length Encoding). This approach is probably the most common, but its efficiency is circumstancial, and depends on flattening order. Other algorythms can be combined with this, such as a Lempel–Ziv–Welch dictionary compression on top of it. But at this point, especially in the case of cascading compressions, modification becomes expensive as you have to typically convert it back to a flat array then re-encode it. This is non-optimal because first this process has overhead, and secondly the data enters the flat array state- even though only temporarily, thats a lot of potential allocation and deallocation of RAM.
As an experiment, I made a "realtime Run-Length Encoding". That is, one I could modify and it is aware of every possible operation (and never performs more than 2 changes per state change), and executes these operations inline without reverting the data to a flat array. It simply checks the value of the cells adjacent to the modified cell and updates them accordingly, performing merges and splits as needed.
Trying to expand this realtime implementation to 2D is where it becomes a headache. There are 7 possible scenarios for 1D, and that expands exponentially for 2D, to the point where a 3D realtime RLE is a bit nightmarish to even consider implementing.
What would a 3D RLE look like?
Well, seeing I used Minecraft as an example above, let's roll with that for some visualization aids:
Here we have a simple, 2 value, 4x4 2-dimensional configuration on the right, using stone and wood planks as our test values. On the left, we have the elements a 2D RLE would store this as, in realtime. Instead of a total of 16 "voxels", the engine would consider the entire configuration to consist of 7 voxels in this example- and the voxels have height and width parameters.
This can be encoded by starting at one corner, then iterating every position towards the other corner. When you reach an unprocessed position that is occupied, you check each of the three dimensions, one at a time, expanding in that direction until a differing value is encountered. For example, starting in the top left corner, we cannot expand the width as stone comes in the next column. Attempting to expand down, we can. Then we proceed to the next unprocessed position and repeat the process. Every position that is consumed needs to be flagged as processed to avoid inclusion in multiple elements. Now, this works well for encoding, but encoding is not the goal, the goal is realtime maintenance, that this is just the format the world exists in.
Here is the same process demonstrated in 3D (sorry for the poor angle):
Here a 3x3x3 cube of 27 voxels is reduced to 10 voxels, each one having width, height, and depth. (Yes, that's 40 bytes instead of 27, but this system is targeting larger, sparser areas in which the gain would be far greater.)
The problem comes from maintaining this in realtime:
When even a single voxel is broken, there are a significant number of checks that need to be performed to avoid fragmentation of the remaining data. Perhaps as a temporary solution a defragmentation could be performed periodically, or when data is saved/unloaded. But defragmentation requires converting the data back to a flat array and recounting it - the goal here being to never have the data in array form, i.e. never, even temporarily, storing 1:1 or greater byte-to-voxel ratio.
By maintaining storage in this manner, it can be passed to the renderer as-is, as it is already in a fairly optimized form (though allowing blocks to clip would increase it further, but only be suitable for rendering- thats a different discussion for later).
Obviously, a 3d-RLE would only really be of benefit to sparse arrays where the rate of change is low. In densely packed chaotic arrays, this has the potential to be less efficient than flat arrays. The only way I can imagine to compensate for this would be to track the density of a given region (think a "Chunk" in Minecraft), and switch methods after a certain threshold is encountered.
Another major benefit of the 3d RLE, storing larger blocks of arbitrary size, is that you don't have to store "empty". Minecraft's "air", for example- still occupies a full integer in ram simply to store a 0. (There are a couple shortcuts that even Minecraft takes here, such as empty chunks not containing a data array, and using shorts instead of ints - but shorts can actually be less efficient than ints and are often stored as full integers in RAM anyway- at least when used by dynamic languages like Java.)
Closing Thoughts:
I have more to share, and will expand on this information in time as I am aiming to construct a very-high-resolution voxel engine. I would love any feedback if there are other developers here that understand my ramblings and have ideas or thoughts related to them.
Thanks for reading!
Hello hbgames!
This topic is to discuss some thoughts and opinions I have in relations to Volumetric Worlds and the future of gaming in general. Specifically, I enjoy tinkering and I'd like to gather some thoughts and ideas that may help me tinker my way into something significantly useful.
This post may be a bit long-winded, just a heads up. If you have no interest in voxels or volumetric world data, turn around now because it only gets worse.
What is Volumetric World Data?
Why I'm glad you asked. Volumetric implies that matter has substance. Polygon meshes, for example, are not volumetric. If you cut them open, without the aid of artist supplied polygons and animations, there would be nothing inside. There is no knowledge of what's inside. The leather strap on a character's clothing is not leather at all - its a bump that an artist painted brown. Obviously, this fashion of simulating reality has clear and inherent limits.
A volumetric simulation of reality, on the other hand, would somehow divide that leather strap into a number of quantized space units, and assign each one a material value of some sort - i.e. "leather". With this information, we can still render the same polygonal mesh, or any other unit of rendering. But if an interaction is made that an artist did not accomodate, assuming some fancy programming, at least the engine would know what belongs on those missing surfaces - leather, in the example above.
What are Voxels?
Well, the term is a bit subjective, but generally describes arrays of data, like pixels, that apply to 3d volumetric representation. In short, Minecraft. This is one solution for volumetric world data. In Minecraft, if you dig away some dirt, rather than empty space being below it, we know that there is stone there, and it becomes visible. That said, voxels generally require massive amounts of data, as even if you only store 1 byte per voxel, that racks up faster than you can imagine. Assuming you want more information than that, it gets obscene rather quickly. You could consider Minecraft to be a volumetric world of extremely low resolution - as each "voxel" represents a cubic meter of space. That's big. Imagine if a pixel on a screen had to represent an entire meter of space - aside from cool aerial shots of the earth, that resolution would be deemed nearly unusuable. The challenge, therefore, is to use less than 1 byte per voxel, to allow for higher resolutions to be rendered without running out of RAM on a household machine.
How do we reduce the overhead?
Well, there are dozens of potential methods. Flat arrays, instead of 3d arrays, compressed somehow (typically something cheap and efficient like Run-Length Encoding). This approach is probably the most common, but its efficiency is circumstancial, and depends on flattening order. Other algorythms can be combined with this, such as a Lempel–Ziv–Welch dictionary compression on top of it. But at this point, especially in the case of cascading compressions, modification becomes expensive as you have to typically convert it back to a flat array then re-encode it. This is non-optimal because first this process has overhead, and secondly the data enters the flat array state- even though only temporarily, thats a lot of potential allocation and deallocation of RAM.
As an experiment, I made a "realtime Run-Length Encoding". That is, one I could modify and it is aware of every possible operation (and never performs more than 2 changes per state change), and executes these operations inline without reverting the data to a flat array. It simply checks the value of the cells adjacent to the modified cell and updates them accordingly, performing merges and splits as needed.
Trying to expand this realtime implementation to 2D is where it becomes a headache. There are 7 possible scenarios for 1D, and that expands exponentially for 2D, to the point where a 3D realtime RLE is a bit nightmarish to even consider implementing.
What would a 3D RLE look like?
Well, seeing I used Minecraft as an example above, let's roll with that for some visualization aids:
Here we have a simple, 2 value, 4x4 2-dimensional configuration on the right, using stone and wood planks as our test values. On the left, we have the elements a 2D RLE would store this as, in realtime. Instead of a total of 16 "voxels", the engine would consider the entire configuration to consist of 7 voxels in this example- and the voxels have height and width parameters.
This can be encoded by starting at one corner, then iterating every position towards the other corner. When you reach an unprocessed position that is occupied, you check each of the three dimensions, one at a time, expanding in that direction until a differing value is encountered. For example, starting in the top left corner, we cannot expand the width as stone comes in the next column. Attempting to expand down, we can. Then we proceed to the next unprocessed position and repeat the process. Every position that is consumed needs to be flagged as processed to avoid inclusion in multiple elements. Now, this works well for encoding, but encoding is not the goal, the goal is realtime maintenance, that this is just the format the world exists in.
Here is the same process demonstrated in 3D (sorry for the poor angle):
Here a 3x3x3 cube of 27 voxels is reduced to 10 voxels, each one having width, height, and depth. (Yes, that's 40 bytes instead of 27, but this system is targeting larger, sparser areas in which the gain would be far greater.)
The problem comes from maintaining this in realtime:
When even a single voxel is broken, there are a significant number of checks that need to be performed to avoid fragmentation of the remaining data. Perhaps as a temporary solution a defragmentation could be performed periodically, or when data is saved/unloaded. But defragmentation requires converting the data back to a flat array and recounting it - the goal here being to never have the data in array form, i.e. never, even temporarily, storing 1:1 or greater byte-to-voxel ratio.
By maintaining storage in this manner, it can be passed to the renderer as-is, as it is already in a fairly optimized form (though allowing blocks to clip would increase it further, but only be suitable for rendering- thats a different discussion for later).
Obviously, a 3d-RLE would only really be of benefit to sparse arrays where the rate of change is low. In densely packed chaotic arrays, this has the potential to be less efficient than flat arrays. The only way I can imagine to compensate for this would be to track the density of a given region (think a "Chunk" in Minecraft), and switch methods after a certain threshold is encountered.
Another major benefit of the 3d RLE, storing larger blocks of arbitrary size, is that you don't have to store "empty". Minecraft's "air", for example- still occupies a full integer in ram simply to store a 0. (There are a couple shortcuts that even Minecraft takes here, such as empty chunks not containing a data array, and using shorts instead of ints - but shorts can actually be less efficient than ints and are often stored as full integers in RAM anyway- at least when used by dynamic languages like Java.)
Closing Thoughts:
I have more to share, and will expand on this information in time as I am aiming to construct a very-high-resolution voxel engine. I would love any feedback if there are other developers here that understand my ramblings and have ideas or thoughts related to them.
Thanks for reading!