So, this post started as a reply to @kamui on this thread, but by the time I was done I decided it deserved its own thread
TLDR: This post gives a âmid-level-overviewâ on how to render a planet procedurally. Weâll go into issues with floating point precision and how to combat them, how to apply a quad-tree spatial partitioning system to a sphere, how to determine desired level of detail, perform occlusion culling, and how to actually render the terrain patches.
Disclaimer/Background: I started with and still use OpenGL. Canât speak to DirectX, but API wonât matter for this summary. Iâve built a procedural planetary renderer, up to but not including the least-recently-used cache (LRU) and geometry occlusion (although Iâve prototyped both). So my prototype will generate and render a whole planet at once, to a reasonable level of detail - but Iâm limited by polycount since everything displays every frame. At that point, life become a bit more interesting (namely I was deployed to Afghanistan), and work on the project stopped.
One of the first things youâll encounter when discussing planetary rendering (procedural or not), is the concept of 32-bit floating point precision. At very large values, the precision of a 32-bit float starts to become pretty poor. closer to zero, you may be able to uniquely represent micro-units (i.e. 1E-6) - but once that number climbs high enough (which will occur quite quickly during planet rendering), the smallest value you can uniquely represent (âprecisionâ) becomes much bigger. i.e. instead of 1.0 -> 1.1 -> 1.2 -> 1.3, youâd be at 1,000,000 -> 1,000,100 -> 1,000,200, -> 1,000,300, etc (these numbers are just examples). The precision problem primarily manifests itself in two ways: Depth buffer âZ-Fightingâ, and vertex position imprecision (or the âjittersâ). In short:
-
Depth Buffer Z-Fighting: The depth buffer is used to determine whether a pixel is âin frontâ of another pixel or not, by comparing itâs depth value, or z-value. Imprecision at large distances makes this process somewhat underterministic, so during one frame, a pixel belonging to a building (for example) will be âin front ofâ the terrain (from the perspective of the camera), and during the next frame, the same pixel will be âbehindâ the terrain. The result is âflickeringâ of objects, âfightingâ over whoâs in front of whom.
-
Positional jitters. Even with floating point numbers, youâre dealing with a discrete âgridâ at its lowest levels. As described above, the resolution of that grid is much higher near the origin than at vast distances. At vast distances, the position of an object - and more specifically, the positions of its individual verticies - become subject to this coarser grid. As a reult, objects appear to deform and jitter, as their individual verticies jump from one position to another position. You can think of this as being caused by large rounding errors, which unfrotunately donât occur uniformly across a single model.
Now, fixing Depth-Buffer Z-fighting is a fairly complex subject, and to my knowledge there isnât a generally accepted âbestâ solution, other than GPUs which natively support 64-bit depth buffer values (which are still very expensive and thus uncommon) - in that case, the issue wonât appear even at galactic scales. Many discussions on techniques to combat depth-buffer precision issues exist online. Check out this post (and its predecessors) for an example technique.
Fixing positional jitters is fairly easy: you just need to keep the camera at or near the origin of the grid system. This way, any ârounding errorsâ that may occur will do so very very far away from the camera, so you simply wonât see them.
To achieve this, you store all positions as 64-bit doubles on the host (cpu), and before submitting them to the client (gpu) for rendering, you determine the relative positions between all objects and the camera by subtracting the cameraâs world-space (double-precision) position value from the objectâs world-space (double-precision) position value. The results are each objectsâ relative position from the camera, with the camera inherently positioned at (0,0,0). These values are submitted (as 32-bit floats) to the gpu for each object, which then renders individual vertices offset by that amount. Other techniques âre-centerâ the camera less often, which is basically a free optimization (i.e. only re-center once the camera moves more than 1,000 âunitsâ from its last origin, etc).
The next hurdle to jump is level-of-detail. You canât draw the whole planet at its highest level of detail (LOD) on a modern GPU at realtime framerates, period. So, you draw parts of the planet that are further away at a lower level of detail (less vertices per unit area) to save VRAM and throughput. A good LOD algorithm makes the loss of detail imperceptible, and a better LOD algorithm makes the transition between different LODs seamless as well.
There are many approaches to this. My preference is to use a âquad-sphereâ, where each of the 6 faces is a quad-tree structure (look up quadtrees and other spatial partitioning structures if youâre not familiar with that term). Typically each ânodeâ of the quadtree will represent a single âterrain patchâ - I like to use at least a 128x128 vertex grid - which weâll discuss in the next section. So, if you picture your âlevel-0â (i.e. lowest level of detail) quadsphere, youâll start off with a cube where each face is represented by a 128x128 grid of vertices. Next, each vertexâs position is normalized (magnitude will == 1) then multiplied by R (radius of the sphere), and as a result, your cube turns into a sphere (youâll have some slight âbunchingâ near the corners, and Iâve played around with smoothing functions to eliminate this altogether. Fun exercise for the reader, but not neccessary for a prototype).
To achieve higher and higher levels of detail, you need to subdivide nodes in each quad tree based upon the cameraâs proximity to the node (ideally each node will appear rougly the same size âon-screenâ so that distant nodes, which are much larger, take up the same amount of screen real-estate). There are several ways to go about this, but hereâs a simple approach: treating each face individually, pretend that the face is actually a flat plane, an transform the cameraâs position with respect to that plane accordingly to preserve distance. In other words, if in real-world space the camera were to follow a perfect arc around the planet, maintain say 1000 feet above sea level, then in the projected space, the camera would follow a straight horizontal path that maintains 1000 feet above the plane. If, in real-world space, the camera were to move along a tangent to the surface of the sphere, in the projected space it would appear as if the camera were making a âuâ shape (in reality a parabola, i beleive), swooping down, just touching the surface of the plane at the tangent point, then returning back into space.
The above projection works even if the camera is not âaboveâ the face in world-space. Now that you have translated the cmaera into each faceâs coordinate system, simply calculate the distance (in projection space) from the camera to each of the quad-tree nodes on the face in question, and recursively subdivide as neccessary. Iâm not going into much more detail about quad trees as much documentation can be found online about the structure.
The level of subdivision for each node in the quadtree now determines the level of detail for a terrain patch occupying that node. Remember - one patch per node, so when a parent node splits, the space that was occupied by one patch now must be occupied by four.
As mentioned before, terrain patches are typically generated in square chunks of vertices (I use a 128x128 grid) (this is done namely for GPU optimization, i.e. better cache coherency). Whether you generate your terrain patch on the GPU or CPU, the basic principles are the same: use a pseudo-random noise function such as perlin noise or simplex noise (look up simplex noise if youâve never heard of either), which will output a single value when you feed-in a set of coordinates. The value this function spits out will be the height-above-sea-level for a particular vertex, and the inputs fed into that noise function are that same vertexâs world position coordinates (post normalization, so they lay on the surface of your quad sphere). Now that youâve produced a height value for each vertex, you simply displace the vertex along its length according to this height value. Note that this implies using a 3-dimensional noise funciton (as each terrain will have a unique X, Y, and Z coordinate), although more complex techniques exist to reduce this to two dimensions.
I like to use instances, so Iâll create just a single 128x128 âpatchâ of vertices, whose origin is at (0,0,1), and for which the z-values of each vertex is 1. This instance will be rendered once for each patch on the sphere, and the position of each vertex will be displaced in the vertex shader (Hm. You should read up on programmable graphics pipelines, thatâs way too much for this post and readily available elsewhere).
You can figure out the pre-height-transformation world-space-positions of each vertex in a particular patch (which will be fed into the noise function as inputs), because you know the center the corresponding quad-tree nodeâs position in world space (which also lies along the surface of a sphere with radius R), and you know the desired level of detail from the quadtree (hence the total 2d-area the patch should cover / i.e. its spatial resolution). Using these two pieces of data (plus, technically, R), determine the world-space position for each vertex in the patch. Next, input these positions into your noise function, and out comes a 128x128 grid of height values (save this data). Now, draw the instance of your 128x128 mesh of vertices, and displace each vertex along its world-space legnth (i.e. just uniformly scale each component by its corresponding height value) from within the vertex shader, and Voila, terrain! Again, youâre using just one âpatchâ of 128x128x vertices to draw every terrain patch on your planet. Each of those patches does need a unique âdisplacement valueâ (i.e. height data) structure/object associated with it.
(Iâve skimmed a lot here, but itâs a very complex subject and Iâm trying to keep this post readable. If you want more detail on a specific step, just ask!)
Based upon the limits of your GPUâs VRAM, youâll only be able to hold onto a certain number of patch height-data objects for re-use before having to discard them. Many folks will use a Least-Recently-Used cache to determine which patch data object to drop once the cache is full. Since the actual height-data generation process described above is the most computationally expensive part of this whole thing, youâll want to be smart about discarding patches, and re-use them as much as possible. (As an example, consider when a quad-tree node merges: Instead of dumping the data from each of its 4 children and then calculating the parentâs height data, you could copy every other vertex (in each dimension) from the children patches into the parentâs patch before tossing away the children. Better yet, using mip-map theory, itâs pretty easy to see that keeping the parents around isnât that expensive in the first place. But start simple).
Occlusion culling: Even if youâre respecting your GPUâs VRAM limits by not keeping too many terrain patches on hand, you will eventually run into the limit of your GPUâs throughput - particularly when using more complex fragment shaders (DirectXâs pixel shaders). One way to help with this is by limiting what you draw solely to things you know will be visible on-screen. Many papers can be found online which discuss occlusion culling, but a fairly naive approach actually works fairly well on a planet renderer, as youâre primarily just testing to see wheter or not a terrain patch is beyond the visible horizon (which is based upon height-above-terrain).
In FPS games, occlusion culling / detection is typically performed on a per-frame basis, as objects can rapidly occlude or âde-occludeâ - imagine an NPC running around a corner. Planetary renderers deal with much larger chunks of geometry, however, and so can usually get away with performing this check less often. There are two schools of thought: Camera orientation-based occlusion culling, where you donât attempt to draw anything outside of the cameraâs immediate view. This can do a great job at reducing the GPUâs throughput requirement, but the calculation can be a bit complex and must be done every frame. A quick google search for âquad tree occlusion cullingâ will bring up several options. The other school of thought (often suitable uniquely to planet renderers) is position-based (more appropriately, horizon-based occlusion). In a nutshell, this technique determines every patch that could possibly be visible to the camera from its current position, regardless of orientation, and then only attempts to draw those. Determining which patches are âbeyond the horizonâ is a simple frustrum-sphere intersection, although even faster techniques exist. (For example, Iâve developed and prototyped an algorithm that can give you an instant distance to the âconservative horizonâ simply based upon height from center of the planet and maximum possible terrain height, garunteed to not miss anything (including patches beond the horizon with very tall features)). Because of the large geometries and distances involved, horizon-occlusion for planet rendering is a great choice. To take it to the next step, orientation-based occlusion culling could be then applied to subset of patches derived form the horizon cull to even futher cull the set - but at some point you may be spending more time calculating what not to draw than it will take for the GPU to fast-discard those vertices, so profiling becomes neccessary.
If you can get this far, you should be able to figure out texturing and lighting. Using a vertexâs pre-height-scaling position as an input to any additional noise functions (ie for procedural texturing) is generally a good idea, but beyond that itâs pretty standard fare.
Phew! That was a lot of typing! Iâm happy to answer any questions that may come out of this post.