TLDR: This post gives a ‘mid-level-overview’ on how to render a planet procedurally. We’ll go into issues with floating point precision and how to combat them, how to apply a quad-tree spatial partitioning system to a sphere, how to determine desired level of detail, perform occlusion culling, and how to actually render the terrain patches.
Disclaimer/Background: I started with and still use OpenGL. Can’t speak to DirectX, but API won’t matter for this summary. I’ve built a procedural planetary renderer, up to but not including the least-recently-used cache (LRU) and geometry occlusion (although I’ve prototyped both). So my prototype will generate and render a whole planet at once, to a reasonable level of detail - but I’m limited by polycount since everything displays every frame. At that point, life become a bit more interesting (namely I was deployed to Afghanistan), and work on the project stopped.
One of the first things you’ll encounter when discussing planetary rendering (procedural or not), is the concept of 32-bit floating point precision. At very large values, the precision of a 32-bit float starts to become pretty poor. closer to zero, you may be able to uniquely represent micro-units (i.e. 1E-6) - but once that number climbs high enough (which will occur quite quickly during planet rendering), the smallest value you can uniquely represent (‘precision’) becomes much bigger. i.e. instead of 1.0 -> 1.1 -> 1.2 -> 1.3, you’d be at 1,000,000 -> 1,000,100 -> 1,000,200, -> 1,000,300, etc (these numbers are just examples). The precision problem primarily manifests itself in two ways: Depth buffer ‘Z-Fighting’, and vertex position imprecision (or the ‘jitters’). In short:
Depth Buffer Z-Fighting: The depth buffer is used to determine whether a pixel is ‘in front’ of another pixel or not, by comparing it’s depth value, or z-value. Imprecision at large distances makes this process somewhat underterministic, so during one frame, a pixel belonging to a building (for example) will be ‘in front of’ the terrain (from the perspective of the camera), and during the next frame, the same pixel will be ‘behind’ the terrain. The result is ‘flickering’ of objects, ‘fighting’ over who’s in front of whom.
Positional jitters. Even with floating point numbers, you’re dealing with a discrete ‘grid’ at its lowest levels. As described above, the resolution of that grid is much higher near the origin than at vast distances. At vast distances, the position of an object - and more specifically, the positions of its individual verticies - become subject to this coarser grid. As a reult, objects appear to deform and jitter, as their individual verticies jump from one position to another position. You can think of this as being caused by large rounding errors, which unfrotunately don’t occur uniformly across a single model.
Now, fixing Depth-Buffer Z-fighting is a fairly complex subject, and to my knowledge there isn’t a generally accepted ‘best’ solution, other than GPUs which natively support 64-bit depth buffer values (which are still very expensive and thus uncommon) - in that case, the issue won’t appear even at galactic scales. Many discussions on techniques to combat depth-buffer precision issues exist online. Check out this post (and its predecessors) for an example technique.
Fixing positional jitters is fairly easy: you just need to keep the camera at or near the origin of the grid system. This way, any ‘rounding errors’ that may occur will do so very very far away from the camera, so you simply won’t see them.
To achieve this, you store all positions as 64-bit doubles on the host (cpu), and before submitting them to the client (gpu) for rendering, you determine the relative positions between all objects and the camera by subtracting the camera’s world-space (double-precision) position value from the object’s world-space (double-precision) position value. The results are each objects’ relative position from the camera, with the camera inherently positioned at (0,0,0). These values are submitted (as 32-bit floats) to the gpu for each object, which then renders individual vertices offset by that amount. Other techniques ‘re-center’ the camera less often, which is basically a free optimization (i.e. only re-center once the camera moves more than 1,000 ‘units’ from its last origin, etc).
The next hurdle to jump is level-of-detail. You can’t draw the whole planet at its highest level of detail (LOD) on a modern GPU at realtime framerates, period. So, you draw parts of the planet that are further away at a lower level of detail (less vertices per unit area) to save VRAM and throughput. A good LOD algorithm makes the loss of detail imperceptible, and a better LOD algorithm makes the transition between different LODs seamless as well.
There are many approaches to this. My preference is to use a ‘quad-sphere’, where each of the 6 faces is a quad-tree structure (look up quadtrees and other spatial partitioning structures if you’re not familiar with that term). Typically each ‘node’ of the quadtree will represent a single ‘terrain patch’ - I like to use at least a 128x128 vertex grid - which we’ll discuss in the next section. So, if you picture your “level-0” (i.e. lowest level of detail) quadsphere, you’ll start off with a cube where each face is represented by a 128x128 grid of vertices. Next, each vertex’s position is normalized (magnitude will == 1) then multiplied by R (radius of the sphere), and as a result, your cube turns into a sphere (you’ll have some slight ‘bunching’ near the corners, and I’ve played around with smoothing functions to eliminate this altogether. Fun exercise for the reader, but not neccessary for a prototype).
To achieve higher and higher levels of detail, you need to subdivide nodes in each quad tree based upon the camera’s proximity to the node (ideally each node will appear rougly the same size ‘on-screen’ so that distant nodes, which are much larger, take up the same amount of screen real-estate). There are several ways to go about this, but here’s a simple approach: treating each face individually, pretend that the face is actually a flat plane, an transform the camera’s position with respect to that plane accordingly to preserve distance. In other words, if in real-world space the camera were to follow a perfect arc around the planet, maintain say 1000 feet above sea level, then in the projected space, the camera would follow a straight horizontal path that maintains 1000 feet above the plane. If, in real-world space, the camera were to move along a tangent to the surface of the sphere, in the projected space it would appear as if the camera were making a “u” shape (in reality a parabola, i beleive), swooping down, just touching the surface of the plane at the tangent point, then returning back into space.
The above projection works even if the camera is not “above” the face in world-space. Now that you have translated the cmaera into each face’s coordinate system, simply calculate the distance (in projection space) from the camera to each of the quad-tree nodes on the face in question, and recursively subdivide as neccessary. I’m not going into much more detail about quad trees as much documentation can be found online about the structure.
The level of subdivision for each node in the quadtree now determines the level of detail for a terrain patch occupying that node. Remember - one patch per node, so when a parent node splits, the space that was occupied by one patch now must be occupied by four.
As mentioned before, terrain patches are typically generated in square chunks of vertices (I use a 128x128 grid) (this is done namely for GPU optimization, i.e. better cache coherency). Whether you generate your terrain patch on the GPU or CPU, the basic principles are the same: use a pseudo-random noise function such as perlin noise or simplex noise (look up simplex noise if you’ve never heard of either), which will output a single value when you feed-in a set of coordinates. The value this function spits out will be the height-above-sea-level for a particular vertex, and the inputs fed into that noise function are that same vertex’s world position coordinates (post normalization, so they lay on the surface of your quad sphere). Now that you’ve produced a height value for each vertex, you simply displace the vertex along its length according to this height value. Note that this implies using a 3-dimensional noise funciton (as each terrain will have a unique X, Y, and Z coordinate), although more complex techniques exist to reduce this to two dimensions.
I like to use instances, so I’ll create just a single 128x128 ‘patch’ of vertices, whose origin is at (0,0,1), and for which the z-values of each vertex is 1. This instance will be rendered once for each patch on the sphere, and the position of each vertex will be displaced in the vertex shader (Hm. You should read up on programmable graphics pipelines, that’s way too much for this post and readily available elsewhere).
You can figure out the pre-height-transformation world-space-positions of each vertex in a particular patch (which will be fed into the noise function as inputs), because you know the center the corresponding quad-tree node’s position in world space (which also lies along the surface of a sphere with radius R), and you know the desired level of detail from the quadtree (hence the total 2d-area the patch should cover / i.e. its spatial resolution). Using these two pieces of data (plus, technically, R), determine the world-space position for each vertex in the patch. Next, input these positions into your noise function, and out comes a 128x128 grid of height values (save this data). Now, draw the instance of your 128x128 mesh of vertices, and displace each vertex along its world-space legnth (i.e. just uniformly scale each component by its corresponding height value) from within the vertex shader, and Voila, terrain! Again, you’re using just one ‘patch’ of 128x128x vertices to draw every terrain patch on your planet. Each of those patches does need a unique ‘displacement value’ (i.e. height data) structure/object associated with it.
(I’ve skimmed a lot here, but it’s a very complex subject and I’m trying to keep this post readable. If you want more detail on a specific step, just ask!)
Based upon the limits of your GPU’s VRAM, you’ll only be able to hold onto a certain number of patch height-data objects for re-use before having to discard them. Many folks will use a Least-Recently-Used cache to determine which patch data object to drop once the cache is full. Since the actual height-data generation process described above is the most computationally expensive part of this whole thing, you’ll want to be smart about discarding patches, and re-use them as much as possible. (As an example, consider when a quad-tree node merges: Instead of dumping the data from each of its 4 children and then calculating the parent’s height data, you could copy every other vertex (in each dimension) from the children patches into the parent’s patch before tossing away the children. Better yet, using mip-map theory, it’s pretty easy to see that keeping the parents around isn’t that expensive in the first place. But start simple).
Occlusion culling: Even if you’re respecting your GPU’s VRAM limits by not keeping too many terrain patches on hand, you will eventually run into the limit of your GPU’s throughput - particularly when using more complex fragment shaders (DirectX’s pixel shaders). One way to help with this is by limiting what you draw solely to things you know will be visible on-screen. Many papers can be found online which discuss occlusion culling, but a fairly naive approach actually works fairly well on a planet renderer, as you’re primarily just testing to see wheter or not a terrain patch is beyond the visible horizon (which is based upon height-above-terrain).
In FPS games, occlusion culling / detection is typically performed on a per-frame basis, as objects can rapidly occlude or ‘de-occlude’ - imagine an NPC running around a corner. Planetary renderers deal with much larger chunks of geometry, however, and so can usually get away with performing this check less often. There are two schools of thought: Camera orientation-based occlusion culling, where you don’t attempt to draw anything outside of the camera’s immediate view. This can do a great job at reducing the GPU’s throughput requirement, but the calculation can be a bit complex and must be done every frame. A quick google search for “quad tree occlusion culling” will bring up several options. The other school of thought (often suitable uniquely to planet renderers) is position-based (more appropriately, horizon-based occlusion). In a nutshell, this technique determines every patch that could possibly be visible to the camera from its current position, regardless of orientation, and then only attempts to draw those. Determining which patches are ‘beyond the horizon’ is a simple frustrum-sphere intersection, although even faster techniques exist. (For example, I’ve developed and prototyped an algorithm that can give you an instant distance to the “conservative horizon” simply based upon height from center of the planet and maximum possible terrain height, garunteed to not miss anything (including patches beond the horizon with very tall features)). Because of the large geometries and distances involved, horizon-occlusion for planet rendering is a great choice. To take it to the next step, orientation-based occlusion culling could be then applied to subset of patches derived form the horizon cull to even futher cull the set - but at some point you may be spending more time calculating what not to draw than it will take for the GPU to fast-discard those vertices, so profiling becomes neccessary.
If you can get this far, you should be able to figure out texturing and lighting. Using a vertex’s pre-height-scaling position as an input to any additional noise functions (ie for procedural texturing) is generally a good idea, but beyond that it’s pretty standard fare.
Phew! That was a lot of typing! I’m happy to answer any questions that may come out of this post.