Procedural Terrain Rendering How-To

So, this post started as a reply to @kamui on this thread, but by the time I was done I decided it deserved its own thread :slight_smile:

TLDR: This post gives a ‘mid-level-overview’ on how to render a planet procedurally. We’ll go into issues with floating point precision and how to combat them, how to apply a quad-tree spatial partitioning system to a sphere, how to determine desired level of detail, perform occlusion culling, and how to actually render the terrain patches.

Disclaimer/Background: I started with and still use OpenGL. Can’t speak to DirectX, but API won’t matter for this summary. I’ve built a procedural planetary renderer, up to but not including the least-recently-used cache (LRU) and geometry occlusion (although I’ve prototyped both). So my prototype will generate and render a whole planet at once, to a reasonable level of detail - but I’m limited by polycount since everything displays every frame. At that point, life become a bit more interesting (namely I was deployed to Afghanistan), and work on the project stopped.

One of the first things you’ll encounter when discussing planetary rendering (procedural or not), is the concept of 32-bit floating point precision. At very large values, the precision of a 32-bit float starts to become pretty poor. closer to zero, you may be able to uniquely represent micro-units (i.e. 1E-6) - but once that number climbs high enough (which will occur quite quickly during planet rendering), the smallest value you can uniquely represent (‘precision’) becomes much bigger. i.e. instead of 1.0 -> 1.1 -> 1.2 -> 1.3, you’d be at 1,000,000 -> 1,000,100 -> 1,000,200, -> 1,000,300, etc (these numbers are just examples). The precision problem primarily manifests itself in two ways: Depth buffer ‘Z-Fighting’, and vertex position imprecision (or the ‘jitters’). In short:

  1. Depth Buffer Z-Fighting: The depth buffer is used to determine whether a pixel is ‘in front’ of another pixel or not, by comparing it’s depth value, or z-value. Imprecision at large distances makes this process somewhat underterministic, so during one frame, a pixel belonging to a building (for example) will be ‘in front of’ the terrain (from the perspective of the camera), and during the next frame, the same pixel will be ‘behind’ the terrain. The result is ‘flickering’ of objects, ‘fighting’ over who’s in front of whom.

  2. Positional jitters. Even with floating point numbers, you’re dealing with a discrete ‘grid’ at its lowest levels. As described above, the resolution of that grid is much higher near the origin than at vast distances. At vast distances, the position of an object - and more specifically, the positions of its individual verticies - become subject to this coarser grid. As a reult, objects appear to deform and jitter, as their individual verticies jump from one position to another position. You can think of this as being caused by large rounding errors, which unfrotunately don’t occur uniformly across a single model.

Now, fixing Depth-Buffer Z-fighting is a fairly complex subject, and to my knowledge there isn’t a generally accepted ‘best’ solution, other than GPUs which natively support 64-bit depth buffer values (which are still very expensive and thus uncommon) - in that case, the issue won’t appear even at galactic scales. Many discussions on techniques to combat depth-buffer precision issues exist online. Check out this post (and its predecessors) for an example technique.

Fixing positional jitters is fairly easy: you just need to keep the camera at or near the origin of the grid system. This way, any ‘rounding errors’ that may occur will do so very very far away from the camera, so you simply won’t see them.

To achieve this, you store all positions as 64-bit doubles on the host (cpu), and before submitting them to the client (gpu) for rendering, you determine the relative positions between all objects and the camera by subtracting the camera’s world-space (double-precision) position value from the object’s world-space (double-precision) position value. The results are each objects’ relative position from the camera, with the camera inherently positioned at (0,0,0). These values are submitted (as 32-bit floats) to the gpu for each object, which then renders individual vertices offset by that amount. Other techniques ‘re-center’ the camera less often, which is basically a free optimization (i.e. only re-center once the camera moves more than 1,000 ‘units’ from its last origin, etc).

The next hurdle to jump is level-of-detail. You can’t draw the whole planet at its highest level of detail (LOD) on a modern GPU at realtime framerates, period. So, you draw parts of the planet that are further away at a lower level of detail (less vertices per unit area) to save VRAM and throughput. A good LOD algorithm makes the loss of detail imperceptible, and a better LOD algorithm makes the transition between different LODs seamless as well.

There are many approaches to this. My preference is to use a ‘quad-sphere’, where each of the 6 faces is a quad-tree structure (look up quadtrees and other spatial partitioning structures if you’re not familiar with that term). Typically each ‘node’ of the quadtree will represent a single ‘terrain patch’ - I like to use at least a 128x128 vertex grid - which we’ll discuss in the next section. So, if you picture your “level-0” (i.e. lowest level of detail) quadsphere, you’ll start off with a cube where each face is represented by a 128x128 grid of vertices. Next, each vertex’s position is normalized (magnitude will == 1) then multiplied by R (radius of the sphere), and as a result, your cube turns into a sphere (you’ll have some slight ‘bunching’ near the corners, and I’ve played around with smoothing functions to eliminate this altogether. Fun exercise for the reader, but not neccessary for a prototype).

To achieve higher and higher levels of detail, you need to subdivide nodes in each quad tree based upon the camera’s proximity to the node (ideally each node will appear rougly the same size ‘on-screen’ so that distant nodes, which are much larger, take up the same amount of screen real-estate). There are several ways to go about this, but here’s a simple approach: treating each face individually, pretend that the face is actually a flat plane, an transform the camera’s position with respect to that plane accordingly to preserve distance. In other words, if in real-world space the camera were to follow a perfect arc around the planet, maintain say 1000 feet above sea level, then in the projected space, the camera would follow a straight horizontal path that maintains 1000 feet above the plane. If, in real-world space, the camera were to move along a tangent to the surface of the sphere, in the projected space it would appear as if the camera were making a “u” shape (in reality a parabola, i beleive), swooping down, just touching the surface of the plane at the tangent point, then returning back into space.

The above projection works even if the camera is not “above” the face in world-space. Now that you have translated the cmaera into each face’s coordinate system, simply calculate the distance (in projection space) from the camera to each of the quad-tree nodes on the face in question, and recursively subdivide as neccessary. I’m not going into much more detail about quad trees as much documentation can be found online about the structure.

The level of subdivision for each node in the quadtree now determines the level of detail for a terrain patch occupying that node. Remember - one patch per node, so when a parent node splits, the space that was occupied by one patch now must be occupied by four.

As mentioned before, terrain patches are typically generated in square chunks of vertices (I use a 128x128 grid) (this is done namely for GPU optimization, i.e. better cache coherency). Whether you generate your terrain patch on the GPU or CPU, the basic principles are the same: use a pseudo-random noise function such as perlin noise or simplex noise (look up simplex noise if you’ve never heard of either), which will output a single value when you feed-in a set of coordinates. The value this function spits out will be the height-above-sea-level for a particular vertex, and the inputs fed into that noise function are that same vertex’s world position coordinates (post normalization, so they lay on the surface of your quad sphere). Now that you’ve produced a height value for each vertex, you simply displace the vertex along its length according to this height value. Note that this implies using a 3-dimensional noise funciton (as each terrain will have a unique X, Y, and Z coordinate), although more complex techniques exist to reduce this to two dimensions.

I like to use instances, so I’ll create just a single 128x128 ‘patch’ of vertices, whose origin is at (0,0,1), and for which the z-values of each vertex is 1. This instance will be rendered once for each patch on the sphere, and the position of each vertex will be displaced in the vertex shader (Hm. You should read up on programmable graphics pipelines, that’s way too much for this post and readily available elsewhere).

You can figure out the pre-height-transformation world-space-positions of each vertex in a particular patch (which will be fed into the noise function as inputs), because you know the center the corresponding quad-tree node’s position in world space (which also lies along the surface of a sphere with radius R), and you know the desired level of detail from the quadtree (hence the total 2d-area the patch should cover / i.e. its spatial resolution). Using these two pieces of data (plus, technically, R), determine the world-space position for each vertex in the patch. Next, input these positions into your noise function, and out comes a 128x128 grid of height values (save this data). Now, draw the instance of your 128x128 mesh of vertices, and displace each vertex along its world-space legnth (i.e. just uniformly scale each component by its corresponding height value) from within the vertex shader, and Voila, terrain! Again, you’re using just one ‘patch’ of 128x128x vertices to draw every terrain patch on your planet. Each of those patches does need a unique ‘displacement value’ (i.e. height data) structure/object associated with it.

(I’ve skimmed a lot here, but it’s a very complex subject and I’m trying to keep this post readable. If you want more detail on a specific step, just ask!)

Based upon the limits of your GPU’s VRAM, you’ll only be able to hold onto a certain number of patch height-data objects for re-use before having to discard them. Many folks will use a Least-Recently-Used cache to determine which patch data object to drop once the cache is full. Since the actual height-data generation process described above is the most computationally expensive part of this whole thing, you’ll want to be smart about discarding patches, and re-use them as much as possible. (As an example, consider when a quad-tree node merges: Instead of dumping the data from each of its 4 children and then calculating the parent’s height data, you could copy every other vertex (in each dimension) from the children patches into the parent’s patch before tossing away the children. Better yet, using mip-map theory, it’s pretty easy to see that keeping the parents around isn’t that expensive in the first place. But start simple).

Occlusion culling: Even if you’re respecting your GPU’s VRAM limits by not keeping too many terrain patches on hand, you will eventually run into the limit of your GPU’s throughput - particularly when using more complex fragment shaders (DirectX’s pixel shaders). One way to help with this is by limiting what you draw solely to things you know will be visible on-screen. Many papers can be found online which discuss occlusion culling, but a fairly naive approach actually works fairly well on a planet renderer, as you’re primarily just testing to see wheter or not a terrain patch is beyond the visible horizon (which is based upon height-above-terrain).

In FPS games, occlusion culling / detection is typically performed on a per-frame basis, as objects can rapidly occlude or ‘de-occlude’ - imagine an NPC running around a corner. Planetary renderers deal with much larger chunks of geometry, however, and so can usually get away with performing this check less often. There are two schools of thought: Camera orientation-based occlusion culling, where you don’t attempt to draw anything outside of the camera’s immediate view. This can do a great job at reducing the GPU’s throughput requirement, but the calculation can be a bit complex and must be done every frame. A quick google search for “quad tree occlusion culling” will bring up several options. The other school of thought (often suitable uniquely to planet renderers) is position-based (more appropriately, horizon-based occlusion). In a nutshell, this technique determines every patch that could possibly be visible to the camera from its current position, regardless of orientation, and then only attempts to draw those. Determining which patches are ‘beyond the horizon’ is a simple frustrum-sphere intersection, although even faster techniques exist. (For example, I’ve developed and prototyped an algorithm that can give you an instant distance to the “conservative horizon” simply based upon height from center of the planet and maximum possible terrain height, garunteed to not miss anything (including patches beond the horizon with very tall features)). Because of the large geometries and distances involved, horizon-occlusion for planet rendering is a great choice. To take it to the next step, orientation-based occlusion culling could be then applied to subset of patches derived form the horizon cull to even futher cull the set - but at some point you may be spending more time calculating what not to draw than it will take for the GPU to fast-discard those vertices, so profiling becomes neccessary.

If you can get this far, you should be able to figure out texturing and lighting. Using a vertex’s pre-height-scaling position as an input to any additional noise functions (ie for procedural texturing) is generally a good idea, but beyond that it’s pretty standard fare.

Phew! That was a lot of typing! :slight_smile: I’m happy to answer any questions that may come out of this post.


Edit: Lots of them :slight_smile: Sorry if you had to read the first draft…

I dont think i’ll be writing my own procedural planet engine anytime soon but that was a fun read nonetheless.


Where is the demo Navy or some video, or it doesn’t exist! :wink:

I hear that way too often :slight_smile:

I’ll see what I can do, the project is two years old at this point.


I hereby award @NavyFish the Biggest Damned Post of the Month Award.


@Arkenbrien Hah, thanks, I guess :blush:

I’ll post a video tomorrow, got the code back up and running tonight.


A very biiiig answer that I was waiting for a while now :wink:

Effectively a video would be welcome… I will dissect every word. Probably the best answer I’ve seen for a question I asked. Thank you !

What a week! Very busy. I was going to post a video - I’ll start recording now, should have it up in a bit.


Video’s uploading now… I went overboard a bit and recorded 20 minutes’ worth (a lot of describing each technique, etc). It’s 2GB, so will take a bit to upload. Will post back with a link to YouTube as soon as that’s complete.


Thanks for the video! I’d ask questions but graphics programming is scary. I know, I’ve tried it before and didn’t like it. :stuck_out_tongue:

Every now and again I get an urge to fiddle with procedural planet generation. I think I’m near a 3rd revision of my planet generation adventure, started with XNA and moved over to Unity. My last two week spell did produce some results, this is what it looks like atm.


(accidental double post above)

@cybercritic Nice! That looks very similar to the idea I had in mind for what ended up becoming the 2nd prototype (snow/mountains)… Namely, having sliders to tweak various parameters and play with the look of the terrain. I ended up getting derailed from that ambition, and focused the snowy prototype on 3D level-of-detail algorithms instead.

Specifically, I tried scanning (parallel sum) the terrain height data generated on the gpu, then sending that information to the CPU where it would create a spherical-coordinate-system-aligned-bounding-box. Next, test a number of camera-centered LOD spheres (one for each LOD desired) against those bounding boxes The transformation was significantly easier when you first transform the camera’s position into the bounding box’s position - not the other way around (lesson learned the hard way) - particularly buecause this allows you to make a very good AABB approximation of the spherical-aligned box (SCABB?). I am still undecided whether or not the end result is worth the the effort (programming-wise or computational-wise), though - it does help ‘edge’ cases where there’s a steep change in elevation within a patch, but testing against a patches’ average coordinate is a pretty close approximation to this. Either way, it was a good adventure in running compute algorithms on the GPU (something that has benefited other projects since then greatly…)

Anyhow, thanks for sharing! I really like your terrain coloring alg!


@Lomsor Now I’m curious what you redacted up there! :smile:

It was this, Indie Planet Generation, you can also click on the edit “pen” to see what was there.

The last revision on my prototype is using a compute shader for the height generation and the spherical mapping is working fine. I’m not sure what exactly you need those boxes for, haven’t even thought about needing them. Also I’m using the mid-point displacement algorithm for height generation, the theory being that you could generate a surface height map in a dedicated planet modelling program, at a specific LOD detail and then let the algorithm just continue generating lower level data from that point…

One thing that I can say about planet generation is that there are always problems to be solved and something new to be learned :slight_smile:

[quote=“cybercritic, post:18, topic:765”]
One thing that I can say about planet generation is that there are always problems to be solved and something new to be learned
[/quote] Amen!

So, I need the bounding boxes to determine how far away I am from a piece of terrain, in order to determine its LOD.

Initially, I tried to do this without requiring knowledge of the generated terrain’s height, other than inherently knowing the minimum and maximum height of terrain that could exist on the planet. So effectively, my LOD determination was 2D. Basically, here’s what’s going on (that’s link’s an interactive web app I made demonstrating the concept…I’d had the processing sketch already, but converting it to javascript for this post was trivial. Processing.js rocks…)

The problem here is that whenever the camera is within the [terrainMinHeight, terrainMaxHeight] range, you’re basically having to assume that it’s right at the surface for your LOD determinations. In other words, imagine your terrain can range from -10,000 to +10,000 units, and that the camera is somewhere between those two heights. Even though directly under the camera the terrain may be at height -4,000, and the camera at height +2,000, a huge distance away, the CPU has no way of knowing the distance is that large (because it doesn’t have terrain height data, which is all on the GPU). Thus it must select the finest level of detail.

So, I needed to send some height data from the GPU to the CPU (which I’d probably need to do anyway, unless all terrain collision detection is performed on the GPU). By giving the CPU the exact min and max height values of a terrain patch, it can compute a bounding box and allow us to perform a true 3D level-of-detail test. The actual bounding box test can be somewhat tricky, but the approximation I mentioned in the last post was to simply do a distance test to the patches’ bounding box’ 3D midpoint: Note, you still need its max and min height to determine this.

I hope that explanation is a bit better than the first. With the huge variety of techniques available for this kind of work (and no real “single-best techniques”, it’s easy to falsely assume everyone uses the same terminology when describing a concept.

How about just checking the distance of your camera position to some vertex, the middle one of your patch for instance and split according to that distance?

Seems a lot less intense than what you are doing with those bounding boxes…

Absolutely, and that’s the approximation I mention in the 2nd to last paragraph. But note, you still need to send height data from the GPU to the CPU to do this (a patches’ min and max height will suffice). Why? Because even if I have the ‘lateral’ midpoint of a patch (i.e. ‘latitude and logitude’), it’s ‘vertical’ midpoint could be anywhere from -10,000 to +10,000.

Your vertex contains the xyz position…