Review : This article's kinda crappy; just read the ROAM papers, you'll be fine. ------------------------------------------------------------------------- The Genesis Terrain Renderer Technology White Paper by Charles Bloom (cbloom@wildtangent.com / cbloom@cbloom.com) December 4, 1999 (see the addendum at the end for miscellaneous changes since this was written) ------------------------------------------------------------------------- INTRODUCTION The Terrain Renderer in the Genesis3D/Wild Tangent Engine was the first cutting edge terrain renderer to be integrated with a solid geometry engine. I'm going to describe the rendering pathway and some of the capabilities of the terrain engine. A summary of the technologies : 1. Real-time tesselation by view-dependent error. Tesselation to a minimum screen space error, or to a maximum number of subdivision steps. 2. Execution time is O(N) in the number of visible tris 3. View-independent simplification to lower memory use 4. Phong lighting in the texture with shadows 5. Polygon-accurate collision 6. Synthetic heightmaps and textures Recommended reading : Hough Hoppe's papers, the ROAM paper, papers of Lindstrom et. al. ------------------------------------------------------------------------- RENDERING Terrain is a built from a heightmap. This means that it can only describe elevations off of a plane (eg. no overhangs). The terrain object also includes an fBm (fractal Brownian motion) noise algorithm for generating synthetic terrain. When the heightmap is applied, a quadtree is created containing error information for real-time tesselation. Tesselation is done by a visual error metric, so the size of the heightmap has no effect what so ever on the number of polygons rendered, or the resulting framerate. That means that you can use arbitrarily large heightmaps, and the only penalty is in the startup time (and memory usage). The Terrain also does sub-pixel smoothing of the heightmaps; since the heightmap is applied in 8-bit format, it has much lower resolution that the floating point coordinates. A smoothing convolution filter is applied which makes sure that the less than 1/256 'th of the range is smooth. Heightmaps are converted into quadtrees. This starts at the leaves which are quads with the four corners on heightmap points (note that this means that for a balanced quadtree, you need a heightmap which is a power of two plus one (ex: 257)). Then parent quads are the average of their four children. Whenever the four children of a new quad lie within a small tolerance of the plane of the new quad, they are simply deleted. This initial view-independent simplification typically cuts out 20 - 70 % of the quads (depending on the level of detail in the heightmap). Notez : you should in general let the Terrain engine do this simplification, rather than shrinking the inital heightmap itself, because the terrain engine can do it better than photoshop. Real-time tesselation is done whenever the camera moves or rotates. Tesselation of a quadtree simply means choosing a subdivision level of the tree; imagine starting with the root quad (a giant quad over the whole terrain) and recursively selectively creating children of each quad. Tesselation is done with a small priority queue. First, the queue is cleared. Then the root quad is inserted at the top. After that, the topmost quad in the queue is removed. Then the view-dependent error for each child is computed, the children are marked as subdivided, and added to the queue in the priority determined by their error. This process is O(N) in the number of subdivided quads (never touches quads lower than the tesselated ones). Note that quads with very low errors are added to the back of the queue and never touched (there is no need to remove unused quads). Subdivision stops when the desired error is reached (that is, all remaining quads provide less improvement than desired), or a maximum number of subdivision steps have occured (which sets a minimum framerate). In practice, having both is an excellent combination. The view dependent error is the most important part of the process. It measure actual screen-space distortion. Let me first describe the variables that are cached in the quad for fast true-error computation. Each quad contains a normal (which is an average of the triangle normals at the four corners) and a "cone of normals". The cone of normals is the smallest cone which contains the quad normal at its center and contains all of its child quad normals and corner point normals. The cone of normals is actually stored as simply the sine squared of the angle of the cone. Each quad also contains an axially aligned bounding box; the XY extents of this box exactly match the box corner coordinates (this is an advantage of the quadtree), and the Z extents contain all the children. Finally, each quad contains an absolute measure of the heightmaps "non-planarity" over that quad. That is, in the XY extents of the quad, all the heightmap points are projected onto the plane of the quad, and the sum of the distances of all these points are added. Note that no quads with a nonplanarity of zero can ever have children, because the children would have been cut in the view independent simplification. Now, with all this data we can compute a screen-space error. First, if the camera is inside the bounding box of a quad, it is subdivided. Then the bounding box of the quad is clipped against the view frustum (bounding box - plane tests are very fast). The frustum clipping is percolated heirarchically down the tree; eg. the clip flags are initially set to all planes, and then whenever a quad is totally out of one plane or totally inside it, we know all its children will be also. Next, the dotproduct of the camera direction and the quad normal is taken. If this dot product is greater than zero, and the square of it is greate than the sine squared which represents the cone of normals, then the quad is set as backfaced, and not subdivided. If the cone of normals is not used, then parent quads may be backfaced even when their children are front-facing. Now that we have decided a quad has the potential for subdivision, we compute the error from two components: the screen-space nonplanarity, and the silhouette bias. The screen space nonplanarity is very simple : it's just the quad's nonplanarity divided by the distance from the camera to the quad squared. This is proportional to the screen space area time the average nonplanarity in screen space. The silhoette bias is a measure of the quad's normal being perpendicular to the camera; when you see a quad head on, its error is invisible; furthermore, a quad which is normal to the camera may be a true silhoette, which we must tesselate. The silhoette bias is measured by taking the sine squared of the angle between the quad and the camera direction (this is just the magnitude of the cross product, squared), adding the sine squared of the cone of normals, and adding a constant term. The final error is just the product of the screen space nonplanarity and the silhoette bias. Once we have tesselated, we may simply render. We already have the heirarchical camera frustum clip flags in all the quads, so we may heirarchically render. We start at the root and recursively descent to the leaves. Any quad which is marked as "all out" is simply not descended, and it's children are never touched. Any quad which is marked as "all in", and all its children, are simply rendered directly. Quads which are marked as partially in must be clipped, but only by the planes that they cross. The result is that even in many-thousand polygon scenes, only about a hundred plane clips are required! Polygons are also backfaced in screen space (by clockwise/counterclockwise detection) immediately before rendering. The only detail about rendering we haven't addressed is T-joint fixing. T-Joints happen when a quad is subdivided, but its neighbor isn't. To fix them, we simply keep a linked list of "next in x" and "next in y" points in each point. This is simply a one way list, since points are never removed, and the list only has to be touched for subdidivded quads, so maintaining it is again O(N) in the subdivided quads. The final rendering is then done by stepping around the points on the linked list to gather all the neighbor points, and rendering those triangles. Note that Binary Triangle Trees offer some advantage over the quadtree method. For one thing, you can render leaves immediately since they're triangles. For another, each node or leaf has a well-defined plane and normal, unlike quads which may be two triangles. Finally, because each leaf is an actually rendered triangle, there is an exact correspondence between node errors and actual visual error, while with the quadtree the triangulation actually adds polygons which will typically give you significantly more quality than estimated from the quads. ------------------------------------------------------------------------- OTHER TECHNOLOGIES Texturing is done by applying a power-of-two dimension grid of textures to the terrain. The terrain is always tesselated to at least 3 levels, which gaurantees that as much as a 16x16 grid of textures can be applied. Using a grid of textures is better than one big texture, because distance textures will mip down to smaller versions. Static lighting can be done on the vertices or cast into the textures. Vertex lighting is fast, but makes the tesselation visible through popping (actually, this can eliminated with lighting-dependent tesselation, but that pathway is not promising). Texture lighting is a slow one-time operation, and can be saved. Accurate shadowing casting is done, including shadows cast by other objects. If the texture is higher resolution than the terrain, Phong interpolation is done on the terrain normals. An ambient light parameter is also taken to help simulate realistic day-time lighting. The Terrain provides collision functions for seamless integration into the engine environment. Collision is done versus rays and bounding boxes extruded along rays. Collision works by sending the thick rays down the quadtree. At each level, the thick ray is checked against the bounding boxes of all the children. Any child which collides with the ray gets the ray sent to it. When the thick ray gets to a leaf, the ray is clipped into the extbox of the leaf, and then collision is done with the two triangles that make up the leaf. Note that collision is done against the full detail terrain, after view independent simplification, so that T-Joints may exist (and are handled, but may cause small anomalies). If a collision is found, and the exact collision point was not requested, then the stack of quads is instantly cleared and we return, for very fast lighting collision. As previously noted, fBm (aka Perlin) noise is used to generate synthetic heightmaps. This provides cumulous-cloud like terrain. The fBm noise is also used to synthesize textures. Several source textures are taken as input, and used as the primary texture in different height ranges. The textures are blended together between the height ranges, and the blending is perturbed by the noise. ------------------------------------------------------------------------- ADDENDUM May 17, 2000 Since this was written, the terrain renderer has been revised in various ways to be better suited to modern hardware. This means pushing more triangles using less CPU time (to make sure the 3d card is being heavily utilized). The first step is making use of d3d vertex buffers and triangle lists. To do this, instead of rendering polygons directly, the verts of a polygons are added to a vertex buffer (if they aren't in it already), and the polygon is added to the indexed triangle list. This allows for sharing of verts across quads in the transform pipe. All triangles are still clipped by the frustum with heirarchical clip flags, because that is so very cheap, so they are added to the triangle list post-clip, and d3d does no clipping. Hardware Transform cards then transform the vertex buffer and triangle list for me. Clipping is also done with the "guard band" on modern cards. That means that if a triangle goes out of the screen rect, but stays in the larger virtual screen (aka the guard band) it need not be clipped. To faccilitate this, two frusta are used : a guard band frustum and a screen frustum. Triangles are all-out-rejected with the screen frustum, they are all-in-included with the guard band frustum, and if neither of those pass, they are clipped to the screen frustum. Refinement priorities (the sorted queue) is now done with a heap data structure. See Sedgewick's "Algorithms", for example. The heap is a static-sized array; it only needs as many elements as the maximum number of allowed refinements, which is a user-specified variable. Silhouette refinement has been removed. This allows us to stop storing normals and "Gauss maps" in the quad nodes, which significantly reduces memory use, making the terrain engine much faster. It also lets us avoid computing the gauss map at startup time, which is quite expensive. -------------------------------------------------------------------------