View Independent Progressive Meshes (VIPM) Technology Doc by Charles Bloom (cbloom@cbloom.com) June 5, 2000 1. Introduction 2. Rendering with a VIPM structure 3. Creating the VIPM structure 4. Using VIPM with Smooth Skinned Characters 5. Using VIPM for Terrain -------------------------------------------------------------------------- 1. Introduction VIPM is a static single collapse-order continuous level of detail (LOD) technology. Why do we want LOD technology? First, to be able to target different machine Speeds. It's almost impossible to predict what kind of computer your game will be running on; hardware improves very fast, and a good game like Half Life may be played by fans for years after it's release. Furthermore, your players may have a wide range of different machines. Gamers these days run from Pentiums with Voodoos to Athlons with GeForces; both gamers deserve as rich a visual experience as possible at 60 fps. Second, eliminating unnecessary polygons. Rendering triangles that are smaller than a pixel is just silly, but can easily happen when rendering high detail models in the distance. If you have a high detail model and duplicate it around the world, the total polygon count will be astronomical without LOD, and very low with it. Why VIPM instead of the more traditional view-dependent LOD (the canonical references here are Hoppe, ROAM, and Lindstrom) ? Because very hardware friendly implementations can be found. For height fields, there is some debate here, since very hardware friendly implementations of ROAM exist; for arbitrary polygonal models, there is no question. Maintaining the active vertex front of Hoppe's Progressive Meshes with all the dependencies of collapses and refines is simply too much work for CPU's these days, and can't push polygons at the GPU (graphics subsystem) fast enough. It's important to fully utilize the GPU, because if it's idle while you're doing game logic, then you didn't push enough polygons at it. Unlike CLOD techniques, the CPU work of VIPM is independent of the number of polygons visible in the mesh, which allows it to extend to very high visible polygon counts indeed. Required reading before continuing : "Progressive Meshes", Hough Hoppe, Siggraph '96, and Hough's web page. "Quadric Error Metrics", Garland & Heckbert, Siggraph '97, and on the web. Hough Hoppe invented the VIPM technique, but didn't realize how powerful it was. Tom Forsyth and Jan Svarovsky were the first (I believe) to realize the importance of VIPM to modern hardware. -------------------------------------------------------------------------- 2. Rendering with a VIPM structure To begin describing the VIPM algorithm, I will detail how to render with a VIPM structure once it is already created. I assume something like a D3D GPU interface; that is, an opaque vertex buffer (VB) and a triangle list of triplets of indices into the VB. The VIPM reconstructs the mesh incremental through "vertex split" operations. In pseudo-code, the vsplit structure is : struct vsplit { uword vsplitVert; // the vert which I'm splitting off of; // (the vert I create is just the current end of the VB) ubyte numNewTriangles; ubyte numFixFaces; ulong fixFacesIndex; // the index here is an offset in the FixFaces list // which contains offsets in the triangle list float error; } So, we have a VIPM mesh in it's current state, whatever that may be. In that state, it has some number of active verts (CurNumVerts). Those active verts are at the front of the VB, filling slots 0 through (CurNumVerts-1). It also has some current triangle list length (CurNumTris); the current triangle list only refers to verts in the active range. We can render with this mesh simply by telling the GPU to render our triangle list of CurNumTris and CurNumVerts; this requires zero CPU work per frame. To increase the number of active verts, we apply a vsplit. If you've read Hoppe, you know what this is, but what we're doing is taking a vert in the active mesh (vsplitVert) and pulling it to make a new edge, connecting that old vert and a new one, in CurNumVerts in the VB (e.g. the next one linearly in the VB). This pulling also adds two new faces along the new edge (except when the new is on a boundary, where it only adds one, or when the new vert isn't connected to the old one at the new LOD, in which case it adds none). This number of new triangles is stored in numNewTriangles; all we do is add that to CurNumTris, and the new triangle's indexes are already stored in the triangle list! We knew exactly which indices to put in (two of the indices on the new triangles are always vsplitVert and the new vert) because the mesh is always in the same state when we do a vsplit. Finally, we must take all the old faces that were on vsplitVert and that are now on the new vert, and change that vertex reference. Hough Hoppe stores faces neighbors in his PM, so he can figure out which faces he needs to fix. I can't afford to do that much CPU work, so I precompute the "fix faces". I store them all in a linear array in the order they're needed. The vsplit refers to a sub-portion of the fix faces specified by the start (fixFacesIndex) and length (numFixFaces). We simply look at all the faces in the fix list and change vsplitVert to the new vert index. I'll go into pedantic detail here because this is hard to communicate. At full LOD, if I was doing no PM, I would have a triangle list TL which is a sequence of triplets of vertex indexes : {1,2,3}{3,2,4}{5,6,7}.. (or whatever). These indices are references into the VB, which is a linear vertex array (eg. 1 in the triangle list means use the vertex VB[1]). We fire these at D3D with DrawIndexed(TL,VB); Now, imagine I'm doing LOD and I have some current TL_Len and VB_Len. I do a refine like this : A) TL_Len += vsplit->numNewTriangles * 3; B) for(int f=0;fnumFixFaces;f++) { int triangleEntry = FixFacesList[ CurFixFace + f]; assert( TL[triangleEntry] == vsplit->vsplitVert ); TL[triangleEntry] = VB_Len; } C) VB_Len ++; The steps are : (A) exposes some number of new triangles along the edge we're splitting open. (B) changes the old triangles on the fan that used to be collapsed onto vsplitVert to point at the new vert in the VB (note that (triangleEntry/3) specifes which triangle to modify, and (triangleEntry%3) specifies which vert on that triangle).. (C) adds that new vert to the active set. Now, we can actually drop the fixFacesIndex from the vsplit structure. We do this simply by storing a CurFixFacesIndex, and adding numFixFaces to it as we do one vsplit after another. The result is that the whole vsplit structure takes only 8 bytes. The fix faces also cost 2 bytes each (assuming a maximum of 64k faces), and there are typically three or four fix faces per vsplit. The result is 14 bytes per vsplit. To reduce the active triangle count, you do the vsplit in reverse, which is an "edge collapse" (ecol). The ecol is done by taking (CurNumVerts-1) and vsplitVert and collapsing that edge together, eliminating CurNumVerts-1 from the mesh (which is simply done via CurNumVerts--). We must go through the fix faces and change CurNumVerts-1 to the old vsplitVert, and then we reduce the current number of triangles by numNewTriangles. When we render, we change the mesh to match some desired error. To do this, we simply keep doing vsplits while the error in the vsplit is larger than the desired error, or do ecols while the error in the vsplit is less than the desired error. The CPU work is thus proportional to the number of verts added or removed from the mesh per frame. Note also that the memory touches in the VIPM algorithm are highly linear and coherent, and the memory footprint is very low, which are perhaps the most important factors. A typical vertex is 36 bytes, so an additional 14 for vsplit information is insignificant. It is also important to note that the VIPM algorithm can never be worse than rendering the geometry without LOD. First of all, if VIPM does no LOD changes, it can fire the VB and triangle list at the GPU with no work at all (note that VIPM never touches the VB; it is opaqued completely). If VIPM does do LOD changes, the work to reduce the triangle count is less than the cost of rendering those triangles. If the GPU is so fast that rendering the triangles is faster than eliminating them, then the tolerated error should be lowered so that fewer collapses are done! While on the rendering issue, it should be noted how to make multiple instances of one VIPM object in different locations. The VB can (and should) be shared among all the VIPM's, since it is never written to by the VIPM, each different VIPM will simply use a different length off prefix of the VB. The triangle list should be copied for each instance to allow for frame to frame coherence. In extreme cases (e.g. if the whole triangle list fits in the CPU cache and nothing else is going on) it may be acceptable to use one shared triangle list and do all the necessary vsplits and ecols to adapt to the different triangle count of each instance. A better way to reduce the memory use of the triangle lists is to have one master triangle list; then each instance only needs to allocate space for as many triangles as it's using, and when it goes to higher LOD, it pages in the triangle list from the master. If you have a high detail mesh duplicated many times (such as a tree on terrain), then the vast majority of them will be at low polygons counts, requiring very little memory (and CPU and GPU time) indeed. How do we choose what error to refine to? First of all, I should note that the error stored in the vsplit is a squared fractional geometric error. We want to convert that error into a screen space error. That's easily done via : ScreenError = (vsplit->error) * (Screen Area) * (ObjectSize^2) / (ObjectDistance^2) Here ObjectSize is the geometric dimension of the entire VIPM object (that's variable, you can scale the object at will and the VIPM is not confused). The ObjectDistance is the geometric distance to the object; I use the distance to the bounding box of the object, plus ObjectSize * 0.1 to prevent the distance from ever going to zero when you're inside the bounding box of the object. So, our render function takes a desired error, in pixels; one is an excellent choice, but you can use a higher error on slower machines. We then compute the desired vsplit error by inverting the above equation for vsplit->error , and use it as described above to get the LOD corresponding to our error. Note that using an allowed error in pixels means that you automatically use higher polygon counts at higher view resolutions. This corresponds directly to users' intuition that they can run at higher resolutions if they have faster machines. You should never ask for an error lower than 1 pixel, because all that will give you is multiple triangles packed in the same pixel, which will gives you aliasing due to high triangle counts (!!). LOD in general is a nice heuristic way of anti- aliasing geometry. I should also emphasize that I've mentioned nothing about morphing the LOD changes. This is due to what is sometimes referred to as the "mantra of LOD" : any LOD geometry reduction should only be done when that change is so small as to be imperceptible. Of course this may not be true if you have severely reduced your polygon count so that you can run at 60 fps on a Pentium-100 , in which case morphing may be desirable. It's quite easy to morph in the changes in the VIPM process; whenever you add a vert, you morph it over roughly half a second from the original vsplitVert's position to the new vert position (similarly when you do a collapse). IMHO this is not worth implementing. -------------------------------------------------------------------------- 3. Creating the VIPM structure Now we'll describe the most difficult part of the VIPM algorithm : creating the VIPM structure described above. This is a complex problem, and I'll leave the bulk of the explaining to Hough Hoppe. We start with the original, full detail mesh. We find an edge collapse, and remember how to do it in the 'vsplit' structure. We then do the edge collapse, and we have a mesh with one fewer vertex. We keep doing this until we have a mesh that cannot take an edge collapse (for example, if all verts are on a boundary, or collapses will make the mesh degenerate). We shuffle the vertex order so that the verts in the minimal mesh lie first, and then verts are ordered by their vsplit. We differ from Hoppe and Heckbert in that when we do an edge collapse, we don't create a new vert for the edge. Instead we snap one vert onto the other (which means that we never use vertices that aren't in the original mesh). The result is that each edge has two different 'edge collapses' : if the edge has vertices a & b, then the collapses are a->b and b->a ; these will have different errors and must be considered separately. Now some notes on choosing the edge collapses. At each step, you should do the edge collapse which produces the smallest error. To do this, you take all the edges, and compute this error, and keep them in a sorted list (you should use a heap). Then you just pull the smallest error off the heap at each step. When you do an edge collapse, you must recompute the errors of all the verts which lie on a face which includes that edge. For the edge collapse (vsplit) error metric, I use the quadrics of Heckbert and Garland; these are a great geometric error, and are blazing fast to compute. The only time when the quadrics give bad answers are at a very severe corner, that is, for an edge whose faces are pointing in opposite directions, or very nearly opposite directions. For opposite directions, the quadrics return zero error (because they measure planar deviation) which obviously isn't right. For nearly opposite directions, the floating precision gets very bad, because the difference between the normals becomes very small. In either case, we can properly penalize these collapses by adding a term equal to the original area of the triangles in question minus the post-collapse areas. On a manifold surface in an interior edge, the areas will change very little (on a flat portion, the area doesn't change at all). Hence, this term in the error penalizes collapses around the boundaries and folds in the mesh, as desired. Note that the units are right, because all the errors are in units of length squared. This is not a big issue for Hoppe and Heckbert, because their high poly input meshes are always very rich, but in games I have found that many of my input meshes have quite low poly sections (modelers for games just can't be convinced that they don't have to use a minimal number of polygons). I also add a term to the error for the various parametric errors. I compute this just by taking the color and texture coordinates at the collapsing vert and the collapse-onto vert and subtracting them, then multiplying by the area of the two triangles being removed. Finally, once all the errors are computed and sorted, you'll find that they are not monotonic (since they are computed relative to the *current* mesh, not the original). In order for our error search to work, we force them to be monotonic by making each error equal to the largest of the earlier collapse errors. This creates plateaus in the error list, which will make many triangles pop in and out at once, so I apply a smoothing filter to the error list. I should note that if your mesh can be guaranteed to be "manifold" (that is, each edge has one or two faces on it) then you can use a very nice data structure like "winged edge" which will give you fast queries for face-edge-vert neighbor relationships. I have found that I get a lot of non-manifold meshes, mainly in the form of two sided faces (e.g. pairs of triangles on the same three verts, but with opposite winding) and edges with three or more faces on them (such as is formed by welding together disjoint primitives, like a pair of spheres). Rather than trying to make these meshes manifold, I find it easier to make the VIPM builder work with non-manifold meshes. This means that you can't use a really nice data structure, so instead I just keep a list of neighboring triangles that are on each vert. When I do a collapse along an edge, I typically remove one face (for a boundary edge), two faces (for a manifold edge), or even three or more faces (for a non-manifold edge!). -------------------------------------------------------------------------- 4. Using VIPM with Smooth Skinned Characters VIPM with SSC (smooth skinned characters) is a perfect match. You only need to do the work of animating & blending vertices for the visible vertices, so not only does the VIPM LOD process take some work off of the GPU, it also saves the CPU from animating a bunch of verts that aren't important. The result is that you can animate lots and lots of SSC's in the distance without having to do the smooth-skinning computations for all their many verts. How do we arrange the SSC computations to make this work? Well, the whole VIPM is given a palette of matrices which correspond to the bones of the character. Each vert has a SkinInfo structure which contains several matrix palette indices and a weight for that matrix (typically the weights on one vertex add up to one). So, after we have made all our VIPM decisions and have a VB with some number of verts, we simply walk through the SkinInfos for all those verts and do the SSC computation, to find the vertex position and normal to jam in the VB. As with the rest of the VIPM math, we only reference the prefix of the VB which will be needed for rendering. This also lets us page in the SkinInfos from disk as needed for higher detail. The animation of the matrices in the palette is left to the hierarchical bone system, IK, or whatever you use. Note that the SSC math runs through the VB linearly in order. This is great for memory coherency; all of your per-vertex operations (procedural coloring or fog, distance fade outs, whatever) should be done together in this one pass over all the (visible) verts. We compute the errors and collapse order only once, and when we animate the mesh, we don't update them. This is ok if the error for any edge is the maximum error that the edge ever sees in the animation (that will guarantee we never have a larger error than we think). Actually doing that is quite expensive computationally, so we seek some approximations. The cheapest way is just to make sure that the model starts out in the "most flexed" position, where every edge is at its maximum curvature (e.g. elbows and knees bent at right angles, etc.). This is not always possible, in which case we could randomly sample some animations and take the largest error seen. Another heuristic, advocated by Tom Forsyth, is to let the artists specify critical edges which must be maintained to all LOD, thereby making sure that the elbow joint never gets removed, for example. A final technique is to include to matrix SkinInfo as part of the parametric error counted in the edge collapse error; you could include the sum of the squares of the differences of blending weights, for each bone (multiplied by the triangle areas). Unfortunately, the VIPM technique does not work well for totally dynamic meshes; the meshes must have some specific structure for the VIPM to work with. The SSC provides some restrictions on the way the mesh can be animated, which lets us collapse out smooth portions of the mesh that don't get distorted by any phase of the SSC animation. -------------------------------------------------------------------------- 5. Using VIPM for Terrain Terrain renderers seem to be the most common applications of LOD in the year 2000, so I'll say a few extra words about using the VIPM technology with terrain. The basic idea is that you take a height map of NxN pixels, and divide it into MxM chunks, and you get L = (N/M)*(N/M) of them. Typically N will be on the order of 2k, and M will be on the order of 64, resulting in a mesh with 8 million triangles at full detail, and L = 1024 chunks, each chunk with 8k triangles and 4k verts. We create a VIPM for each chunk, and render them independently, doing the LOD adaptation for each chunk. Since each chunk is a VIPM, we can page in geometry as needed, so we never need to keep all 8 million triangles in memory. Using this technique, we can have terrains that are nearly infinite, and the streaming is linear and easy to implement. Also, we can cull each chunk against the view frustum. Finally, we could construct a PVS from chunk to chunk (or even to finer resolution) to knock out chunks that are occluded. The biggest issue is with seams between the VIPM chunks. Obviously, if you do LOD on neighboring chunks independently, you will get cracks. Maybe the neatest solution is the "flanges" technique due to Tom Forsyth. In this method, you simply make neighboring chunks overlap each other, so that when you do collapses, the connections may not be perfect, but there will be no cracks. The flanges are constructed by giving each chunk a border which lies under its neighboring chunks. The conceptual drawing looks like : -------\ /---------- X / \ / \ You simply let them intersect, and allow the Z buffer to do CSG for you. The easiest seam solution to implement is simply to leave the boundary edges of each chunk at full detail. The penalty here is that you have M*4 triangles in each chunk that are basically wasted keeping the edges tessellated at all times (this is 250 triangles for M=64, which is 1/32nd of the 8k triangles in the mesh). IMHO this is really not so bad, unless you want a truly huge terrain with a very far clip, in which case the distant chunks will have 250 triangles when they could have two. Another solution is the "boundary last" technique. In this case, we do the usual VIPM on all the chunks, keeping boundaries at full tessellation. Once we find a VIPM chunk with such a high error tolerance that it would be acceptable to use only two triangles, we look at our four neighbors and see if they also have such a high error tolerance. When that is the case, then we have two neighbors with only their edges tessellated. We then snap their connecting boundary to only two verts (getting rid of all the verts on the boundary). We can do this simply by storing 16 meshes for each chunk, with all the possible configurations of boundary edges have 2 verts or M verts; this requires only 32*M verts (2k for M=64) for all of these meshes. (Addendum 6-13-00:) These techniques will never reduce a chunk to fewer than four verts and two triangles (the full-LOD-edges method has a much higher minimum number of triangles). If you have really huge terrain on a low-curvature world (so lots of chunks are visible), then further reductions may be necessary. In this case you can use a quadtree structure to merge chunks in the distance. If you think of the grid of terrain chunks as the leaves of the quadtree, then whenever you find a quadtree node whose children are all at minimum LOD, you combine those children into one chunk. This is obviously not much of an advantage with the full-LOD-edges method, because the new chunk must still have all the verts of the four old chunks, except for the seams down the middle of the chunk. The quadtree method works with the "boundary last" method, but becomes quite a lot of bookkeeping to make sure that chunks border the right kind of neighbors. It definitely works best with the "flange" technique, because you can simply merge the chunks and let the flanges handle the seaming. Note that the minimum mesh in the flange technique has eight verts and ten triangles to make sure that it does indeed have flanges that will prevent cracking. With any of these techniques, it is worthwhile to collapse four children into one parent as soon as the size of the parent is much less than the distance from the camera to the parent. This follows from the general principle that the VIPM works better when you merge meshes together, except for the fact that the sub-portions become less responsive to relative camera positions. Comparison to ROAM is inevitable and instructive (when I refer to ROAM, I'm really referring to the entire class of height map-based CLOD terrain algorithms, including those of Hoppe and Lindstrom). The first thing we notice is that the VIPM algorithm provides a much more general set of triangulations (e.g. not constrained by the binary-triangle-tree structure of ROAM, or quad trees, or etc.), and is correspondingly slower to build. The second thing we notice is that the VIPM terrain can be non-height map, e.g. it can seamlessly represent caves, arches, etc. with one single skin. VIPM also has a clear advantage in that you only ever have to write one hunk of rendering code, so you can spend more time working in one place to get it all right. Aside from these differences, we should note that the VIPM and ROAM techniques are really apples and oranges in a way. The VIPM algorithm is designed to push machines with fast GPU's , and is optimal in that case (more triangles means better specular, better hardware lighting, etc. even if the error estimated is the same as that of ROAM). The ROAM algorithm was designed to push old hardware with CPU's much faster than their GPU's and is optimal in that case; ROAM can push a maximum of about 1 MegaTriHz on current fast machines, while VIPM can push about 4 MegaTriHz (most ROAM implementations are much slower than 1 MTriHz, and it's clearly optimal at much lower triangle rates like 0.2 MTHz). Ignoring these differences, we'll assume that your ROAM and VIPM implementations are both aiming at 1 pixel of screen space error (e.g. never a visible pop). I think that VIPM is a lot easier to stream in, either from disk to memory, or from a network server. This is because each chunk can be streamed in simply by adding to its list of vsplits and vertices, e.g. no touches of the already existing data are needed. In contrast, streaming most of the height map based terrain algorithms requires bringing in progressively higher resolution height maps (like a mipmap chain) and building a new bin tree or quad tree on that height map. You then have to worry about seams between the various patches of height maps at different resolutions; while this is all doable in practice, it's a lot more work. Furthermore, because of the sorting of the VIPM collapses, we know that we're streaming the data in the right order, e.g. the triangles arrive in order of importance. The biggest issue for VIPM is the inability to quickly modify the structure. Something like ROAM can be modified quickly because it has a natural tree structure, which allows modification of one height map pixel in log(N) time by percolating the changes up the tree. Unfortunately, VIPM has no simple hierarchical organization (see Hoppe for a multi-child multi-parent tree structure for progressive meshes that I find too cumbersome to be practical), so we can't percolate changes. Even if we give up on changing the collapse order in the VIPM (certainly reasonable for small changes, like blowing dents in the ground in Tread Marks) and just want to update the errors, we have to check the errors of all the edges in the VIPM. This is an unsolved problem, and for the moment, you simply can't use VIPM if dynamic landscapes is important to your game. If you want to let landscapes be dynamic only occasionally or in isolated places, then you could simply make the dynamic chunk full-LOD and do normal height map modifications to it. (addendum 6-14-00:) I should mention the issue of memory use. ROAM is quite lean in its memory use; good ROAM implementations need only 3 bytes per heightmap pixel (two for the 16-bit pixel and one for an error) to store the full-LOD terrain. For a 2k square heightmap, this is 12 megabytes, which is really the practical maximum. Good ROAM implementations need about 20 bytes, plus the size of the vertex (32 bytes, I'm assuming), for the on-screen verts, which should be around 10k verts, or 520k of "active" memory. Note that in almost all ROAM implementations, this entire 520k of active memory is touched each frame. Storing that entire 2k square heightmap as a VIPM would be prohibitive, requiring 128 megabytes (2k x 2k x 32). Instead, we split our heightmap into 64x64 chunks, each of which requires 128k of memory (for verts, and about 50k for the refinement info, as noted above). We then stream in only as many verts to each chunk as needed (with a few more to make sure we'll have as many as we want next in the next frame). Typically, we can run scenes with 50k verts, which requires about 2 megs. If we pad all of our chunks to reduce disk reads, we might use as much as 4 megs for all our VIPM chunks; note that this 4 megs of memory use is independent of the size of the full-LOD mesh (that only affects disk use, which can be greatly reduced since VIPM lends itself very well to compression (see Hoppe)); also note that these 4 megs are not actually touched per frame (except by D3D), only the small portion of memory where incremental changes are made needs to be touched. This is rarely more than 1k verts per frame, or a mere 50k bytes of memory written to by VIPM. -------------------------------------------------------------------------- EOF