View Independent Progressive Meshes (VIPM)
Technology Doc by Charles Bloom (cbloom@cbloom.com)
June 5, 2000
1. Introduction
2. Rendering with a VIPM structure
3. Creating the VIPM structure
4. Using VIPM with Smooth Skinned Characters
5. Using VIPM for Terrain
--------------------------------------------------------------------------
1. Introduction
VIPM is a static single collapse-order continuous level of detail (LOD)
technology.
Why do we want LOD technology? First, to be able to target different
machine Speeds. It's almost impossible to predict what kind of computer
your game will be running on; hardware improves very fast, and a good
game like Half Life may be played by fans for years after it's release.
Furthermore, your players may have a wide range of different machines.
Gamers these days run from Pentiums with Voodoos to Athlons with
GeForces; both gamers deserve as rich a visual experience as possible at
60 fps. Second, eliminating unnecessary polygons. Rendering triangles
that are smaller than a pixel is just silly, but can easily happen when
rendering high detail models in the distance. If you have a high detail
model and duplicate it around the world, the total polygon count will be
astronomical without LOD, and very low with it.
Why VIPM instead of the more traditional view-dependent LOD (the
canonical references here are Hoppe, ROAM, and Lindstrom) ? Because very
hardware friendly implementations can be found. For height fields, there
is some debate here, since very hardware friendly implementations of
ROAM exist; for arbitrary polygonal models, there is no question.
Maintaining the active vertex front of Hoppe's Progressive Meshes with
all the dependencies of collapses and refines is simply too much work
for CPU's these days, and can't push polygons at the GPU (graphics
subsystem) fast enough. It's important to fully utilize the GPU, because
if it's idle while you're doing game logic, then you didn't push enough
polygons at it. Unlike CLOD techniques, the CPU work of VIPM is
independent of the number of polygons visible in the mesh, which allows
it to extend to very high visible polygon counts indeed.
Required reading before continuing :
"Progressive Meshes", Hough Hoppe, Siggraph '96, and Hough's web page.
"Quadric Error Metrics", Garland & Heckbert, Siggraph '97, and on the web.
Hough Hoppe invented the VIPM technique, but didn't realize how powerful
it was. Tom Forsyth and Jan Svarovsky were the first (I believe) to
realize the importance of VIPM to modern hardware.
--------------------------------------------------------------------------
2. Rendering with a VIPM structure
To begin describing the VIPM algorithm, I will detail how to render with
a VIPM structure once it is already created. I assume something like a
D3D GPU interface; that is, an opaque vertex buffer (VB) and a triangle
list of triplets of indices into the VB.
The VIPM reconstructs the mesh incremental through "vertex split"
operations. In pseudo-code, the vsplit structure is :
struct vsplit
{
uword vsplitVert;
// the vert which I'm splitting off of;
// (the vert I create is just the current end of the VB)
ubyte numNewTriangles;
ubyte numFixFaces;
ulong fixFacesIndex;
// the index here is an offset in the FixFaces list
// which contains offsets in the triangle list
float error;
}
So, we have a VIPM mesh in it's current state, whatever that may be. In
that state, it has some number of active verts (CurNumVerts). Those
active verts are at the front of the VB, filling slots 0 through
(CurNumVerts-1). It also has some current triangle list length
(CurNumTris); the current triangle list only refers to verts in the
active range. We can render with this mesh simply by telling the GPU to
render our triangle list of CurNumTris and CurNumVerts; this requires
zero CPU work per frame.
To increase the number of active verts, we apply a vsplit. If you've
read Hoppe, you know what this is, but what we're doing is taking a vert
in the active mesh (vsplitVert) and pulling it to make a new edge,
connecting that old vert and a new one, in CurNumVerts in the VB (e.g.
the next one linearly in the VB). This pulling also adds two new faces
along the new edge (except when the new is on a boundary, where it only
adds one, or when the new vert isn't connected to the old one at the new
LOD, in which case it adds none). This number of new triangles is stored
in numNewTriangles; all we do is add that to CurNumTris, and the new
triangle's indexes are already stored in the triangle list! We knew
exactly which indices to put in (two of the indices on the new triangles
are always vsplitVert and the new vert) because the mesh is always in
the same state when we do a vsplit.
Finally, we must take all the old faces that were on vsplitVert and that
are now on the new vert, and change that vertex reference. Hough Hoppe
stores faces neighbors in his PM, so he can figure out which faces he
needs to fix. I can't afford to do that much CPU work, so I precompute
the "fix faces". I store them all in a linear array in the order they're
needed. The vsplit refers to a sub-portion of the fix faces specified by
the start (fixFacesIndex) and length (numFixFaces). We simply look at
all the faces in the fix list and change vsplitVert to the new vert
index.
I'll go into pedantic detail here because this is hard to communicate.
At full LOD, if I was doing no PM, I would have a triangle list TL which
is a sequence of triplets of vertex indexes : {1,2,3}{3,2,4}{5,6,7}..
(or whatever). These indices are references into the VB, which is a
linear vertex array (eg. 1 in the triangle list means use the vertex
VB[1]). We fire these at D3D with DrawIndexed(TL,VB); Now, imagine I'm
doing LOD and I have some current TL_Len and VB_Len. I do a refine like
this :
A) TL_Len += vsplit->numNewTriangles * 3;
B) for(int f=0;fnumFixFaces;f++)
{
int triangleEntry = FixFacesList[ CurFixFace + f];
assert( TL[triangleEntry] == vsplit->vsplitVert );
TL[triangleEntry] = VB_Len;
}
C) VB_Len ++;
The steps are : (A) exposes some number of new triangles along the edge
we're splitting open. (B) changes the old triangles on the fan that used
to be collapsed onto vsplitVert to point at the new vert in the VB (note
that (triangleEntry/3) specifes which triangle to modify, and
(triangleEntry%3) specifies which vert on that triangle).. (C) adds that
new vert to the active set.
Now, we can actually drop the fixFacesIndex from the vsplit structure.
We do this simply by storing a CurFixFacesIndex, and adding numFixFaces
to it as we do one vsplit after another. The result is that the whole
vsplit structure takes only 8 bytes. The fix faces also cost 2 bytes
each (assuming a maximum of 64k faces), and there are typically three or
four fix faces per vsplit. The result is 14 bytes per vsplit.
To reduce the active triangle count, you do the vsplit in reverse, which
is an "edge collapse" (ecol). The ecol is done by taking (CurNumVerts-1)
and vsplitVert and collapsing that edge together, eliminating
CurNumVerts-1 from the mesh (which is simply done via CurNumVerts--). We
must go through the fix faces and change CurNumVerts-1 to the old
vsplitVert, and then we reduce the current number of triangles by
numNewTriangles.
When we render, we change the mesh to match some desired error. To do
this, we simply keep doing vsplits while the error in the vsplit is
larger than the desired error, or do ecols while the error in the vsplit
is less than the desired error. The CPU work is thus proportional to the
number of verts added or removed from the mesh per frame. Note also that
the memory touches in the VIPM algorithm are highly linear and coherent,
and the memory footprint is very low, which are perhaps the most
important factors. A typical vertex is 36 bytes, so an additional 14 for
vsplit information is insignificant.
It is also important to note that the VIPM algorithm can never be worse
than rendering the geometry without LOD. First of all, if VIPM does no
LOD changes, it can fire the VB and triangle list at the GPU with no
work at all (note that VIPM never touches the VB; it is opaqued
completely). If VIPM does do LOD changes, the work to reduce the
triangle count is less than the cost of rendering those triangles. If
the GPU is so fast that rendering the triangles is faster than
eliminating them, then the tolerated error should be lowered so that
fewer collapses are done!
While on the rendering issue, it should be noted how to make multiple
instances of one VIPM object in different locations. The VB can (and
should) be shared among all the VIPM's, since it is never written to by
the VIPM, each different VIPM will simply use a different length off
prefix of the VB. The triangle list should be copied for each instance
to allow for frame to frame coherence. In extreme cases (e.g. if the
whole triangle list fits in the CPU cache and nothing else is going on)
it may be acceptable to use one shared triangle list and do all the
necessary vsplits and ecols to adapt to the different triangle count of
each instance. A better way to reduce the memory use of the triangle
lists is to have one master triangle list; then each instance only needs
to allocate space for as many triangles as it's using, and when it goes
to higher LOD, it pages in the triangle list from the master. If you
have a high detail mesh duplicated many times (such as a tree on
terrain), then the vast majority of them will be at low polygons counts,
requiring very little memory (and CPU and GPU time) indeed.
How do we choose what error to refine to? First of all, I should note
that the error stored in the vsplit is a squared fractional geometric
error. We want to convert that error into a screen space error. That's
easily done via :
ScreenError = (vsplit->error) * (Screen Area) * (ObjectSize^2) / (ObjectDistance^2)
Here ObjectSize is the geometric dimension of the entire VIPM object
(that's variable, you can scale the object at will and the VIPM is not
confused). The ObjectDistance is the geometric distance to the object; I
use the distance to the bounding box of the object, plus ObjectSize *
0.1 to prevent the distance from ever going to zero when you're inside
the bounding box of the object.
So, our render function takes a desired error, in pixels; one is an
excellent choice, but you can use a higher error on slower machines. We
then compute the desired vsplit error by inverting the above equation
for vsplit->error , and use it as described above to get the LOD
corresponding to our error.
Note that using an allowed error in pixels means that you automatically
use higher polygon counts at higher view resolutions. This corresponds
directly to users' intuition that they can run at higher resolutions if
they have faster machines. You should never ask for an error lower than
1 pixel, because all that will give you is multiple triangles packed in
the same pixel, which will gives you aliasing due to high triangle
counts (!!). LOD in general is a nice heuristic way of anti- aliasing
geometry.
I should also emphasize that I've mentioned nothing about morphing the
LOD changes. This is due to what is sometimes referred to as the "mantra
of LOD" : any LOD geometry reduction should only be done when that
change is so small as to be imperceptible. Of course this may not be
true if you have severely reduced your polygon count so that you can run
at 60 fps on a Pentium-100 , in which case morphing may be desirable.
It's quite easy to morph in the changes in the VIPM process; whenever
you add a vert, you morph it over roughly half a second from the
original vsplitVert's position to the new vert position (similarly when
you do a collapse). IMHO this is not worth implementing.
--------------------------------------------------------------------------
3. Creating the VIPM structure
Now we'll describe the most difficult part of the VIPM algorithm :
creating the VIPM structure described above. This is a complex problem,
and I'll leave the bulk of the explaining to Hough Hoppe.
We start with the original, full detail mesh. We find an edge collapse,
and remember how to do it in the 'vsplit' structure. We then do the edge
collapse, and we have a mesh with one fewer vertex. We keep doing this
until we have a mesh that cannot take an edge collapse (for example, if
all verts are on a boundary, or collapses will make the mesh
degenerate). We shuffle the vertex order so that the verts in the
minimal mesh lie first, and then verts are ordered by their vsplit.
We differ from Hoppe and Heckbert in that when we do an edge collapse,
we don't create a new vert for the edge. Instead we snap one vert onto
the other (which means that we never use vertices that aren't in the
original mesh). The result is that each edge has two different 'edge
collapses' : if the edge has vertices a & b, then the collapses are a->b
and b->a ; these will have different errors and must be considered
separately.
Now some notes on choosing the edge collapses. At each step, you should
do the edge collapse which produces the smallest error. To do this, you
take all the edges, and compute this error, and keep them in a sorted
list (you should use a heap). Then you just pull the smallest error off
the heap at each step. When you do an edge collapse, you must recompute
the errors of all the verts which lie on a face which includes that
edge.
For the edge collapse (vsplit) error metric, I use the quadrics of
Heckbert and Garland; these are a great geometric error, and are blazing
fast to compute. The only time when the quadrics give bad answers are at
a very severe corner, that is, for an edge whose faces are pointing in
opposite directions, or very nearly opposite directions. For opposite
directions, the quadrics return zero error (because they measure planar
deviation) which obviously isn't right. For nearly opposite directions,
the floating precision gets very bad, because the difference between the
normals becomes very small. In either case, we can properly penalize
these collapses by adding a term equal to the original area of the
triangles in question minus the post-collapse areas. On a manifold
surface in an interior edge, the areas will change very little (on a
flat portion, the area doesn't change at all). Hence, this term in the
error penalizes collapses around the boundaries and folds in the mesh,
as desired. Note that the units are right, because all the errors are in
units of length squared. This is not a big issue for Hoppe and Heckbert,
because their high poly input meshes are always very rich, but in games
I have found that many of my input meshes have quite low poly sections
(modelers for games just can't be convinced that they don't have to use
a minimal number of polygons).
I also add a term to the error for the various parametric errors. I
compute this just by taking the color and texture coordinates at the
collapsing vert and the collapse-onto vert and subtracting them, then
multiplying by the area of the two triangles being removed.
Finally, once all the errors are computed and sorted, you'll find that
they are not monotonic (since they are computed relative to the
*current* mesh, not the original). In order for our error search to
work, we force them to be monotonic by making each error equal to the
largest of the earlier collapse errors. This creates plateaus in the
error list, which will make many triangles pop in and out at once, so I
apply a smoothing filter to the error list.
I should note that if your mesh can be guaranteed to be "manifold" (that
is, each edge has one or two faces on it) then you can use a very nice
data structure like "winged edge" which will give you fast queries for
face-edge-vert neighbor relationships. I have found that I get a lot of
non-manifold meshes, mainly in the form of two sided faces (e.g. pairs
of triangles on the same three verts, but with opposite winding) and
edges with three or more faces on them (such as is formed by welding
together disjoint primitives, like a pair of spheres). Rather than
trying to make these meshes manifold, I find it easier to make the VIPM
builder work with non-manifold meshes. This means that you can't use a
really nice data structure, so instead I just keep a list of neighboring
triangles that are on each vert. When I do a collapse along an edge, I
typically remove one face (for a boundary edge), two faces (for a
manifold edge), or even three or more faces (for a non-manifold edge!).
--------------------------------------------------------------------------
4. Using VIPM with Smooth Skinned Characters
VIPM with SSC (smooth skinned characters) is a perfect match. You only
need to do the work of animating & blending vertices for the visible
vertices, so not only does the VIPM LOD process take some work off of
the GPU, it also saves the CPU from animating a bunch of verts that
aren't important. The result is that you can animate lots and lots of
SSC's in the distance without having to do the smooth-skinning
computations for all their many verts.
How do we arrange the SSC computations to make this work? Well, the
whole VIPM is given a palette of matrices which correspond to the bones
of the character. Each vert has a SkinInfo structure which contains
several matrix palette indices and a weight for that matrix (typically
the weights on one vertex add up to one). So, after we have made all our
VIPM decisions and have a VB with some number of verts, we simply walk
through the SkinInfos for all those verts and do the SSC computation, to
find the vertex position and normal to jam in the VB. As with the rest
of the VIPM math, we only reference the prefix of the VB which will be
needed for rendering. This also lets us page in the SkinInfos from disk
as needed for higher detail.
The animation of the matrices in the palette is left to the hierarchical
bone system, IK, or whatever you use.
Note that the SSC math runs through the VB linearly in order. This is
great for memory coherency; all of your per-vertex operations
(procedural coloring or fog, distance fade outs, whatever) should be
done together in this one pass over all the (visible) verts.
We compute the errors and collapse order only once, and when we animate
the mesh, we don't update them. This is ok if the error for any edge is
the maximum error that the edge ever sees in the animation (that will
guarantee we never have a larger error than we think). Actually doing
that is quite expensive computationally, so we seek some approximations.
The cheapest way is just to make sure that the model starts out in the
"most flexed" position, where every edge is at its maximum curvature
(e.g. elbows and knees bent at right angles, etc.). This is not always
possible, in which case we could randomly sample some animations and
take the largest error seen. Another heuristic, advocated by Tom
Forsyth, is to let the artists specify critical edges which must be
maintained to all LOD, thereby making sure that the elbow joint never
gets removed, for example. A final technique is to include to matrix
SkinInfo as part of the parametric error counted in the edge collapse
error; you could include the sum of the squares of the differences of
blending weights, for each bone (multiplied by the triangle areas).
Unfortunately, the VIPM technique does not work well for totally dynamic
meshes; the meshes must have some specific structure for the VIPM to
work with. The SSC provides some restrictions on the way the mesh can be
animated, which lets us collapse out smooth portions of the mesh that
don't get distorted by any phase of the SSC animation.
--------------------------------------------------------------------------
5. Using VIPM for Terrain
Terrain renderers seem to be the most common applications of LOD in the
year 2000, so I'll say a few extra words about using the VIPM technology
with terrain.
The basic idea is that you take a height map of NxN pixels, and divide
it into MxM chunks, and you get L = (N/M)*(N/M) of them. Typically N
will be on the order of 2k, and M will be on the order of 64, resulting
in a mesh with 8 million triangles at full detail, and L = 1024 chunks,
each chunk with 8k triangles and 4k verts. We create a VIPM for each
chunk, and render them independently, doing the LOD adaptation for each
chunk.
Since each chunk is a VIPM, we can page in geometry as needed, so we
never need to keep all 8 million triangles in memory. Using this
technique, we can have terrains that are nearly infinite, and the
streaming is linear and easy to implement. Also, we can cull each chunk
against the view frustum. Finally, we could construct a PVS from chunk
to chunk (or even to finer resolution) to knock out chunks that are
occluded.
The biggest issue is with seams between the VIPM chunks. Obviously, if
you do LOD on neighboring chunks independently, you will get cracks.
Maybe the neatest solution is the "flanges" technique due to Tom
Forsyth. In this method, you simply make neighboring chunks overlap each
other, so that when you do collapses, the connections may not be
perfect, but there will be no cracks. The flanges are constructed by
giving each chunk a border which lies under its neighboring chunks. The
conceptual drawing looks like :
-------\ /----------
X
/ \
/ \
You simply let them intersect, and allow the Z buffer to do CSG for you.
The easiest seam solution to implement is simply to leave the boundary
edges of each chunk at full detail. The penalty here is that you have
M*4 triangles in each chunk that are basically wasted keeping the edges
tessellated at all times (this is 250 triangles for M=64, which is
1/32nd of the 8k triangles in the mesh). IMHO this is really not so bad,
unless you want a truly huge terrain with a very far clip, in which case
the distant chunks will have 250 triangles when they could have two.
Another solution is the "boundary last" technique. In this case, we do
the usual VIPM on all the chunks, keeping boundaries at full
tessellation. Once we find a VIPM chunk with such a high error tolerance
that it would be acceptable to use only two triangles, we look at our
four neighbors and see if they also have such a high error tolerance.
When that is the case, then we have two neighbors with only their edges
tessellated. We then snap their connecting boundary to only two verts
(getting rid of all the verts on the boundary). We can do this simply by
storing 16 meshes for each chunk, with all the possible configurations
of boundary edges have 2 verts or M verts; this requires only 32*M verts
(2k for M=64) for all of these meshes.
(Addendum 6-13-00:) These techniques will never reduce a chunk to fewer
than four verts and two triangles (the full-LOD-edges method has a much
higher minimum number of triangles). If you have really huge terrain on
a low-curvature world (so lots of chunks are visible), then further
reductions may be necessary. In this case you can use a quadtree
structure to merge chunks in the distance. If you think of the grid of
terrain chunks as the leaves of the quadtree, then whenever you find a
quadtree node whose children are all at minimum LOD, you combine those
children into one chunk. This is obviously not much of an advantage with
the full-LOD-edges method, because the new chunk must still have all the
verts of the four old chunks, except for the seams down the middle of
the chunk. The quadtree method works with the "boundary last" method,
but becomes quite a lot of bookkeeping to make sure that chunks border
the right kind of neighbors. It definitely works best with the "flange"
technique, because you can simply merge the chunks and let the flanges
handle the seaming. Note that the minimum mesh in the flange technique
has eight verts and ten triangles to make sure that it does indeed have
flanges that will prevent cracking. With any of these techniques, it is
worthwhile to collapse four children into one parent as soon as the size
of the parent is much less than the distance from the camera to the
parent. This follows from the general principle that the VIPM works
better when you merge meshes together, except for the fact that the
sub-portions become less responsive to relative camera positions.
Comparison to ROAM is inevitable and instructive (when I refer to ROAM,
I'm really referring to the entire class of height map-based CLOD
terrain algorithms, including those of Hoppe and Lindstrom). The first
thing we notice is that the VIPM algorithm provides a much more general
set of triangulations (e.g. not constrained by the binary-triangle-tree
structure of ROAM, or quad trees, or etc.), and is correspondingly
slower to build. The second thing we notice is that the VIPM terrain can
be non-height map, e.g. it can seamlessly represent caves, arches, etc.
with one single skin. VIPM also has a clear advantage in that you only
ever have to write one hunk of rendering code, so you can spend more time
working in one place to get it all right.
Aside from these differences, we should note that the VIPM and ROAM
techniques are really apples and oranges in a way. The VIPM algorithm is
designed to push machines with fast GPU's , and is optimal in that case
(more triangles means better specular, better hardware lighting, etc.
even if the error estimated is the same as that of ROAM). The ROAM
algorithm was designed to push old hardware with CPU's much faster than
their GPU's and is optimal in that case; ROAM can push a maximum of
about 1 MegaTriHz on current fast machines, while VIPM can push about 4
MegaTriHz (most ROAM implementations are much slower than 1 MTriHz, and
it's clearly optimal at much lower triangle rates like 0.2 MTHz).
Ignoring these differences, we'll assume that your ROAM and VIPM
implementations are both aiming at 1 pixel of screen space error (e.g.
never a visible pop).
I think that VIPM is a lot easier to stream in, either from disk to
memory, or from a network server. This is because each chunk can be
streamed in simply by adding to its list of vsplits and vertices, e.g.
no touches of the already existing data are needed. In contrast,
streaming most of the height map based terrain algorithms requires
bringing in progressively higher resolution height maps (like a mipmap
chain) and building a new bin tree or quad tree on that height map. You
then have to worry about seams between the various patches of height
maps at different resolutions; while this is all doable in practice,
it's a lot more work. Furthermore, because of the sorting of the VIPM
collapses, we know that we're streaming the data in the right order,
e.g. the triangles arrive in order of importance.
The biggest issue for VIPM is the inability to quickly modify the
structure. Something like ROAM can be modified quickly because it has a
natural tree structure, which allows modification of one height map
pixel in log(N) time by percolating the changes up the tree.
Unfortunately, VIPM has no simple hierarchical organization (see Hoppe
for a multi-child multi-parent tree structure for progressive meshes
that I find too cumbersome to be practical), so we can't percolate
changes. Even if we give up on changing the collapse order in the VIPM
(certainly reasonable for small changes, like blowing dents in the
ground in Tread Marks) and just want to update the errors, we have to
check the errors of all the edges in the VIPM. This is an unsolved
problem, and for the moment, you simply can't use VIPM if dynamic
landscapes is important to your game. If you want to let landscapes be
dynamic only occasionally or in isolated places, then you could simply
make the dynamic chunk full-LOD and do normal height map modifications
to it.
(addendum 6-14-00:) I should mention the issue of memory use. ROAM is
quite lean in its memory use; good ROAM implementations need only 3
bytes per heightmap pixel (two for the 16-bit pixel and one for an
error) to store the full-LOD terrain. For a 2k square heightmap, this is
12 megabytes, which is really the practical maximum. Good ROAM
implementations need about 20 bytes, plus the size of the vertex (32
bytes, I'm assuming), for the on-screen verts, which should be around
10k verts, or 520k of "active" memory. Note that in almost all ROAM
implementations, this entire 520k of active memory is touched each
frame. Storing that entire 2k square heightmap as a VIPM would be
prohibitive, requiring 128 megabytes (2k x 2k x 32). Instead, we split
our heightmap into 64x64 chunks, each of which requires 128k of memory
(for verts, and about 50k for the refinement info, as noted above). We
then stream in only as many verts to each chunk as needed (with a few
more to make sure we'll have as many as we want next in the next frame).
Typically, we can run scenes with 50k verts, which requires about 2
megs. If we pad all of our chunks to reduce disk reads, we might use as
much as 4 megs for all our VIPM chunks; note that this 4 megs of memory
use is independent of the size of the full-LOD mesh (that only affects
disk use, which can be greatly reduced since VIPM lends itself very well
to compression (see Hoppe)); also note that these 4 megs are not
actually touched per frame (except by D3D), only the small portion of
memory where incremental changes are made needs to be touched. This is
rarely more than 1k verts per frame, or a mere 50k bytes of memory
written to by VIPM.
--------------------------------------------------------------------------
EOF