VIPM Topics October 30, 2000 by Charles Bloom (cbloom@cbloom.com) This is a collection of musing on advanced topics in VIPM. Read my introduction to VIPM document first, and some of the literature on PM. Most of this is culled from mailings on the 3d algorithms list. //}{---------------------------------------------------------------------- // Overview : First, I'll summarize some hot topics in VIPM. None of these have been actaully implemented, or written up in any formal way, but the "experts" in the field generally agree they are "good". The major topics we're trying to solve are : 1. Vertex-cache behaviour for T&L hardware. 2. Data sharing between instances of the same object 3. Semi-view-dependence for large meshes. The attempts at solving them are : 1. Vertex-cache Optimization. This is an attempt to address the fact that traditional VIPM, which sorts triangles by collapse-order, is very bad on the vertex cache of hardware T&L cards. There are two primary solutions to this, due to Tom Forsyth and myself. The first was Tom's "optimized prefixes approach". In this variant of VIPM, you take your mesh of N triangles. You do normal PM and generate a triangle list in collapse order. Then you take the r*N first triangles and copy those into "VIPM Index List 2" (0 < r < 1). Then, you can optimize the r*N triangles in the prefix of list 1. You do the same to make List 3, now with r*r*N triangles, and you can optimize that prefix of List 2. Optimizing here means moving triangles around in the index list to improve vertex cache performance. This is quite simple to implement, it can be tacked on to an existing PM implementation pretty easily. The second way is my "leave degenerate triangles". In this approach you simply take your triangle list and optimize it for vertex caching. Then, when you collapse a triangle for PM, you simply change the vertex index and don't remove the triangle; collapsed triangles become degenerate (that is, two indices are the same) and you just leave them in the list. This can be combined with Tom's technique. There are mail messages on advanced variants of these ideas below. Vertex-cache optimization can mean the difference between 3 Mtris and 18 Mtris on a GeForce2. 2. Semi-view-dependent meshing for large models. This is basically the same topic that has been addressed at length for VIPM of terrain, which I talked about in my other article. I now see this is a more general issue, for example to do VIPM of a corridor shooter, or a whole castle. The rough solution to this is to break the model up in a binary-tree-like way. At a distance, you have just one model, for the castle, say. As you get closer, VIPM adds triangles up to some point. At some number of triangles, that triggers a swap. The original castle model is replaced with two separate ones (precompouted), both of which do VIPM independently. As you get closer, one or both of those models may be swapped out again for two more models. This continues up until you right on the castle, at which point the section you are on will be a rather small model (one room, for example) and there will be something like an oct-tree of larger models all around you. This should be used in conjunction with compression of the vertex data to get a handle on the massive amount of geometry that can be rendered this way. Holding this tree of models in a scene graph is somewhat of a challenge. 3. Static triangle-list VIPM, aka fully-shared index data. This is a very new development in VIPM, and it's unclear whether it's really a good thing. There's a full mail on this below, I'll just briefly describe it. The idea here is to order and duplicate your index data in such a way that you get a giant chunk of index data which is fully shared between all instances. While the data is larger, that's quickly compensated before if you have many instances of the same original mesh. Vertex-cache behaviour is somewhat dubious. On to the details ... //}{---------------------------------------------------------------------- // good links from Brian Marshall : Efficiently Computing and Updating Triangle Strips for Real-Time Rendering Jihad El-Sana, Francine Evans, Amitabh Varshney, Steven Skiena, Elvir Azanli (Get it from this link, under publications.) http://www.cs.bgu.ac.il/~el-sana/ This is more work like strip, including 'Issues in Integrating Triangle Strips with Multiresolution Hierarchies' which introduces Skip-Strips for maintaining triangle strips as you do edge collapses. Not specific to VIPM as it deals with VDPM and Merge Trees, but well worth a look. Coarse View-Dependant Levels of Detail for Hierarchical and Deformable Models Dieter Schmalsteig, Anton Fuhrmann (Get it from this link, under publications some way down...) http://www.cg.tuwien.ac.at/staff/DieterSchmalstieg.html This includes a neat way of producing chunked VIPM like Charles Bloom demonstrates on his web site (and the discussion here.) The interesting bit is that the authors assign vertices to VIPM clumps and not triangles which allows for some simplification on the boundary. I won't go into details here as the authors do a much better job of explaining it than I can. //}{---------------------------------------------------------------------- // my mail on "stripped VIPM", aka "just leave the degenerate triangles" Date: Wed, 16 Aug 2000 23:20:05 -0700 I just read the paper "Skip Strips" by El-Sana et.al. which seems to be a much-overlooked work on VDPM. I found it as part of the course notes in the simplification course on the Siggraph 2k CD. It's an interesting read, though quite archaic. They make some clever data structures to speed up the inherently slow algorithm of VDPM. Anyway, it gave me ideas about how to strip VIPM. In fact, it's so easy, it's ridiculous. Generate your vertices in collapse order. Create an optimal Indexed Triangle Strip over those vertices, using duplicated vertex indices to make it one single strip over the entire mesh. When you do a collapse, do it just like normal VIPM, except that rather than changing indices in a list of indexed triangles, you're changing them in a list of indexed strips; it's just the same operation: IndexList[ vsplit->changeIndex ] = newVertIndex; You also can't reduce the number of triangles by two, because the two collapsed triangles may not be at the end of the list, and their verts are needed to continue the strip on either side. After you collapse some number of verts, you switch down to a new strip which is optimized on that number of verts. This takes a cue from Tom Forsyth, and something like power-of-2 sets of triangle strips are recommended. For example, a mesh with 5000 triangles at full LOD might have strip lists for {5000,3000,2000,1000,500,250,125} triangles. Now note that the memory usage of this scheme is almost identical to that of normal VIPM (!!). If you use power- of-2 triangle strip list changes, then you only duplicate the number of indices stored, and by using strips instead of lists, you roughly half the storage needed. The disadvantage of this scheme is obvious : your triangle count stays the same until you swap down to a shorter strip. In that sence, it's a lot like discrete LOD. The difference is that you're still doing collapses one by one, and reducing the vertex count one by one, and virtually eliminating popping. Finally, what's the advantage? I'm not sure. This method is surely faster than conventional VIPM. On the other hand, I'm not sure it compares terribly well to Tom's idea of stripped prefixes and tri-list suffixes. //---------------------------------------------------------------------- // Tom's response : Hmmmm... interesting. And got me thinking. I have a bit of a knee-jerk response to strips, because of their worse-than-list vertex caching (up to twice as bad). However, since normal VIPM pretty much destroys the vertex cache anyway, this argument doesn't hold much water! I think the fundamental difference this approach takes is abandoning the attempt to ditch collapsed tris as early as possible, which is what forces the re-ordering of the tris that destroys the vertex cache performance of "standard" VIPM. And thinking about it, that's fair enough. This still bins tris, but by making them degenerate, rather than have them fall off the end of the list and not feeding them to the chip at all. So you still incur the cost of sending the indices, but not the clipping, setup costs, etc associated with a triangle. But of course the bandwidth cost of the indices is tiny compared to the bandwidth of the vertices, and if by sending (say) double the number of indices, you can reduce the number of vertex resends (i.e. how well the vertex cache is working) by even a small amount, you win! The rejection of degenerate tris is lightning-fast (done right at the front-end I would assume - three 16-bit compares), and in parallel with all the "expensive" stuff, so doubling the number of times it does that is not going to affect performance in any measurable way I would guess. So having stated that I dislike strips, I see no reason not to switch the algorithm back to using indexed lists. Arrange your lists optimally for cache use, and do as the algorithm below says - make the binned ones degenerate, and move the others, just like a normal VIPM rout. The only difference from the routine that Charles has in his tutorial is that you don't reduce the number of tris, and when generating the change list, you need to include an index change that makes the binned triangle degenerate. Wow - that's pretty easy, and vertex cache efficiency is nicely maintained all the way through the collapses. A few points that need mulling over: (1) Using indexed lists, you need to make more index changes per collapse than for strips. Three times as many, I think - not 100% sure. This means slightly larger collapse information, slightly more of a cache hit. However, the collapse info is linear, and so very cache-friendly, and since the aim is to be vertex-cache friendly with our tris, the indices that need changing are likely to be close together as well, or at least in just two parts of the index list, so again fairly cache-friendly (remember - it's stepping outside the cache _once_ that hurts - subsequent accesses to that area are very cheap). So actually not much more expensive than strips I think. And better vertex cache performance (using the "fractal gasket" traversal method that I still can't figure out for arbitrary meshes). (2) Once a tri has been made degenerate, does it need to be involved in further collapses? For example, once a tri 1,2,3 has been made into 1,1,3, do we need to update it when a subsequent collapse bins vertex 3? On some hardware, the degenerate rejection may be done right up-front on the indices, and that triangle never makes it any further, so even though it references vertex 3, because no non-degenerate tri ever references it, vertex 3 is never loaded. However, some hardware may not do the rejection until after the vertex cache fetch, which means that vertex 3 gets fetched, and possible T&L'd, even though it is then never used. If the latter, then you may be better off doing slightly more index edits to "fix" already-degenerate tris so that they only reference used vertices, and having more collapse info. Tricky - I can't decide which option has the worse worst-case performance. Maybe having both versions and runtime testing at load time would be the best option. Compared to "my" two-part method, I think it's a lot simpler. The more I think about the two-part scheme, the more I spot gotchas, and tris that need including in the "slow" pool, the more fragmented the remaining "static" tris get (more fragmentation = less vertex-cache coherency). The advantage that my two-part method retains is the ability to share lots of the index data between instances of objects, which will be a definate advantage in some cases. But it is more complex, and my feeling is that it will achieve poorer vertex caching than this new method. //}{---------------------------------------------------------------------- // good message from Tom Forsyth on idea for optimizing VIPM // for data sharing between instances and optimal vertex cache behaviour. // Date: Thu, 5 Oct 2000 17:41:58 +0100 Both these questions have been looked at. A way to get better vertex cache coherency was suggested by Charles, after a paper on "skipstrips" (I seem to have lost the reference to it, carelessly). You take your high-res mesh and "stripify" it. In fact, there's no reason you can't use indexed lists rather than indexed strips, but the idea is the same - you order your tris to get the best vertex cache coherency. Then, when you do a collapse, instead of trying to get rid of the thrown away triangles, you just make them degenerate. This is easy enough with both lists and strips - in fact the actual runtime VIPM algorithm doesn't need to change at all from the original VIPM rout to do either of those, except that it never reduces the number of tris being drawn. The degenerate tris are rejected very very quickly by hardware (both T&L and not). Note that the vertices are still in collapse order, and get reduced as normal, it's just the number of tris that get sent doesn't change. Obviously, at some point the number of degenerate tris gets too big, and the bandwidth consumed by them grows unacceptable. At this point, switch another version of the index list that has the same number of non-degenerate tris, but has been re-ordered to the optimal vertex cache order, etc. So it's sort of like static-LoD levels, in that you have multiple versions, but you can gradually collapse the higher-tri version to be geometrically identical to the lower-tri version, at which point you switch invisibly. However, this doesn't help with multiple-instancing overheads. This has a partial solution that I thought up, but it seems quite complex to implement. First you make multiple copies of the index lists at different LoDs, as above. Say for argument's sake, each level has twice the number of tris of the previous one, but there's no reason to have any particular granularity - use what works best. Anyway, at preprocess time, each of these levels is collapsed to the level below using standard VIPM methods. Then the original tris are classified into four groups: (1) Those that are removed and modified by edge collapses (i.e. first they get modified by one collapse, then they get removed by another). (2) Those that are removed, but not modified before that removal, by edge collapses (for example, the tris removed by the very first collapse falls into this category). (3) Those that are modified, but not removed by edge collapses. (4) Those that are unchanged by edge collapses. Then reorder sets (3) and (4) according to optimal vertex cache behaviour, and (1) and (2) according to collapse order. Then you assemble two separate lists of indices: List 1 can be shared between instances, and is composed of set (4), then set (2). The length of this changes as edges collapse, but the contents are not actually changed, which is why they can be shared. List 2 cannot be shared, as it is composed of set (3), then set (1). The length of this changes as well, and the contents are also modified by collapses. So this cannot be shared between instances. Now you might think that the size of List 2 is quite large if at each stage you bin half the tris, and you'd be right. But doing fewer collapses between "layers" seems bad - you generate more layers, and make the memory problem worse, surely? But it's not true, because the size of list 2 shrinks much faster, and since this is the one that gets replicated between each instance, that is what really matters. So finer layers are better (up to a point of course). Unfortunately, this does get worse vertex cache coherency than the skipstrips, since they are split into four separate lists, only two of which are ordered vertex-cache-coherently. The other caveat is that the above requires two render calls per object - one for list 1, and one for list 2. But they share vertex data. Onmost software T&L pipelines, this is inefficient, so you need to solve that (using ProcessVertices on D3D - I don't know if OGL has an equivalent). But I can't see a way around that and still share some of the index data between instances. On hardware T&L, it's not a problem - vertices are T&L'd on-demand, so aside from the poorer vertex cache coherency, there is no extra penalty. You could unify sets (1) and (3) (the non-shared index list) into a skipstrip-like-thing to get better vertex caching. Actually, that would work quite well. But you can't do that with set (2) because it relies on _not_ needing modification to bin tris - you just change the number of indices you send down, whereas a skipstrip bins tris by making them degenerate. You could try merging set (2) into list 1 and allowing skipstripping (because list 1 is not shared), but that removes a whole bunch of tris that could have been shared between instances, and now aren't. However, it probably does get better vertex cache coherency. The other factor to consider is that some architectures (Dreamcast, PS2, GeForce1) don't deal with indexed lists all that well - they would far prefer to be fed indexed strips. In their case, making as much as possible into skipstrips by merging sets 1, 2 and 3 into a skipstrip, and set 4 into an indexed strip (that doesn't need to actually "skip", since it doesn't change) will be a benefit. However, two of these are consoles, and therefore probably need the memory even more badly than the PC+GeForce solution. Life is harsh... //}{---------------------------------------------------------------------- // efficient VIPM for duplicated models // this is my idea to have lots of static index lists // that you just slide a window around on. As the window goes // to the right, old tris get dropped off the left and new tris get // added on the right. In the middle is a static chunk of triangles // with swiss cheese structure. To build the PM data : Start with a triangle and vertex list. Choose a vert to remove (V_i) and a vert to collapse it onto (V_j). Find the 2 triangles which have V_i and V_j on them. Move them to a new triangle list, the "Prefix List". Find the N triangles which have V_i but not V_j. Remove them from the main triangle list, and add them to the back of the "Prefix List" and also copy them to another list, the "Suffix List" but replace the V_i with V_j in the Suffix list. Now add the collapse info : it's just an error and the number N. Now, you can find the next collapse and do this again. Note that when you find the next collapse, you consider only edges which are in the main triangle list, that is not in the Suffix List or the Prefix List, or bordering those lists. The result at the end is three triangle lists : a Prefix List, an Uncollapsed List, and a Suffix List. These are concatenated together to make a master list, and you remember where the sub-lists are. When you start out the PM (no collapses) the Head of the active triangle list is at the start of the Prefix, and the Tail is at the end of uncollapsed list. Like this : |Prefix+++++++++|Uncollapsed+++++++++++|Suffix+++++++++++++++| ^ Head ^ Tail which means you are rendering the original mesh. Note that if there are M collapses, the Prefix is 2*M triangles larger than the Suffix. To do a collapse, you look in the next collapse record and get the number N of triangles to move. Then you move the Head up by N+2 and the Tail up by N. This removes your two triangles that you collapsed, and changes the triangles which referred to the collapsed vert. Vertices are stored in collapse order, just like with normal VIPM, and you remove the vertices simply by decrementing the vertex count after each collapse. The cool thing about this is that you never modify the index data, so it can be const (ergo, locked in a DX8 Index Buffer!). That also means that you can shared index data between object instances, they just have different Head and Tail positions. Note that the memory use of one list is about the same as normal VIPM, because you've got a much simpler collapse structure, but you've duplicated lots of triangle indices to get it; in fact, I think it's probably quite a bit more. Normal VIPM can do a collapse in around 12 bytes; this requires about 4*3*2 = 24 bytes (4 triangles in the suffix, 3 verts per tri, and 2 bytes per index), so the memory use is roughly doubled. The crappy thing about this method is that you cannot do a collapse which is even in the neighborhood of another collapse, much less on the same vert again. The result is that you can only do a fraction of the possible collapses. After you do the full set of collapses possible, you are left with a new mesh, built from (Uncollapsed+Suffix). You can then repeat the build process on that new mesh to get you down to even fewer triangles. You probably have to do this several times to get all the collapse you want. I estimate that in each pass, you can do about 1/8 of the triangles. That means after two passes, you're down to (7/8)^2 of the original mesh size. To get down to 1/50th of the original mesh triangle count, you would need 30 iterations. The memory use is thus multiplied by 1 + (7/8) + (7/8)^2 + ... up to (1/50) , this sum is about 8. So, at run time you've got 30 of these lists. You have a VIPM_Definition : { Prefix/Uncollapsed/Suffix Triangle List * 30 Collapse Info * 30 Vertex Buffer * 1 } and a VIPM_Instance : { int (which of the 30 lists am I using?) int Head and Tail int (which collapse to do next) int (number of active verts) } Certainly, the instance memory use is pretty sweet. The Definition uses about 16 times the memory of a "traditional" VIPM structure (not counting the vertex buffer). The advantage is that everything in the Definition is read-only, so can be shared. If you have more than 16 instances of a given object, this is probably "a win". The restriction on collapses imposed by this method (that you can't collapse a neighbor of a triangle which has been recently collapsed) is really not a terrible thing, except in the cases where your mesh starts out in a really silly condition (eg. lots of nearly duplicated verts in ony small area). It's not a nice thing, though, because a good use of VIPM in games is to make really high-poly faces for characters, since that's only important when you zoom in on the character (during conversations or cut-scenes, for example). This method is also somewhat related to the method of "optimized prefixes" that Tom and I have been talking about. We end up duplicating and sharing index data in a similar way, though this method is even more extreme about the duplication and sharing. Also note that the index lists in this method can be somewhat optimized. Within each (N+2) chunk in the prefix and each (N) chunk in the suffix, you can reorder triangles at will to improve vertex cache performance. Also, within the (large) uncollapsed chunk, you have complete freedom to re-order the triangles. This method can also be done with strips, as with any VIPM, simply by using strip indices instead of triangles indices, and storing a number of indices to skip instead of a triangle count. Intereseting idea. Perhaps I'm wrong about how restricted the collapses have to be, but I can't see how you can do a collapse involving a triangle once it's already in the suffix or prefix. //---------------------------------------------------------------------- // Tom's reply : You're right. The original author did mention it, but I don't think he realised how many triangles were affected by a single collapse. I partially warm to this method - it is quite cool in that there is no dynamic data at all, which may be superb for some architectures. Certainly the most irritating thing about current DX8 implementations on DX8 is that every time the index buffer changes, they need to revalidate the data to ensure there are no out-of-range indices. I'm all for secure OSi, but it really stomps on most VIPM routs. Anyway, since the IB is static, the validation can be done at start of day and never again, which is nice. This still doesn't give terribly good vertex cache efficiency though, since tris are stored in collapse order. Actually, it's still better than conventional VIPM, since the patch of tris that is affected by a collapse is stored together, and since the whole of this patch is removed or added as a unit, the internal order of that patch (usually 5-6 tris) can be made optimal. So it's lower efficiency than optimal, since the next patch is likely to be somewhere completely different, but it's not terrible. Charles hoped that the central bit - the bit that is not affected by the collapses - can be in any ordering you like, and so can be made optimal. But there are two gotchas here. First, because of the restrictions on patches - that they can't overlap - I would expect this region to be very small. For a single "LoD layer" (i.e. a single index list, before you switch down to the next version of the mesh), I would expect the algorithm to do almost all the collapses it can. This means that there will be very little left of the surface that is not affected (because if there were, a collapse would have been done there). So the number of tris in this region is not very big. The diagram is less like this: |Prefix+++++++++|Uncollapsed+++++++++++|Suffix+++++++++++++++| ^ Head ^ Tail and more like this: |Prefix++++++++++++++++|Uncollapsed|Suffix+++++++++++++++++++| ^ Head ^ Tail The second gotcha is that this unchanged region is spread out all over the mesh. They are the bits left over after the roughly circular "bites" of affected tris are taken out of the mesh. So they're not going to be very well connected - it's going to look like a slice of Emmental (http://fc-net.fr/CPPR/2femmental.html), and so your vertex cache efficiency is not going to be very good. Using strips is not going to help much, since the "patches" are basically fans, and doing fans in indexed strips is pretty inefficient - you double your triangle count. And then (because successive patches are almost inevitably not connected) you have to link them together with four degenerate tris. So for a fairly typical 6-tri patch (that collapses to a 4-tri patch), it looks like you need around 13 indices for the 6-patch (including the 4 to lead to the next patch), and 11 for the 4-patch. A list would take 18 indices for the first, and 12 for the second, so the difference is pretty small, but it does look like it's slightly smaller. Messy though :-) Anyway, so the vertex cache efficiency is not too bad, but not great. But that's about as good as you can expect of VIPM, sadly. It strips sort-of-OK. The biggest problem is the restrictions on which edges can collapse when, but maybe for real-world meshes this may not matter all that much - I guess we'd need to test. The data sharing is superb, the static memory use is higher than normal VIPM, but if you have high instancing, this will probably be more than made up for by the amount of sharing. When generating these collapse orders, I suspect a good thing to do is to do as a standard VIPM, but mark edges as "I can't collapse this without making a new LoD level". Normally you would collapse the edge with the lowest error associated with it each time, but if one of these special edges has the lowest score, in theory you need to start a new LOD level. However, you could keep collapsing other edges until you exceed the lowest error by a certain (small) amount. So you allow the collapses to get slightly out of true VIPM order to reduce the number of LoD levels. Since VIPM is an approximation anyway, this might be quite tolerable. Hmmm... started writing this reply not to liking this method (probably a case of NIH syndrome). But assuming the restrictions on collapses don't actually restrict it too much, it might actually be a cunning thing to do. Now I just need to find the time to compare it against skipstrips and ... er ... whatever we decide to call the other sort that I talked about in the previous email (as (apparently) the inventor, I guess I have to think up a snappy name with a good acronym). //}{---------------------------------------------------------------------- // musing on chunked backfacing inspired by Nicolas Gauvin // this could be useful for PM of very large models Hi Nicolas. I think that technique is a win if you can also do your lighting in model space (which you can do iff your xforms are made only of orthogonal transforms and uniform scalings; eg. no shears or unequal scalings). If that's the case, then you can do all lighting and backfacing in model space before you transform. The disadvantage is that you have to make a pass over the triangles before rendering. My view-space pipeline is like : transform and prep all verts for all triangles if not backfacing, light all verts and render I guess the model-space algorithm would just walk all the triangles and do : for all triangles if backfacing, continue for three verts if not processed, light, transform, and clip-prep render triangle This does kill the nice linear walk along the verts, so it hurts cache, but it's still quite nice and tight. Definitely an alternative worth testing. It would probably win if you could sort your verts into "almost strip" order (eg. very good on the cache so that those three verts of your triangle are nice and local). For example, indices like {0+1+2,1+2+3,2+3+4} would be great, while indices like {931+0+271,100+930+1,...} would be awful. Another thing you lose is xforming all your verts in one pass without any if's , which is really nice on modern CPU's (it lets you write a tight SSE/3dnow loop). Even better would be if you grouped your triangles into chunks with similar normals, then you could use a Gauss Map (which is basically a sphere in 4d) to do a gross cull of many triangles that are back-facing. Of course, this is only an advantage for very high poly models where you'll be able to cull many polys in a group, but on X-Box you'll have 10k poly characters, right? At 09:51 PM 8/25/2000 GMT, Nicolas wrote: >What is your opinion on doing backface culling in model space? As opposed to >screen space. > >Basically you have a precomputed normal for each of your polygons in model >space. You just have to transform your camera position to model space and do >a dot product with the plane normal to find out if your polygon is back >facing. This avoids having to do any transforms,lighting or clipping on >those back faced vertices that were rejected very early on at very little >cost. The drawbacks that I see are higher memory requirements and possible >back facing artifacts due to round off errors once things get transformed to >screen space. //}{---------------------------------------------------------------------- // random idea on how to do VIPM for procedural terrains // (or very low memory use VIPM for static terrains) Hmm.. this gives me an idea on how to do CLOD for infinite terrains. First, it is very hard to apply traditional CLOD methods (like ROAM or the VIPM-terrain ideas) to near-infinite terrains when you're moving fast, because there's so much data, you have to page in large amounts. Now, I know some people think this is feasible, but I don't think it can be done in a guaranteed time of less than 5 milli- seconds per frame, which means that you can't run at a solid 60 fps, which is a must for me. So, how do we do it? You use the VIPM implementation I described a week or so ago, in which you pre-compute a bunch of triangle lists with a prefix and suffix set, and you just slide your active set around in that list. This collapse data can be shared among multiple instances of the object. So, take your terrain, and cut it into 32x32 heightmap chunks. You've pre-built a VIPM for a flat 32x32 grid, using the technique aforementioned. For each chunk of the terrain, you have a unique VB containing the actual vertex data for that chunk, and you have the current state of reference into the shared VIPM data. When you page in a new chunk, all you have to do is load in the 33*33 (= 1089) heights of the heightmap elements (2.1k bytes), and you make it refer to the lowest detail state of the VIPM data. Of course, if your terrain is procedural, then this works great... Of course, you have the usual problems in VIPM terrain at chunk boundaries; either leave the edges at full detail, use quad-trees of collapse data, or just use the "flanges" technique (probably best). Also, the collapses for the chunks are all the same, which isn't perfect. You would just build the triangle collapse order to evenly distribute the collapses over the chunk semi-randomly. This can be partially improved by having a per-chunk bias. Each frame, you determine how many polys you want in a given chunk by using a distance metric; you could bias this with the precomputed per-chunk bias factor to give you more polys on the detailed chunks and fewer on the flat ones. //}{---------------------------------------------------------------------- // EOF