Advanced Techniques in Shadow Mapping Charles Bloom & Phil Teschner 5-26-01, 6-03-01 -------------------------------------------------------------- This is a description of some of the advanced shadow mapping technology used in Munch's Oddyssey, and some thoughts on shadows in general. Many of these ideas and most of the hard implementation was done by Phil Teschner. The original idea and implementation of the shared render target was by Steve Lacey. I won't describe basic shadow mapping in detail. See my earlier article, or many other articles on the internet. -------------------------------------------------------------- 1. ADVANCED SHADOW MAPPING Shadow mapping is the most practical current shadowing technique. It's naively very fast, and has a very low cost of application (eg. it extends well to large worlds with few shadowing objects, unlike many shadow algorithms which have a large cost per object applied-to). So, for Munch we wanted to cast shadow maps from every dynamic character in the game. If you just take a standard shadow map algorithm and do this, it's very slow (we decided that 16 was a reasonable maximum number of shadows, and we would frequently be making that full number). So, I'll go through all the little things we did to make it faster. A. SINGLE, SHARED RENDER TARGET The first thing you'll see if you make 16 shadow maps and try to update them every frame is that the cost of switching render- targets is astronomical. Presumably this is because you're doing lots of mesh rendering and killing your parallelism between the CPU and GPU (because the render-target switch is a hard stall). On NVidia hardware there's some additional complication related to linear vs. swizzled render-targets, so you may find an additional cost because of that. You can do an experiment and just render your shadows to the back buffer before your scene - you'll see the cost go very low, as in our case the CPU is the bottleneck as long as we maintain parallelism. So, the solution to this is not do so many rendertarget switches. That's accomplished by rendering all the shadows to the same shadow map. This shadow map is large, with little portions allocated to each shadow. In our case, it's 128x2048, with 128x128 shadow maps (it's non-square due to bizarre stride restrictions). As usual, you render your shadows to a smaller viewport (126x126) to gauranteee a one pixel white boundary. The problem now is how to use this big texture? Well, on some nice hardware like the PS2 you can use a sub-rect of a texture as your texturing source with "region clamp" mode. Unfortunately, D3D doesn't expose this. The only you can do is use textures allocated in place on the larger one; you can do this on the XBox, and I believe you could do this in Glide, but you can't do it in D3D, alas. Assuming you've got some API that lets you do this, you just create your shadow-map textures as textures that refer into this single larger one. The result is that you only ever use two render targets per frame : the shared shadow target, and the back buffer. B. STAGGERED UPDATE OF SOME SHADOWS The next obvious optimization is to not generate all the shadows all the time. We chose to generate the player character shadow every frame; other characters get their shadows updated every second frame or even less often if they're in the distance. The error is quite unnoticeable as long as your frame rate is greater than 30 fps. The result is that instead of generating 16 shadows every frame, you make an average of 4. C. GENERATING ONLY THE VISIBLE SHADOWS You obviously don't need to generate shadows that you won't see. You can't just do this by not generating shadows for characters that aren't visible, since their shadow might come onto screen. You can however, cull the bounding volume of the area of affect of the shadow. This is just the frustum of the camera that was used to generate the shadow map (the virtual camera placed at the light source). This frustum has its near clip plane at the shadow map plane, and can have a far plane based on the distance fade-out of the shadow (see later). In Munch we actually approximated this frustum with an OBB, and a sphere around it for acceleration. General note : whenever you do complicated bounding tests, it's always good to start with a sphere around your bound to give you a first level of quick rejection. So, the result is that you test this shadow bound against the real camera frustum, and you only need to generate shadows whose bound is visible. D. APPLYING ONLY TO NECESSARY SURFACES The shadow bound can then be used to apply the shadow map only to surfaces it touches. Whenever rendering an object, you just test its bound against the shadow bound, and only apply the shadow map if they intersect. This lets your normal rendering of the objects be a single pass, picking up effect maps as necessary. The point of these optimizations is to reduce the number of shadows applied to a given surface; additional shadows can result in extra render passes, which is a big expense. E. DISTANCE FADE-OUT It's useful to make your shadows fade out in the distance. It makes them look better, because it does a sort of fake simulation of the fact that in the distance other light than the shadow caster are contributing illumination. It also allows you to put a far clip plane on your shadow frustum, which limits the number of things you project onto. It also reduces the anomalies due to projecting through walls and such. You can implement distance fade out in a few ways. You could do it with a per- vertex computation in a vertex shader. You could also do it using the trilinear mip-mapping hardware of the GPU (simply by giving your entire shadow map a white mipmap in the second level); this technique is pretty cool except for the memory waste of making a big all-white texture. You could also do it using the clipper (see below). F. CLIPPING BACK-SHADOWING ONLY WHEN NECESSARY One of the big anomalies of shadow mapping is that the projection is bi-directional (eg. shadows are projected behind the light source). Now, applying using the shadow bound actually gets rid of most of these problems, but a few remain. You still get problems when you stand on a single large convex object which is both in the shadow area of effect and in the backwards mirror image of the shadow frustum (for example, if you're inside a tube and the tube is all one object, you'll get a shadow on the floor and the ceiling of the tube). The traditional solution to the "back projection" problem is to use a "clipper" texture (the same way NVidia implements use clip planes in D3D). The clipper texture is a 1x2 texture with one black pixel and one white; you implement texture coordinate generation to create a "u" index into the texture such that the z coordinate of the shadow-generating camera gets mapped to u, with u = 0.5 at the shadow map plane. You then add the texel from the clipper to the shadow map before applying it to the scene (with saturation). Since white saturations anything to white, the area behind the shadow map plane always ends up with a white pixel from the shadow map, which is a no-op when you multiply that into the scene. Now, you don't want to do this all the time because it burns a texture stage. What you can do is only use this "clipper" when you're projecting onto one of these objects which lies both in the area of front-projection of the shadow and also in the area of back-projection. If your game objects are generally close to being convex (as they should be), this case is rare and results in only a small performance hit. You can also do the distance fade-out using the clipper. Instead of making a two pixel texture, you make a gray-scale ramp which is black just in front of the plane and ramps up to white in the distance. By scaling the z->u texgen you can control the fade distance. -------------------------------------------------------------- 2. MORE POSSIBILITIES A. LOW-LOD SHADOW MESHES It's a common suggestion to use a lower-LOD mesh to generate the shadow map. This is an Ok idea, but I think it's not really worthwhile for a few reasons. First, the rendering to the shadow map is nearly free; you're doing a very simple rasterization operation (no textures) which makes it very fast, and you won't hit vertex-rate limit on any modern hardware. Second, if you use the same mesh that you'll render, you can do skinning only one (if you do it on the CPU). In the future, good character animation will consist of lots of weighted bone blending and vertex tweening, which you'll want to do only once, and use for both your shadow maps and your normal rendering. If you're rendering high poly meshes and not using the CPU for skinning, then low LOD versions is certainly the way to go (which is true in Munch). B. NOT APPLYING TO BACK-FACING POLYS The "clipper" discussed above does not solve all anomalies. Shadows still project onto back-facing polygons in the forward shadow frustum. I believe that this is a problem which cannot be easily solved. The fundamental reason is that "back facing" is a per-polygon concept, while all the shadow mapping work is done in texture coordinate generation *per vertex*. For example, consider projecting onto a tetrahedron. You'll typically have 3 faces which are toward the light and should receive the shadow, and one pointing away. The problem if that once you've made texture coordinate for the 3 front-facing faces, you've touched all your vertices and you have no degrees of freedom left! Obviously, you can get around this problem by not sharing any vertices between triangles, but that's an unreasonable solution. C. FILTERING You could filter the shadow maps to make them "softer". One technique is to use the hardware to blur them, by rendering them back to a texture, offset by 0.5 texels; this makes the bilinear filtering hardware produce four-pixel averages. Another method is to do it by hand. This could be done either with a CPU pass over the texture, filtering in place, or simply by writing your own rasterizer to generate the shadow map and building filtering into the rasterizer. This is actually not as insane as it sounds, since on PC's the CPU is typically very fast, and this lets you avoid all of the render-to- texture stalls. Of course this only works if your rendertarget is reasonably small (fits in cache), and because you're rasterizing in grayscale and not doing any texture reads. -------------------------------------------------------------- 3. PROBLEMS AND OTHER TECHNIQUES Shadow mapping with these enhancements is pretty great, and I think it'll be the technology that most games use for several years; it's the technique at the "sweet spot" of current hardware. There are some big problems with it, however. A. NO SELF SHADOWING Self shadowing is hugely important to realistic visuals. Have a look at any tree, and you'll see an incredible complexity of lighting due to partial transparency of leaves and self-shadowing. There are other shadowing techniques which facilitate self-shadowing, namely stencil shadows and shadow maps with depth. Both of these have severe problems of their own. Stencil shadows are hard edged and cannot be distance- limited. They require massive CPU effort to find the mesh silhouettes, and also massive GPU fill rate. Despite these, stencil shadows are intereseting if you choose to have a stylized shadow look and to spend much of your time and clock rate on shadowing (see Malice and Doom 3). Shadow depth buffers suffer from severe precision problems. Both of these techniques may be used in conjunction with shadow maps in a hybrid technique : use shadow maps for application onto other meshes, and use one of these techniques for shadowing yourself. The hybrid technique probably makes the most sense with shadow depth maps, since one you generate one of them you've got your shadow map already. B. PIXELATION Shadow maps are pixelated. Ideally, your shadow map would be a "virtual texture" that remembered the polygons used to generate it, and rasterized then in screen space (not texel space) as fragments were needed. This isn't possible, so you get texel stretching. One thing you could do to improve this is to blur the shadow map by render-to-texture convolution. C. FUNDAMENTALLY NOT PHYSICALLY CORRECT Any shadow technique which is applied as a darkening pass after all lighting is just wrong. If a building is casting a shadow and I walk into it with a torch, the shadow should be brightened up. This can only be done correctly by accumulating the lighting and the shadows one by one. That is, you must do your rendering like this : Set FrameBuffer = 0 For each light, add to framebuffer : Diffuse Lighting per-pixel, modulated by shadows from that light Modulated by Diffuse texture Specular Lighting per-pixel, modulated by shadows from that light It's possible that this is practical on modern high-fillrate parts if you are careful to make virtual lights which average the contribution of several, such that you never have more than about two lights affecting an object. The challenge with virtual lights is making adjacent objects match lighting at shared vertices. This lack of physical correctness results in a few very noticeable anomalies : shadow maps which fall onto eachother produce extra-dark areas where they overlap, and shadow maps project through walls onto back-facing surfaces (this would never be a problem if you did accurate lighting). Note that this double-darkening can actually be "fixed" by using the stencil buffer to record where you've drawn shadows, and simply not apply another shadow where you've already put one; this doesn't work well with shadows that are fading out. -------------------------------------------------------------- EOF