CBLOOM.COM real time 3d : tech rambles


On Valve's lighting - (see the paper from GDC on the ATI web site) -

Hey all, you should think about this more generally, and then it becomes obvious that there become countless possibilities.

Basically, your "lightmapper" is computing the light that can be seen from various texels. The lightmap texel tells you the location of the light sample, but you may look in different directions if you are using a normal map. So, ideally you want to store some sort of spherical sampling of the directional lighting environment that's visible at that location. You put this sampling into your lightmap "texels". Now when you render per-pixel, you do :

pixel's lighting environment = bilerp lookup of "lightmaps" sample lighting environment in direction N

where N may come from a normal map.

Now you have tons of options.

1) The lightmap could just be a scalar. Then it's just the sampling of the lighting environment in the base surface normal direction.

2) The "lightmap" could contain some number of SH coefficients. Then you sample the SH in the direction N (the normal map direction may itself be represented with SH coefficients).

3) The "lightmap" could be a sampling in 3 basis directions (this is what HL2 does). Basically this is an N*L basis function in 3 orthogonal directions. You then sample using N*basis for the 3 bases.

4) The "lightmap" could be a vector for the 3 colors. Then the output color is { vec1*N, vec2*N, vec3*N }

5) The "lightmap" could be like a cone of visibility. Then the output color is A * (L*N) + B , where A and B are colors and L is a direction

6) The "lightmap" could be a little cubemap at each texel and then you use the normal to look up that cubemap (not sure if this is actually possible in any current or proposed hardware).


Basically what you're getting is a nice per-pixel sampling of a precomputed lighting environment, which is direction-dependent. The spatial variation is handled nicely by the bilerping of samples. Note that this means you want to choose a basis that is amenable to linear interpolation!! The angular variation is encoded in the basis functions and looked up per pixel.

BTW I'm talking about the surface-based shading for background static objects here, not the volume sampling for dynamic objects. Sampling on the surfaces of static objects is really a way of compressing the entire volumetric lighting environment by sampling it only on the surfaces where you actually have geometry. Also you are using the base (un-bumped) normal of the object to know which direction is most important to you. You could only sample in the hemisphere in the base normal direction (the HL2 three directions are like this).

The HL2 volume stuff is a similar concept, but you have volume data everywhere in space, and you don't have the base surface to simplify your lighting environment.


Miles Macklin has guilted me into finally actually writing about how you might generate lighting cube maps (see the Galaxy page or the GD-algs archives for more) -

So, for rendering into a cube-map there are 3 techniques you can use :

1) do it on the CPU. This is actually the fastest way to do it on the XBox because there's no texture uploads and no lock overhead, so there are no stalls. You just have a tight SSE loop which just does :

for array index i, get from a table normal[i]
evaluate lighting using normal[i]

and we have the table pre-swizzled and we can walk over all 6 faces at once.

2) use render-to-texture; make sure your textures are set up as linear so they won't get swizzled. Pre-construct a VB which is a grid over the texture with a vertex at each target texel. The vertex xy specifies the texel, and then you have a normal which is the proper cubemap normal. Now just render-to-texture using this VB and evaluate the lighting; you can do something like 4 or 8 lights in a pass, and do more passes if you have more lights than that.

3) same as #2, but use spherical harmonics; first compute the spherical harmonic coefficients for the object using the lighting environment, then render the spherical harmonics to the textures. This can actually be done even faster if you like; you can precompute the images for the spherical harmonic basis functions multiplied by the normals of the cubemap, so to generate the result you just have to evalute a linear combo in your pixel shader :

	out = c1 * T1 + c2 * T2 + ... 

For static objects, I separate out the static lighting & dynamic lighting parts. So, for each static object, there are 2 cube maps, one with all the static lights (which never changes) and one with any dynamic lights affecting that object (which is not used at all if there are zero). This generally means you have to update a lot fewer lights and cubemaps. You can also freeze lights in the distance so that only a few lights near the viewer are actually dynamic, and the result is you have to update very few cubemaps. Another optimization is that nearby objects can share the same cubemap, though that's hard with dynamic objects because you have to handle them coming together and apart; it's quite easy with static objects. So, for example, small decorator objects usually just use the cubemap of the more significant object near them.


How to render in the future :

First render to a 3-channel floating point buffer (call this WP) For each object, output the world-space position as the output value

Now render to another 3-float buffer (call this WN) For each object, output the world-space normal as the output value

Now render to another 3-float buffer (call this D) Output the diffuse color

Now render to another 3-float buffer (call this S) Output the specular and gloss

Make some more buffers if you want more data (such as spherical harmonic coefficients or tangent spaces). Ideally in future hardware you'll be able to output to all these buffers at one, or it could be one big N-component buffer.

Start doing your final compositing passes. This is just a bunch of passes with full-screen quads that take in a bunch of full-screen buffers and layer things onto the output buffer.

{WP,WN,D,S,etc.} -> [pixel shaders] -> output

You do lots of passes here, many (4+) per light. You can do things like soft-shadows with penumbra wedges, because in your pixel-op you've got the world-space position and normal for each pixel so you can do geometric math.

Alpha ruins everything.

Oh yeah, and you can do DOF like this : Once you're done, treat your WP buffer as a *vertex stream*. Render point sprites to the final screen buffer. You make one point sprite for each pixel in the original image. Compute the focus of each point by using world-space position and camera position. Turn that into a size of a point sprite : perfect focus = 1/2 texel radius, out of focus = larger radius. Gives you nice Gaussian blur DOF with no stepping artifacts.

> Mark Lee replies :
>You'd need to send down a vert for each pixel on the screen.
>It would be incredibly slow (and that 64*64 point sprite bug
>would kill you).

Well, yeah, point sprites are fucked on the Xbox, but assuming they weren't, 640x480x30 = 9 M / second should be plenty fast. On future hardware at higher resolution and frame rate 1280 x 1024 x 100 = 131 M / second should also be possible.

If you prefer you can also do semi-Gaussian blends in the pixel shader.

For each pixel : write to A of the color buffer the radius of the blend you want (per-pixel) 0 for perfect focus larger for blurry

Do N passes (N = maximum radius). Take the current image Do a simple box-filter to make a blurred version Blend the two thusly :

radius = image.A
setge lerper, radius  (lerper = 1 if radius > 0, else lerper = 0)
output lerp(lerper,image,blurred)   (selects blurred if radius > 0, else original)
output max(radius-1,0) to A

(something like that; I think my filter theory isn't quite right). Note that repeated use of a simple box filter is a polynomial approximation of a Gaussian.


The way most people think of the Shannon-Nyquist sampling theorems is wrong. Everyone leaves out a key point. Here's a reference . Now, imagine I have some set of samples Sn at some fixed sampling interval. This is all I've got from the "original" signal. Note that this is what actually happens these days, someone makes an audio track or an image in digital samples, they don't make some analog signal with infinite precision. Now, many people assume that these samples represent a signal specified by the lowest-frequency Fourier reconstruction that goes through them - here . That's not necessarilly true. What the Shannon-Nyquest theorem says is that these samples *can* perfectly capture that information, but they may well have captured something else! In particular, if our "original" signal was not low-passed before "sampling", who knows what it was? In fact in most cases there is no "original" and this is all a thought experiment - the sampled version is the "original".

Now, it is important to have this concept of what the "original" should be like. Consider you want to resample the signal of discrete samples Sn, to some new sampling interval and some new offset. You want to know the ideal values for the new samples. Well, the correct answer is *not* the naive Shannon-Nyquist resampling. That assumes that the "original" is a correctly low-passed signal. In reality, you should be using the "original" signal that makes sense for your data.

What is the correct "original" signal? Well, let's consider the case of bitmap images. First, consider photographs captured with a digital camera. The cells of the CCD have done discrete sampling; if you knew the details of their workings you could reconstruct basis functions to reproduce the "original" as well as possible. In practice, they're pretty close to being ideal low-passed signals. So, standard Shannon-Nyquist rules apply. Now, how about hand-painted bitmap images? This case is quite different. Here the artist is looking at the image on a CRT as they work. In this case the ideal "original" is the continuous signal that the artist is seeing on the CRT. So, each sample has something like a 2d Gaussian basis function for how it contributes to the continuous signal. In particular, this "original" is NOT a properly low-passed signal. If you used the naive Shannon-Nyquist reconstruction of the original, you'd get something much more blurry (and with ringing) than you should get. In this case, if you want to process the image or resample or whatever, you should use an "original" continuous signal which is appropriate.

Intuitively this just makes sense, because when the artist hand-paints a bitmap, he is making use of the quantization grid of pixels. He puts strong hard lines vertical and horizontal so that they line up with the sampling grid. He can, in a sense, paint higher frequences in the vertical and horizontal direction than he can in the diagonal directions. If you assume that the samples correspond to a properly low-passed source data set, this information is lost.

To be specific, imagine I have a bitmap. I re-sample this bitmap to a new one which is rotated 45 degrees and twice as big in each direction. Now, Shannon says that this resampled version is just as good a representation of the underlying continuous signal as my original bitmap, *BUT* that is only true if my original bitmap was a sampling from an underlying signal was properly low-passed. In the real world, it is not, and in fact I've lost information by doing this resampling.


When it comes to controls and responsiveness and such, simpler code is better. If you write a big fancy player-control system, the result will be a krufty, glitchy interface. I believe that I've finally realized the sweetest way to do a 3rd person platformer- type control scheme.

First, read the analog stick and transform it into an XY vector in world-space, using the camera matrix. Casey Muratori has the sweetest idea here - we use a non-orthogonal transform, so that sideways on the stick makes you go straight sideways on screen and up on the stick makes you go straight up on screen. These vectors are perpendicular at the center of the screen, but everywhere else they're slightly skew.

Ok, now I have a world-space XY vector with a length. I store a little history of them. I think 3 frames is probably the optimal amount, just use a little cyclic history thingy. Now take the average of those 3. This gives you a low-pass of the stick.

In the common case, all you do is take your player and set his facing along the stick and give him velocity equal to the length of the stick. Super-responsive, super simple, and good. The low-passing has nice properties like causing your velocity to ramp up a bit from a stop, and smoothing your turns, etc. If you want your guy to feel more sluggish you just use a larger history.

You can also do some simple pattern-detection in the history window if you want to make things a bit fancier. Just looking at the lengths of the sticks you can do things like

[1,0,0] -> sudden stop
[0,0,1] -> sudden acceleration
[1,0,1] , [0,1,0] -> erratic
You can also see when the stick flips a full 180 to a special skid-turn-around or whatever.

The next big part is how the animations reflect all this. I'm now a big fan of "animations are late". That is, first you set up the guy's facing and velocity based on this stuff, in the way that feels good. Then you decide what to do with the animations to make it look right. So, like in the normal cases of moving you'd be blending some Stand,Walk,Run,Turn anims. Then you can also detect the sudden stop, sudden accelerate, turn around, etc. cases and play custom anims for those.

The way to get fancy with the anims is two things : 1) pick a blend based on your actual velocity and how much the anims move the guy; you should be able to find a blend of anims that will reproduce your velocity. 2) pick the pose for each animation based on your current pose, so when you start the "sudden stop" you start it from whatever your current skeleton state is (left foot up, right foot up, feet together, etc.)

Unfortunately, I don't get to do this sweetness on our current game, because our main character is kind of unusual, and this type of very-immediate controls wouldn't work.


Simple time dilation for improved animation blending.

First a bit of review. If you have a "Walk" and a "Run", they are made to be played with a duration D_walk and D_run, and over that time they will move the guy at a velocity V_walk and V_run, so the overall translation for each is D_walk * V_walk and D_run * V_run.

So, if I want to move the guy at some velocity V which is between V_walk and V_run, I have several choices : 1) just play Walk faster, 2) just play Run slower, 3) adjust both speeds and blend, 4) adjust neither speed and blend. We'll look at #4. We do actually need to adjust the speeds a bit to normalize the durations.

So, let's make a lerper ,

t = (V - V_walk)/(V_run - V_walk).

This tells us where we are between Walk and Run; eg. t=0 is Walk, t=1 is Run. Now, we choose our duration just by lerping :

D = lerp(D_walk,D_run,t);

Next we adjust the speed of both Walk and Run so that they will play in the desired duration :

Speed_Walk = D_walk/D
Speed_Run = D_run/D

Note that typically D_run < D_walk, because it's a faster gait, do D_run < D < D_walk, so Speed_Walk > 1 and Speed_Run < 1.

Now that we've matched durations on these guys, we just blend them. Talk to Casey Muratori about how you blend two animations, it's pretty well known.

Now, this is all well and good, but the problem remains that even though we have scaled the durations to match up, the gaits of Walk and Run may not match up well. Most importantly, the moments at which the feet are on the ground may not match up. If you just blend anims when they don't match nicely like that, you get funny stuff. So, we can fix this. First, we require the artists to give us a bunch of foot-event info for the animations. They put in "text keys" for "LFoot_down" "LFoot_up", and the same for RFoot. Between the "down" and "up" time the foot is assumed to be locked to the ground. Now, we interpolate the time of these events just like we did the duration :

Time of blended LFoot_down = lerp(LFoot_down_Walk,LFoot_down_Run, t);

and etc. for all the events. Note that this methods does NOT support the events being in different orders in the two animations!! So, for example, you cannot use this for a Horse or some animal whose feet change patterns in different gaits.

Finally, we do time-distortion on each animation to find out which frame to use for the blend. We have our blended duration, and the play pointer is somewhere in there. We look and find which events the play pointer is between. We find the fraction of where we are between those, and go use that same location in the source animations.

Specifically, imagine we are between LFootU and RFootD in the blended times.

Time of blended LFootU = lerp(LFootU_Walk,LFootU_Run, t);
Time of blended RFootD = lerp(RFootD_Walk,RFootD_Run, t);
Current time T is between them, and 0 <= T <= D
Interpolator = (T - blended_LFootU)/(blended_RFootD - blended_LFootU);
T_in_Walk = lerp( LFootU_Walk, RFootD_Walk, Interpolator );
T_in_Run = lerp( LFootU_Run, RFootD_Run, Interpolator );
Sample Walk anim at T_in_Walk
Sample Run anim at T_in_Run
Blend the two samples

So, that's it. The CPU cost is very low, not much more than ordinary blending. You get perfectly locked feet, no sliding, and you get gaits that mix right. The big disadvantage is that it really only works for bipeds, because of the fact that gait pattern changing is not supported. I hope to cook up a simple demo for this soon so you can see the difference between doing this and not doing it.


BRDF for leaves. Here's the idea : cook up a simple fake BRDF which simulates some of the properties of leaves. The idea is to get a tree which actually looks decent. You want your lighting equation to have certain properties :

1. If the viewer and the light are on the same side of the leaf, you have basically a plastic-like surface; it gets more specular at grazing angles (fresnel effect).

2. If the viewer and the light are on opposite sides of the leaf, it is still being lit; it's transmissive, so it's brightest when the light and the viewer are directly opposing each other.

To make a real tree you would also need to encode some occlusion info, so that the interior leaves were somewhat shadowed by the outer ones. Real shadow techniques like stencils are horrible here. Some like spherical harmonics that encodes a bit of directional occlusion information would be appropriate.


I heard of a super-brilliant technique. It's really obvious and I think I even talked about it here before, but what the hell, here it is a again.

Build your lightmaps with radiosity. Subtract out the first bounce (direct illumination). Now, you're putting the first bounce back in with runtime direct illumination and shadowing.

The result is that with no dynamic objects around, it looks just like a full radiosity solve. If there are dynamic objects, they affect the direct illumination, just not the higher orders, and it pretty much looks fine.

This is actually really similar to my note below :


Continuing your L*N past 90 degrees is a good thing, like this :

ModifiedLambert = Max(0, L*N * (1-softness) + softness );

The problem is that this doesn't work with shadowing. The reason is that past 90 degrees, the object will always be self-shadowing, so those points will go to black (from that light, I'm not counting the ambient or any other lights here).

The answer is to break a light's contribution into two parts : the "direct" part which is shadowed, which is just like L*N , and the "indirect" part which is a sort of hacky radiosity. The indirect part is not shadowed. The values are just :

Direct = Max(0, L*N);
Indirect = Max(0, ModifiedLambert - Direct );

The result is that in normal situations you still get the nice soft continue- around-the-end feature, and it works with shadowing.


Idea for depth buffering in a space game :

As a brief reminder, there's a problem with traditional depth buffers for space games because you can have some huge extremes of Znear/Zfar , and your precision can get boofood. So, here's the proposal : First, sort all your object back-to-front. So, you're doing a sort of "Painter's algorithm" for object-object depth compares. This means that you can't handle intersections of planets or anything like that, but that shouldn't be happening. Count your objects. Now, take your Z buffer depth range, say 2^32 (for a 32 bit z buffer). Allocate this evenly to all your objects, so each object gets a range for (2^32/Count). Clear the Z-buffer now. Render the farthest object with ZNear/ZFar set to be just covering the extents of the object, and scaling to write Z's into the hardware buffer in the first (2^32/Count) range of ints. Render the next object in depth, but use the next bit of integer range. This way you can draw all your objects and they each Z-buffer independently, without needing to clear the Z buffer between each object. The final result is that objects use "painter's algorithm" to depth-compare against each other, and use a Z buffer to depth-compare against themselves. Enhancement : allocate the Z-range non-linearly, so larger, closer, more complex objects get more of the Z range. Thanks to Marc Hernandez for the base idea.


We've found Journaling to be reasonably easy.  I find it really useful
to use little helper functions that wrap anything that you're not sure
will be identical in the playback session.  For example :

uint64 Clock::GetTicks()
	uint64 ticks;
	if ( Journal::IsReading() )
		ticks = RawGetTicks();
	if ( Journal::IsWriting() )

	return ticks;

These kind of functions can be called anywhere at any time from the client
code, and it just magically works with journals.  Very nice.  You can do
similar things for stuff like "Is this sound playing", or "Was this render
test occluded".  Of course you have to realize where you need to put these


Interpolators in 3d graphics are pretty interesting.  I did a
bunch of math for Mikes Sartain & Abrash about the hyperbolic
interpolators in 3d.  This is all pretty simple stuff, but it's
useful to have it clear.

The goal of most 3d interpolators is to linearly interpolate
things in *world-space*.  The point is that if I have some
triangle with colors on the vertices, then the color at any
point on that triangle is a function only of the world-space
position (eg. not view-space position!).  We want that because
it makes it look like the color is locked to a certain piece
of the triangle, eg. it doesn't "swim" when the camera moves.

Now, in order to get linear interpolation in *world-space* you
need hyperbolic interpolation in homogeneous projected space.
In practical terms, that means if you linearly interpolate
the world-space values divided by "w" with an interpolator that
is linear in screen space, then you get the right results.

It's pretty easy to do the math and derive all this.  We then
worked out bounds for the difference between this correct
interpolation and interpolation in screen space.

The thing that's been interesting to me lately is that in order
for something to be "pinned" to the triangle, and *tesselation
independent*, you can only use linear functions of world-space
position.  That is, if you have some field of attributes F(x)
this function assigns an attribute F to each point in the universe
x.  If I put down a triangle and assign values of F to the vertices,
then I can interpolate to produce an F_interp anywhere in the
plane of the triangle.  F_interp is equal to F if and only if
F is only a linear function of x; that is F = M * x for some linear
operator M (a matrix or scalar or vector).

This came up recently with specular.  We were looking at putting
a specular effect on the water surface.  The water surface is
nearly planar, but the tesselation is very strange.  This means
that if you put anything but a linear function on the water, you'll
get odd tesselation-dependent results.  A linear function will look
perfect.  The best thing we can do (I think) is to use a 3x2 matrix
to transform world-space position to texture coordinates per vertex.
Then we can look up that texture per-pixel to find the specular
value.  This matrix can have rotations and scales, the point is
that the per-vertex operation is linear in world space position.

As a counter-example, actually making a "reflection vector" is NOT
linear in world-space, because it involves length and normalized

Here's some of the old text :

Integrating's not bad; I didn't even need Gradsztein ;^)

Note that I'm computing the error across a single line in
screen-space here (such as a span or a triangle edge),
not the error for a whole triangle.

Some quick background to establish notation :

You can interpolate across a span in "screen space" with "t".
So, if S0 = screen space coord at time 0, similar for S1 ,

S(t) = S0 + t * (S1 - S0) = S0 + t * dS

And "S" is proportional to something like (X / Z)

Any world-space geometric value can be interpolated linearly with "t"
in it's (1/Z) form , so (U/Z) can be linearly interpolated with "t".

World-space geometric values *without* the 1/z should be interpolated
with "T" , where

T = t * z0 / ( z1 - t * dz )

(and dz = (z1 - z0) )


X(t) = x0 + T(t) * dx

for example; same for u(t) and z(t), etc.

You can confirm that

X(t) = Z(t) * S(t) = (z0 + T * dz) * (S0 + t * dS) = x0 + T * dx

Ok, so the error in doing affine interpolation instead of perspective
is the error of using "t" to interpolate something like X linearly
intead of using "T" which is what you should use.  So, the error at
any t is :

E(t) = t - T

Note that E is always positive, because T is always less that t.
Also note that

T(0) = 0
T(1) = 1

so T and t match at the endpoints, as they should.

So, you can integrate the error to find the total error over the
span; I did the integral of just E(t) , but actually doing E(t)^2
might be better; and in fact max{ E(t) } (aka the L_infinity norm)
may be even more interesting.

Anyway :

E = Integral{0 to 1} E(t) dt

Note that I integrate on dt, not dT, because I care about the sum of
the errors over the pixels, hence the use of the screen-space interpolant.

The answer is :

E = (z0 + z1) / (2 * dz) - (z1 * z0 / (dz * dz)) * ln( z1 / z0 )

This is kind of funny, but it makes sense if you analyze it a bit.
If you plot it as a function of (z1/z0) for any fixed z0, you see
something that very rapidly increases and then levels out.

In fact, for z0 fixed, as z1 gets large, this E tends to :

E = 0.5 - (z0/z1) * ln(z1)

Which tends to 0.5 ; the max error integral is 0.5 (that's very big,
since t only goes from 0 to 1).  The minimum is of course 0 .

As dz goes to zero, you would expect E to go to zero.  In fact, it
does, but it's not trivial.  The two terms in E have to cancel each
other, because both appear to have (1/dz) forms which make them
singular.  If you expand the natural log with a Taylor series, you'll
find that the actual behavior of E around small dz is :

E = 0.5 dz / z0

Which has a very nice (1/z0) form; in general E goes down as z0 and z1
get bigger.  In fact, you can write E entirely as a function of (dz/z0)
or (dz/z1) if you so desire.

Now, while this is all correct, it doesn't actually give you a simple
formula to compute; in fact, this E is a rather nasty formula, but it
does have a pretty nice shape.  So, you could probably fit an approximation
to E which conservatively over-estimated the error.  You want to maintain
E(dz = 0) = 0  exactly, of course.  

Also, note that this is the error for a general interpolant.  To get
the error of something like "u" , you would use :

E[u] = du * E

(it's just this error times du).

Perhaps more to come...


Find the maximum error instead of the integral of the error :

E_max = E(t_max)

This is the L_infinity norm.

The right answer is :

t_max(a) = a * ( 1 - sqrt( 1 - 1/a) )

(note the sign changes).

As a->1, t_max -> 1 (good)

As a->inf , t_max -> 0.5

Some more useful notes :

(a - 1/2) = Z_avg / dz

(1 - 1/a) = Z0 / Z1

Z_avg = 0.5 (Z0 + Z1)

Which means :

E(0.5) = dz / (4 * Zavg)

(exact) which is pretty cool.


E_max(a) = E(t_max(a)) = 2 * t_max(a) - 1

Is also pretty cute, but not very useful.

We can rearrange this to use the original variables :

E_max = 2 * [ Z_avg - sqrt( Z0 * Z1 ) ] / dZ

Which is actually a super-cute formula.  That's
the difference of the arithmetic and geometric

Michael Sartain wrote:
 > So please excuse my ignorance yet again, but this is what I know of
 > arithmetic vs. geometric averages:
 > arithmetic = (p0 + p1 + p2 + ... + pN) / N
 > geometric = (p0 * p1 * p2 * ... * pN) ^ (1/N)
 > Can you point me to something or give me an idea of what the difference of
 > these two dudes tells us?

Yeah, I have no idea :^)  I imagine there is some literature on this;
are some simple quotes :

arithmetic >= geometric
the difference of the two is related to the standard deviation of the

And there are some nice geometric interpretations :


Also, if Z0 = 1 and Z1 = (1+2e) , for e small, then the difference is
e^2 ; that is, geometric and arithmetic means are the same to 1st order.


There's a good general optimization technique that I've never
seen written about.  It applies to any "closest point" type
problem, an example of which is collision detection.

So, you are making some query, like "give me the closest
object within X units".

The optimization is basically this : once you find a possible
solution, you do not need to consider anything farther away.
This seems pretty obvious, but it's actually profound. 

The key is that "give me the closest object within X units"
should be much cheaper for smaller X.  The reason is you can
make a bounding sphere (or some acceleration structure) of size
X, and then you only need to consider objects in that sphere.
For example, if all your objects are in a grid, then the number
of cells you need to consider is of order X squared (X^2).

For example, if you can cheaply bias your search to see closer
objects first (such as in a quad-tree descent, pick the closer
of the four children first), then you can pretty quickly get
a good guess, at which point you can shorten your query to
only look for objects that are closer, which should then be a
very cheap query.

You can use coherence to make this even faster.  If you remember
the previous object you were closest to, then you're probably
pretty close to that object again.  The result is that the distance
to that object is probably a very good guess, so it allows you
to very quickly reduce the size of your query.


Draw the sky, with color, to the frame buffer.

Render the whole world to the Z-buffer, no color writes.

Now render the whole world with ZTest = Equals,
doing color writes.

The result of this is that each color pixel is written
only once (not counting the sky).

When drawing to the color buffer, alpha-blend, and
set alpha based on distance from the eye, just like you
would set the fog value (transparent at z_far, opaque
at fog_near).

The result is that the world blends into the sky color
in the distance.  You also only run your expensive shaders
once per pixel.

After all this, draw alpha polygons back to front,
Z testing, with their alpha value scaled in the same

In some cases, it may be nice to paint an approximation
of your far geometry into the sky dome; for example,
mountains in geometry would have paintings of mountains
behind them, and everything would fade together nicely.

This eliminates the anomaly which occurs when the fog
color doesn't match the color of the sky behind it.


We've been thinking alot about object-space bump-mapping here
at OddWorld, and it seems to me that it's much more attractive
than what people usually do, which is surface-local space
bump-mapping.  Let me review for clarity, and I'll assume for
the moment static objects (eg. not skinned) :

"surface local bump mapping"

normal map is surface-local; eg. flat surfaces have normals
 that are just unit z

per vertex, you must store a local frame (eg. as two vectors)

in a v-shader, transform the light vector into surface-local
 space at each vertex.

this surface-local-per-vertex L vector is interpolated and
 handed to the pshader

in the pshader :
L may no longer be normalized, so renormalize if you desire

dot L with N from the normal map


"object space bump mapping"

normal map is in object space; eg. very colorful

provide L in object space as a pixel-shader constant

zero per-vertex work needed, no local frame needed

in the pshader :
dot L with N from the normal map
no renormalization needed


I actually did object-space bump mapping in Galaxy1 because
it can be done with the fixed-function pipe; all you do is
put the light color in the tfactor, and you just have a single
DP3 operation!

Anyway, here are the disadvantages of object-space bump
mapping :

1. cannot tile or reuse the normal map; that is, the geometry
and the normal map are tied explicity

2. the normal map does not palletize as well.  surface-local
bump maps take pretty well to palettizing.

3. behaves badly under mip-mapping; eg. will change overall
brightness as it goes into the distance

And the advantages are :

1. faster

2. no need to store per-vertex frame, so less memory used

3. no problems with finding a smooth local frame coverage of
your object

(Caveat about #3 : you actually still have this problem, since
you must uv-map your object-space normal map onto the object.
However, this operation is actually much more forgiving than
finding a smooth coverage with local frames.  For example, you
can replicate chunks of pixels in the uv map to patch up seams.
It's mathematically impossible to smoothly put a local frame on
a sphere, but it is completely possible to C0-smoothly cover it
with textures, using overlapping and seam-matching).


Mail to Casey & Blow :

Hey guys, I've been playing around with quats lately,
sparked by the old talks with Casey about how Granny
does quat splines, and Jon's new article.

Anyway, I wanted to see for myself what various quat
interpolators actually looked like, so I coded up
a little test app.  I'm attaching the source for
reference, though you have no hope of compiling it
since it's heavily tangled with our engine.

So, what I found was kind of surprising to me.  It
seems to me that your choice of Quat interpolator
doesn't matter one damn bit.  That is, between any
two quats, you have some interpolator :

Q(t) = Interp( Q(0), Q(1) , t )

Obviously this Interp needs to have certain properties,

Interp(a,b,0) = a
Interp(a,b,1) = b

and ideally it would be C(infinity) (continuous in
all derivatives), monotonic, and various other things.

What surprised me a bit is that the actual shape of
Interp() doesn't matter very much.  There are lots of
choices for Interp : Slerp, Lerp, exp-map lerp, euler
angle lerp, axis/angle lerp, rational map lerp, etc. etc.

You can then build a spline with any of these Interp's.
The key to a good Quat spline is where you put your
control points, not how your interpolate between them!
That is, any funny property of Interp can be balanced by
inserting control points in the right places.

One thing that really go me started on this was Jon's
remapping of time.  The core of the idea is to use a
simple Interp function, but apply a 0->1 remapping of
the time parameter, like :

Interp_Blow(a,b,t) = Interp(a,b, remapper(t) )

Now the thing I immediately thought was : if you're doing
a cubic spline of quats, and remapper is a cubic function,
then you can always get the same remapping with a normal
time variable just by moving your control points around
on the spline.  That is, any set of control points and
a remapper can always be expressed with a different control
points and the identity remapper (aka no remapper) (as long
as the spline is >= the order of the remapper polynomial).
Of course, Jon's remapper isn't polynomial, but with a
Rational spline, you actually can get close.

This is true of any different between various interps.
Any polynomial difference can be exactly compensated by
moving the spline control points.  Any non-polynomial
difference can be approximated with a cubic polynomial.

Addendum :

I just tried the "rational map" stuff by Johnstone and Williams.
This stuff is actually super-nice; the Quats have really nice
properties under the rational map, and the conversion from rational
map to Quat is actually cheaper than a normalize!


Here are some reasons to not use (or wrap) the STL :

If I type this:

ActorRoom * LayoutEntry::MapRoom(const NiRoom * pRoom)
	ActorRoomMap::const_iterator it;
	it = mActorRoomMap.find(pRoom);
	if ( it != mActorRoomMap.end() )
		return const_cast(&((*it).second));
	return NULL;
Instead of this :
ActorRoom * LayoutEntry::MapRoom(const NiRoom * pRoom) const
	ActorRoomMap::const_iterator it;
	it = mActorRoomMap.find(pRoom);
	if ( it != mActorRoomMap.end() )
		return const_cast(&((*it).second));
	return NULL;
I get this error :
C:\MUNCH\Code\Actor\ActorManager.cpp(1922) : error C2782: 'bool __cdecl operator !=(const _Tp &,const _Tp &)' : template parameter '_Tp' is ambiguous
        could be 'struct _Rb_tree_iterator,struct pair &,struct pair *>'
        or       'struct _Rb_tree_iterator,struct pair const &,struct pair const *>'
C:\MUNCH\Code\Actor\ActorManager.cpp(1922) : error C2676: binary '!=' : 'struct _Rb_tree_iterator,struct pair const &,struct pair const *>' does not define this operator or a conversion to a type acceptable to the predefined operator

if I type this :

        MapType::const_iterator it;
        it = mMapAllocations.find(key);
        if ( it != mMapAllocations.end() )
instead of this :
        MapType::const_iterator it;
        it = mMapAllocations.find(key);
        if ( it != mMapAllocations.end() )
Then I get this error :
c:\munch\code\core\owmemory.h(160) : error C2664: 'unsigned int __thiscall hash_map,struct equal_to,class __default_alloc_template<1,0> >::erase(const unsigned long
 &)' : cannot convert parameter 1 from 'struct _Hashtable_const_iterator,unsigned long,struct hash,struct _Select1st >,struct equal_to,class __default_alloc_template<1,0> >' to 'const unsigned long &'
        Reason: cannot convert from 'struct _Hashtable_const_iterator,unsigned long,struct hash,struct _Select1st >,struct equal_to,class __default_alloc_template<1,0> >' to 'const unsigned long'
        No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called


[from 3d algs, for my reference] :

I can't find any shadowing technique which actually works.

1. Stencils : forget about the hard edges and problems with transparency, that would be fine if there were no other problems. The big problem here is the amount of CPU work and fill rate burned. It's fine if this is the primary feature of your game, *and* your environments are quite compact and isolated (eg. Blade of Darkness, Malice), but this is not ok for a general purpose engine nor for a game with large outdoor environments. Stencils are O(L*N) where L = # of lights, N = # of objects.

2. Shadow maps : no self shadowing is unfortunate. More worrisome are the problems with the onset of the shadow map, eg. the "front clip plane" of the shadow projection. There's all the problem that if all objects shadow all other objects, then shadow maps can be O(L*N*N) (since you must make a map for each light-caster pair and then cast that onto all castee's); in practice it isn't that bad because there are generally few important casters and they affect only a few objects. Aliasing is a problem. Large memory use for all the shadow maps is a problem. One nice thing is that shadow maps can be cached when they're not changing, but that relies on semi-static lightind and environments. Another nice thing is that LOD is easy : just generate lower resolution shadow maps in the distance, but making that transition smooth is another problem.

3. Self-shadowing shadow maps (such as index buffers; shadow-z-buffers) : way too expensive and full of artifacts to apply to entire worlds. They can be great on single objects. A hybrid technique is possible : use a shadow+depth map to self-shadow an object, and then use a normal shadow map to cast that object's shadow onto other objects. This assumes mostly non-intersecting objects. The shadow+depth map must fade out in the distance.

Finally, with all of these, if you want to do it right you must actually accumulate the shadowing in the framebuffer for multiple lights on a given object. That is, the rendering of each object must look like :

  render object with ambient to FB
  for each light :
    add onto FB :
      object lit by the light; multiplied by
      all shadows cast by that light

Where there may be several terms due to diffuse and specular lighting. This actually "fades out" pretty easily; as objects get farther from the camera, you increase their "drop a light" tolerance so more lights are averaged into one; thus in the distance objects only have one virtual light on them and you can use the old fast single-light method with no accumulation in the frame buffer.


I've been thinking alot about lighting (for computer graphics) recently. I'd like to do a little brain dump, though not with the rigor or formality of a "techdoc", since I haven't actually tried any of these thoughts yet.

Current lighting models (eg. the Phong model, or the Lambertian diffuse model) which we use in compute graphics are quite poor. They tend to make shading much too harsh (this is largely due to not simulating radiosity, the many bounces of light) or too plastic or too mirror-like. Real materials and lights are somewhere in between. I see two basic failings (there are others, but they aren't really significant IMHO) :
1. Poor modeling of the interaction of light with the surface; this is the "local" surface responce to diffuse, global, and reflected light. Real surfaces have complex relations which determine the amount (and color) of re-emitted and reflected light given a light source in a given input direction.
2. Simulation of lights as point sources. In reality, lights are extended objects; when you see a large "specular spot" in real life, you're actually seeing an image of the extended light source; furthermore, real objects are actually lit from virtual light sources in *all* directions due to the secondary bounces of radiosity.

So, first let's think of some cases where these failings really show up. I'm going to try to point out real-world scenarios that look dramatically different in games and typical real-time 3d applications. Of course, there are lots of examples from funny objects, like those cloth Christmas tree balls, or CD's, or oil slicks on roads, or coated sun-glasses, etc. but those aren't really that important (though they are cool as an occasional "gee-whiz" in games). I want to talk about normal stuff that's all around us. Let's start with a piece of paper. If you look at it straight on, such that the vector from your eye to the paper is aligned with the normal of the paper, then the paper will look very flat-lit; Lambertian diffuse models this illumination quite well (with modifications, see later). Now hold the paper so you're looking at it from a glancing angle. Hold it so that a light source is on the other side of the paper from you; if you rotate the paper around a bit, you'll see an image of the light source in the paper, that is, a specular reflection! This exact same phenomenon happens with asphalt roads. Standing above the road, the road looks quite diffuse; now look at the road in the distance, towards the sun; you'll see a bright spot on the asphalt (this is independent of the mirage effect on hot roads which is unrelated by also interesting). Both of these are due to the fact that most surfaces are actually reflecting at glancing angles (skin is another). On surfaces that are only slightly reflecting, like these, it's a reasonable approximation to only consider the few bright light sources which contribute to the reflection; on highly reflective surfaces like metals, you'll actually see the images of *all* the diffuse light arriving at the surface (eg. all objects). This glancing reflection tends to produce a glow around objects which are between the viewer and a light. You can hold a finger up towards the sun to see this. This effect is modeled by a "Fresnel factor" which is actually one of the few parts of computer graphics lighting models which comes rigorously from physical wave theory. Many surfaces have color-dependent Fresnel factors, so that light is more or less reflective at glancing angles at different wavelengths (due to the dependence of the index of refraction on wavelength).

Some more cases are informative. First of all, you may notice that micro-structure is rarely visible under diffuse illumination. What I mean by micro-structure is things like the tiny bumps on most machines plastics, the grooves and brush marks on metal, the clumps in the concrete of buildings, the fingerprint and wrinkles on your skin, etc. Under weak indirect light, these details tend to wash out. Furthermore, under bright sunlight, they also tend to wash out (except when they have enough relief to cast shadows on the surface). Thus, I propose to ignore these structures for diffuse illumination. For specular illumination (eg. when you see a reflection of the light source), they're quite important. Again, hold your finger towards the sun, such that you can see the under-side of your finger. You'll actually see your finger-print highlighted by the sun; the grooves and peaks will be alternatingly bright and dark. The saim things happens with grain and ridges on wood, etc. (all the examples above). Thus I propose that micro-structure is actually quite important to specular reflection. Almost everything is slightly reflecting (skin is an important one), but nothing but metals and mirrors will produce a big smooth specular highlight - everything else has this microstructure which will make the specular "spotty" or "grainy", and is quite important to the look.

The thing I'll point out is the failure of simple lighting models. In the room you're in right now, the primary light is probably from the sun. Pick any object and hold a screen (such as your body) between it and the sun. Hmm.. the object is still very brightly lit. Direct illumination is a terrible approximation. We generally correct for this failure with an ambient term, but if you look at your object you'll see that it's not flat lit; objects which are flat-lit by ambient look horribly wrong, they have no depth cues, so they literally look flat. Take a look outside on a cloudy day. There's no direct illumination from the sun; everything is lit from indirect light coming off the clouds and other objects. Some of the "depth" and contrast of the lighting is gone; in particular the microstructure I mentioned before is not pronounced, and objects don't have a strong "N dot L" Lambertian diffuse look. Instead, objects are lit by a sort of uniform isotropic radiosity. What you end up seeing is that surfaces are shaded based on how exposed the surface is to its environment. For example, the flat wall of a building will be well lit, but around a window sill the curves and edges will be darker. Basically, you can imagine any point on a surface; make a little hemisphere there centered on the normal; shoot rays from that hemisphere and see how far they go until you hit something : the longer all these rays, the more light you'll have. Thus, convex corners are lit brightly, and concave corners are darker. This is like a "local self occlusion" for objects. Another good example are trees - the interior branches and leaves, and the trunk, will be darker than the outer leaves due to self-shadowing of the general diffuse radiosity. There's nothing like N dot L diffuse lighting because there's no particular diffuse light direction! (actually, that's not quite right, but more on that later). I propose to improve the look of "ambient" light based on these observations of cloudy days and objects in rooms where no direct light source is visible.

It's interesting to note that the basic Lambertian Diffuse lighting is actually very hard to ever see in real life. It's visible primarily under extreme lighting conditions, such as with a flash-light at night. During the day under sunlight, the most important terms are the ambient/radiosity I described above and the specular. Leaves are another cool thing that's typically done very badly in games. There are several things about them that are wrong. First, the majority of leaves are highly specular, and the micro-structure (eg. the veins and ridges) become quite evident in the specular. Second, leaves rarely actually diffusely lit in a Lambertian sence, since they all scatter light off eachother, so that the light ariving at any particular leave is generally non-directional; the difference in light between the front and back side of a leaf is usually pretty subtle, and could even be encoded in a texture. Finally, leaves are actually semi-transparent, so that they appear lit from the back side when the sun is on the opposite side; this can be quite pronounced when a tree is between the viewer and the sun - the leaves around the edges actually "glow" with the light of the sun shining through them.

So, let's get on to my proposals for improvements. First, let's talk about the diffuse component. There are a few basic terms here; we want to simulate all the interaction with light other than the direct reflection of light. There are two texture maps which I'll need. One is just the standard color texture map. This map should *not* contain any darkening shading which wouldn't disappear under severe illumination; for example, the trunk of a tree should not be darkened in the texture map, because a spot-light would fully illuminate it, whereas cracks in wood can be darkened, because the gathered laquer and dirt there will look dark under any reasonable illumination. The other map I'll need is the surface-occlusion map which I described above (shooting rays from hemispheres). This may either be painted by hand or computed offline. It may be per-object independent, or computed with contributions from the object's environment included. This map is gray-scale and may be placed in the alpha channel of the diffuse texture if that alpha channel isn't needed for transparency. In this map you might have slight darkness at the corners of rooms, darkness on tree-trunks, in macroscopic grooves (as in monitors and grilles). So, away we go computing the diffuse lighting. The first step I propose is to use an ambient light from an environment map (cube map, parabolic map, as you choose). This gives the ambient a bit of shading which is very important; this ambient environment map encodes the illumination from all the static not-too bright diffuse lights, so you might even put your sun and scene lights in it depending on how bright and/or dynamic they are. One way to generate this map is to start with an environment map which just encodes the incoming light as a function of angle, lets call this M1[L] (map 1 is a function of L, which is implicitly a light vector, converted into a 2d-texture coordinate via some sort of uv mapping); you make a new map which convolves a filter against this map, let's call it M2[N] (map 2 is indexed by the normal of the object); you might just use N dot L as your convolution filter, or you might get fancier; this can be done offline, or even on-line using render-to-texture pretty quickly. To look up the "ambient" light you simply look up M2[N] for the surface. This could be done per-pixel. You may also choose to paint M2 by hand; in outdoor scenarios it works well to simply paint the upper hemisphere lightly and to smoothly shade into a darker lower hemisphere. We then multiply this lighting by the "self-radiosity map" (SR map henceforth) which I described above. Now, for each dynamic or bright light in the scene, we add in its diffuse lighting. We could do this with simple per-vertex standard diffuse lighting; then multiply by the SR map; you should probably do this with a 2x scale so that bright lights can over-brighten the SR map (note that there's no need to darken the SR map; instead darken your lights and then scale up by 2x after multiplying the lights vs. the SR map). If you prefer, you can do your diffuse lighting per pixel, using one pass per light (only lights not in the ambient map are needed). Attenuation by distance may done on a per-object basis except when multiple objects are linked together, in which case the seams are visible unless attenuation is done per-pixel or per-vertex (accurate attenuation is not at all important to visual quality; in fact 1/r^2 attenuation for point lights is a poor model for area light sources). I like to use a "softened" Lambertian lighting. Instead of using N dot L, I use (N dot L + softness)/(1 + softness), and then clamp to 0. This basically lets the light reach around the corner where the normal is perpendicular to the light. It makes the shading look less harsh, and is a weak simulation of radiosity and the micro-variation of normals on the model (eg. where the normal of the surface is perpendicular to the light source, there are actually many micro-structures still facing the light!). Finally, you multiply the result by the texture. In all places where I use a normal (N) you could either use a vertex normal or a bump-mapped normal. If you do use a vertex normal, you can do phong-shading by using a pixel operation which interpolates the normal (as a color) and then uses that to do a lookup to a normalizer cube map, and then does the dot-product with the light using that looked-up normal. This is the same as using a bump-map technique with just a flat height map! Using bump-maps is a good choice sometimes, but they should be used with very subtle height variations only, because the aliasing for large height variations is too severe, and the lighting differences are too strong; they are especially useful as a way of separating the lighting and the geometry, which lets you do more seamless LOD changes of the geometry. So ends the diffuse section.

Now we want to add on the specular/reflecting term. This needs to be done for almost all surfaces in the world, but typically you'll only need to consider one light (the brightest). Outdoors, and/or during the day, this will always be the sun. With multiple lights you can use Tom Forsyth's neat math for using the brightest light and making gradual transitions as the identity of the brightest changes. So, we've already got our diffuse lit object in the frame buffer. Basically, we're going to add on our specular contribution by rendering the object again in additive mode. No matter what we do, we'll multiply the specular by a Fresnel term. For a mirror, this is basically 1 everywhere; for other surfaces, it makes the reflectance weaker in the normal direction, while keeping it strong off-normal; it may be colored (since it's based on the index of refraction, which is wavelength-dependent). You can do a great Fresnel term simply by taking V * N (view dot normal) and using it to look up a one dimensional texture. You'll multiply this by the specular that we generate in the rest of the pipe.

There are a couple of ways to do the remaining specular bit (eg. to find what light should be reflected and how much). One of the best is to simply use an environment map. You may use a micro-structure bump-map if you can do an EMBM operation (using the bump-map normal to generate a reflection vector to look up the environment map; the bump-map should be hi res and tiled, to simulate micro structures like I described earlier, NOT to simulate large scale bumps (that looks like ass because of the low resolution). To model imperfect reflectors, you could filter your environment map with a reflection convolution kernel. If your environment is really dynamic, a nicely painted chrome map works awfully well, especially with imperfect and/or rough reflectors (not so good for near-mirror objects, but you shouldn't have any of those anyway!!).

For objects that are nothing like real reflectors, your other option is to use a BRDF at this phase; this lets you model anisotropic reflectors (eg. ones that have a preferred direction on the local surface, like the grooves on a CD, or the fibers on cloth, or flat-lying fur). The simplest BRDF is just the phong specular dot-product to a power model; this can again be done with a micro-structure bump map, and works quite well (along with the Fresnel term) for things like skin and paper that are poor mirrors and also not isotropic. In fact, for these simple cases, the Fresnel term accounts for most of the error in typical game rendering. The nicest thing about the BRDF is that it can encode different color responses in different directions, which is cool for membrane surfaces like plants and skin which have fancy internal reflections. I'll leave off the details of this, you can find it elsewhere. To use the BRDF, you've got to pick one or a few point light sources to compute your specular from. You'll also want to multiply your specular by the "SR map" from the diffuse days; generally you need not use 2X here, because areas that are dark in the "SR" map really should not have much specular contribution, even from very bright lights.

There seems to be a bit of a problem in the specular arena with making a transition between the two choices here, the environment map and the BRDF; it would be nice if we had a parameter we could tweak which would continuously take us between the two.

Charles Bloom / cb at my domain
Send Me Email

Back to the Index

The free web counter says you are visitor number