Go to the new cbloom rants @ blogspot

01-25-09 | Low Level Threading Junk Part 2.01

I'm going to update Part 2 with some fixes and I'd like to write a Part 3 about some actual lock-free madness, but those may have to wait a tiny bit. In the mean time I thought I would post some links.

These are some very good summaries, mainly by the MSDN guys :

MSDN - Lockless Programming Considerations for Xbox 360 and Microsoft Windows
MSDN - Concurrency What Every Dev Must Know About Multithreaded Apps
MSDN - Understanding Low-Lock Techniques in Multithreaded Apps
MSDN - Synchronization and Multiprocessor Issues (Windows)
Intel - Multiple Approaches to Multithreaded Applications

These articles have a good discussion of the memory models of various processors ; in particular Part 1 has a nice table that shows what each processor does (warning : there may be some slight wrongness in here about x86).

Memory Ordering in Modern Microprocessors, Part I
Memory Ordering in Modern Microprocessors, Part II

Here come a mess of links about memory models and barriers and what's going on. There's a lot of contradiction (!!) so be careful : For example - does LFENCE do anything at all on x86 for normal aligned integral reads? good question... btw SFENCE definitely does do nothing (I think?).

Bartosz Milewski - Publication Safety on Multicore
Hans Boehm - Why atomics have integrated ordering constraints
Bartosz Milewski - C Atomics and Memory Ordering
YALFAm (Yet Another Lock Free Approach, maybe) - comp.programming.threads Google Groups
Who ordered memory fences on an x86 «   Bartosz Milewski’s Programming Cafe
The Double-Checked Locking is Broken Declaration
Synchronization and the Java Memory Model
Re Memory fence instructions on x86
Re LFENCE instruction (was [rfc][patch 33] x86 optimise barriers)
Re Intel x86 memory model question 2
Re Intel x86 memory model question 1
Memory Barriers, Compiler Optimizations, etc. - comp.programming.threads Google Groups
LFENCE instruction (was [rfc][patch 33] x86 optimise barriers)
cbrumme's WebLog Memory Model

And now some links about some real mad shit. Do not try any of this at home. I've found two mad geniuses : Chris Thomasson ("the AppCore guy") who has got some nutty x86 lock-free synchronization stuff that relies on details of x86 and does safe communication with zery fencing, and Dmitriy V'jukov ("the Relacey guy") who has written a ton of nutty awesome stuff for Intel.

Wait free queue - comp.programming.threads Google Groups
Thin Lock Implementation «   Bartosz Milewski’s Programming Cafe
Paul E. McKenney Read-Copy Update (RCU)
Multithreaded Producer-Consumer The Easy Way
Larry Osterman's WebLog So you need a worker thread pool...
julian m bucknall Hazard Pointers (careful - wrong stuff in here)
Dr. Dobb's Lock-Free Queues July 1, 2008
Computer Laboratory - Cambridge Lock Free Library
CodeProject A simple Win32 readerswriters lock with reentrance. Free source code and programming help
Atomic Ptr Plus Project
AppCore A Portable High-Performance Thread Synchronization Library
Re Non-blocking queue for the 2 threads (AppCore)
refcount - mostly lock-free word-based atomic reference counting (AppCore)
Asymmetric rw_mutex with atomic-free fast-path for readers - Scalable Synchronization Algorithms Google Groups
Asymmetric Dekker Synchronizatio

There's also Intel's TBB :

Threading Building Blocks Home

It's a horrible name, it's not "Threading Building Blocks" at all. It's fucking OpenMP. It's not really a lib of little threading helpers, which would be handy, it's like fucking CUDA for CPUs.

Also just to repeat - I am in no way recommending that anybody go down this path. As I have said repeatedly - CRITICAL_SECTION is really fucking fast and smart and does exactly what you want. Just use it. If critical section locks are giving you problems it's because your architecture is wrong, not because the micro-routine is not fast enough.

01-25-09 | Low Level Threading Junk Part 2

Okay, we're starting to learn some primitive threading junk. So let's try to use it do something useful.

Non-trivial statics in C++ are hideously non-thread-safe. That should be extremely obvious but if you don't see it, read here : The Old New Thing C++ scoped static initialization is not thread-safe, on purpose! . Obviously plain C static initialization of integral types is fine (BTW floats are NOT fine - beware using const floats in C++). Beware all things like this :

Object & Instance() { static Object s; return s; }

void func(int i)
    static Object once;

Not thread safe. For the most part, we should just not use them. It's very nice to do minimal work in CINIT. Much bruhaha has been made about the "thread safe singleton" (see all these) :

opbarnes.com Code Snippet Released - Thread-safe Singleton (for Windows platforms)
Dr. Dobb's Singleton Creation the Thread-safe Way October 1, 1999 - BROKEN
Dr. Dobb's C++ and The Perils of Double-Checked Locking Part I July 1, 2004
Double-checked locking - Wikipedia, the free encyclopedia

For my money this is a little silly because there's a really damn trivial solution to this. The main point of the old C++ initialize-on-demand Singleton pattern (like Instance() above) is to make it work during CINIT with complex initialization order dependencies. Now, CINIT we know is run entirely on one thread before any threads are made. (you are fucking nuts if you're making threads during CINIT). That means we know any Singletons that get made during CINIT don't have to worry about threading. That means you can use a trivial singleton pattern, and just make sure it gets used in CINIT :

static Object * s_instance = NULL; // okay cuz its a basic type

Object * GetSingleton()
    if ( ! s_instance )
        s_instance = new Object;
    return s_instance;

static volatile Object * s_forceUse = GetSingleton();

The forceUse thing just makes sure our GetSingleton is called during cinit, so that we can be sure that we're all made by the time threads come around. The GetSingleton() that we have here is NOT thread safe, but it doesn't need to be ! (btw I assume that "CreateThread" acts as a MemoryBarrier (??))

Okay, that's nice and easy. What if you want a Singleton that doesn't usually get made at all, so you don't want to just make it during CINIT ? (you actually want it to get made on use). Okay, now we have to worry about thread-safing the construction.

The easiest way is you know your Singleton won't get used during CINIT. In that case you can just use Critical Section :

static CCriticalSection s_crit;     // CRITICAL_SECTION wrapper class will construct during CINIT
static Object * s_instance = NULL;

// broken version :

Object * GetSingleton()
    if ( ! s_instance )

        s_instance = new Object;
    return s_instance;

Okay, that was easy, but it's horribly broken. First of all, there's no gaurantee that Object is only made once. The thread can switch while the instruction pointer is between the first if() and the critsec lock. If that happens, some other thread can get in and make s_instance, and then when we come back to execute we run through and make it again. (If you like, you could say we put the critsec in the right place - we could fix it by moving the critsec out of the if). Even aside from that the line that assigns s_instance is all wrong because the pointer to s_instance is not necessarilly being written atomically, and it might be written before the stuff inside Object is written. What did we learn in Part 1?

// good version :

static CCriticalSection s_crit;     // CRITICAL_SECTION wrapper class will construct during CINIT
static Object * volatile s_instance = NULL;

Object * GetSingleton()
    if ( ! s_instance ) // must be memory_order_consume

        if ( ! s_instance )
            Object * pNew = new Object;
            InterlockedExchangeRelease( &s_instance , pNew );
    return s_instance;

This is a "double-checked lock" that works. The purpose of the double-check is to avoid taking the critsec when we can avoid doing so. If instance is NULL we take the critsec, then we have to check again to make sure its still null, then we rock on and make sure that the Object's memory is flushed before s_instance is set, by using a "Release" memory barrier. Also using Interlocked ensures the pointer is written atomically.

ADDENDUM : I need to add some more notes about this. See comments for now.

Okay, that's all good, but it doesn't work if you now ever try to use this Singleton from CINIT. The problem is that s_crit might not be constructed yet. There's a pretty easy solution to that - just check if s_crit has been initialized, and if it hasn't, then don't use it. That works because we know CINIT is single threaded, so you can do something like :

class Object { } ;
static CRITICAL_SECTION s_crit = { 0 };
AT_STARTUP( InitializeCriticalSection(&s_crit); );
static Object * s_instance = NULL;

Object * GetSingleton()
    if ( ! s_instance )
        if ( s_crit.DebugInfo == 0 ) // DebugInfo will always be nonzero for an initialized critsec
            // must be in CINIT !
            s_instance = new Object;

            if ( ! s_instance )
                Object * pNew = new Object;
                InterlockedExchangeRelease( &s_instance , pNew );
    return s_instance;

This actually works pretty dandy. Note that in CINIT you might take the upper or lower paths - you have no way of knowing if GetSingleton() will be called before or after the critsec is initialized. But that's fine, it works either way, by design. Note that we are crucially relying here on the fact that all non-trivial CINIT work is done after all the simple-type zeroing.

Okay, so that's all fine, but this last thing was pretty ugly. Wouldn't it be nice to have a critical section type of mutex object that can be statically initialized so that we don't have to worry about CINIT mumbo jumo ?

Well, yes it would, and it's pretty trivial to make a standard simple Spin Lock thing :

typedef LONG SpinMutex; // initialize me with = 0

void SpinLock(SpinMutex volatile * mut)
    // 0 is unlocked, 1 is locked
    COMPILER_ASSERT( sizeof(SpinMutex) == sizeof(LONG) );
    int spins = 0;
    while( InterlockedCompareExchange((LONG *)mut,1,0) == 1 ) // Acquire
        if ( spins > CB_SPIN_COUNT )
    // *mut should now be 1 
void SpinUnlock(SpinMutex volatile * mut)
    // *mut should now be 1
    InterlockedDecrement((LONG *)mut); // Release
    // *mut should now be 0
    // if there was a higher priority thread stalled on us, wake it up !    

Note that SpinLock will deadlock you if you try to recurse. Okay, now because this thing is static initializable we can use it to make a Singleton and be safe for CINIT :

static SpinMutex s_mutex = 0;
static Object * volatile s_instance = NULL;

Object * GetSingleton()
    if ( ! s_instance )

        if ( ! s_instance )
            Object * pNew = new Object;
            InterlockedExchangeRelease( &s_instance , pNew );
    return s_instance;

And that works too.

Now, people get tempted to use these simple SpinLocks all over. *DON'T DO IT*. It looks like it should be faster than CRITICAL_SECTION but in fact it's way way slower in bad cases.

First of all, what is a CRITICAL_SECTION ? See : Matt Pietrek - Critical Sections Under Windows .

It starts out as a simple spin lock like the above. It does the same kind of thing to busy-spin the processor first. But then it doesn't just Sleep. This kind of spin-and-sleep is a really really bad thing to do in heavy threading scenarios. If you have lots of threads contending over one resource, especially with very different execution patterns the spin-and-sleep can essentially serialize them (or worse). They can get stuck in loops and fail to make any progress.

Once CRITICAL_SECTION sees contention, it creates a kernel Event to wait on, and puts the thread into an altertable wait. The Windows scheduler has lots of good mojo for dealing with events and threads in altertable waits - it's the best way to do threading sleeping and wake ups generally. For example, it has mojo to deal with the bad cases of heavy contention by doing things like randomly choosing one of the threads that is waiting on a critsec to wake up, rather than just waking up the next one (this prevents a few runaway threads from killing the system).

One important note : you should almost alwayse use "InitializeCriticalSectionAndSpinCount" these days to give your crit secs a spinner. That's because more about this in a moment.

About performance : (from one of the MSDN articles) :

    * MemoryBarrier was measured as taking 20-90 cycles.
    * InterlockedIncrement was measured as taking 36-90 cycles.
    * Acquiring or releasing a critical section was measured as taking 40-100 cycles.

which clearly shows that the idea that a critical section is "too slow" is nonsense. I've said this already but let me emphasize :

    Most of the time (no contention) a Crit Sec *is* just an InterlockedIncrement
        (so how could you beat it by using InterlockedIncrement instead?)

    When there is contention and the Crit Sec does something more serious (use an Event) it's slow
        but that is exactly what you want !!

In fact, the big win from "lock free" is not that InterlockedIncrement or whatever is so much faster - it's that you can sometimes do just one interlocked op, unlike crit sec which requires two (one for Enter and one for Leave).

An interesting alternative is this QLock thing by Vladimir Kliatchko : Dr. Dobb's Developing Lightweight, Statically Initializable C++ Mutexes May 1, 2007 (QLock) ; it's quite a bit like the MS critsec, in that he uses a simple interlocked op to check for contention, and if it's not free then he makes an Event and goes into a wait state and all that. The code is there on Dr. Dobbs but you have to do the thing where you hit the "Download" in the top bar and then navigate around to try to find it. it's here in MUTEX.LST .

Ok, now a bit more InitializeCriticalSectionAndSpinCount. The reason you want a Spin is because basically every new machine these days is "multiprocessor" (multicore). That's a big difference from threading on a single core.

If you're threading on a single core - memory will never change underneath you while you are executing. You can get swapped out and memory can change and you can get swapped in, so it looks like memory changed underneath you, but it doesn't happen in real time. With multiple cores memory can be touched by some other core that is running along at full speed.

Often this doesn't affect the way you write threading code, but it does affect performance issues. It means you can have contention on a way finer scale. In a normal OS single proc environment, you get large time slices so other threads aren't frequently randomly poking at you. With real multicore the other guy can be poking at you all the time. A good range is to spin something like 1000-5000 times. 1000 times is a microsecond

What the spin count does is let you stay in the same thread and avoid an OS thread switch if some other processor is holding the lock. Note that if it's a thread on the same processor - spinning is totally pointless (in fact it's harmful).

01-25-09 | Low Level Threading Junk Part 1

Okay, this is definitely one of those posts where I'm no expert by a long shot so I'll probably write some things that are wrong and you should correct me. By "low level" I mean directly accessing shared variables and worrying about what's going on with the memory, as opposed to just using the language/OS constructs for safe sharing.

Let me say up front that writing lots of low-level thread-sharing code is a very very bad idea and should not be in 99% of the cases. Just use CriticalSection and once in a while use Interlocked and don't worry about the tiny inefficiency; if you do things right they won't be a speed hit. Trying to get things right in lots of low level threading code is a recipe for huge bugs.

I'm going to assume you know about race conditions and basic threading primitives and such. If you're hazy, this is a pretty good introduction : Concurrency What Every Dev Must Know About Multithreaded Apps . I'm also going to assume that you know how to do simple safe thread interaction stuff using Interlocked, but maybe you don't know exactly what's going on with that.

First of all, let me try to list the various issues we're dealing with :

1. Multiple threads running on one core (shared cache). This means you can get swapped in and out; this is actually the *easy* part but it's what most people think of as threading.

2. Multiple cores/processors running multiple threads, possibly with incoherent caches. We have #1 to deal with, but also now the memory views of various threads can be different. The memories sync up eventually, but it's almost like they're talking through a delated communication channel.

3. CPU OOP instruction reorder buffers and cache gather/reorder buffers. Instructions (even in ASM) may not execute in the order you wrote, and even if they do exectue in that order, memory reads & writes may not happen in the order of the instructions (because of cache line straddle issues, write gather buffers, etc.)

4. Single CISC CPU instructions being broken into pieces (such as unaligned accesses). Single ASM instructions may not be single operations; things like "inc" become "load, add, store" and there's an opportunity for a thread to interleave in there. Even apparently atomic ops like just a "load" can become multiple instructions if the load is unaligned, and that creates a chance for another thread to poke in the gap.

5. The compiler/optimizer reordering operations. Obviously things don't necessarily happen in the order that you write them in your C program.

6. The compiler/optimizer caching values or eliminating ops

I think that's the meat of it. One thing that sort of mucks this all up is that x86 and MSVC >= 2005 are sort of special cases which are much simpler than most other compilers & platforms. Unfortunately most devs and authors are working with x86 and MSVC 2005+ which means they do lazy/incorrect things that happen to work in that case. Also I'm going to be talking about C++ but there are actually much better memory model controls now in Java and C#. I'm going to try to describe things that are safe in general, not just safe on x86/Windows, but I will use the x86/Windows functions as an example.

Almost every single page I've read on this stuff get its wrong. Even by the experts. I always see stuff like this . Where they implement a lock-free fancy doohicky, and then come back later and admit that oh, it doesn't actually work. For example this Julian Bucknall guy has a lot of long articles about lock free stuff, and then every 3rd article he comes back and goes "oh BTW the last article was wrong, oops". BTW never try to use any of the lock free stuff from a place like "codeproject.com" or anybody's random blog.

I've read a lot of stuff like :

Unfortunately, Matt's answer features what's called double-checked locking which isn't supported by the C/C++ memory model.

To my mind, that's a really silly thing to say. Basically C/C++ doesn't *have* a memory model. Modern C# and Java *do* which means that the language natively supports the ideas of memory access ordering and such. With C/C++ you basically have zero gaurantees of anything. That means you have to do everything manually. But when you do things manually you can of course do whatever you want. For example "double checked locking" is just fine in C, but you have to manually control your memory model in the right way. (I really don't even like the term "memory model" that much; it's really an "execution model" because it includes things like the concept of what's atomic and what can be reordered).

Some things I'm going to talk about : how lock free is really like spin locking, how critical sections work, why you should spincount critical sections now, what kind of threading stuff you can do without critsecs, etc.

Something I am NOT going to talk about is the exact details of the memory model on x86/windows, because I don't think you should be writing code for a specific processor memory model. Try to write code that will always work. x86/windows has strong constraints (stores are not reordered past stores, etc. etc. but forget you know that and don't rely on it).

Let's look at a simple example and see if we can get it right.

Thread A is trying to pass some work to thread B. Thread B sits and waits for the work then does it. Our stupid example looks like :

// global variables :
MyStruct * g_work = NULL;

Thread A :

int main(int argc, char ** argv)
    g_work = new MyStruct( argc, argv );


void ThreadB()

    while( g_work == NULL )


Okay, now let's go through it and fix it.

First of all, this line should give you the heebee jeebees :

    g_work = new MyStruct( argc, argv );
It's doing a bunch of work and assigning to a shared variable. There are no gaurantees about what order that gets written to memory, so g_work could be assigned before the struct is set up, then ThreadB could start poking into it while I'm still constructing it. We want to release a full object to g_work that we're all done with. We can start trying to fix it by doing :
1.  MyStruct * temp = new MyStruct( argc, argv );
2.  g_work = temp;
that's good, but again you cannot assume anything about the memory model in C or the order of operations. In particular, we need to make sure that the writes to memory done by line 1 are actually finished before line 2 executes.

In Windows we do that by calling MemoryBarrier() :

    MyStruct * temp = new MyStruct( argc, argv );
    g_work = temp;
MemoryBarrier is an intrinsic in MSVC ; it actually does two things. 1. It emits an instruction that causes the processor to force a sync point (this also actually does two things : 1A : flushes caches and write gather buffers and 1B. puts a fence in the reorder buffer so the processor can't speculate ahead). 2. the MemoryBarrier instrinsic also acts as a compiler optimizer barrier - so that the MSVC compiler won't move work before MemoryBarrier ahead of MemoryBarrier.

MemoryBarrier is a full memory fence, it creates an ordering point. In general if you just write memory operations :

you can't say what order they actually happen in. If another thread is watching that spot it might see C,A,B or whatever. With MemoryBarrier :
You get an order constraint : C is always after {AB} , so it might be ABC or BAC.

Another digression about the compiler optimizer fence : in windows you can also control just the compiler optimization with _ReadWriteBarrier (and _ReadBarrier and _WriteBarrier). This doesn't generate a memory fence to the processor, it's just a command to the optimizer to not move memory reads or writes across a specific line. I haven't seen a case where I would actually want to use this without also generating a memory barrier (??). Another thing I'm not sure about - it seems that if you manually output a fence instruction with __asm, the compiler automatically treats that as a ReadWriteBarrier (??).

Alright, so we're getting close, we've made a work struct and forced it to flush out before becoming visible to the other thread :

    MyStruct * temp = new MyStruct( argc, argv );
    g_work = temp; <-
What about this last line? It looks inocuous, but it holds many traps. Assignment is atomic - but only if g_work is a 32 bit pointer on 32-bit x86, or a 64 bit pointer on 64-bit x86. Also since g_work is just a variable it could get optimized out or deferred or just stored in local cache and not flushed out to the bus, etc.

One thing we can do is use "volatile". I hesitate to even talk about volatile because it's not well defined by C and it means different things depending on platform and compiler. (In particular, MS has added lots of threading usefulness to volatile, but nobody else does what they do, so don't use it!). What we want "volatile" for here is to force the compiler to actually generate a memory store for g_work. To my mind I like to think that volatile means "don't put me in registers - always read or write memory". (again on x86 volatile means extra things, for example volatile memory accesses won't get reordered, but don't rely on that!). Note you might also have to make sure g_work is aligned unless you are sure the compiler is doing that.

One thing to be careful with about volatile is how you put it on a pointer. Remember to read pointer adjective from right to left in C:

    volatile char * var;

    // this is a non-volatile pointer to a volatile char !! probably not what you meant

    char * volatile var;

    // this is a volatile pointer to chars - probably what you meant
    //  (though of course you have to make sure the char memory accesses are synchronized right)

Note that volatile is a pretty big performance hit. I actually think most of the time you should just not use "volatile" at all, because it's too variable in its meaning, and instead you should manually specify the operations that need to be sync'ed :

On Windows there's a clean way to do this :

    MyStruct * temp = new MyStruct( argc, argv );
    InterlockedExchangePointer( &g_work, temp );

The Interlocked functions are guaranteed to be atomic. Now we don't have to just hope that the code we wrote actually translated into an atomic op. The Interlocked functions also automatically generate memory barriers and optimizer barriers (!! NOTE : only true on Windows, NOT true on Xenon !!). Thus the InterlockedExchangePointer forces MyStruct to get written to temp first.

Let me just briefly mention that this full MemoryBarrier is *very* expensive and you can get away with less. In particular, something you will see is "Acquire" and "Release". The heuristic rule of thumb is that you use "Acquire" to read shared conditions and "Release" to write them. More formally, "Acquire" is a starting memory barrier - it means any memory op done after the Acquire will not move before it (but ones done before can move after). "Release" is a finishing memory barrier - memory ops done before it will not move after (but ones done after can be done before).

So if you have :

    C - Release
The actual order could be {A B C D} or {B A C D} or {A B D C} but never {A C B D}. Obviously Acquire and Release are a slightly weaker constraint than a full barrier so they give the processor more room to wiggle, which is good for it.

So lets do another example of this simple thread passing (stolen from comp.programming.threads) :

int a;
int b;
LONG valid = 0;

void function1 () { // called from many threads

  if (! valid) {
    a = something;
    b = something;
    InterlockedExchangeAddRelease(&valid, 1);
    // writes to memory 'a' and 'b' will be done before valid is set to 1


int function2 () { // called from many threads

  if (InterlockedExchangeAddAcquire(&valid, 0)) {
    return a + b;
  } else {
    return 0;


int function3 () { // broken

  if ( valid ) {
    return a + b;
  } else {
    return 0;


Now it's easy to fall into a trap of thinking that because we did the "Release" that function3 is okay. I mean, by the time function3 sees valid get set to 1, 'a' and 'b' will already be set, so function 3 is right, okay? Well, sort of. That would be true *if* function 3 was in assembly so the compiler couldn't reorder anything, and if the chip couldn't reorder memory ops (or if we rely on the x86 constraint of read ordering). You should know the actual execution of function 3 could go something like :

fetch a
fetch b
add b to a
test valid
set conditional a to 0
return a

which is now reading a and b before 'valid'. Acquire stops this.

Some good links on this basic memory barrier stuff : (read Kang Su in particular)

Kang Su's Blog volatile, acquirerelease, memory fences, and VC2005
Memory barrier - Wikipedia, the free encyclopedia
Acquire and Release Semantics
The Old New Thing Acquire and release sound like bass fishing terms, but they also apply to memory models
Memory ordering, barriers, acquire and release semantics niallryan.com
Memory barrier + volatile (C++) - comp.programming.threads Google Groups

BTW note that volatile does some magic goodness in VC >= 2005 , but *not* on Xenon even with that compiler version. In summary :

ReadBarrier/WriteBarrier - just prevents compiler reordering

MemoryBarrier() - CPU memory load/store barrier

Interlocked & volatile
    Interlocked functions on Windows automatically generate a compiler & a CPU barrier
    Interlocked on Xbox 360 does not generate a CPU barrier - you need to do it

special volatile thing in MSVC ?
    volatile automatically does the right thing on VC >= 2005
    not for XBox though

ADDENDUM : lol the new Dr. Dobbs has a good article by Herb called volatile vs volatile . It covers a lot of this same territory.

And some more links that are basically on the same topic : (the thing with the valid flag we've done here is called the "publication safety pattern")

Bartosz Milewski - Publication Safety on Multicore
Hans Boehm - Why atomics have integrated ordering constraints
Bartosz Milewski - C Atomics and Memory Ordering

01-24-09 | linkies

Complexification Gallery - wow really gorgeous images. All made algorithmically with Processing, and most driven by semi-physical-mathematical models. Lots of applets to play with. This is seriously fucking amazing and inspiring.

Killzone 2 making of video is pretty awesome.

Mischief and Ivo Beltchev have some crazy debugger database plugin thing for the string-CRC model. IMO this is fucking nuts. But I do love me some autoexp.dat

We were talking about this the other day and I realized I've forgotten what "Koenig lookup" is and why you need it. Basically it just means that functions are looked up in the namespace of their arguments. So :

namespace my_ns
    class Vec3 { ... };

    void func( Vec3 & x )


int main()
    my_ns::Vec3 v;

works, even though it's calling a func() that's not in the global namespace.

If you think about it a second this is nice, because it means that non-member functions on a class act like they are in the same namespace as that class. This is pretty crucial for non-member operators; it would be syntactically horrific to have to call the operators in the right namespace. But it's nice for other stuff too, it means you don't need to jam everything into member functions in order to make it possible to hand a class out to another namespace.

namespace my_ns
    class Vec3 { ... };

    bool operator == (const Vec3 &a , const Vec3 & b) { ... }

int main()
    my_ns::Vec3 a,b;

    if ( a == b )  // ***
At the line marked *** we're calling my_ns::operator == (Vec3,Vec3) - but how did we get to call a function in my_ns when we're not in that namespace? Koenig lookup.

Now, this really becomes crucial when you start doing generic programming and using templates and namespaces. The reason is your containers in the STL are in std:: namespace. You are passing in objects that are in your namespace. Obviously the STL containers and algorithms need to get access to the operations in your namespace. The only way they can do that is Koenig lookup - they use the namespace of the type they are operating on. For example to use std::sort and make use of your " operator < " it needs to get to your namespace.

See Herb's good articles :

What's In a Class - The Interface Principle
Namespaces and the Interface Principle
Argument dependent name lookup - Wikipedia, the free encyclopedia

Not exactly related but other good C++ stuff : Koenig has a Blog .

And if you want to get really befuddled about C++ name lookup you can mix this in : Why Not Specialize Function Templates .

Finally, I've had "dos here" for a long time, but of course it really should be "tcc here". Just follow these registry instructions but for the "command" put :

"C:\Program Files\JPSoft\TCCLE9\tcc.exe" /IS cdd %1

01-22-09 | endir

One of the most useful things I've done recently is "endir.bat" :

endir.bat :

if not exist "%1" endbat
md zz
call mov %1 zz\
call zren zz %1

It puts an object into a dir with the same name as that object. This sounds silly but it's a super useful starting point for lots of things. For example, you have a bunch of music folders named :

Stupid Band NAme - Ironic Album Title

You want to make that

Stupid Band Name / Ironic Album Title

a dir inside a dir. You just go "endir" on the original single dir, it makes a dir inside a dir. Now to do the rename you just truncate each name where you want it. (it's impossible to write a general renaming rule for this because people use all kinds of different separators; I've literally seen names like "And_You_Will_Know_Us_By_The_Trail_Of_Dead_Source_Tags_And_Codes").

01-20-09 | Laptops Part 3

A little more reportage from the laptop front. Holy christ looking for laptops sucks so bad. They're basically all the same inside, but of course people can't just name them clearly based on what they're like. You have to look into every single fucking model in great detail to see what's actually going on with it. Some general thoughts :

802.11n (aka "Wireless N") seems cool; I wasn't really aware of that. The Intel Wifi 5100 and 5300 seem like the way to go. A lot of newer laptops don't have 802.11n which seems pretty dumb.

As much as I hate it, Vista-64 seems like the way to go for the future. All the rotten programmers out there are gonna bloat things up to all hell in the next few years, and I imagine you'll need 8 GB of RAM to run anything in 2010. Hell photoshop already wants 2GB+ for itself. A 32 bit OS locks you into 3 GB which is gonna suck in 5 years.

XP can definitely be found for laptops, but it does greatly reduce your choices. The Dell business line offers "XP downgrades" on most laptops. Lenovo Thinkpads and Toshiba Satellites also have some XP options. Oddly it seems that all of the Mini (10" or less) lappies are on XP.

LED backlighting is brighter, lighter weight, thinner, and uses less power (than the older CCFL backlighting). That all sounds good, *however* I have seen a lot of reports of problems with LED screens. Unfortunately it seems to be one of those things where they vary by screen manufacturer (LG or Samsung or whoever), and every laptop brand sources their screens from various manufacturers, so you never know what you're going to get. The problems I've seen reported are edge bleeding, flickering, brightness various, and "chunky" appearance. I think it's worth the gamble going with LED now, but you might have to fight your manufacturer for RMA if you get a bad one.

SSD's are clearly awesome for laptops but don't seem quite ready for prime time. Again as with the LED screens, not all SSD's are created equal, and it's hard to tell what brand you're going to get when you buy a laptop (mainly some of the off brands have very very poor write performance). Also, there appear to still be major problems with the bios/driver interaction with SSD's - I've widely seen reports that they completely stall the CPU during heavy write activity, indicating a failure to do asynchronous writes. Oddly the SSD upgrade options for laptops seem uniformly overpriced (just buy your own and stick it in yourself) - but there are some laptops with SSD's included by default that are moderately priced.

As for graphics - I've seen a lot of reports of heat problems with NV 8000 series of mobile chips (See previous laptop post for links). Apparently the 9000 series is better. The ATI chips seem to have less of a problem with heat, however there are many reports of problems with ATI mobile drivers under Vista. Yay. The Intel integrated chips are the best choice if all you care about is battery life.

The NV9400 is an integrated part (integrated with the south bridge / memory controller all that mForce whatever). It's supposed to be pretty good for power and is a good DX10.1 part. It's what's in the MacBook Pro for example. BTW it's a real LOL to see the tech writers speculate about Apple shunting OS work off to CUDA. Buy the marketing nonsense much? BTW also lots of laptop specs lie about the "graphics memory" with NV 9 chips. NV 9 is a shared-memory model chip because it's built into the memory controller, kind of like the XBoxes, so the "1 Gig" of graphics memory they're claiming just means that the NV 9 is using 1 G of your main memory.

Another interesting new thing in graphics is the Thinkpad T400 Switchable Graphics . The T400 has an ATI HD3470 for games and an Intel integrated X4500 for battery life. Under Vista apparently you can switch with just a mouse click - no reboot. Under XP you have to reboot. This is kind of cool, but it's also scary as hell because apparently it uses a hacked-together FrankenDriver. Personally I think just going with the NV9400 is a better choice. (also, the MacBook Pro has a 9400 -> 9600 switch option; that's weird because the 9400 is good enough IMO. anyway apparently you have to log out and log back in to switch which is kind of lame; it's not quite a reboot, but it's not a mouse click either).

There are a lot of shitty screens out there. I ranted before about the prevalence of 1280x800 15" screens. Oddly, there are also 1920x1200 15" screens now, which is equally insane in the opposite direction. (I assume those laptops come with a complimentary magnifying glass). Many laptop manufacturers now seem determined to fuck up the screens for no reason but taking glossy screens, and then PUTTING AN EXTRA GLOSSY FILM ON ALREADY GLOSSY SCREENS. Apple for example does this with the new Macbooks. The manufacutrers usually call them "edgeless" or "infinity" screens or something like that, but they should call them "nerfed" or "bricked" screens because they make your laptop useless outside of pitch dark. BTW it's super LOL that the MacBook website shows the pictures of the screen with giant reflections across it; the web site design people must have thought it was cool looking and intentional so they're trying to show off the sweet glare. (the MBP is available with matte I gather, but the MB is not). BTW also this is not a question of taste - yes, matte vs. glossy is a reasonable choice that could go either way, but "glossy with extra gloss layer" is not a reasonable choice.

On the plus side, matte screens definitely can be found (or just plain glossy without the extra stupid layer if you like glossy). With quite a lot of searching I found a bunch of 14" and 15" laptops with matte WXGA+ (1440x900) screens. Something I could not find was non-widescreen (1400x1050). Everything is widescreen now. (you can find 1680x1050 aka WSXGA+ in 15" if you want a bit more res)

Intel Core 2 Duo is the way to go, but there's a little trap. Some of them are 25W and some are 35W. Obviously if you care about heat and battery you want the 25W. It seems the secret code here is a "P" in the name and ignore the number, so P8400 or P9500 is good. Like everything else Intel has decided to jump into fucking retarded obfuscated naming land. Oh yes, of course I want a Penryn on Montevina that's a 9500 (beware, some Penryns on Montevina are 35W). Pretty good guide to Intel nomenclature .

The really thin & light laptops are crazy overpriced right now for no reason. It's the same fricking parts, in fact often they use *cheaper* parts like older CPU's and lesser GPU's, but then they charge almost a $1000 markup for being small. For example the Dell Latitude E4300 is around $1700 but a more capable and identical (but larger) Latitude E6400 is around $900. This applies obviously to stuff like the Air and the Envy, but also to pretty much every good Sony Vaio. If you want the super light laptop you have to either shell out, or go off-brand, or wait.

There are countless things to verify that many lappies fail on : heat, noise, touchpad works decently, hinges okay, screen latch (!), solid case, gigabit ethernet, eSATA.

Finally, www.notebookcheck.net was my savior in finding laptop information. They actually have all the models gathered and a pretty clear simple list of specs with each one, so you can decode the laptop model jungle. Notebookcheck reviews have actually noise volume measurements and exact temperature measurements. Booya. They do it in Celcius. 30 C is okay. 35 C or more is uncomfortable.

Now the specific models :

Dell Latitude E6400 is one of my favorites. Available in 14.1" with WXGA+ (1440x900) matte LED. It's a metal case. It's reasonable light and small and has decent battery life. The specs are not top of the line but they're fine. It also has no huge "style" markup price. Price range $800-1100. The E5400 is a good budget choice, it's just a bit heavier and bigger and slightly downgraded components, but can be had for $600. (these have no top-end graphics options, they're for non-gamers)

Thinkpad T400 is probably what I'd buy for myself right now if I was buying something. 14.1" with WXGA+ (1440x900) matte LED. Very good battery life. Good build quality and keyboard, though I have seen reports that they are flimsier and the keyboard is worse than the older Thinkpads like the T61 line. T400 is around $1200 well equiped. BTW the Thinkpad R400 and Ideapad Y430 are both very similar to the T400, but not really much cheaper, so just go with the T400.

Dell Studio 15 is a pretty solid middle of the road choice for a 15" lappy. 1440x900 LED , but sadly only glossy. It's rather heavy, battery not great. I rather like the sloping keyboard design of these things, much more ergonomic than the big brick style of the Thinkpads.

HP dv5t is a 15" with 1680x1050. Saddly only glossy and offers the retarded "infinity display" which is awesome if you like seeing reflections of the sky . The HP dv4 and dv5 are the cheapest (major brand) laptops with high end GPUs and decent specs. Battery life is notoriously poor for HP's. The use the dedicated NV chips not the integrated.

The Toshiba Satellites seem okay but have nothing in particular to recommend them over one of these.

The Acer Aspire line is pretty amazing if you want a sub-$500 laptop. The components are basically fine, the only thing sucky about them is shitty plastic build quality. If you're not a style snob, they could be a great choice.

The Sony Vaio Z is a very good ultraportable 13", but expensive ($1699). The cheaper Vaio's like the NR,CS,NZ are not competitive in build quality or specs with the laptops above.

The new "unibody" MacBook (not pro) is actually a decent value; I mean, it's not a good value, but it's not awful. Unfortunately it's only available in awful-gloss.

Lastly, laptop pricing is fucking awful. Because you're locked into the manufacturers, they do weird shit with sales and coupons. The prices can fluctuate wildly from one day to the next. Chances are one of the laptops mentioned here is available with a close to 50% discount right now. So if you are not locked into a particular model and can wait a bit, do so.

01-21-09 | Towards Lock-Free Threading

Currently Oodle threads communicate through plain old mutex locking, but I'd like to go towards lock-free communication eventually. Now, lock-free coding doesn't have the normal deadlock problems, but it has a whole new host of much scarier and harder to debug thread timing problems, because you don't have the simplicity of mutex blocking out concurrency during shared data access.

It occurs to me there's a pretty simple way to make the transition and sort of have a middle ground.

Start by writing your threading using plain old mutexes and a locked "communication region". The communication area can only be accessed while the mutex is held. This is just the standard easy old way :

{Thread A} <---> {Communication Region} <---> {Thread B}

Thread A - lock mutex on CR
    change values ; 
--- Thread B lock mutex
    read values

Now find yourself a lockfree stack (aka singly linked list LIFO). The good old "SList" in Win32 is one fine choice. Now basically pretend that Thread A and Thread B are like over a network, and send messages to each other via the lock-free stacks.

To keep the code the same, they both get copies of the Communication Region :

{Thread A | Communication Region} <---> {Communication Region | Thread B}

and they send messages via two stacks :

{Thread A | Communication Region} --- stack AtoB --> {Communication Region | Thread B}
{Thread A | Communication Region} <-- stack BtoA --- {Communication Region | Thread B}

The messages you send across the stacks apply the deltas to the communication regions to keep them in sync, just like networked game.

So for example, if Thread A is the "main thread" and Thread B is a "job doer thread" , a common case might be that the communication region is a list of pending jobs and a list of completed jobs.

Main Thread
    {Pending Jobs}
    {Completed Jobs}

Request Job :
    add to my {Pending Jobs}
    push message on stackAtoB

Worker Thread
    {Pending Jobs}
    {Completed Jobs}

Sit and pop jobs off stackAtoB
put in pending jobs list
work on pending jobs
put result on completed jobs list
push completion message on stackBtoA

The nice thing is that Main Thread can at any time poke around in his own Pending and Completed list to see if various jobs are still pending or done yet awaiting examination.

Obviously if you were architecting for lock-free from the start you wouldn't do things exactly like this, but I like the ability to start with a simple old mutex-based system and debug it and make sure everything is solid before I start fucking around with lock-free. This way 99% of the code is identical, but it still just talks to a "Communication Region".


I should note that this is really neither new nor interesting. This is basically what every SPU programmer does. SPU "threads" get a copy of a little piece of data to work on, they do work on their own copy, and then they send it back to the main thread. They don't page pieces back to main memory as they go.

While the SPU thread is working, the main thread can either not look at the "communication region", or it can look at it but know that it might be getting old data. For many applications that's fine. For example, if the SPU is running the animation system and you want the main thread to query some bone positions to detect if your punch hit somebody - you can just go ahead and grab the bone positions without locking, and you might get new ones or you might get last frame's and who cares. (a better example is purely visual things like particle systems)

Now I should also note that "lock free" is a bit of a false grail. The performance difference of locks vs. no locks is very small. That is, whether you use "CriticalSection" or "InterlockedExchange" is not a big difference. The big difference comes from the communication model. Having lots of threads contending over one resource is slow, whether that resource is "lock free" or locked. Obviously holding locks for a long time is bad, but you can implement a "lock free" model using locks and its plenty fast.

That is, this kind of model :

[ xxxxxxxxxx giant shared data xxxxxxxxxx ]

    |          |         |         |
    |          |         |         |
    |          |         |         |

[thread A] [thread B] [thread C] [thread D]

is slow regardless of whether you use locks or not. And this kind of model :

[data A  ] [data B  ] [data C  ] [data D  ] 
    |     /    |     /   |       / |
    |    /     |    /    |      /  |
    |   /      |   /     |     /   |

[thread A] [thread B] [thread C] [thread D]

is fast regardless of whether you use locks or not. Okay I've probably made this way more wrong and confusing now.


Let me try to express it another way. The "message passing" model that I described is basically a way of doing a large atomic memory write. The message that you pass can contain various fields, and it is processed synchronously by the receiver. That makes common unsafe lock-free methods safe. Let me try to make this clear with an example :

You want Thread B to do some work and set a flag when it's done. Thread A is waiting to see that flag get set and then will process the work. So you have a communication region like :

// globals :
    bool isDone;
    int workParams[32];
    int workResults[32];

Now a lot of people try to do lock-free work passing trivially by going :

Thread A :

    isDone = false;
    // set up workParams
    // tell Thread B to start

    ... my stuff ...

    if ( isDone )
        // use workResults

Thread B :

    // read workParams

    // fill out workResults

    isDone = true;

Now it is possible to make code like this work, but it's processor & compiler dependent and can be very tricky and causes bugs. (I think I'll write some about this in a new post, see later). (the problem is that the reads & writes of isDone and the params and results don't all happen together and in-order). Instead we can just pass the object :

struct ThreadMessage
    bool isDone;
    int workParams[32];
    int workResults[32];

Thread A :
    ThreadMessage myLocalCopy; // in TLS or stack
    // fill out myLocalCopy
    // push myLocalCopy to thread message queue to thread B

    ... my stuff ...

    if ( pop my message queue )
        myLocalCopy = copy from queue
        // use myLocalCopy.workResults
Thread B :
    ThreadMessage myLocalCopy; // in TLS or stack
    myLocalCopy = pop message queue

    // read myLocalCopy.workParams

    // fill out myLocalCopy.workResults
    push myLocalCopy to queue for thread S

Okay. Basically we have taken the separate variables and linked them together, so that as far as our thread is concerned they get written and read in one big chunk. That is, we move the shared data from one large consistent state to another.

01-20-09 | Spam

The gmail spam filter is pretty good, it's eliminated like 99% of my spam. However, at the same time, it's kind of disappointing. Spam still gets through, and the spam that gets through is just so obvious, how can you not tell it's spam? I know Google doesn't like have any smart employees or anything like that, so let me help you guys out :

1. If it's an email selling ED drugs, it's probably spam.

2. If I get an email and mark "report spam" on it - you probably shouldn't send me the *exact same email* the next day.

3. If the email is full of crazy non-text characters or weird ANSI color codes, it's probably spam.

I mean, Google is letting through emails whose entire body is :

Buy Viagra&Cialis&Tramadol....

How can you possibly think that's right?

BTW I think it's very productive to have an atmosphere at your company where anybody can challenge anybody else about the behavior of their functionality. So for example in this case I'd like to see every Google employee forwarding their spam to the spam dev team. But in game dev I'd like to see the artists go up to the tools guy and go "hey dude, our level export takes 30 minutes, WTF." This is one of those things that's very hard to get right. If you have a culture where everyone is friendly and everyone is trying to be the best, it can be welcomed. But if your culture is that everyone is just trying to do the minimum of work and fly under the radar and not talk to each other, then this will be very unwelcome.

01-20-09 | Tuesday Random

You shouldn't put flags in the top bits of pointers any more, because there are things like /3GB and others that make your address space be top-down. *However* you can usually still put flags in the bottom bits of your pointers. If all allocations are 8 or 16 byte aligned you've got 3 or 4 bits of flags in the bottom of your pointers. I like to make a little class called PointerAndFlags that lets you get at the pointer part and the flag parts cleanly.

It occurred to me this morning that we almost have a "Game STL" at RAD. Jeff has a really good sort, and I wrapped it up in templates. I have my old vector (pretty trivial, really), and I recently screwed around with hash tables for Oodle which I could easily wrap in templates. Sort, vector and hash_map are 90% of the good stuff that I use in the STL. I could easily build my hash table on top of my vector, which means that there is only one type of object that does allocations, which is pretty cute.

Unfortunately I don't have time to really do that right, and there's not much motivation for RAD to do it. We couldn't sell it; nobody would buy it (even though they should - it's one of those things that actually has a ton of value, but everybody thinks they can write their own, and producers who do the funding don't understand the value even if the programmers do). It's also one of those things where people are foolishly unwilling to settle for compromises. If there's a very good available solution which doesn't make exactly the trade-offs that you would have chosen, many programmers go off and roll their own. And anyway, even if we could sell it, we shouldn't ; library code is kind of thing that really needs to be open source, so that a concensus can be formed and everyone's fixes can be integrated, etc.

There's something I've written about previously in regards to poker, but I've been thinking about it a lot recently in other contexts. It's that "tough decisions" don't matter. Basically if a decision is really hard and complex and interesting - then it's probably irrelevant. If the EV of both ways is roughly equal, you don't need to sweat over which choice to take, just flip a coin and take one.

I thought about this the other day watching some interview on the Daily Show where someone was saying how Bill Clinton would sit down with you for hours and talk about policy with you, and how he was a policy nerd and just loved to get into the details. Okay, that's awesome, it's way better than someone who doesn't want to learn or study or think at all, but it's not actually the crucial thing in a president. That is, you don't need a president who gets the really tough issues right. All that really matters is having a president that doesn't *grossly* botch the *easy* decisions.

I think I've written about this in game development and production meetings and such too. I've been in lots of design meetings where we'll get sucked into some long argument about some irrelevant detail where the designers and artists will go over and over for hours about whether the health indicator should be orange or red. The fact is, getting those details right is super irrelevant. There are so many other issues that you're completely screwing up that you don't need to get those little things right.

I was thinking about it just now in terms of evaluating algorithms and STL's and such like. For example with hash tables there are a lot of different implementation choices, you can trade off code complexity vs. speed vs. memory use vs. flexibility. The truth is, almost any reasonable point on that curve would be fine. The only thing that actually matters is that you avoid implementations that are *grossly bad*. That is, they're way off the curve of where the good choices are. Whether you choose something that's slightly faster or not is pretty irrelevant in the scheme of things. (like all of these points, obviously this is dependent on context - decisions that are usually irrelevant can become crucial in some conditions).

01-19-09 | swap, templates, and junk

I keep running into annoying problems.

One is having macro conflicts. I like to just have stuff like

#define MIN(a,b)    ( (a) < (b) ? (a) : (b) )

for myself, which is fine and all, until you try to mix multiple codebases that all have their own idea of what that macro should be.

(obviously MIN is not actually a big deal because everybody agrees on what that should be; a bigger issue is like if people are doing a #define of malloc or SWAP or Log).

The cleanest thing I've come up with is :

1. All #defines in headers must be "namespaced" , like CB_MIN instead of just MIN.

2. Never #define anything that looks like it's in global namespace (eg. no #defining malloc)

3. Make a header called "use_cb.h" which does
    #define MIN     CB_MIN
    #define malloc  cb::malloc


Basically this is hacky way of "namespacing" the macros and then "using the namespace" in the CPP files. Not awesome.

The other issue I'm running into is generic functions and template specialization.

Say you're working with some codebase, let's call it "stupidlib". They have their own ::swap template :

template < typename Type > inline void stupidSwap(Type &a,Type &b)
    Type c = a; a = b; b = c;

And then they also override it for some special cases :

template < >
inline void stupidSwap<stupidObject>(stupidObject &a,stupidObject &b)

Now, you are trying to munge stupidlib into a codebase that just uses the normal STL. It's putting objects in containers, and you want it to use std::swap correctly, but when you call std::swap on a stupidObject - it doesn't see that a nice swap override has been made.

So far as I can't tell you cannot fix this. It is impossible to make std::swap redirect to stupidSwap for the cases where stupidSwap has been overriden. You have to do it manually for each object type, and any time somebody changes stupidlib your code can silently break.

Now there is a pretty easy solution to this : never define a generic operation which already exists in the STL - just override the STL.

The problem is that people like to replace bits of the STL all the time. Hell I do it, because the STL headers are so retardedly bloated they kill compile times, so I try to use them minimally.

But that "solution" isn't really a solution anyway - it's just a convention to use the STL defs as bases for overloading when they exist. When you're trying to overload a generic operation that isn't defined in the STL you go back to having the same problem - two codebases can define that operation and various specializations, and you can't mix them.

Generic/template programming simply does not work well when mixing codebases.

01-19-09 | Laptops Part 2

So I was just about to buy a Dell 1330 (you can get LED and SSD for $1200) but I kept seeing reports of problems. Urg. Turns out the stupid NV GPU overheats and fries the board. There have been many recalls and Dell has a hacky attempt to crank up the fans with the BIOS :

many people having problems , notebookreview forum , Dell response , more problems . Yay.

ADDENDUM : there are a lot of funny youtube videos related to his. Start with Dell XPS 1330 Overheat and then look in the Related Videos tab. I like "Dell XPS 1330 Un Fraude Total".

I was kind of thinking of going with the integrated graphics anyway cuz it's cooler and draws less power, so maybe it's not a problem, but URG, seriously !?

Also - why is there not a fucking decent simple mobile graphics option? I would be totally happy (for Alissa's machine) with just a Mobility-9600 level of functionality, and it would presumably be pretty cool and small now with better process. Instead we get the fucking NVidia Heat Generator XL337 or the Intel 1CantDoAnything.

If you want a super-portable, I've found some decent things in the 12" space (you'll recall I think 10" or less is shit, and thin is shit) :

The Vaio G11 has a 1024x768 12" display, but it's a very high quality *matte* LED display (good matte is hard to find these days). Similar is the Zepto Notus 12" which has a 1280x800 display but is worse in most other ways. Both are around 2.5 pounds and have about 9 hours of battery life - and have DVD drives so you can actually watch movies on them.

Also there seem to be a lot of 15.4 " screens at 1280x800 ( !!?? ) WTF. Plus I have to read through pages of reviews to find out that the fucking 15" laptop I'm looking at is worthless and crippled. If I wanted 1280x800 I would be looking at 13.3" lappy you fuck-heads.

Notebookreview.com is generally okay, but this review of the HP dv5t has some real gems.

With the introduction of the HP Pavilion dv5 series notebooks, HP is finally offering high-resolution displays. The dv5t is currently offered with a WXGA or WSXGA+ resolution. The WXGA screen (1280 x 800 resolution) is what most 15.4-inch notebooks in stores have, and the most common resolution on 15.4-inch notebooks. The WSXGA+ display (1680 x 1050 resolution) is what my notebook has. It has 42% more viewable space than the WXGA display, which is the reason I chose it. Higher-resolution screens allow you to see more and scroll less. For example, if I view a large web page, I could see 42% more content on the WSXGA+ display than on the WXGA display. Another example- while viewing a high-resolution picture, I can see 42% more detail on the WSXGA+ display than on the WXGA. WSXGA+ makes it possible to use larger windows side by side; you would be hard-pressed to practically view two spreadsheets side-by-side with a WXGA display, but with the high-resolution WSXGA+, it is more than possible (you could do it without shrinking the windows too much).

Wait, wait, I'm not sure I follow, are you saying that with more resolution I can see better pictures?

HP offers two display finishes in addition to the resolutions - the standard BrightView or the BrightView Infinity. The Brightview display has the standard glossy finish that nearly all new consumer notebooks come with. The Infinity display is a new option introduced on the dv5 series notebooks. The Infinity display is basically a large piece of clear plastic over the entire display. It makes the display look like it has no borders. I have the Infinity finish on my notebook. While it makes the notebook look sleeker and more modern, it does increase the amount of reflections over a standard glossy finish. I personally do not mind the reflections. If you are used to a regular glossy display, the Infinity display is not that different in glossiness. I would choose the Infinity display again, since it makes the notebook look sleeker.

Oh, sweet, it looks more modern. I definitely want that shizzle. Who cares if it's more expensive and retardedly glossy and reflects everything and makes your screen unusable. It looks SLEEK bitches!

Ok, back to le sigh...

01-18-09 | Laptops

So I'm trying to learn about laptops. I need to buy a new one for Alissa ASAP, and some day soon I need to replace mine, though the more I read about them the more I like my fucking 5 year old laptop. My AOpen laptop is running a Radeon 9600-Mobility , which is a DX8 part, but it has a 7200 RPM drive, 15" display at 1400x1050 with a thin bezel, runs cool and quiet (with no 3d use), and the battery lasts 4 hours (with no 3d use). Plus my current lappy is Windows XP which is a huge bonus. Unfortunately it's my main dev machine, and once I start doing DX10+ it just won't work.

So my first dilemma looking for laptops for Alissa is the fucking operating system problem. It's hard to find XP laptops any more (though they do exist, it severely limits the choices); I really don't want to ever deal with Vista. The other choice is Mac OS X. I hear good things about OS X for casual users, but it greatly reduces my ability to do tech support on it; I'd have to learn a whole bunch of quirks of new operating systems, and I'm worried that simple things like sharing files with our Windows media PC might be a pain.

The biggest improvements in laptops in the last 5 years seem to be : 1. SSD's , and 2. LED's. The SSD disk is mainly awesome because of reduced noise. The SSD has other nice littler perks - slightly faster boot (though that's comparing to a 7200 RPM drive, if you get stuck with the standard 5400 or 4200 in notebooks then an SSD is a big improvement), and slightly more durable and less power draw. LED backlit screens is a pretty big win in every way. Other than those two things I'm pretty disappointed. Weight and battery life do not seem to be significantly improved.

The Atom-based mini laptops are kind of cool. For one thing they actually have battery life (8 hours on good ones like the Samsung NC10 ). However, the ubiquitous 10" screens just don't make sense to me. It's too small for doing anything serious, it's like an eye-strain nightmare and you can't even fit web pages in that resolution. I think 13" and 1280x900 or so is the minimum for a "netbook" type of usage, and 15" and 1400x1050 is the minimum for a "productivity" laptop (yes I know 1280x800 seems to be the standard thing now but fuck that's just not enough vertical pixels; I'd like 1280x1024 really, I fucking hate widescreen; text pages are vertical!!).

I just don't get this "mini" product niche at all. If you can carry a 10" thing then you can carry a 13" thing. If you want something tiny just for email, use your iPhone. You can get some 12" screens at 1280x800 which would be okay ; for example this MSI . MSI seems to be the new cool off-brand notebook (it used to be AOpen but they seem to have dropped out). MSI also has one of the few decent websites of any laptop maker, you can actually filter the different models by sensible criteria like screen size, weight, and operating system.

The "Air" is just a marketing piece of shit. First of all if you want a thin laptop that's actually functional, you can get a Samsung X460 . But thin is just such a worthless retarded fucking way to measure laptops. It's like measuring cars by the volume of their engine. It's just not what you actually care about. Oo la la look how fucking thin my laptop is; who fucking cares? The important things for a laptop to actually be lappable are : 1. weight, 2. heat, and 3. battery. The "Air" does okay on #1 but fails miserably on #2 and #3. What the fuck is the point of a thin light laptop if you can't actually use it on your lap because it gets scalding hot and only runs on battery for 2 minutes. Also the lack of ethernet port is pretty ass-tacular.

The MacBook Pros seem good, but they're just about double the price of comparable windows laptops such as the Asus N80 - they're $2500 instead of about $1100 for the same spec PC. Also, it's pretty fucking annoying the way they keep reving the hardware and not changing the name at all. When someone says "Macbook Pro" it doesn't specify the type at all. This is a decent page about the revisions. I guess the new ones (Rev F - 2.4 Ghz Penryn) have nice improvements. They offer LED and SSD. The resolution of the 15" is pretty decent (1440 by 900) though I'd rather have non-wide 1400x1050.

I have to admit the Adamo looks pretty sexy but I'm sure it will be way overpriced. Though with everything I'm reading about heat, I'm not convinced that making your laptop a solid brick of conductor is such a great idea. There's some merit to the old plastic cases with vents and fans.

The Dell XPS M1330 actually looks pretty sweet, it seems like a decent compromise (it has SSD and LED now), and they're old enough that the price is now reasonable. The main drawback is that it's Dell (and Vista), and Dell has been making nothing but ass for the last 10 years. Apparently it does get too hot to lap, though.

Anyway .. off to do more research ..

ADDENDUM : fucking hell, searching for hardware info is really annoying. For one thing, default searching gives me all kinds of old results that are just completely worthless (old hardware news should just get deleted from search or something, WTF, I don't want to know about the 2006 revisions of the Macbook). The other thing is the results are dominated by sites that are just selling things with no added info - and even when you do get good info from a place like Anand or Tom's they tend to be "first look" type reviews. The good reviewers are focused on the things that aren't out yet, and they often fail to go back and write a good review with hindsight. What I'd actually like is reviews from people who have actually owned these things for several months, and not just random shlubs but cognoscenti.

Also, I don't need to see pictures of you fucking unboxing it ! No, I don't care about the packaging or what kind of pamphlets and logos it came with. I don't want a reviews from someone who's spent a few days or less with the damn thing. Urg.

BTW another trend that seems to have arisen is the "boutique" laptop, and it's infiltrating all the price points. For example there's the Best Buy "Blue Label" laptops, that are basic laptops with a 50% price hike for no reason. And all the manufacturers now are making "special editions" variants where you can spend an extra $200+ to get stupid designs on your case, like the HP Pavilion with "* Ceramic white coloring with sophisticated leaf design symbolizing rejuvenation and growth". Who the fuck pays for this shit?

01-17-09 | Float to Int

A while ago the Some Assembly Required blog wrote some good notes about float-to-int. I posted some notes there but I thought I'd try to summarize my thoughts coherently.

What I'd like is a pretty fast float-to-int (ftoi) conversion. The most useful variants are "truncate" (like C, fractions go towards zero), and "round" , that is, fractions go towards the nearest int. We'd like both to be available all the time, and both to be fast. So I want ftoi_trunc and ftoi_round.

First let me say that I hate the FPU control word with a passion. I've had so many bugs because of that fucker over the years. I write some code and test it and everything is fine, and then we actually put in some game and all hell breaks loose. WTF happened? Oh well, I tested it with the default word setup, and now it's running with the FPU set to single precision. The other classic bug is people changing the rounding mode. D3D used to be really bad about this (you could use FPU_PRESERVE but it was a pretty big hit back in the old days with software T&L, not a big deal any more). Or even worse is people who write code intentionally designed to work with the FPU in a non-standard rounding mode (like round to nearest). Then if you call other code that's meant for the normal rounding mode, it fails.

Ok, rant over. Don't mess with the FPU control word.

That means the classic /QIfist really doesn't do that much for us. Yes, it makes ftoi_trunc faster :

int ftoi_trunc(float f) { return (int) f; }

that's fast enough with /QIfist, but you still need round :

int ftoi_round(float f) { return ( f >= 0.f ) ? (int)(f + 0.5f) : (int)(f - 0.5f); }

note that a lot of people just do + 0.5 to round - that's wrong, for negatives you need to go the other way, because the C truncation is *toward zero* not *down*.

Even if you could speed up the round case, I really don't like using compiler options for crucial functionality. I like to make little code snippets that work the way I want regardless of the compiler settings. In particular if I make some code that relies on ftoi being fast I don't want to use C casts and hope they set the compiler right. I want the code to enforce its correctness.

Fortunately the xs routines at stereopsis by Sree Kotay are really good. The key piece is a fast ftoi_round (which I have slightly rejiggered to use the union method of aliasing) :

union DoubleAnd64
  uint64    i;
  double    d;

static const double floatutil_xs_doublemagic = (6755399441055744.0); // 2^52 * 1.5

inline int ftoi_round(const float val)
  DoubleAnd64 dunion;
  dunion.d = val + floatutil_xs_doublemagic;
  return (int) dunion.i; // just cast to grab the bottom bits

in my tests this runs at almost exactly the same speed as FISTp (both around 7 clocks), and it always works regardless of the FPU control word setting or the compiler options.

Note that this is a "banker's round" not a normal arithmetic rounding where 0.5 always goes up or down - 0.5 goes to the nearest *even* value. So 2.5 goes to 2.0 and 3.5 goes to 4.0 ; eg. 0.5's go up half the time and down half the time. To be more precise, ftoi_round will actually round the same way that bits that drop out of the bottom of the FPU registers during addition round. We can see that's why making a banker_round routine was so easy, because that's what the FPU addition does.

But, we have a problem. We need a truncate (ftoi_trunc). Sree provides one, but it uses a conditional, so it's slow (around 11 clocks in my tests). A better way to get the truncate is to use the SSE intrinsinc :

inline int ftoi_trunc(const float f)
  return _mm_cvtt_ss2si( _mm_set_ss( f ) );

Note that the similar _mm_cvt_ss2si (one t) conversion does banker rounding, but the "magic number" xs method is faster because it pipelines better, and because I'm building for x86 so the cvt stuff has to move the value from FPU to SSE. If you were building with arch:sse and all that, then obviously you should just use the cvt's. (but then you dramatically change the behavior of your floating point code by making it run through float temporaries all the time, instead of implicit long doubles like in x86).

So, that's the system I have now and I'm pretty happy with it. SSE for trunc, magic number for round, and no reliance on the FPU rounding mode or compiler settings, and they're both fast.

For completeness I'll include my versions of the alternatives :

#include < xmmintrin.h >

typedef unsigned __int64 uint64;

union DoubleAnd64
  uint64  i;
  double  d;

static const double floatutil_xs_doublemagic = (6755399441055744.0); // 2^52 * 1.5
static const double floatutil_xs_doublemagicdelta = (1.5e-8);                         //almost .5f = .5f + 1e^(number of exp bit)
static const double floatutil_xs_doublemagicroundeps = (0.5f - floatutil_xs_doublemagicdelta);       //almost .5f = .5f - 1e^(number of exp bit)

// ftoi_round : *banker* rounding!
inline int ftoi_round(const double val)
  DoubleAnd64 dunion;
  dunion.d = val + floatutil_xs_doublemagic;
  return (int) dunion.i; // just cast to grab the bottom bits

inline int ftoi_trunc(const float f)
  return _mm_cvtt_ss2si( _mm_set_ss( f ) );

inline int ftoi_round_sse(const float f)
  return _mm_cvt_ss2si( _mm_set_ss( f ) );

inline int ftoi_floor(const double val)
    return ftoi_round(val - floatutil_xs_doublemagicroundeps);

inline int ftoi_ceil(const double val)
    return ftoi_round(val + floatutil_xs_doublemagicroundeps);

// ftoi_trunc_xs = Sree's truncate
inline int ftoi_trunc_xs(const double val)
  return (val<0) ? ftoi_round(val+floatutil_xs_doublemagicroundeps) : 

BTW note the ceil and floor from Sree's XS stuff which are both quite handy and hard to do any other way. Note you might think that you can easily make ceil and floor yourself from the C-style trunc, but that's not true, remember floor is *down* even on negatives. In fact Sree's truncate is literally saying "is it negative ? then ceil, else floor".

Finally : if you're on a console where you have read-modify-write aliasing stall problems the union magic number trick is probably not good for you. But really on a console you're locked into a specific CPU that you completely control, so you should just directly use the right intrinsic for you.

AND : regardless of what you do, please make an ftoi() function of some kind and call that for your conversions, don't just cast. That way it's clear where where you're converting, it's easy to see and search for, it's easy to change the method, and if you use ftoi_trunc and ftoi_round like me it makes it clear what you wanted.

ASIDE : in fact I'm starting to think *all* casts should go through little helper functions to make them very obvious and clear. Two widgets I'm using to do casts are :

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to same_size_bit_cast( const t_fm & from )
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    return *( (const t_to *) &from );

// check_value_cast just does a static_cast and makes sure you didn't wreck the value
//  eg. for numeric casting to make sure the value fits in the new type
template < typename t_to, typename t_fm >
t_to check_value_cast( const t_fm & from )
    t_to to = static_cast< t_to >(from);
    ASSERT( static_cast< t_fm >(to) == from );
    return to;

intptr_t pointer_to_int(void * ptr)
    return (intptr_t) ptr;

void * int_to_pointer(intptr_t i)
    return (void *) i;

ADDENDUM : I'm told this version of same_size_bit_cast is better on various platforms. That's why we want it gathered up in one place so it can be easily changed ;)

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to same_size_bit_cast( const t_fm & from )
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
        t_fm    fm;
        t_to    to;
    } temp;
    temp.fm = from;
    return temp.to;

01-16-09 | push_macro & pop_macro

WTF MSVC has macro push & pop !? How did I not know this? It's so superior. It actually makes #defining new & delete actually possibly an okay option. (normally I get sucked into a hell of having to #undef them and redef them back to the right thing)

#define test 1


#pragma push_macro("test")

#undef test
#define test 2


#pragma pop_macro("test")


outputs : 1 , 2 , 1


BTW this demo used these tricks :

#define _Stringize( L )            #L
#define _DoMacro1( M, X )        M(X)

#define STRINGIZE(M)            _DoMacro1( _Stringize, M )
#define LINE_STRING                STRINGIZE( __LINE__ )

#define PRAGMA_MESSAGE(str)        message( __FILE__ "(" LINE_STRING ") : message: " str)

Of course I can't use it at work where multi-platform support is important, but I can use it at home where I don't give a flying fuck about things that don't work in MSVC and it makes life much easier.

01-16-09 | Automated NFL Bettor

I was thinking about some NFL betting this weekend and realized I never wrote about my automated NFL bettor and thought I should remedy that.

I was working on the Netflix prize stuff (see my post here then here ) and it occured to me that it's a very similar problem.

If you think about it in terms of the Netflix sparse matrix way of thinking, you have 32 x 32 teams , and each time they play fills out a slot in the matrix with the score delta of that game. The matrix is very sparse - there are lots of blank spots where teams haven't played each other (it's less than half full), and the challenge for an automatic bettor is to fill in the blanks - aka making a prediction for teams that haven't played yet.

Now, it should be obvious that you could use one of the Netflix methods right off the bat. For example "Collaborative Filtering" can be directly applied - it just means using the point delta of teams I've already played to predict the delta vs. a new team. So eg. if team A beat team B by +3 , and team B beat lost to team C by -7, then we predict team A will lose to team C by -4. You can also obviously use an SVD approach. Take the matrix of scores with blanks. Use SVD to find a low-rank generator that approximates the existing values, and makes predictions for the blanks.

Both of these techniques "work" but are terrible. The problem is the NFL scores have lots of randomness. That is, we fundamentally assume that if team A and team B play over and over the average score delta will converge to some number, but in the short term the actual score is drawn from some random source centered on the average. We generally assume that it's Gaussian, but in fact in the NFL it's very non-Gaussian, there are weird peaks at +3/-3 and +7/-7 and a hole at 0 (BTW this is where the "Wong Teaser" comes from - it exploits the fact that oddsmakers use Gaussian models but the distribution is not actually Gaussian).

So, we really need to use some kind of Bayesian or Hidden Model approach. That is, the score that we actually observed cannot be taken as the average expected score of those teams. Instead we must find a model for each team which maximizes the likelihood that we would observe the actually seen scores given that model ; this is an ML approach , and it's crucial that you also include a prior for regularization. Let me say that again in a more Bayesian way -

We saw some set of scores {S} ; there is some team model {M} and the model tells us the expected average score between teams; the model is our predictor. There's some probability that the Model M creates scores S : P(S|M) ; but that is not the right thing to maximize ! We want to find the model that maximizes P(M|S) which is :

P(M|S) = P(S|M) * P(M) / P(S)

Now P(S|M) is known, it's given by our model, but the inclusion of P(M) here is really crucial - Not all models are equally likely!

In particular, you need to make more extreme models less likely or you will get bad overfitting. The data is way too sparse and too random. The prior model should know that all the teams are really very close in strength and extreme score differences are not very reflective of future performance, especially when they're anomalies.

Most of the existing automatic NFL predictors use some kind of "power ranking". This just means assigning a single value to each team and predicting the score of A vs. B to be power(A) - power(B). You can fit these trivially using least squares and you have plenty of data, but that still sucks because of the variance issue. The good people who do this automatically use various forms of prior conditioning and regularization in the least squares fit. For example they use things like fitting to the square roots, which reduces the importance of large differences.

Dr K's Automatic Betting Talk (PPT) has a good summary of the classic ways of doing regularized power rankings.

Anyway I got into that and started doing some reading and realized I wasn't going to beat the stuff that people are already doing. There's also just really not enough data to make good fits in the NFL. I mean the "Power Ranking" is a really pathetic model if you think about it, it gives each team a single "strength" number so it has absolutely no way to model the style that the team plays and what kinds of strenghts and weaknesses it might have. An obvious idea would be to try to fit something like 5 numbers - {rushing offence, rushing defence, passing offence,rushing defence,home field advantage} - but you don't have nearly enough information to fit that.

If you look at The Prediction Tracker he's got aggregated lots of predictions from around the net including some that are formulated with good mathematical models. However, not one of them can reliably beat the spread.

For example : Dr K's preds and history he has a 134-125 record this year. That's actually remarkable, most people do worse. Assuming a very good 105 vig (often called a 105 bet in the NFL because you're putting up 105 to win 100), so a bet pays +1 on a win and -1.05 on a loss, his total profit is 134 - 125*1.05 = 2.75 ; yay he actually made money. If he bet $105 on every game all season he made $275 on the year. (BTW a lot of suckers make bets at 110 vigs which is much worse and pretty much impossible to beat).

But that is assuming that you make every bet. If you look at the automatic predictions, often they predict a score delta that's very close to the spread. Clearly you shouldn't make that bet, because the vig will mean you lose money on average. You should only place a bet when your confidence is greater than the vig. To break even you have to win 105/205 of the time = 51.2 %

So, I thought - is there any way to formulate a *strategy* for betting that gaurantees a profit? My idea was to go to The Prediction Tracker and grab all the predictions from various sources and create a decision tree that tells me whether to bet or not.

You can think about a decision tree as a sort of BSP tree on the space of the parameters. Each parameter (in this case all the predictors) are each a coordinate axis which defines a large dimensional space. In this case the axes are {LINE, SAG, PFZ, ARGH, KAM ... } it's about a 30 dimensional space. The value for each of those is the coordinate in that dimension, so we have specified a point in this big space. The decision tree cuts up the space into regions of {BET} and {NO BET} , basically a 1 or a 0. These spatial regions can capture a lot of rules. For example, it can capture rules like :

If abs( (average pred) - (line) ) < 1.0 
  then don't bet

If ( (average pred) - (pred sdev) ) <= line - 1.0 && ( (average pred) + (pred sdev) ) >= line + 1.0
  then don't bet


These are some rules that seem intuitive to me - if the pred is too close to the line, don't bet, or if the sdev of the various preds is too wide and near the line then don't bet. But the point is not to hard code rules like that, the decision tree finds them automatically and finds the right dividing values. It could also come up with rules that you may not expect, like it might find things like :

if SAG > 10 and L2 > 8 and LINE < 7 then
  do bet

We can train the decision tree based on past performance if we have past scores and past lines. We can make an optimal in-hindsight decision tree. Basically what you do is for every game in the past you make a point in space from all the parameters on that game; you give this point either a category of {bet} if the prediction was right, or {no bet} if the prediction was wrong. Then you try to put down splitting planes to chop up the space to separate the {bet} points from the {no bet} points. You hope that they are clustered together in pockets. BTW it's important to use the right cost function for tree building here because of the vig - it's not just the number of points miscategorized, there's a penalty of 105 for putting a {no bet} into {bet}, but only a penalty of 100 for putting a {bet} into {no bet}. That is, it's better to abstain from possible bets once in a while than it is to make a bet you shouldn't have. In fact I found better trees if I pretended I was betting at 110, eg. make the tree artificially more conservative.

In practice again you can't actually do this directly because you don't have enough history and the space is too big and sparse, but you can fix that in this case by collapsing dimensions. For example you can eliminate most of the predictors because only a few of them actually help, so instead of 30 dimensions you have 4. You can also make optimized linear combinations of predictors, and then make the decision tree on the combined predictors. Another option would be to use PCA to collapse the dimensions rigorously (note that we don't have the sparseness problem because we're doing PCA on all the prediction values, and we have a pred for each axis for every game each week).

Of course in testing your DT you need to do holdouts. eg. if you have 4 seasons of past data, hold out 100 games randomly chosen. Train the DT on the other games, and then test it on the 100 that you held out. That way you're not testing on the data you trained on.

With this approach I was able to find a meta-betting strategy that had a 55% success rate on the random holdouts tested. That's well over the 51.22% needed to break even. If you bet $105 with a 55% success rate you expect $7.75 of profit from each bet.

That's pretty good, but man sports betting sucks as a way to actually make money. You're putting up huge sums at near coinflips to make very small profits.

01-16-09 | Virtual Memory

So I just had kind of a weird issue that took me a while to figure out and I thought I'd write up what I learned so I have it somewhere. (BTW I wrote some stuff last year about VirtualAlloc and the zeroer.)

The problem was this Oodle bundler app I'm working on was running out of memory at around 1.4 GB of memory use. I've got 3 GB in my machine, I'm not dumb, etc. I looked into some things - possible virtual address space fragmentation? No. Eventually by trying various allocation patterns I figured it out :


On Windows XP all calls to VirtualAlloc get rounded up to the next multiple of 64k. Pages are 4k - and pages will actually be allocated to your process on 4k granularity - but the virtual address space is reserved in 64k chunks. I don't know if there's any fundamental good reason for this or if it's just a simplification for them to write a faster/smaller allocator because it only deals with big aligned chunks.

Anyway, my app happened to be allocating a ton of memory that was (64k + 4k) bytes (there was a texture that was exactly 64k bytes, and then a bit of header puts you into the next page, so the whole chunk was 68k). With VirtualAlloc that actually reserves two 64k pages, so you are wasting almost 50% of your virtual address space.

NOTE : that blank space you didn't get in the next page is just *gone*. If you do a VirtualQuery it tells you that your region is 68k bytes - not 128k. If you try to do a VirtualAlloc and specify an address in that range, it will fail. If you do all the 68k allocs you can until VirtualAlloc returns NULL, and then try some more 4k allocs - they will all fail. VirtualAlloc will never give you back the 60k bytes wasted on granularity.

The weird thing is there doesn't seem to be any counter for this. Here are the TaskMgr & Procexp reading meanings :

TaskMgr "Mem Usage" = Procexp "Working Set"

This is the amount of memory whose pages are actually allocated to your app. That means the pages have actually been touched! Note that pages from an allocated range may not all be assigned.

For example, if you VirtualAlloc a 128 MB range , but then only go and touch 64k of it - your "Mem Usage" will show 64k. Those pointer touches are essentially page faults which pull pages for you from the global zero'ed pool. The key thing that you may not be aware of is that even when you COMMIT the memory you have not actually got those pages yet - they are given to you on demand in a kind of "COW" pattern.

TaskMgr "VM Size" = Procexp "Private Bytes"

This is pretty simple - it's just the amount of virtual address space that's COMMITed for your app. This should equal to the total "Commit Charge" in the TaskMgr Performance view.

ProcExp "Virtual Size" =

This one had me confused a bit and seems to be undocumented anywhere. I tested and figured out that this is the amount of virtual address space RESERVED by your app, which is always >= COMMIT. BTW I'm not really sure why you would ever reserve mem and not commit it, or who exactly is doing that, maybe someone can fill in that gap.

Thus :

2GB >= "Virtual Size" >= "Private Bytes" >= "Working Set".

Okay, that's all cool. But none of those counters shows that you have actually taken all 2 GB of your address space through the VirtualAlloc granularity.

ADDENDUM : while I'm explaining mysteriously named counters, the "Page File Usage History" in Performance tab of task manager has absolutely nothing to do with page file. It's just your total "Commit Charge" (which recall the same as the "VM Size" or "Private Bytes"). Total Commit Charge is technically limited by the size of physical ram + the size of the paging file. (which BTW, should be zero - Windows runs much better with no paging file).

To be super clear I'll show you some code and what the numbers are at each step :

int main(int argc,char *argv[])


    vector<void *>  mems;
    #define MALLOC_SIZE     ((1<<16) + 4096)
    uint32 total = 0;
        void * ptr = VirtualAlloc( NULL, MALLOC_SIZE , MEM_RESERVE, PAGE_READWRITE );
        if ( ! ptr )
        total += MALLOC_SIZE;

    lprintf("press a key :\n");

This does a bunch of VirtualAlloc reserves with a stupid size. It prints :

press a key :

The ProcExp Performance tab shows :

Private Bytes : 2,372 K
Virtual Size : 1,116,736 K
Working Set : 916 K

Note we only got around 1.1 GB. If you change MALLOC_SIZE to be a clean power of two you should get all 2 GB.

Okay, so let's do the next part :

    for(int i=0;i < mems.size();i++)
        VirtualAlloc( mems[i], MALLOC_SIZE, MEM_COMMIT, PAGE_READWRITE );
    lprintf("press a key :\n");

We committed it so we now see :

Private Bytes : 1,112,200 K
Virtual Size : 1,116,736 K
Working Set : 2,948 K

(Our working set also grew - not sure why that happened, did Windows just alloc a whole bunch? It would appear so. It looks like roughly 128 bytes are needed for each commit).

Now let's actually make that memory get assigned to us. Note that it is implicity zero'ed, so you can read from it any time and pull a zero.

    for(int i=0;i < mems.size();i++)
        *( (char *) mems[i] ) = 1;

    lprintf("press a key :\n");

We now see :

Private Bytes : 1,112,200 K
Virtual Size : 1,116,736 K
Working Set : 68,296 K

Note that the Working Set is still way smaller than the Private Bytes because we have only actually been given one 4k page from each of the chunks that we allocated.

And wrap up :

    while( ! mems.empty() )
        VirtualFree( mems.back(), 0, MEM_RELEASE );

    lprintf("UseAllMemory done.\n");

    return 0;

For background now you can go read some good links about Windows Virtual memory :

Page table - Wikipedia - good intro/background
RAM, Virtual Memory, Pagefile and all that stuff
PAE and 3GB and AWE oh my...
Mark's Blog : Pushing the Limits of Windows Virtual Memory
Managing Virtual Memory in Win32
Chuck Walbourn Gamasutra 64 bit gaming
Brian Dessent - Re question high virtual memory usage
Tom's Hardware - My graphics card stole my memory !

I'm assuming you all basically know about virtual memory and so on. It kind of just hit me for the first time, however, that our problem now (in 32 bit aps) is the amount of virtal address space. Most of us have 3 or 4 GB of physical RAM for the first time in history, so you actually cannot use all your physical RAM - and in fact you'd be lucky to even use 2 GB of virtual address space.

Some issues you may not be aware of :

By default Windows apps get 2 GB of address space for user data and 2 GB is reserved for mapping to the kernel's memory. You can change that by putting /3GB in your boot.ini , and you must also set the LARGEADDRESSAWARE option in your linker. I tried this and it in fact worked just fine. On my 3 GB work system I was able to allocated 2.6 GB to my app. HOWEVER I was also able to easily crash my app by making the kernel run out of memory. /3GB means the kernel only gets 1 GB of address space and apparently something that I do requires a lot of kernel address space.

If you're running graphics, the AGP window is mirrored into your app's virtual address space. My card has 256MB and it's all mirrored, so as soon as I init D3D my memory use goes down by 256MB (well, actually more because of course D3D and the driver take memory too). There are 1GB cards out there now, but mapping that whole video mem seems insane, so they must not do that. Somebody who knows more about this should fill me in.

This is not even addressing the issue of the "memory hole" that device mapping to 32 bits may give you. Note that PAE could be used to map your devices above 4G so that you can get to the full 4G of memory, if you also turn that on in the BIOS, and your device drivers support it; apparently it's not recommended.

There's also the Address Windowing Extensions (AWE) stuff. I can't imagine a reason why any normal person would want to use that. If you're running on a 64-bit OS, just build 64-bit apps.

VirtualQuery tells me something about what's going on with granularity. It may not be obvious from the docs, but you can call VirtualQuery with *ANY* pointer. You can call VirtualQuery( rand() ) if you want to. It doesn't have to be a pointer to the base of an allocation range. From that pointer it gives you back the base of the allocation. My guess is that they do this by stepping back through buckets of size 64k. To make 2G of ram you need 32k chunks of 64k bytes. Each chunk has something like MEMORY_BASIC_INFORMATION, which is about 32 bytes. To hold 32k of those would take 1 MB. This is just pure guessing.

SetSystemFileCacheSize is interesting to me but I haven't explored it.

Oh, some people apparently have problems with DLL's that load to fixed addresses fragmenting virtual memory. It's an option in the DLL loader to specify a fixed virtual address. This is naughty but some people do it. This could make it impossible for you to get a nice big 1.5 GB virtual alloc or something. Apparently you can see the fixed address in the DLL using "dumpbin.exe" and you can modify it using "rebase.exe"

ADDENDUM : I found a bunch of links about /3GB and problems with Exchange Server fragmenting virtual address space. Most interestingly to me these links also have a lot of hints about the way the kernel manages the PTE's (Page Table Entries). The crashes I was getting with /3GB were most surely running out of PTE's ; apparently you can tell the OS to make more room for PTE's with the /USERVA flag. Read here :

The number of free page table entries is low, which can cause system instability
How to Configure the Paged Address Pool and System Page Table Entry Memory Areas
Exchange Server memory management with 3GB, USERVA and PAE
Clint Huffman's Windows Performance Blog Free System Page Table Entries (PTEs)

I found this GameFest talk by Chuck Walkbourn : Why Your Windows Game Won’t Run In 2,147,352,576 Bytes that covers some of these same issues. In particular he goes into detail about the AGP and memory mirroring and all that. Also in Vista with the new WDDM apparently you can make video-memory only resources that don't take any app virtual address space, so that's a pretty huge win.

BTW to be clear - the real virtual address pressure is in the tools. For Oodle, my problem is that to set up the paging for a region, I want to load the whole region, and it can easily be > 2 GB of content. Once I build the bundles and make paging units, then you page them in and out and you have nice low memory use. It just makes the tools much simpler if they can load the whole world and not worry about it. Obviously that will require 64 bit for big levels.

I'm starting to think of the PC platform as just a "big console". For a console you have maybe 10 GB of data, and you are paging that through 256 MB or 512 MB of memory. You have to be careful about memory use and paging units and so on. In the past we thought of the PC as "so much bigger" where you can be looser and not worry about hitting limits, but really the 2 GB Virtual Address Space limit is not much bigger (and in practice it's more like 1.5 GB). So you should think of the PC as have a "small" 1 GB of memory, and you're paging 20 GB of data through it.

01-15-09 | Urg

The Netflix Plug-In for MediaPortal has been torpedoed . WTF. Apparently Netflix is switching all Watch Now users to Silverlight. I guess a Silverlight install gives you a gauranteed VC-1 streamer which is pretty good and avoids the codec disasters of normal Windows media playing, which leads us to : Silverlight works - Netflix lays off employees .

Also, state liquor stores suck. Obviously the state likes it because they make money on it, but they could make more money with just a liquor tax and an open market, and it would be much better for the consumer. The state should not be involved in running retail stores. In fact, it occurs to me that there must be some corrupt distributor who's really the only person profiting from this arrangement; the prices for scotch in the Washington Liquor store are outrageous - they're basically full retail price ; wtf nobody buys anything at retail price in the real world.

Anyway, I picked up a bottle of the Macallan Cask Strength 15. Everything was >= $50 so I figured with the cask strength I can drink less and get a better deal. Freaking Famous Grouse was $50; I used to get it at TJ's in CA for $25; it's a great blended scotch for $25, but for $50 the only person it's pleasing is the corrupt distributor.

01-15-09 | Gay Sex

I came home early today to do some errands and work from home and miss the traffic. It was actually somewhat sunny today for the first time in weeks, so I thought I should take a walk and get some fresh air while the opportunity exists. So I went for a stroll around Cap Hill and decided to cut through Volunteer Park as I often do.

Whoah. Volunteer Park is an anonymous gay sex hookup spot. I've often seen cars parked in this one spot and guys sitting in them but I never really put it together until today. I'll give explicit directions so that all you closeted Seattle gays who want an anonymous hookup can go check it out :

In the middle of the park is a reservoir. A one way road runs through the park ringing the reservoir. If you drive this road around to the west side of the reservoir, so you're down the hill in a sort of neglected spot with no view and lots of trees, there are a bunch of cars just randomly parked. In these cars are single men waiting for the hookup. They sit there and wait, and young men approach the cars and they make some signal if they're interested. There's a path on the hillside to the east (near the reservoir) that goes up hill to the big open field with the stage. The stage has bathrooms, and the men go in there. After the deed is done they walk back to their cars.

I've seen little bits of this before, but it was this last walk that I saw today which finally made me put it all together. I was walking up the path by the reservoir and saw two guys come out of the bathroom; they didn't say anything to each other though they glanced at each other. One walked ahead down the path, the other just stood there awkwardly a second to let some space between them, and then walked down himself. They both had that awkward look like "I didn't just do anything".

01-14-09 | Allocator Alignment

I like it when allocations of size N are aligned to the next lowest power of 2 below N.

So eg. an allocation of 4000 bytes is aligned to 2048. It means you can do things like just malloc a 128-bit vector and it's aligned and you never have to worry about it. You never have to manually ask for alignment as long as the alignment you want is <= the size of your object (which it almost always is).

eg. if you want to malloc some MAT4x4 objects, you just do it and you know that they are aligned to sizeof(MAT4x4).

Is there any disadvantage to this? (eg. does it waste a lot of memory compared to more conservative alignment schemes?)

Also, I used to always do my malloc debug tracking "intrusively" , that is by sticking an info header at the front of the block and allocating and bigger piece for each alloc, then linking them together. The advantage of this is that it's very fast - when you free you just go to (ptr - sizeof(info_header)).

I think I am now convinced that that is the wrong way to do it. It's better to have a separate tracker which hashes from the pointer address to an info struct. The big advantage of the "non-intrusive" way like this is that it doesn't change the behavior of the allocator at all. So things like alignment aren't affected, and neither is cache usage or optimization issues (for example if you're using a GC-type arena allocator and adjacency of items is important to performance).

In general now I'm more eager for debugging and instrumentation schemes like this which have *zero* affect on the behavior of the core functionality, but basically just watch it from the outside.

(For example on consoles where you have 256M of memory in the real consoles and an extra 256M in the dev kits, it's ideal to make two separate allocators, one in the lower 256 where all your real data goes and one for the upper 256 where all your non-intrusive debug extra data goes; in practice this is a pain in the butt, but it is the ideal way to do things, so that you have the real final version of the game running all the time in the lower 256).

01-14-09 | Random Thoughts

If you go through life with the attitude that other people are fuckers and retards, you will be inevitably driven to the conclusion that you are also a fucker and retard, and be unhappy. If you go through life believing that other people are basically good and have things to offer, then you will be happy. Nonetheless, I cannot give up the conclusion that most people are fuckers and retards.

When I see a left turn signal on the car in front of me, that is a signal for me to abruptly change lanes into the right lane without looking, lest I be trapped behind him.

As I mentioned with TV, the ease of getting specific programs on demand spoils a lot of the fun. The same has happened with video games, they're so much easier now and give more instant gratification, it really reduces your level of attachment and the feeling of satisfaction. Still, given the choice, noone would ever choose the harder way.

It's depressing to me how many hot young girls are in porn these days. You think I would be happy about it, but I'm not. Let's do some numbers : I estimate around 100,000 new girls do porn every year. I know that seems high but I think it might actually be an underestimate if you think about the massive amount of obscure B-grade specialized porn that normal people like me aren't even really aware of. Most of those girls are in the LA area. The greater LA area has around 10 M people in it. Half are girls, so 5 M. Many of those are obviously too old or too young, so let's guess 1 M girls around the right age. Only about 1 in 10 girls is hot enough to do porn. That means 100% of girls of the right age and hotness in the LA area are in porn. 100%.

I work myself too hard and get quite stressed. People ask, "why do you get all stressed? you're working alone at your own pace, you should be chilling!?" Stress comes from the annoying managers and producers and all those jerks, right? Hello, I have to work with *me* ! I am incredibly stressful to work with, if you had to work with me you would be stressed.

If Stefan doesn't win this Top Chef I'm going to be very upset. It especially upsets me that I care.

Men who don't masturbate are dangerous psychopaths.

01-14-09 | Go Away

Go away Apple Software Update Mother Fuckers .

The next time I have to fill in my name and email to post on some fucking blog I'm just not gonna post. Curse you Wordpress, you don't deserve my thoughts. Remember my freaking info!

I've been buying lots of junk for the house to try to make myself happy. I hate when people whine about things and don't take action so I've been trying to take action about everything. I've bought - a couch, blackout drapes, massagey seat, light box, air filters, new blanket, shower head, toilet seat, butcher block, window insulation, computer desk, winter/water clothes, and probably some other things I'm not remembering. I can't say that any of it has made much of a difference. I mean all of this slightly higher comfort level does nothing for happiness really. I think I'd rather have my $5000 back.

01-13-09 | Stupid Linking

Why does Wikipedia put links on the most uninteresting and irrelevant words ?

01-13-09 | Strings

I just had another idea for strings that I think is rather appealing. I've ranted here before about refcounted strings and the suckitude of std::string and bstring and so on. Anyway, here's my new idea :

Mutable strings are basically a vector< char > like std::string or whatever. They go through a custom allocator which *never frees*. What that means is you can always just take a c_str char * off the string and hold onto it forever.

Thus the readable string is just char *, and you can store those in your hashes or whatever. Mutable string is a String thingy that supports operator += and whatnot, but you just hold those temporarily to do edits and then grab the char * out.

So the usage is that you always just pass around char *'s , your objects all store char *'s, nobody ever worries about who owns it and whether to free it, you can pass it across threads and not worry. To make strings you put a String on the stack and munge it all you want, then pull the char * out and rock with that.

Obviously this wastes memory, BUT in typical gamedev usage I think the waste is usually microscopic. I almost always just read const strings out of config files and then never edit them.

One exception that I'd like to handle better is frequently mutated strings. For example, you might have something in a spawner that does something like this :

for(int variant=0;variant < numVariants;variant++)
    // char name[80];
    // sprintf(name,"spawnmobj_%d",variant);
    String name("spawnmobj_");
    name += variant;


I don't love making names programatically like this, but lots of people do it and it's quite convenient, so it should be supported. With the model that I have proposed here, this would do allocs every time you spawn and memory use would increase forever. One way to fix this is to use a global string pool and merge duplicates at the time they are converted to char *. That way you don't every increase your memory use when you make strings you made before - only when you make new strings.

With the string pool model, the basic op becomes :

    const char * StringPool::GetPermanentPointer( String & str );

in which the 'str' is added to the pool (or an existing one is found), and the char * you get back will never go away.

ADDENDUM : to be clear, this is not intended as an optimization at all, it's simply a way to make the programming easy without being too retarded about crazy memory use. (eg. not just making String a char [512])

01-13-09 | Sad

One idea I had was to search for sports news about shoulder injuries and see what doctor they mention in the articles. While doing that search I found this lovely bit of cheer :

But if pitchers with torn labrums were horses, they'd be destroyed

We can't rebuild them. Dr. Anthony Tropiano, a top baseball arm doc, says the best available treatment option today is to do nothing. "We call it conservative treatment," he says, "but that's just a euphemism for a little rehab and a lot of prayer."

I'm like Garfield Minus Garfield .

I wish I was more like Dick Proenekke .

Peep Show is painful and amazing. Alissa made some comment once about how it's not very realistic, the guys are just too big of losers - real guys aren't really like that, are they? Um, yeah they are, they pretty much hit the two male extremes directly on the head. (I guess they're missing the retarded cocky swinger douchebag type, but if you count Johnson that's covered too). I don't think girls can ever appreciate just how hillarious and perfect Peep Show is; the little sad offhand comments they make to themselves hit that "oh dear god it's just like me" comedy value. I especially love the subtle way that Mark (Mitchell) will say something really pathetic to himself but with that slight inflection in the voice that indicates its really what he wants but isn't willing to fully admit it even to himself. Anyhoo, Season 5 is now available via torrents so go get it.

New BBC I've just got into :

Black Books - mmm it's a little bit broad for my taste (definitely reminds me a bit of the "Are You Being Served" style of British comedy), but not bad, definitely watchable, I've seen 3 episodes so far and will stick with it. The bit where the one guy absorbs the Book of Calm and becomes like jesus was pretty damn good (though rather similar to the Simpsons where Homer becomes like god). ADDENDUM : meh, I take back all the negatives, it's quite charming; I especially appreciate it while guzzling cheap wine.

The Mighty Boosh - pretty hammy and sometimes misses the mark (the whole Zoo owner character does nothing for me), but the two main guys are pretty good (particularly Julian Barratt as Howard Moon). The first episode with the Kangaroo Boxing was fucking gold, it literally had me screaming with laughter, the whole training sequences is money. The next two episodes have been kind of meh. I like the whole surrealism and muppet-show-esque style, but the song and dance numbers are pretty weak and Fringe-Festival-esque in a bad way.

Nathan Barley - mmm, I get that they're spoofing someone that's annoying, but it's still just annoying. It sort of reminds me of SNL-style humor, like "god look at how loud and annoying and awful these characters are!" ; umm, yes, but that's just loud and awful and annoying.

BTW I have noticed that downloading seasons of TV and watching them at my leisure does sort of ruin the magic of TV. In the old days, you had this crap-box that spewed out nonsense and filth, and once in a rare while you could turn to something that was surprising and actually really great. It was exciting. You would wait for your favorite show to be on at a specific time, or you might have the joy of flipping around and stumbling on something great. Now you just queue up what you want and play it. It kills the joy of the hunt, the surprise, and in the end the actual content isn't strong enough to carry itself. Also when you have the full season it's tempting to sit and watch one after another, but most TV shows just lose their impact when you get used to them, because they're very repetitive - they're much better if you only watch one episode and then wait a week to watch the next.

This is my TV "todo" list :

The Storyteller
Getaway in Stockholm
Boston Legal
West Wing
Larry Sanders
Band of Brothers
Foyle's War
Berlin Aleksanderplatz

01-12-09 | LCD Monitors and Such

I've got a Samsung SyncMaster 2443BW at work. It's much brighter and higher contrast than the Dell 2405 I have at home. More importantly for me - it autoadjusts *way* faster and stays in adjustment much better (this is autoadjustment to an analog VGA source which I know is not that interesting to anyone but me any more). The Dell takes a long time to autoadjust, and it falls out of sync pretty quickly, even after a few hours of use the edges will get shimmery and I have to do it again. This actually is more significant than you might think, because when doing D3D and going fullscreen to different modes, it causes an autoadjust each time, which is just excruciating on the Dell but not bad on the Samsung.

I'm a bit disappointed in the Dell 30" at work ; I thought it was the creme de la creme of monitors, but the brightness & contrast are not very good and it also has a horrible squeal when you turn it up to full brightness. I did some searching on the internet and apparently this is a semi-common problem with LCD's in the power converter. Various models of LCD seem to have squeals at different brightness levels; supposedly some will squeal in mid ranges of brightness. I do tend to drive my monitors at higher brightness than usual, because I like my office to be super brightly lit.

In other consumer review news, I bought one of those $200 "sun frequency" flourescent lights that are supposed to be good for "SAD" (the winter weather sucks syndrome). Holy crap is it a piece of shit. It's basically a $10 standard flourest bulb stuck in a cheap plastic case. Those guys must be rolling in profits.

It's retarded that windows doesn't cache network dir listings. I'm using Hamachi to get to my home music collection from work, which is mostly fine, but the root music dir has like 5000 folders in it. It takes a good 60 seconds or so to get the dir listing. That's fine as long I remember to never double-click on a folder to browse into it, because if I do I lose the whole fucking listing of the big parent dir. They could easily store some kind of recent communication cache and do a simple ACK thing where my side says "hey you sent me this at time X can I just use what you sent before?" and the music server would go "yep, it's the same" and my side goes "cool bra".

In other music retardedness, WinAmp (and most music players) only have two options : 1. Read all song info at the time the songs are added to playlist, or 2. Read song info when it's played. BOTH of these are fucking retarded. If you choose #1 it makes song additions stall out horribly over a slow network. If you choose #2 it's even worse - song adding is now fast, but when you play through a playlist it ruins continuity and does a hard chunk when it switches songs. Of course the right answer is obvious - start reading song info when you add to playlist, but do it on a low priority background thread so it doesn't cause any hitches.

01-10-09 | Seattle Living

The weird couple across from us finally took out their Christmas tree a few days ago after it was thoroughly dead. Of course they just dragged it down the stairs and through the hall, so that it literally dropped every one of its needles. And of course they didn't clean up. Though they DID actually load the tree on their car and take it to the dump or something rather than just leave it on the street. They're a weird uptight couple who has some cracked-up ideas about responsibility. What they should do is just fucking leave the tree on the street and clean up the damn hall. Anyway, that's a minor annoyance, I don't mind the sound of crunching pine needles as I walk.

Upstairs neighbor is playing furniture moving today. I think he might be practicing for the World's Strongest Man competition, you know lifting furniture over his head and then dropping it on the floor. Sometimes I think he's bowling up there. He amazing seems to both wake up at 6 AM and stay up until 2 AM. Most of the 2 AM and 3 AM furniture dragging has stopped since I talked to him, but last week he woke me up at 2 AM with some dragging and dropping. The dude is like a hundred pounds but he stomps like Godzilla.

The house nextdoor to us has this outside light that comes on at night ; it shines on my bedroom window right next to my head and it's literally brighter than the sun here (not hard to do since the sun is basically never visible). I go from bleak dark gray day to a night time of slams and bangs and drags and a bright light in my face in bed. My body's clock is running upside down and backward.

The traffic last week was some of the worst I've seen anywhere ever, largely because of lots of flooding and road closures related to the weather. It took me about an hour to get home several times - and that was leaving work at 3:00 to try to avoid the worst of it. Commuting in the rain is just excruciating. It's hard to see, people are driving like fuck-tards as usual, they do ass-licking moves like slam on the brakes for no reason or drive right beside me for no reason so I'm constantly stressed out trying to get away from people. Some of the days I would be 10 minutes into the trip and it would be all dark and rainy and trafficy out and knowing that was another 30+ minutes of this would just make me want to cry. It's the knowing you have to just sit and get through it that kills me. And then every single night of every single day I know I have to get up in the morning and do it again. Anyway, traffic and commuting in the rain can blow me.

I made a huge mistake deciding to live on the west side, apparently, I should've rented a house right near work. I knew if I did that I would never go out and I would be bored, but I never go out anyway and I'm just miserable commuting.

I hate the way I wake up with all kinds of ideas of stuff to do in the code, feeling excited, and by the time I get to work my brain is full of nothing but "fuck you, fucking retard, god damn it, fuck fuck, stupid dick get off the fucking road, jesus christ you fucking retard". It's really not a healthy state to be in and it kills all my enthusiasm for the day. One of the main times I find myself screaming here is any time there's a merge. My god I have never seen a population which is so collectively bad at merging. People will literally come to a complete stop on freeway onramps right next to the traffic. Not only does this completely fuck up the flow of the right lane and make the whole merge traffic scenario much worse, god help you if you're stuck behind one of these retards - you find yourself at a complete stop on the onramp. I'm now leaving tons and tons of space in front of me when I get on onramps - like I will actually just come to a stop at the beginning of the onramp and wait for the fuck-head in front of me to get on the freeway before I even start my approach, so that I can be sure I can do it smoothly. Still I often find myself yelling "fucking merge! don't fucking brake you fucking cock, don't you dare fucking stop on the onramp!".

Some days I feel like it's all I can do to shower, eat breakfast, get dressed, commute, eat lunch, try to stretch a little or at least go for a walk, shave, commute home, eat dinner. Ugh. That was an exhausting day. Oh wait, I need to actually get some work done too !? And of course there's a million other fucking things on my todo list that just all seem like too much like exercise, fix shoulder, eye doc, need dentist, shave, clean house, practice guitar, be nice to girlfriend, etc. Fuck that shit.

I literally don't understand how all the stupid suckers out there do it. Most people live in disgusting housing out in the fucking styx and commute for hours in shitty cars, do horrible jobs like wash dishes or telemarket, commute again, they have to work sucker hours that puts them right in rush hour, they eat some shitty frozen dinner, they're fat and ugly and have no talents, what the fuck are you living for? how do you do it? My god.

I have a very thin connection to the ordinary way of life. Like if I had to be a factory worker in the industrial revolution or something like that I would either kill my self or become an outlaw or a gambler or move to the South Seas and become a pirate or something. I've never believed in the "honor" of slogging through a miserable life. It's just stupid and weak, it's not "respectable" to muddle through a miserable job for 40 years.

(I think I've picked up saying "literally" way too much from David Mitchell. He can literally just add the word "literally" to anything and it becomes funny).

ADDENDUM : ARG! WTF ARE YOU DOING! Quit dragging shit around on the floor above me and dropping shit and stomping around. My fucking god I'm going to stab you with a spoon!

Some times late at night I like to check the traffic flow map page just so I can see it be all green. MMmmm green.

01-10-09 | Simple Image

libPNG is such overcomplicated shizzle. I could easily make a little image format that was just BMP with a LZ that was like maybe 1000 lines of code and just put it in a header, STB style. Hell I could toss in a lossy wavelet image coder for another 1000 lines of code. It wouldn't be the best in the world, but it would be good enough and super simple. Unfortunately I guess there's no point cuz it's not a standard and whatnot. (Using arithcoding instead of Huffman is part of what could make both of those so easy to write).

WIC is obviously a good thing - it's retarded that every app has its own image readers & writers (I mean fucking Amiga OS did this perfectly back in 1892 so get with the fucking last century bitches). On the other hand, the fact that it's closed source and MS-only and in fact requires .NET 3.5 makes it pretty much ass.

A simple open-source multi-platform generic image IO library that's based on pluggable components and supports every format is such an obvious thing that we should have. It should be super simple C. You shouldn't have to recompile apps to get new plug-ins, so it should be DLL's in a dir in Windows and whatever similar thing on other OS'es. (but you should also easily be able to just compile all the plug-ins into your app as a static lib if you want to do that).

One thing that mucks it up is that many of the image formats allow all kinds of complicated nonsense in them, and if you want to support all that then your API starts getting really complicated and ugly. Personally I'm inclined to only support rectangular bit-plane images (eg. N components, each component is B bits, and they are interleaved in one of a handful of simple ways, like ABCABC or AAABBBCCC ).

All compressors unfortunately have this problem that they start off super simple but become huge messes when you add lots of features.

ADDENDUM : Oh fucking cock hole. I put a PNG loader in my image thing and now all my apps depend on libPNG.dll and zlib.dll , so I have to distribute those with every damn exe, and worry about dll's being in path and so on. They also cause failures at startup if they're not found, when really they should just fail if I actually try to load a PNG. (of course I could do that by doing LoadLibrary by hand and querying for the functions, but I have to call like 100 functions to load a PNG so doing that would be a pain). Urg bother.

01-09-09 | Image Stuff

Most of this is related to RAW. The GUILLERMO LUIJK stuff in particular is very good. dcraw seems to be the best freeware raw importer, but god help you working with that. UFRaw and LibRaw are conversions of dcraw into more usable forms, though they tend to lag his updates. I've given up on WIC because I can't get it (the new Windows SDK) to install on all my machines.

The commercial RAW processors that I've looked at are so freaking slow, this is definitely a problem that could use some bad ass optimizer programmer love. Hell even just the viewers are slow as balls.

ImageMagick is pretty cool BTW ; it's really similar to the old DOS program "Image Alchemy" which I used to use lots in the 386 days. It's all command line so you can set up batch files to do the processing you want on various images.

DCRAW and mods :

Dave Coffin's Home Page
About LibRaw LibRaw
UFRaw - Home
RAWHide Image Converter
RAW decoder comparison dcraw vs. dcraw_ahd vs. Bibble

Good articles on photos :

GUILLERMO LUIJK  -  b i t s & p h o t o g r a p h y
perfectRAW 0.65 Buscando la perfección
Photographic Tone Reproduction

Windows Imaging Components (WIC) :

How to Write a WIC-Enabled CODEC and Get Full Platform Support for Your Image Format
Windows with C++ Decoding Windows Vista Icons with WIC
Windows Imaging Component Overview
WIC-Enabled Codecs C++ Tutorial Loading an Image File
Canon download WIC codec

Other MS junk :

Microsoft Research Image Composite Editor (ICE)
Microsoft Professional Photography SyncToy v2.0
Microsoft Professional Photography Downloads RAW SyncToy ProPhoto Shoot
Microsoft Professional Photography Codecs

Other misc image stuff :

LibTIFF - TIFF Library and Utilities
ImageMagick Convert, Edit, and Compose Images
Framewave Project Homepage
FastStone Image Viewer - Powerful and Intuitive Photo Viewer, Editor and Batch Converter

DNG (Adobe Digital Negative) :

DNG specification
DNG ProfilesEditor - Adobe Labs
Adobe - Digital Negative (DNG)

01-09-09 | Dump

My left shoulder is fucked up; I'm not sure what re-aggravated it, but it's very painful inside, and I can't put any pressure on it. (I can't lie on my left side any more when I sleep, which means I pretty much have to lie on my right side all night, which is pretty rough). I'm pretty sure I tore my labrum in my left shoulder last year, and of course the right side has never healed right from my old bike crash. SIGH. I really need to work out. I need sun and running around and sweating and muscles and sex and dancing and booze and cooking and nature.

So I'm thinking about going back to a doctor, but I dunno I have almost zero hope that they'll actually do anything good. I think I could maybe use some arthroscopic surgery to shave off scar tissue in the shoulder sockets or something like that, but I have no confidence in any doctor and no way to look up their performance records. If I could spend some amount of money and be reasonably sure something would be done, I would gladly do it, but I hate the idea of spending a fortune on more MRI's and getting nothing out of it.

I'm really bothered by the whole pay model of contractors, doctors, mechanics, etc. Basically the *worse* job they do, the more you pay, because you have to keep going back, or they have to take a long time. Somebody who just gets the job done quickly the right way gets paid the least. That's fucked. The pay should be based on the *results*. They should have to give you an estimate for the results you want, and then if there are complications that's too bad for them, they have to deal with it and do whatever is necessary to get it done.

I started looking at the Orthopedic Surgeons around Seattle, but FUCK they just give you zero information. As a consumer trying to find a good doctor I have no resources. I don't want somebody who works with old people to just give them back basic function, I want someone who's a world expert on shoulders, and who works with athletes to restore them to 100% function. I don't want someone who will manage the problem, but someone who will fix it.

Anagram Hall of Fame ; some gold in there. Anagrams for me :

Balms Cool Her  (meh)

Lab Color Mesh  (a decent summary of my life's work)

Blah Colors Me  (my personality)

Local Hombres   (meh)

Balls Moocher   (yep)

Call Bro Homes  (advice for New Yorkers moving to East LA)

lots of Google features they keep secret ; most of these are just Calculator and not interesting, but stuff like flight status is a custom feature that's almost like an "easter egg" this shit is so secret.

Glenn Marshall has some okay semi-algorithmically generated videos . SubBlue has some cool generated images.

Cool animations of sorting algorithms also : H.W. Lang 's sort pages have much better algorithm descriptions and also cool visualization applets at the bottom of each page. Though you should ignore the text in both of these and just look at the pictures. And just use "Introsort" becuase it's so fucking bad ass.

Neave Strobe - "...like dropping acid, but not".

01-09-09 | LRB Video

I'm kind of excited about the possibilities for video compression with Larrabee. The chip is very powerful and flexible and obviously well suited to video. Having that much power in the decoder would let you do things that are impossible today - mainly doing motion-comp on the decoder side. That lets you acheive the old dream of basically not sending motion vectors at all, (or just sending corrections from that's predicted).

In fact, it would let you send video just more like a normal predictive context coder. For each pixel, you predict a probability for each value. That probability is done with context matching, curve fitting, motion compensation etc. It has to be reproduced in the decoder. This is a basic context coder. These kind of coders take a lot of CPU power, but are actually much simpler conceptually and architecturally than something like H264. Basically you are just doing a kind of Model-Coder paradigm thing which is very well understood. You use an arithmetic coder, so your goal is just to make more accurate probabilities in your model.

Other slightly less ambitious possibilities are just using things like 3d directional wavelets for the transform. Again you're eliminating the traditional "mocomp" step but building it into your transform instead.

Another possibility is to do true per-pixel optical flow, and frame to frame step the pixels forward along the flow lines like an incompressible fluid (eg. not only do the colors follow the flow, but so do their velocities). Then of course you also send deltas.

Unfortunately this is all a little bit pointless because no other architecture is anywhere close to as flexible and powerful, so you would be making a video format that can only be played back on LRB. There's also the issue that we're getting to the point where H264 is "good enough" in the sense that you can do HD video at near-lossless quality, and the files may be bigger than you'd like, but disks keep getting bigger and cheaper so who cares.

01-07-09 | Direct X WTF

The DX9 / DX10 issue is a pain in the butt. I guess it's kind of nice that they wiped the slate clean, but it means that you have to support both for the foreseeable future, and they're very different. For a sweet little while now you've been able to get away with just supporting Dx8 and then just supporting Dx9 , which sure is nice for a small indie game developer.

Now every app has to try both and support both and fuck all.

I *really* can't believe they changed the half pixel convention. I know a lot of people think it was "wrong" and now its "right" , but IMO there is no right or wrong, it's just a fucking convention (it was a small pain that GL and D3D did it differently). The only thing that matters in a convention is that you pick one way to do it, document it clearly, and then *keep it the same* WTF hyachacha.

Rasterization rules DX9

Rasterization rules DX10

Anyway, I guess it's about time I look into the improved DX10 resource management capabilities. In particular, the ability to do resource updates on a thread, so that hopefully all that horrible driver resource swizzling will be done off the main thread, and secondly better support for subresource updates (which theoretically lets you page mips of textures and such).

Mmm update : apparently the good multithreaded resource management isn't until DX11 :(

And I'm a little unclear whether DX10 for XP even really works? The real features seem to require Vista (?). Mmm.. seems like it's not worth the trouble, between XP and not having hardware that will run DX10 at home, I give up.

I also had the joy of discovering that to even compile stuff without the new Platform SDK you have to do this :

// CB : VS2003 patches :
#if 1
#define __in
#define __out
#define __inout
#define __out_bcount_opt(val)
#define __in_bcount_opt(val)
#define __in_opt
#define __out_opt
#define __inout_opt
#define __in_ecount(val)
#define __in_ecount_opt(val)
#define __out_ecount(val)
#define __out_ecount_opt(val)

Yay. Well, that was a waste of time. It sucks cuz texture updates take for freaking ever in Dx9 and lower.

Maybe we can just ignore Vista and DX10 and just wait for Windows 7 and DX11. At that point to just build a "hello world" app you'll have to install the "Managed Hello World Extensions" which is a 2 GB package.

I've had a very disappointing night of not finding disappeared video games.

First off I found that Paul Steed's masterwork Mojo Master was "dicontinued" by Wild Tangent. Waah.

Now I just learned that Shizmoo and Kung Fu Chess are GONE !? WTF !? I was thinking about dusting off my KFC skills, but I guess I won't.

I had an idea for "Stratego Chess" - meh it's pretty self explanatory.

"DownLoadThemAll" has the option to set the file's modtime to the server's time for the file. This option is ON by default. That's so fucking retarded. I download papers, and then go to the direct and sort by "last modified" and expect it to be at the bottom ... NO! It's got a date from fucking 2002 or something. Modtime should always be the last modtime *on my machine*. One thing that I've always been unsure about is whether "copy" should copy the modtime data or whether it should make the dest's modtime now - by default it moves the modtime, which is another reason why just using modtime is not strong enough for Oodle - when somebody copies new data on top of an existing file, the modtime can actually go *backward*. Oodle with file hashes now sees that the modtime has changed and updates the file hash.

01-07-09 | Oodle Rambling

One of the hard things with turning resources into bundles is that it's hard to define an exact metric to optimize. Generally with computers if you can come up with a simple fast measure of whether a certain configuration is best, then you can through various techniques at it and do well under that metric (see for example all the work on the Surface Area Heuristic for geometry).

Anyway, the problem with bundling is the optimal setup depends very much on how they're used.

If you just do a standard "load everything and immediately stall" kind of whole level loading, then it's pretty easy. In fact the optimal is just to jam everything in the level into one bundle.

On the other hand, if you actually do paging and drop bundles in and out to reduce memory load, it's harder. Finest grain paging minimizes your memory use, because it means you never hold a single object in memory that isn't needed. In that case, again its easy to make optimal bundles - you just merge together resources which are always either loaded or not loaded at the same time (eg. the list of "sets" which want them is identical).

More generally, you might load a big chunk for your level, then page pieces. To optimize for that you want the level load to be a big fast chunk, then the rest to be in pages. Also, you might not want the pages to be all finest possible grain, if you have a little memory to waste you probably want to merge some of them up to reduce seeks, at least to merge up many very small resources together.

One of the main ways to make bundles for Oodle is just to let Oodle watch your data loads and let it make the bundles for you. (if you don't like that, you can always completely manually specify what resource goes in which bundle). In order to make it possible for Oodle to figure out a good bundling you can also push & pop markers on a stack to define "sets" and maybe also do some tagging.

One thing that's occurred to me is that even the basic idea of making bundles to just load the data fast is dependent on how you use it. In particular, when you start trying to use the resource, and whether they always work as a group, etc.

For the straight loading path, I believe the key feature to optimize is the amount of time that the main thread spends waiting for resources; eg. make that time as small as possible. That's not the same as maximizing throughput or minimizing latency - it depends on the usage pattern.

For example :

Case 1.

Request A,B,C
... do stuff ...
Block on {A,B,C}
/Process {A,B,C}

Case 2.

Request A,B,C
... do stuff ...
Block on A
Process A
Block on B
Process B
Block on C
Process C
These are similar load paths, but they're actually very different. In Case 1 the bundle optimizer should try to make {ABC} all available as soon as possible. That means they should be blocked together as one unit to ensure there's no seek between them, and there's no need to make them available one by one. In Case 2 you should again try to make ABC linear to avoid seeks, but really the most important thing is to make A available as quickly as posible, because if it there is some time needed to get B it will be somewhat hidden by processing A.

Anyway, I'm kind of rambling and I'm not happy with all of this.

I've got a lot of complication currently because I'm trying to support a lot of different usages. One of those usages that I've been trying to support is the "flat load".

"Float Load" = a game resource compiler knows how to write the bits to disk exactly the way that the game wants them in memory. This lets you load up the resource and just point directly at it (maybe fix some pointers internal to the resource, but ideally not). This was a big deal back in the old days; it allows "zero CPU use" streaming - you just fire an async disk op, then when it's done you point at it. We did this on Xbox 1 for Stranger and it was crucial to being able to seamlessly page so much of the game.

This is obviously the super fast awesome way to load resources, but I don't think that many people actually do this any more. And it's less and less important all the time. For one thing, almost everybody is compressing data now to have better bandwidth, so the "flat load" is a bit of a myth - you're actually streaming the data through a decompressor. Almost every system now is multi-core, both on PC's and consoles, and people can afford to devote maybe 25% of one core to loading work, so the necessity of having "zero CPU use" streaming which the Flat Load offers is going away.

Anyway, the reason the Flat Load is so complicated is because it means I have to support the case that the application is pointing into the bundle memory. That means a lot of things. For systems with "cpu" and "gpu" separate memory regions, it means I have to load the pieces of the bundle into the right regions. It means I need to communicate with the app to know when it's okay for me to free that memory.

It also creates a huge issue for Bundle unloading and paging because you have to deal with the "resource transfer" issue. I won't even get into this, but it's the source of much ridicule from Casey and the cause of some of our biggest pain at Oddworld.

Anyway, I think I might get rid of the "Flat Load" completely and just assume that my job is just to page in the resource bits, and the game will spin on this bits, and then I can throw them away. That lets me make things a lot simpler at the low level, which would let me make the high level easier to use and neater.

I think my selling point will really be in all the neat friendly high level tools, like the load profiler, memory use visualizer, disk-watcher for demand loading, console file transfers, all that kind of jazz.


"Data Streaming" = Bringing in bits of data incrementally and processing or showing the bits as they come (eg. not just waiting until all the data is in before it is shown). eg. Videos "stream".

"Data Paging" = Bringing in and out bits of data based on what's needed. When you run around a big seamless world game like GTA it is "paging" in objects - NOT STREAMING.

01-06-09 | Poker Blue Balls

ZOMG ZOMG I'm so excited for this :

Ivey accepts Durrr Challenge (article)
Tuesdays with Ivey (podcast)

Durrr is the biggest online winner ever, a young kid, 2+2er, phenom. He plays nosebleed hold'em and PLO, mainly short-handed; he plays Ivey and Antonius and such all the time, and they weren't giving him as much action as he wanted, so he challenged them to play 50k hands, with a sidebet of $1.5 million : $0.5 million to encourage them to play. That's so insanely bad ass. BTW Durrr is a freak and a very weird dude.

ADDENDUM : Antonius accepts too .

From what I can tell, Antonius plays a pretty standard TAG style, mixes up his ranges, is very good, and is very aggressive and not afraid.

Ivey plays tighter than many of the high stakes regs; when he gets in big pots he seems to be either bluffing wisely or have the goods. He does do some slightly weird things that almost noone else does, like he almost doesn't seem to play by "ranges" at all, at times it seems like he's playing no-look poker.

Durrrr is one of the weirdest good poker players I've ever seen. He plays super super LAG , his image is just nutty. He plays more big pots than anyone (except people who are just clearly nuts like Ziig). He can bluff huge in almost any spot, he can bet huge for value in any spot - even with super weak hands like a pair of 3's or ace high , and he can call down huge light with almost nothing.

Durrr's style freaks out a lot of people and they don't know how to play against it (god knows I wouldn't). A lot of people try to just "trap him" because he's "so nutty" , but the whole thing about Durrr is he hand reads well, so he knows when you have a hand that's trying to trap (so he doesn't bluff) and he knows when you are weak. He plays super weird in the early rounds - he can do things like limp the button preflop which normal people never do, he will often check three streets out of position, which is again very odd, but he plays really well on the river when the pot is big.

Durrr is like the epitome of something I wrote about long long ago re: poker -

If you are a better player, then you want to play lots of big pots all the time.  Say you have a 1% edge in each pot, your profit is
maximized by playing huge pots all the time.

(assuming you're betting roughly pot size all the time) - the early decisions are in small pots so not as important, the really
key decisions are in later streets when he pot is huge.

On early streets, you want to be jamming the pot as often as possible, because that builds the pot for later streets, and lets you
make a big decision in a larger pot, which means a bigger win for you.

In other news, I kind of want a Snuggie .

Blankets are OK but they can slip and slide, plus your hands are trapped inside.

I can't believe fucking MSI's still can't be run from substed drives, I have to copy them to c:. WTF

To build UFRaw you just need to install Cygwin and these packages :

    * gimp-dev (at least version 2.2)
    * gtk+-dev (at least version 2.6)
    * glib-dev
    * pango-dev
    * atk
    * gettext
    * libiconvi
    * liblcms (at least version 1.13)
    * libjpeg
    * libtiff
    * libpng
    * zlib
    * libbz2
    * exiv2 (at list version 0.11)
    * gtkimageview (at list version 1.3)

No problem!

I'm trying to get some basic RAW image support for my image tools. All I want is to be able to read & write some simple standard RAW format, and then I could use other tools to convert my camera's raws to the standard format. Ummm.... yeah...

So, there seems to be no actual standard RAW format. And the ones that do exist (like DNG, HD Photo, etc.) are all incredibly complicated. Many of the camera RAWs are based on TIFF which was a fucking retarded thing to do. TIFF is based on LZW which is like the worst compression algorithm ever. It sucks at compressing, it's complex to implement, and it runs very slow. Something like 16-bit PNG would have been totally fine.

ADDENDUM : BTW one of the retarded things about this is that if everyone just used a bitplane based embedded image format (like JPEG2000 or any other bitplane based wavelet) then all this HD bit depth nonsense would be a non-issue. Boom you just send more bits. The compression is also invariant under just scaling up all your values to more bits which is a good kind of invariance to have in an image format.

Also, I tried using Adobe's DNG Converter to convert the raw CR2's that Aaron sent to me. It seemed to work - except that on two of the images it just mysteriously produced zero byte output files. Yay.

There's no DNG WIC codec for Windows XP or Vista 64 !? WTF. DNG is supposed to be the most standard RAW format, and WIC is supposed to be the good new way to work with image formats, and yet the two do not meet (there is a free Adobe DNG WIC codec for Vista 32, and a commercial one from Ardfry for the other platforms).

Actually the whole WIC situation is a disaster; I'm finding it really hard to get a setup that can even use WIC's in Windows XP, much less develop code for them. (part of the problem is I'm stuck on Visual Studio .NET and the freaking new "Windows SDK" crashes if you try to install it without Vistual Studio 2008).

Most of the free raw code out there is based on dcraw which is a horrible UNIXy plain C monstrosity that has usage similar to the old "cjpeg" type of stuff.

dcraw also contains this gem :

Why don't you implement dcraw as a library?

I have decided that dcraw shall be a command-line program written in C,
and that any further abstraction layers must be added around this core,
not inside it.

Library code is ugly because it cannot use global variables. Libraries
are more difficult to modify, build, install, and test than standalone
programs, and so are inappropriate for file formats that change every

There's a simpler way to make dcraw modular and thread-safe: Run it as a
separate process. Eric Raymond explains this technique here. 

Aka "screw you guys, I gave you awesome free code, take it the way it is and quit your whining".

01-05-09 | Seattle Food

DAMN ! I've been wanting to go La Spiga for a while, and in fact we wanted to go last night but they're closed on Sunday (like so many places; it's annoying how so much is closed Sunday here, it's one of my favorite nights to go out to eat). Today I pop on the Iron Chef on the TiVo, and lo... it's fucking Chef Tinsley from La Spiga. Now they're going to be all popular. Sigh.

Anyway, both her performance and Flay's on the Bean Battle were very poor. @@@ ...

Mitchell & Webb is probably the best show currently not available in the US.

01-05-09 | Paging

Somebody sent me this paper :

"I3D09 A Novel Page-Based Data Structure for Interactive Walkthroughs.pdf"

a little while ago. While the basic ideas in it are totally fine, but they page on *4K* granularity. That's so completely retarded that it invalidates the whole paper.

There's a natural "critical size" that you can figure for paging which is the only possible combination of your fundamental variables. (this is sort of a physics style argument - there's only one way you can combine your variables and get the right units, therefore it must be right answer, up to a constant multiplier).

Your two variables inherent to the system are seek time and throughput speed of your disk. You combine them thusly :

typical hard disk :

seek time 10 ms
load speed 50 MB/sec

10 ms = X / (50 MB/sec)

10 /1000 sec = X  sec / (50 MB

10*50 /1000 MB = X

X = 0.5 MB

typical DVD :

seek time 100 ms
load speed 5 MB/sec

X = 100*5/1000 MB = 0.5 MB

Pleasantly, for both DVD and HD paging the critical paging unit size is close to the same !! Note that I'm not saying 0.5 MB is actually the ideal size, there may be some constant on there, so maybe it's 0.1 MB or 1.0 MB , but it's certainly somewhere close to that range (the exact optimum depends on a lot of things like how important latency vs. throughput is to you, etc).

Of course, Solid State Disks change that in the future. They are really just like an extra level of slower RAM, and they make fine-scale paging practical. The possibilities for infinite geometry & infinite texture are very exciting with SSD's. Unfortunately you won't be able to target that mainstream for a long time.

BTW : the Giga-Voxels paper is sort of interesting. The basic method of using a tree with indirections to voxel bricks is an old idea (see earlier papers) - the main addition in the GigaVoxels paper is the bit to have the GPU pass back the information about what new pages need to be loaded, so that the render gives you the update information and it's just one pass. That bit is really ugly IMO with the current GPU architecture, but it's progress anyway. Personally I still kinda think voxel rendering is a red herring and surface reps are the way to go for the foreseeable future.

Jeff had a good idea for streaming loads a while ago. A lot of people play a Bink movie when they do level loads. During that time you're hitting the IO system hard for your level data and also for your movie load. Right now that causes a lot of seeking and requires big buffering for the movie data and so on. With Oodle doing the IO for Bink and for your level load, we could great an interleaved load stream that has like [200 ms of Bink Data][some level load data][200 ms of Bink data][ ... ] ; that eliminates all seeking and lets the DVD just jam fast.

In fact, in the ideal Oodle future that might not even be a special case. The Bink data could just be split into packets, and then it could all just be handed to the generic Oodle packet profiler & layout system, and Oodle would automatically lay out the packets to minimize total load time. This all probably won't be in Oodle V1 , but it's good to keep a picture in my head of where we want to be some day.

01-03-09 | Blah blah blah

We were chatting in the hall the other day about realtime radiosity and I got my facts mixed up a bit.

Geomerics are the UK guys who ranted oddly about differential forms a while ago. They make "Enlighten" which seems to be a precomputed light-transport system that creates a static 5d transport field of some kind on *static* geometry ; you can then move lights around in that field in real time and relight the surfaces. It's hard to tell what's actually going on. If you look at the demo video on Youtube the best looking part is the *direct* lighting shadows from the sun through that courtyard. They insist on constantly showing this character with a lantern and stuff like that which is just lighting through a texture, I don't get it.

Fantasy Lab is the company of Mike Bunnell who did the GPU disk surfel light transport thing. They were selling it as the "Radium SDK" but that appears to be completely off the market and they are now just making a game.

The other big one I wasn't really aware of before is Lightsprint . Their demo video is much better designed to show actual radiosity effects. It seems like Lightsprint is using the technique of Stepan Hrbek who runs the Realtime Radiosity News page. I can't find any information about the technique they're using.

It's seems obvious that some kind of semi-brute-force single-bounce realtime radiosity is quite practical on current/future hardware.

I made Anne Burrell's Tarte Tatin after watching her do it on TV this morning. It's a shame she has some awful screen presence, since she's actually a decent cook. The tart was not great. It's got way too much caramel for my taste, it was all sticky and way too sweet. I think if I do it again I'd use maybe half the sugar in the filling that she calls for. Dave Lieberman's Tarte Tatin looks better - 1/4 cup of sugar instead of 1 cup, big difference.

I only get the New York Times maybe 33% of Sundays. The other times I wake up all excited and then am just so disappointed that it's not there. I'm not sure if it's being stolen or just not delivered. Anyway, when you don't get it, you call in, and they refund you. Bullshit. A refund is not fair compensation for failure to deliver a service (plus the time waste and aggrevation of using their fucking customer support phone number). This reminds of when our flight was cancelled and Alaska so kindly offered us a refund. Oh, you completely fucked me and failed to offer the service you promised, and you are being kind enough to give me back my money !? WTF that is not even the *beginning* of making up for it. Of *course* you give me back my money, NOW start talking about what you're going to give me to make up for your failure. It's a huge advantage for a business to be able to just fail to deliver when they want and give you a refund.

"Noi the Albino" was quite beautiful. It's sadder than I expected from the liner, but that's fine. The mise en scene is just gorgeous, the warm interiors, all the profile shooting with wallpaper backgrounds, and the constant metaphor of the desolate icelandic cold.

"Lonesome Dove" was good. I mean, it's sort of awful in that it becomes a juvenile Heinlein-esque romp at times where the main characters are basically superheros and the Indians are supervillians, everything is a caricature, but still it's charming and hokey and a fun read. It also brought back the romance of the west for me. I've driven through the great dead plains of Texas many times, I used to joy ride in my car out east of Austin (and crashed it on a country road out there once); I drove the I-10 from Houston to LA many times, through the deserts of New Mexico and Arizona, I drove old Route 66 for fun and came in through the 40 to Amarillo and Big Tex. I've driven the 80 through the plains of Nebraska and Wymoning where the wind blows fiercely and there are great wood-frame wind breaks near the freeway to catch refuse and tumbleweeds so they don't blow on the road. But I've never been to Yellowstone or Glacier or into Montana much. And besides, blowing past on the freeway is not the same. Lonesome Dove made me long to get a horse and a bedroll and just ride from Austin up to Montana. Most of that land is still barren empty plain. Of course it's all private now and there's fences and all, but it's still empty land and I bet you could get away with it. my Texas picture .

Stemless wine glasses are so retarded. Fucking trend-following design-style wannabees buying this shit.

I no longer Yelp since they closed my account because I wrote negative reviews (it's not in Yelp's interest to have honest factual reviews; they only allow sluts who write humorous junk (no, not you baby)). Anyway, I still find that I need to use it because it's the only site on the whole damn interweb that actually has semi-accurate information about retailers and restaurants locations and hours and such (though UrbanSpoon is getting better all the time so I might make the switch at some point). Now we all know that the reviews and ratings are largely nonsense, but you can get good information out of it. I wrote before about how to make productive use of Yelp in San Francisco. Sadly, Seattle has not really got on the internet bandwagon yet (lol, I know this sounds like a joke, but it's true; in SF literally every single person has their own web site, is starting a Web 2.0 company, uses Yelp, twitter, and a hundred other fucking web apps I've never heard of). That means there is no Uber-Reviewer yet in Seattle like Toro E in SF that you can just follow.

01-01-09 | Running out of junk titles

For people who don't get Ignacio's blog, you must read : Watertight Texture Sampling . He's got a great survey of all the modern UV mapping literature. Texture map - 3d mesh surface correspondence is a very interesting topic and something I've always wanted to come back to. Maybe we'll do a RAD lightmapper one day and I'll get to do the UV mapping part.

The average person produces between one and four pints (about 0.5 to 1.9 liters) of gas per day, and passes the gas 14 to 23 times a day.

Whoah !? I thought I made more gas than average, but maybe I'm actually under average !? It's still a mystery to me when girls manage to squeeze it out.

The Netflix recommender is okay, but they use it in fucking retarded ways. Why do you keep showing me recommendations of movies I've already seen !? Fucking show me movies I *haven't* already seen in the "related" list. Also why the fuck are you putting movies that you don't even predict that I would like on my main page !? There are movies that they predict I will give 1-2 stars on the main page, why are you showing me that nonsense!? When I browse TV shows it keeps showing me fucking "That's So Raven" and "Hannah Montana" and whatnot. Also, the insane slowness of their fucking web based GUI for the Queue tilts me so bad it puts me close to smashing my monitor every time I look at it. It tilts me that the problems with their system are basically all stupid practical issues, not the actual difficult technology. Of course that's pretty much always the way of things...

Cheap Shit Condos is a whole page devoted to the revolting cheap tacky condos going up all over Seattle. I'm not sure who I hate more - the retards who buy this shit, the retards who build them, the retard home owners that let them go up in their neighborhood, or the whole fucking retarded city for not having building codes. Oh, I give up, I'll just hate everyone. These boxes are like an architectural "fuck you" to anyone who appreciates beauty or sanity or history or value.

The Vertigo is one of the many condos on that site that are right near our house. It's part of a trend here that really blows my mind, which is condo developers buying up some super ratty 70's apartment building from the really low bad times of Seattle, basically just putting a new coat of paint on it (usually with some "hip" "youthy" styling) and selling them for $300,000 as "luxury condominiums". Seriously !? Part of what makes this all so disgusting to me is that the Seattle neighborhoods have so much character in the old architecture and there seems to be zero respect for that.

To put a little cherry on this pile of shit - our neighborhood is also home to perhaps the best new construction condo complex I've ever seen. Over on Harvard Ave near Aloha is this big new multi-building complex that perfectly fits in with the brick / pseudo-Tudor style of the neighborhood.

01-01-09 | Console vs GUI

Raymond Chen's blog is often informative and interesting, but he's very obstinate and often has blinders on. The latest one about console vs GUI mode apps in Windows is a classic example of Microsoftian "our way is right" thinking which is just completely off base and also ignores what users and developers want. It doesn't matter how "right" you are (and BTW, you're not right), if lots of users (developers) want something, such as the ability to make apps that are either console apps or GUI apps, then you should give it to them.

In particular, I would say that 90% of the apps I write I would like to be GUI/console hybrid apps. That is, depending on the command line switches, they either run attached to the console or not. There's also another major usage pattern that I like : if the app is launched by running from an icon or something like that, run as a GUI app, but if it's run from a command line, then run as a console app. This can be accomplished with the devenv ".com" trick (works because from cmd a .com of the same name is run before the .exe), but that means you have to build two targets which is annoying.

In practice what I'm doing now is that I make my apps console apps, and then depending on how they are run (command line swtiches and such) it can respawn itself with DETACHED_PROCESS. (this also makes sure you detach from the console - I hate fucking GUI interactive apps that I run from the console that don't detach themselves and thus lock up my console). BTW this last bit is another important hybrid usability thing - I want my app to tell if it's going into interactive mode or not. If it's a non-interactive thing (like just showing help or doing some batch processing) then it should run as a non-detached console app. If it pops a GUI and runs interactive, it should detach.

This is all reasonably easy with these calls :

IsDebuggerPresent() (needed because you don't want to do any funny respawning if you're debugging)




freopen("CONIN$","rb",stdin); // etc...

I've written in this blog in the past about some of the very bizarre weirdness about how Windows manages the console system; in fact just go read my post from last year if you care.

BTW if you do want to do the .com/.exe thing to be cleaner, you should use the "subsystem not set" trick and just make two mains like this .

Also, for my GUI apps in "final" builds, I still have the code in there to write log info out to a normal console with printf. I just make the app start up detached - hence no console window. Then I give the user a button to toggle the console visible. The nice thing is that you can just GetConsoleWindow() and tell it to HIDE or be visible. Oh, make sure you create the console right at start up and then hide it even if you don't think you want a console, that way if the user chooses to show it, all the logs will be there in the history.

12-31-08 | Nonsense

Things that are nonsense :

Hybrids and plug-in hybrids, etc. We can already easily make regular gasoline cars that get 40 mpg. Let me be very generous and assume that you could get a hybrid that gets 60 mpg (you would need to account for the extra waste making & disposing the battery, the oil used to generate the electricty for a plug-in, etc.). Consider a person who commutes 100 miles a day.

Efficient car uses 100/40 = 2.5 gallons
Hybrid uses 100/60 = 1.67 gallons

Savings = 0.833 gallons

That's okay, but it's an expensive, difficult savings. In fact, we already have the mechanism to save much more. If people just stopped driving ridiculous gas guzzlers it would make a much bigger difference. Say for example someone drives a truck that gets 15 mpg , and they switch to something only slightly better at 20 mpg :

Typical truck uses 100/15 = 6.67 gallons
Better truck uses 100/20 = 5 gallons

Savings = 1.67 gallons

The point is not that hybrids are bad, but that the general political push for "a green mission to the Moon" or Hydrogen cars or the "brightest minds working on transportation" is all a lot of nonsense. What's needed is a *behavior* change for people to give up ridiculously inefficient cars.

Let's look at another example that's more ridiculous but drives the point home more :

Efficient car at 50 mpg uses 100/50 = 2 gallons
Future magic hybrid at 100 mpg = 100/100 = 1 gallon

Savings = 1 gallon

Big SUV at 10 mpg uses 100/10 = 10 gallons
Moderate SUV at 20 mpg = 100/20 = 5 gallons

Savings = 5 gallons

Currently SUVs and light trucks are still around 33% of all vehicles. Getting that down to 5-10% or so is far more important than any advance in technology that makes efficient cars better. Politicians who contend otherwise are simply trying to avoid the unpopular reality that making big changes requires some sacrifice.

A gas tax would change behavior and make people buy more efficient cars. In the long term it might even encourage people to live closer to work, use public transit and simply drive less. It would do far far more to reduce gas consumption than new technology.

Something that actually does make sense is new technology for transportation of goods in trains and large trucks. More efficient deisel engines is a start, but natural gas or hydrogen might be sensible for them. It's a sector that uses far more gas per vehicle, so each vehicle changed makes a bigger difference, they get low MPG - and the low MPG is where the win is. They also can deal with a limited number of refueling stations as they already are accustomed to seeking truck stops, etc.

I'm not even going to really talk about the fact that the excessive fixation on transportation is illogical. Big gains could easily be made in residential power use by simply kicking people out of the fucking desert in Phoenix and the frozen norths where people waste massive amounts of energy on AC in one and heat in the other. (you don't have to literally kick them out, you just raise the cost of fuel and electricity and they move naturally).

Also note that a gas tax is not really a "tax" in the sense of skimming an extra fee - rather it is trying to balance out the massive distortion of the free market by the subsidies that the US government gives to drivers and the auto industry through so many sources (direct tax deductions for trucks, cheap govt leases for petroleum extraction, foreign policy expenditures, bailouts for the oil and auto industries, and of course the huge federal highway spending).

Wikipedia Energy Use Numbers
US Govt Energy use chart
Anti-SUV rant

BTW I do think there's some merit in hybrids, but it's mainly in raising awareness and in making small cars fashionable. It's funny to me that a little Civic is totally uncool, but a Prius is "hip". Whatever, as long as the rubes do the right thing I can't complain about their foolish reasons. (I've written before about how residential recycling is mainly beneficial for the same kind of reason - not because it actually reclaims resources usefully, but because it raises awareness and causes people to reduce consumption and so on).

Next week's episode of Nonsense will debunk the environmental benefit of "locavorism" or "sustainable" food production.

12-30-08 | More Technical Junk

SIMD Aware Raycasting is a really retarded name for this idea of using a polygon hull to accelerate volume ray casting. It's a good idea though. Basically you put a simple convservative outer hull around your octtree or whatever volume data. Then do your GPU raycast, you render the outer hull front & back and you rasterize out the ray start and end positions, then you read those to cast. This means casting many fewer pixels that just hit nothing, and means you don't have to march across empty space so much. It also is a good way to do the object-space transform. A lot of the volume rendering / GPU raytracing research is targetted to having one big giant rigid world that you just fly around in, which is silly. You at least need objects that can be rigid body transformed. This polygon hull method provides a good way to do that - when you generate the ray start/end points you can transform them into object space so that they are ready to go for the ray cast (you can also scale them to be inside the axial box of the object in object space so that they are ready to go as 3d texture coordinates).

I enjoyed the little "game" Reset by Roburky ; I use "game" in quotes because it's really an "experience". I don't really play many games any more, but I would enjoy playing little music-sync'ed experiences that have some interesting vision to them. (I'm also enjoying listening to the music of Trash80 as I work today). Sort of like the old Orisinal stuff in that it's really just an "interactive artistic moment". There's some interesting techniques in the Linger in Shadows demo; I'd love to see more creative and unusual render styles in video games.

I finally played a little Braid over the holidays. I didn't play enough to really feel like I have any opinion of it, but two things struck me : 1. I'm really going blind and need new glasses, because on a non-HD TV I was having trouble seeing WTF was going on (though my brother could see it just fine), and 2. it's really got a coherent artistic vision, and everything in the game, from the music to the render shaders to the art style, and the story and gameplay, all work together very well. It's something you almost never see any more. So many games these days are just a random hodgepodge of art and play elements and controls and GUIs that aren't coherent, don't match the universe, and don't contribute to an overall feeling.

I almost always write threaded code in a master/slave paradigm. That is, there is some master thread which is in charge of owning object lifetimes and creating slave threads and so on. I've never written anything significant which really does massively parallel cooperative multithreading, and I hope I never have to! Anyway, with almost all the Oodle stuff my threading paradigm is to make the library interfaces non thread safe, and then if you want to do things across threads you need to protect them for thread safety yourself; for example Files assume that only one thread is talking to them at a time. This is almost always the way I write threaded code - I don't like library calls that automatically do mutexes and such for me on every single call, I want to do it myself. I just realized today that I could actually use the single thread CRT in my multi threaded app. The MT CRT is really really really slow. For example, the MT CRT version of fgetc is 170 clocks instead of 10 clocks in the single threaded version. I can just use the single threaded CRT and protect it from bad threading manually. (I probably won't actually do this, instead I'll just avoid using the CRT altogether, but if for some reason you wanted to write a really high performance threaded app and still use the CRT, then I would recommend not using the MT CRT).

12-29-08 | Junk

Bill Clinton was perhaps the best Republican president this country has ever had.

Israel's overreactions to nuisance attacks from Hamas has made them a liability to US global interests. It is no longer a tenable political position to stand by them. This is not a question of who's right or wrong or morality, it's simply realpolitik.

Any reasonable investment these days returns far less than inflation (eg. a CD is around 3% which is not enough). Any money you have sitting around is losing value in real dollars. We may well be in a period similar to 1964 to 1984 where we could go 10-20 years with investments either showing no gain or a loss in real dollars. The only good option is to spend immediately on hookers and blow before the money loses more value.

Capitol Hill Snow Slideshow at Seattle Weekly ; most of these pictures are from right outside our house. I love the way people go nuts here when the weather's wacky. When it dumps snow nobody drives, everything is cancelled, and people just play. (people go nuts here in the summer too when the days are so long and it's too hot to sleep, everyone stays up late and wanders outside). We were flying that day, but I just wanted to cancel it and stay home and have snowball fights and drink cocoa. It's funny walking around and seeing the cars abandoned on the steep hills where people realized they couldn't make it (why wouldn't you just go back down to the bottom of the hill and park properly?).

I hate video games that change my resolution to go full screen. My laptop runs on a 1920x1200 external LCD. Any res except native looks like pure ass (the exception is 1600x1200 or 800x600 in center mode, not stretched). Furthermore, because I run over analog VGA, changing res causes the monitor to go through its autoadjust which takes a solid 30 seconds or os. That means absolutely any resolution change pisses me off (and it has to adjust when the game starts, then again when I quit, and when it crashes, which is like inevitable, it always leaves me in fucking 800x600 desktop mode and screws up all my icons; URG).

For my own 3d work I've always used the method of just grabbing the desktop res and going fullscreen with that res (then let the user down-res if they want, but never change the res unless they ask for it). But my GPU is too slow to run most games at full 1920x1200. The way that I'd like to see more games handle this is to let me stay in 1920x1200 but render only a portion of the screen. eg. 1600x1200 centered would be okay, but hell you could also just let me do a 1600x1000 wide aspect centered window and that would be fine too. Actually while I'm at it, I just wish most games and 3d apps didn't go fullscreen AT ALL without me telling them to. The default initial startup should always be windowed mode (since I'm going to have to watch it load for half an hour, I'd like to keep the use of my web browser thank you very much).

One of the weird things about Vegas is the way it's just a complete shambles as soon as you step off the strip. In fact, they don't even try to hide it much on the strip, like even at the Bellagio if you look behind a bush you'll see the walls are some kind of paper-mache plaster kind of thing and there are just gaps and dented spots all over. If you look out the back windows there will be the heating/cooling plants, piles of construction equipment, etc. Off the strip it really reminds me of Cancun or Mexico in general - piles of dirt, construction equipment everywhere, tons of stuff half built that nobody seems to be working on actively.

I hate that I'm writing all this nonsense. I hate sitting at the computer all the time, it gives me no happiness. I just don't know what else to do with myself. It's miserable outside. My shoulders are wrecked, probably forever, I can't work out and I'm fat. I feel like I'm not doing good work right now and I'm racking my brain trying to fix my coding process. When my work suffers I get depressed since it's about the only thing that gives me any sense of self worth.

I'm trying to avoid doing allocations in CINIT ; we've made our own allocator now just fail if you call it at cinit, but that's not catching everything for me because of the difficulty of making sure you've replaced every bit of code that tries to use "new" or "malloc" or whatever. One annoying thing I found : almost every STL container can be declared static in a file and it doesn't do any initialization, the constructor just zeros out some members - but hash_map actually does an alloc :

static std::hash_map<int,int> s_hash;

will allocate a 193 member table. That sucks. Empty containers should not do any allocations IMO since there are many usage patterns where you might declare a container but never actually put anything in it. (note that of course in real Oodle I don't use any STL, this was just in some test code).

Does the gmail "Report Spam" actually do anything? In any case I'm pretty happy with my gmail email pass-through.

When I see a "baby on board" sticker, it makes me really want to ram my car into the person. Is that wrong? Also, 1980 called, they want their sticker back.

I mentioned before how I think the bilateral filter is a pretty mediocre way of denoising or super-res'ing images, because it basically gives you a piecewise constant model. On the other hand, it's a great way to deal with things that actually ARE piecewise constant - and lighting in graphics is pretty close to piecewise constant. (a symmetric bilateral filter can do piecewise linear ; really what you want is piecewise quadratic). There are some good new papers around about new realtime GI techniques.

The new Fog Creek Office is pretty nice, and as usual Joel is right on about spending money to make programmers more productive. I disagree with some of the details, they don't really need to spend so much on fancy design and the common areas, but stuff like the mechanical height adjustable desks is super awesome. Of course you've got to offer a lot of perks to get people to work on FogBugz. ZOMG that's got to be some of the more boring programming work in the universe. Also, WTF how many people does it take to make fucking FogBugz !? We evaluated a bunch of bug trackers at OddWorld, and FogBugz was okay, but it's very simple and very restrictive, it's not overloaded with crazy features the way the competitors are; it literally looks like something written by 1-3 people (1 Casey or 3 normal people). I guess it's because if you're a programmer working on FogBugz you only spend 1 hour a day actually working and you spend the rest of the day dreaming about quitting or killing yourself.

12-28-08 | Capitalism Reform

A lot of the problems of the corporate system can be traced back to the fact that the individual actors cannot be held responsible. Executives basically have carte blanche to commit crimes behind the corporate veil, and the liability is subsumed by the corporation. Corporations can be spun off with debts and legal liabilities, corporations can be dissolved and reformed, moved to other countries or states where the laws treat them better, etc.

The basic idea of capitalism is that if individuals make decisions in their own best interest, the total value of the system is maximized. That does not hold true when you have a government structure which allows individuals to take the profit and pass on the risk (such as all the mortgage brokers who got commisions on loans but passed on the actual loan risk to others ; if they actually had to offer the loan capital themselves they would have been more careful and this never would have happened).

I've been thinking about this a while, and something that occurs to me is that there's really no need for the whole idea of a "corporation", nor is a there a need for "stock". The corporation is an artificial construct which acts like an individual but actually has many more rights than an individual.

Who needs corporations? Instead the companies should just be owned by individuals. In the end, that individual owner is fully responsible for the actions of the company. The company can be sold to another, but the legal liability for actions during that time sticks with the single owner. Corporations can no longer do things like pay out dividends when they have massive debt. Sure, they can still pay their owner, but there is no "corporate veil" - the owner is liable for all the corporation's debts, so the money is not being hidden away in a place it can't be retreived.

Who need stocks? Stockholders (and brokers) don't have the ability to really analyze companies and pick stocks well, they're just gambling. What do companies need stocks for? To raise cash for expansion. Well they can do that just by issuing bonds. Individuals can buy the bonds if they have faith in the company. The company gets cash, and the individuals get a return on their investment and capital flows to where it's needed just fine with no need for stock. The good thing about this is the company has a straightforward debt to the bondholders, and the bondholders are offered a clear risk vs return in buying the bond.

Furthermore, bond issues should go through a rating company, and that rating company should be required to insure those bonds! We got rid of stocks so there's no longer a problem of all the bogus stock hawkers, but you still might have a problem of bogus risk ratings on bonds. That's easy to fix - make the bond rater take the penalty for mis-rating the bonds. Boom rating qualities will be far more accurate and more conservative. The insurance goes like this : when you buy a bond you can choose to just buy it normally without insurance, but if you choose, the bond rater is required to give you insurance at a fixed fee based on the rating that they gave, eg. AAA= 1% fee, Baa = 3% fee or whatever; the point is that by assigning a rating they are basically picking how much vig they would need to profitably insure that bond.

You can still have things like "hedge funds" - the hedge fund manger just personally owns the funds, various people give him big loans, he runs the money and pays back the profit.

Now I'd also like to get rid of the Fed entirely and get rid of the FDIC but there may be more complaints about that. Getting rid of the FDIC is relatively easy, you just require banks to purchase insurance at market rate from various insurers instead of giving it away for basically free. Also getting rid of corporations means the owners are personally responsible for debts if the bank defaults so they would be less likely to default.

ADDENDUM : yes I know this is not practical or realistic, but I do think it's interesting as a thought experiment. I believe most of the reasons that people cite for why "we need stocks" are bogus.

1. "We need stocks to create enough aggregate wealth to fund big projects" ; not true, in fact even today those kinds of big capital-intensive projects don't raise money through stocks, they raise money from hedge funds and other large private investors; it's always been that way.

2. "We need stocks to reward entrepreneurs and early employees" ; if you mean that you need stocks to randomly massive over-reward some people and not others, then perhaps, but really this is just a big randomizer. Without stocks entrepreneurs can still get a huge payday by selling their company, or if it's actually a profitable company, they can just keep ownership of it and make a profit with it! NO WAI. In fact, the real benefit of stocks here is for people who create bogus non-profitable companies and somehow manage to convince others that it has good prospects and the stock should be worth more than it is. Early employees could easily be given profit-sharing agreements or conditional bonds which would reward them handsomely. In fact, making all these rewards more accurately tied to the real value of the company would improve the efficiency of capital markets.

3. "Stocks provide a way for the average individual to invest and grow their wealth" ; Individual selective stock ownership has been pretty widely shown to be a bad thing. People don't have the time or skill to invest wisely, and so are best off just investing in large funds. You could just as easily invest in large funds without stock ownership.

In fact, so far as I can tell, the biggest benefit of stocks and public corporations is in fact the exact thing I'm trying to eliminate - that is, responsibility for actions. If the individual running the company really had to take responsibility, they would be far less willing to take risks. Lots of exciting companies might never have happened because they were too risky. The ability to create a diffuse corporate veil is in fact very beneficial in some ways, because some amount of illogical risk is actually good for the economy. (it's an old saying that if people really understood how hard and risky it was to start a company, nobody would ever do it).

Let me illustrate foolish financial risk taking through an analogy to gambling. (Gambling with an edge is just "investing").

Consider a simple thought experiment. You are given the opportunity to bet on a coin flip where you an edge (in this example you win 53% of the time). You start with 1.0 monies. You get to bet a fixed fraction of your bankroll on each flip. eg. you could bet 1/4 of your bankroll on every flip, or 1/5 on each flip, but you aren't allowed to change the fraction. Note that as long as you pick a fraction < 100% you can never go broke. You must always bet all 100 flips.

What is the best fraction to bet ? Well, if you only care about the *average* return, the best fraction is 100%. That should be obvious if you think about it a second. Every extra dollar that you can put in this bet means more profit on average, so you should put as much as possible. But at the same time, if you bet 100% you almost always go broke. In fact only once in 2^100 do you make any profit at all.

It's a little unclear exactly what metric to use to decide which strategy is best, so lets go ahead and look some numbers :

winPercent : 53% , num bets = 100, num trials = 262144
fraction average sdev profit% median
1/ 2 19.22 1051.56 1.69% 0.00
1/ 3 7.24 422.89 13.48% 0.02
1/ 4 4.43 58.10 24.17% 0.18
1/ 5 3.30 20.66 30.92% 0.44
1/ 6 2.70 10.43 38.30% 0.67
1/ 7 2.35 5.79 46.10% 0.85
1/ 8 2.11 3.94 46.03% 0.97
1/10 1.82 2.30 54.18% 1.10
1/12 1.64 1.61 53.96% 1.17
1/14 1.53 1.24 61.69% 1.19
1/16 1.45 0.99 61.75% 1.20
1/19 1.37 0.77 61.94% 1.19
1/22 1.31 0.62 61.76% 1.18
1/25 1.27 0.53 62.10% 1.17
1/29 1.23 0.43 69.36% 1.16
1/33 1.20 0.37 69.22% 1.15
1/38 1.17 0.31 69.21% 1.13
1/43 1.15 0.27 69.40% 1.12
1/49 1.13 0.23 69.25% 1.11
fraction = fraction of bankroll placed on each bet
average = average final bankroll
sdev = standard deviation
profit = %% of people in the trial who had any profit at all
median = median average bankroll

(note the sdev for the 1/2 and 1/3 bet cases is very inaccurate ; I foolishly did a monte carlo simulation rather than
just directly counting it which actually is easy to do with the binomial theorem; the average return is exact)

It's easy to see in the table that the average profit is maximize for very large (risky) bets. But if you look at the profit% to see what portion of the population is benefiting, you see that in the risky bet cases only a tiny part of the population is benefiting and the high average is just because a very few people got very lucky.

It seems to me just intuitively that maximizing the median seems to produce a pretty solid strategy. For a win percent of 53% that corresponds to a bet fraction of 1/16. Another sort of reasonable approach might be to choose the point where > 50% of the population shows a profit, which would be at 1/10. (these both also occur right around the area where the sdev becomes smaller than the mean, which might be another good heuristic for picking a strategy). Using much smaller fractions doesn't really increase the profit% very much, while using larger fractions very rapidly increases the variance and descreases the profit%.

In any case I believe the analogy to finance is amusing. In a simple experiment like this it's trivial to see that when everyone is taking too much risk, it might be very good for the *whole*, but it's bad for almost every individual. It can be very profitable for a tiny portion of the population. It's also trivial to see here that any measure of the sum of the population (such as the GDP) is pretty useless when really measuring how well your strategy for the economy is working.

ASIDE: I used the fast median code from here . There are basically two sane fast ways to find a median. In most cases the fastest is the histogram method, which is basically just a radix sort, assuming you have 32 bit values or something reasonable like that. Basically you do a radix sort, but rather than read the values back out to a sorted array, you just step through the histogram until you've seen half the values and that's your median. If you can't do a radix for some reason, the other sane way is basically like quicksort. You pick a pivot and shuffle values, and then rather than descending to both sides you only need to descend to one side. This is actually still O(N) , it's not N log N, because N + N/2 + N/4 ... = 2*N which is of course O(N). Both of these methods can find the Kth largest element, the median is just a special case.

12-27-08 | Financial Quackery

I know very little about finance. It's time to play Financial Quackery. Put on your tin foil hat.

I believe the US economy is still in far worse shape than the mainstream media or politicians are saying. The reality is that we have been sustaining a period of artificial growth by taking out massive loans against our future. We have not been spending on education or research or infrastructure, and we now have a huge population that thinks it deserves to own a house and drive a nice car with absolutely no skills or hard work. There is hardly any sector of the US economy that can be said to be a real healthy engine of profit and economic growth. The biggest parts of our economy are phantoms : 1. Finance has been propping up the S&P for the last ten years and it's all an illusion. 2. Real Estate was just a bubble funded by inflationary dollars printed by the Fed. 3. Health Care is a huge part of GDP but is really just a drain. 4. Consumer Spending is just sending borrowed dollars back to China with interest, and consumer spending can collapse. 5. Service jobs in general rely on big spending from a small part of the population and high-paid service industry is not sustainable.

To get out of a recession you need real productive industry that makes something the world wants, and we don't have it (or only < 1% of us do). As I've said before, the long-term low interests rates which the Fed has used for the past ten years has put us in a trap where Stagflation is just about the only possible outcome. Our economy is already pumped full of free cash which has nothing productive to do, we've been running on empty and borrowing to keep up standard of living, and now we have tons of worthless paper.

Basically every financial number that's reported in the mainstream media these days is cooked to make it look better than it really is. Everybody knows the unemployment number has been bogus for some time; it's not 6%, it's really more like 12%. All the investment return numbers look way better than they really are because they don't include inflation, fees, & taxes. Furthermore, the widely reported inflation number is bogus. The other big thing they do is fail to adjust for population changes; so you'll see things like the number of jobs increasing, but it actually descreased as a percentage of population.

I've written about how the standard inflation measure is bogus here before. Anybody who has been alive for the past 10 years and has a brain should know that it's nonsense. The nominal 3% inflation would be 34% over the last 10 years. I think it's more like 100% over the last 10 years if you actually compare apples to apples. The normal CPI inflation measure assumes that you gave up the nice house you had in Travis Heights 10 years ago for which you paid $500 a month and you moved to Detroit in order to keep the inflation measure low.

(BTW 100% over 10 years = 7% annual, which is pretty close to the measure under the "pre-Clinton CPI")

Some reference :
SafeHaven article on the CPI by John Mauldin - see also his other newsletters ,
ShadowStats article on the CPI by WJ Williams - also other good articles by the same guy on other misleading stats

Even the people who do show inflation adjusted numbers tend to do so against the CPI for lack of a better accepted measure. Some graphs of "real returns" (sort of - some of these don't count dividends, and they all use the CPI which is bogus, and most don't count taxes or fees) :

Simple chart of DJIA
Intelligent Bear detailed chart
More optimistic log scale chart - where he accounts for reinvestment of dividends.

There have basically been two primary sources of the false growth in the last 10 years. One is just lies. Distorted figures and misreporting make the growth look good even though it's not really there. The Fed basically prints a bunch of money, the financial industry books it as profit, and then they lie about the inflation and we all think we're getting richer. The other are risky leveraged bets.

Basically the corporate system is completely broken. I've written a bit about this before and will probably again, but if you think about it in a game theory sense from the standpoint of an executive - why in the world would I ever do what's good for the company or the country? I can get huge bonuses based on short term profit, and then I can leave the company or sell it or just let it go bankrupt, I get to keep my profit, what do I care if I wrecked the company? My only motivation is to maximize short term returns, and of course that is exactly what they do. We've always known this, and executives have screwed up many companies by not having long term vision, cutting R&D, laying off too much staff to cut costs, etc. What happened in the last 10 years is they just got much more clever about it. When your core business can't make money because the entire economy is in the shitter, what do you do? You take out loans and make massive leveraged bets in risky markets to possibly get a big profit. The actual EV of these moves is maybe zero or negative, but the thing is for the executive they get a big payout if they win the bet, and they get almost zero penalty of they lose the bet. It's gambling for free, or rather gambling where they get the profit and the shareholders and taxpayers pick up the loss. I can't blame them for taking that bet, obviously it's a winning proposition for them.

12-27-08 | Temper

Temper is one of the stranger words in English, since it can mean both something and its opposite. This struck me while reading Lonesome Dove, which uses the expression "out of temper" to mean that someone's patience was exhausted, as in "dern, this bronc been trying to throw me all day and I'm plum out of temper".

In this usage "temper" means something like "composure". But when someone is said to "have a temper" it means they are often "out of temper" or often "lose their temper". (this obviously suggests a riddle, something like - "what can a person have even when they lose it?").

There seem to be a variety of unrelated usages : (only giving examples of usages still common today)

Temperate climate
Temperance movement
The Well Tempered Clavier
Tempering Steel or Chocolate
Hold your temper / lose your temper / out of temper / have a temper / be in good temper / temper tantrum

Interestingly, temper the verb used to be more common than temper than noun, and the primary meaning was to "mix" or "moderate" or even "compromise", such as in tempering the sweet with a little sour. This usage is now rare.

Here are some modern dictinary definitions of the archaic usage :

9.  to moderate or mitigate: to temper justice with mercy.
10. to soften or tone down.
11. to bring to a proper, suitable, or desirable state by or as by blending or admixture.

And some from the 1828 Webster's :

    1. To mix so that one part qualifies the other; to bring to a moderate state; as, to temper justice with mercy.

    2. To compound; to form by mixture; to qualify, as by an ingredient; or in general, to mix, unite or combine two or more things so as to reduce the excess of the qualities of either, and bring the whole to the desired consistence or state.

    Thou shalt make it a perfume, a confection after the art of the apothecary, tempered together, pure and holy. Ex.30.

    3. To unite in due proportion; to render symmetrical; to adjust, as parts to each other.

    God hath tempered the body together. 1 Cor.12.

    4. To accommodate; to modify.

The "Tempered Clavier" of Bach seems to stem from this usage; it's a way of mixing the ideal tuning for the different keys to create a single tuning that's not quite right for any of them, but is okay enough for all of them; it's a "well mixed tuning" if you will.

Tempering Chocolate obviously comes from analogy to tempering metal, and in fact they are sort of similar, in both cases you are controlling the crystal formation by raising and lowering the temperature carefully through a small range. Understanding the origin of tempering metal is not obvious, but maybe it comes from the "compromise" origin like Bach's usage. Tempering metal is a way to acheive a good mix of hardness and softness that keeps it from being too brittle. (BTW many of the standard dictionary definitions for temper in the metallurgy usage are just wrong; you'll see definitions like "the degree of hardness" ; The Barbarian Keep has a nice little thing on tempering and a rant about misuse).

We're still left with the problem of the meaning of "temper" in reference to moods. The Webster's 1828 definition of temper has the two opposing meanings for the noun :

3. Calmness of mind; moderation.
4. Heat of mind or passion; irritation.

For laughs Wordia has the same two meanings but in opposite order :

3) noun,  a tendency to exhibit uncontrolled anger; irritability
4) noun,  a mental condition of moderation and calm

It seems to me that the meaning with respect to moods was originally "moderation", and perhaps just misunderstanding of the expression made it flip.

Another posibility involves another meaning of "temper". Temper can also just mean "mood or state of mind", thus you could have an ill temper, a good temper, a magnanimous temper, a generous temper, etc. Over time temper may have been mainly used in the form "bad temper" and "ill temper" and thus simply become "temper" meaning "bad mood". ( some people still use temper with other adjectives, but this is no longer standard English; presumably phrases like The Heroic Temper would now mainly be rendered as The Heroic Temperament, at least in America; I find a lot of usage in Australia of "Temper" in the more general sense).

Anyway, this leads to funny possibilities, such as : To temper his image of having a temper, Giulani shows good temper . When confronted with intemperance, to show you don't have a temper you must keep your temper.

12-26-08 | In Defense of Less Typing

In the C++ rant we got a bit into the discussion of "that just reduces typing". It's become a common wisdom these days that "anything that just reduces typing is not of significant importance in programming". This is sort of a reaction to the bad old days where developers put too much emphasis on reducing the time it took to make a first draft of the code. Now people like to repeat the plathitudes that "first draft is only 10% of dev time, debuggability and readability and maintainability are what really matter".

Yes, yes, that is all true, but I think people miss the forest for the trees here in some cases.

Every character you type is a chance for a bug or a mistake. For one thing there are typos, but there are also just brain farts. The less you type when writing a piece of code, the more likely it is to be correct.

(I should emphasize the fact that reducing code duplication is very good for many reasons that I don't go into detail much in this rant, and those are mainly the main reason to merge duplicate code; I'm talking about cases where the code is not exactly duplicated, but is similar, or where you have a choice between making a very simple API which requires a lot of typing by the client, or a more complex API which has very simple usage for the client).

A lot of good programmers now are adopting the idea of exposing simple minimal C-style APIs that leave usage up to the client. There are a lot of things to like about that (for example, Sean's stb.h style thing for simple functionality is in fact wonderfully easy to integrate and steal snippets from), but there are also bad things about it. I think good programmers overestimate their ability to write simple usage code without mistakes. For example, you might think that you don't need a class to encapsulate a 32-bit Color, you can easily just use a 4-byte array or pass around a dword and do the shifts by hand - but I have seen bug after bug from small mistakes in that kind of code, because if you write the same thing over and over, or copy-paste and try to change a bunch of code by hand, there is some small chance of mistake each time you do it.

It's funny to me that good programmers in game dev are going in two directions at the same time. One direction is to make code very easy to follow by simple visual inspection. Many major people in game dev are pretty high on this these days. The basic idea is to use C-style imperative programming, make the action of each line obvious to simple visual inspection, reduce segmentation of code into function calls and prefer simple long linear code blocks (rather than factored out functions, conditionals, objects, etc). There are certainly merits to that. The other direction people are going is custom metaprogramming and local language redefinition. Many of these people want coding languages where you can locally redefine the rules of the language and do custom markup for things like network mirroring, thread fiber spawning, local-global state memory variable differentiation, etc. This kind of stuff would make code completely impossible to understand by simple visual inspection without intimate undestanding of how all the extra meta-language rules work. These ideas also have a lot of merit, because writing micro-threaded massively parallel code in plain C-style C++ is really difficult and ugly, and having custom features would make it much better - but these two directions are totally conflicting.

While I'm ranting about opposite directions, let me also rant about the idea that something is a good idea for "library design" but not for within your app (or vice versa). IMO Coding = Library Design. Most of the complex code that I write is in "libraries" for myself. Libraries are just chunks of functionality that you want to expose in a compact interface. Well, that's what you should be doing all the time. Coding is just writing a bunch of libraries, then the "app" is just tying together the "libraries".

So, for example, Casey's excellent talk about good library design (things like exposing multiple levels of interface from very simple to nitty gritty, and not forcing a usage pattern on the client) are just good ways to write code *always*.

I don't trust the me of one year ago, nor do I trust the me of one year in the future. I need to write API's for myself that make me write the right code. Part of that is all the things I've often written about before (self-validating API's, API's that are impossible to use wrong), but part of it is just plain less typing. If the API makes me (the client) write a whole bunch of code to do the simple things I often want to do - that makes it far more likely I will do it wrong.

Also I believe the illusion of choice is a horrible thing. If there's really only one or two reasonable ways to use a system, then just expose that to me. Don't give me what looks like a bunch of flexible components, but they only really work right if you do one specific thing.

Addendum : okay I'm bored of this topic and I'm sure you are too, but I feel like I started it so I should wrap it up a bit more.

Paul Graham has this thing "Succinctness is Power" that's sort of similar to this rant. As usual he writes it well, but I think he's a little bit wrong. The issue that I believe is important, which is what I'm trying to talk about here is :

Reducing the number of choices that the programmer has to make in order to write correct code.

Part of that is reducing typing - but the crucial thing is reducing typing when there is actual choice in the alternatives. That is, if it's something you *have* to type, that's not bad. For example a very verbose language like COBOL is not inherently bad due to its verbosity (cobol is horrible for other reasons).

Making code that works correctly with minimal typing (and makes compile errors if you use it wrong) is the goal. So part of what I'm getting at here is using short common words that it's easy for the programmer to get right, using highly constrained classes instead of very general ones, etc.

Part of the credo goes like this :

remove the option to do things that you never want to do

make it hard to do things that you rarely want to do

make it easy to do the right thing

As an example - iterators are cool even when they save almost no work. Say for example you have something like a doubly linked list class. Many of the simple C guys would say "hey you can just write code to traverse the linked list", and you write client code like :

for(Node * n = list.head; n != list.head; n = n->next)

That's easy right, you're a good programmer, you can just write that loop. No. You can't, that's what I'm trying to say with this rant. I mean, yes, you can, but you had a 1% chance of introducing a bug because you had to write code.

Writing code is bad.

Instead you could make your Node look like an iterator. You could give it standard names like begin() and end(). Now instead of writing code you can just use for_each , or even just copy-paste or spit out a loop that you've done a million times :

for(Node::iterator it = list.begin(); it != list.end(); ++it)

Is safer because it's standard. On a similar note, using a constrained object like an iterator is safer than using an int, because every time you use an int people get tempted to do evil things. How many bugs have I seen because people try to get clever with their loop iteration? Maybe they count backwards for efficiency and use and unsigned type by mistake. Or they pull the ++i out of the for() and then forget to do it due to a continue. Or they use the "i" outside of the for loop and bugs get introduced.

Lots of people are anti-OOP these days. I love OOP ; no, not deep inheritance trees and lots of data inheritance, and whatnot, but the basic idea of coding in terms of objects that impose constraints and conditions on their use. The best way for me to program is to build components and helpers which make expressing the program easy.

12-18-08 | Christmas

It dumped snow last night; it's very pretty. Everyone is staying home and playing in the snow.

We're flying to see Alissa's parents tonight. Hopefully the snow doesn't screw up the airports. I'll have no computer. We can't get a cab because of the weather so I guess I'm going to have to try to drive in this. We're definitely going to die on the way.

Bleck I want to just make snowmen and throw snowballs! God I hate Christmas travel.

In calendar year 2008 my portfolio has done about -50%. That sounds bad, but it's actually worse than that. To get back its value it needs to go +100%. The loss amount looks smaller because it's measured relative to the larger initial value, not the small ending value. In fact, the percents are a bad way to show finance changes. A better way is the log (base two) of the multiplier (and then x100 to make it look nice).

So 0% = 1.0 multipier, log2( 1.0 ) = 0 ; no change

-50% = 0.5 multiplier , log2( 0.5 )*100 = -100

+100% = 2.0 multiplier , log2( 2.0 )*100 = +100

Under this counting, if you have a -100 year then a +100 year you're back where you started, which is more intuitive. (obviously, appreciation amounts combine by multiplication, so taking the log changes it to addition).

Addendum : in case you weren't sold on the log scale, it also gives you the easy intuitive way to do compounding (obviously). Say you have something that makes 10% one year. What does it do over two years? Well, even though we know it's wrong, our brain intuitively jumps to "20%". The real answer is of course 1.10*1.10 = 1.21 = 21%. In log scale 1.1 = 100*l2(1.10) = +13.75. What does it do over two years? 2*13.75 = 27.5

That may not seem like a help, but it's more significant if you look at monthly compounding or something. Say you have something that does 10% annually, what's the monthly compound? It's 1.10^(1/12) ; in log scale it's just 13.75/12. The daily compound is just 13.75/365. Nice. The problem is that getting between log scale and normal scale is annoying and non-intuitive, so as long as most reporting is in normal scale, log scale is hard to use.

BTW also one of the things finance people do which messes everyone up is they don't usually report inflation-adjusted numbers. Inflation adjusting over time is annoying in normal percents, but of course in log scale you just do -5 (3.5% inflation is about a 5 in log-scale).

So the boiler room in our apartment is usually locked, but the landlord (being the irresponsible screwup that he is) accidentally left it unlocked a few days ago. Alissa discovered it and checked out the settings on the boiler. It's programmable and all that where it comes on with different time schedules. It was set to 63 degrees.

12-18-08 | Open Source Just Doesn't Work

There's a Netflix Plugin for Media Portal . I dare you to even try to figure out how to install it and get it to run. And then there are some big problems with control and integration once you get it running. I'm trying to get the developer to fix it; fingers crossed, it would be nice to have that. I keep hearing the Xbox Netflix integration is really awesome. I might have to get an Xbox just for that. Yay.

12-17-08 | Manifesto

1. The State only listens to violence and destruction of property. Peaceful protest accomplishes nothing without the threat of action. (Supposed successful peaceful protesters like Ghandi or MLK were in fact empowered by the state's fear of violent civil unrest).

2. The State is primarily concerned with enriching and maintaining the wealth and power of the Elite. The State only changes when something threatens the Status Quo which benefits The Few.

3. The Rich are in collusion with The State. They have power over The State, they benefit from its actions. They are complicit in all acts of evil performed by The State.

4. It is the right of those being trampled upon to fight back with the means available to them. They may fight or protest to secure their ability to survive, or their freedom.

5. Theft and destruction of the property of The Rich is one of the valid ways for The Trampled to take action against The State.

6. The Rich earn the right to keep their posessions and power by making the system function well enough for the poor. If the poor become desperate enough to risk jail or injury, the rich have broken their obligation and the poor are morally justified in taking action.

12-17-08 | Junk

tmpnam() and tmpfile() are making files in the root of my machine. According to the docs they're supposed to put files in your temp dir as set by "TMP" , but all I ever see is stuff like "\s38c.1" . The windows functions GetTempPath / GetTempFileName seem to work fine.

Is this legal ?

static int x = func(&x);

I did the Casey-style tweakable var thing where you watch your C files, and I want to be able to initialize a variable and also grab its address in just one statement. It works perfectly fine in MSVC, I just wonder if it will break on some platform; I can't seem to find anything in the C standard about whether the address of a variable is defined before the variable is finished initializing. (I know for example that you aren't "supposed" to use "this" during a constructor, but everybody does it and it works fine in every compiler I've ever seen).

It sucks that MSVC doesn't have fmemopen(). It would let me page data into a memory buffer and then fmemopen on that and give it back to clients who want to just read bits with stdio because they have functions already written that use stdio.

More generally, I wish stdio FILE had actually been defined to use function pointers the way it does in some GCC POSIX implementations . Then you could plug in anything and FILE would be a true virtual file. That way people could just write code to talk to stdio, and I could secretly pass them Oodle file handles and it would all just work.

Instead I have to make my own virtual file layer, and then maybe some #defines or something if you want to just stomp your stdio calls to Oodle.

The function pointer indirection is really not a performance cost, because getc will still be a macro that goes straight to a buffer, and the function pointer only needs to get called to fill the buffer. The stdio buffer these days is not actually a file system buffer - it's way too small, that's just not its job. The file system is buffering the disk in 256k chunks, stdio has a little 4k buffer whose purpose is to reduce the number of times you need to talk to the OS, just to cut down on function call overhead. I'm planning on using the same method for the Oodle virtual file, but with just a 256 byte or maybe 1k buffer, which is just there so you can do fast macro getc() and only rarely jump through a function pointer to do big reads or refill the little buffer.

I'm working on my laptop, which is way way slower than my awesome work machine. My Oodle test games runs at 10 fps on the laptop and 200 fps at work. It's letting me find lots of little paging bugs. Yikes. For real release testing I'm probably going to have to put in a mechanism to run at artificial slow framerates, and maybe also some randomized frame durations to really try to stress all possible orders of things occuring.

I'm annoying by having "Oodle" and "rad" in front of all my function names. It's nice to expose it to clients that way because it makes it so you never have possible conflicts with their definitions, but it's annoying for me during dev. It ruins typing autocomplete and browse info. I think "oh, I want to open this file" and start typing "ood.." and it gets me nothing, whereas I could just be typing "open.." and it would complete for me.

I might have to do the thing Casey did where I use nice short names internally for myself and then run it through a bunch of wrappers to expose long names to the outside world.

Back on Galaxy and all my old libraries, and back when I did C-style OOP at Eclipse I would always make the names of things show what system they're in. So all the Galaxy stuff is gFile, gVec, etc. I now realize that's bad. For one thing it's bad for the autocomplete and so on. It's bad for file name browsing. It's bad for pronouncability. But more importantly it doesn't help. In C++ you can just wrap all your junk in a namespace to prevent conflicts, and then within your namespace you can rock on with short names.

Furthermore, short generic names is better for metaprogramming and templates. If your functions are named Quat_Length() and Vec_Length() and so on, I can't write generic functions that work on both. But if it's just Length() and Distance() and such, I can write lovely generics. Even without templates this is a win because it lets you copy-paste code, or change your data types. Like say you have a chunk of code that works on Vec3. You decide it needs to work on Vec4 instead. If your functions are Vec3_Add and Vec4_Subtract then you have to change a ton of code. If it's just Add() and Subtract() you change the data types and it just works.

Algorithms are seperate from the data they work on. That's the big epiphany of Stepanov. In fact, Stepanov is a weirdo and thinks the most important thing is to find the absolutely minimal constraint that the algorithm places on the data it works on.

12-16-08 | Libraries and Cinit

I need a kind of mini class-factory for Oodle. This is for when I load a paging bundle that's full of various resources, I need to make each one. (BTW there will be a "low level" usage pattern for Oodle where you just load a paging bundle and the game gets a handle to the bundle and does whatever it wants with it. This is for the "high level" layer that automates a lot of things for you but is optional).

I guess what I'm going to have to do is require the game to give me a creator function pointer that knows how to make all the objects. eg. it would give me something like

void * (*fp_factory) (int objectType);

and fp_factory would point to something like

void * GameObjectFactory(int type)
    case OBJECT_PLAYER : return (void *) new Player;
    case OBJECT_ENEMY: ...

Or something. As a game developer I hate that kind of global registry, because when you add a new object type you have to remember to go update this global registry file, which becomes a frequent merge disaster for source control, etc. I really like having everything you need to make a game object in a single CPP file. That means objects should be self-registering.

The way I usually do self-registering objects is with a little class that runs at cinit. Something like :

#define IMPLEMENT_FACTORY_PRODUCT(classname)    namespace { ClassFactory::Registrar classname##registrar( classname::Factory , typeid(classname) ); }

then in Player.cpp you have :


That's all fine and dandy, but it's not so cool for me as a library maker.

For one thing, doing work during CINIT is kind of naughty as a library. I've been trying to make it a rule for myself that I don't use object constructors to do cinit work the way I sometimes like to do. It's a perfectly normal thing to do in C++, and if you are writing C++ code you should make all your systems instantiate-on-use like proper singletons so that CINIT objects work - BUT as a library maker I can't require the clients to have done all that right. In particular if I do something like allocations during CINIT, they might run before the client does its startup code to install custom allocators or whatever.

For another thing, there are problems with the linker and cinit in libraries. The linker can drop objects even though they are doing cinit calls that register them in global factory databases. There are various tricks to prevent this, but they are platform dependent and it is a little evil bit of spaghetti to get the client involved in.

I guess I probably also shouldn't rely on "typeid" or "dynamic_cast" or any other runtime type information existing either since people like to turn that off in the compiler options for no good reason (it has basically zero cost if you don't use it). So without that stuff I pretty much just have to rely on the game to give me type info manually anyway.

Bleh, I'm just rambling now...

12-15-08 | Denoising

I've been playing around with denoising images a tiny bit. There's a ton of papers on this and I've barely only scratched the surface, but it strikes me that the majority of the researches seem to be doing silly things that are ill-conceived.

Almost all of them work in the same basic way. They create a prediction of what the current pixel should be with a local smooth predictor, let's call that 'S'. Then they take the difference from the actual pixel value 'A'. If the difference is greater than a certain threshold, |S - A| > T , they replace the pixel value with 'S'.

That's just very wrong. It assumes that images can be modeled with a single-lobe Gaussian probability distribution. In fact, images are better modeled as a blend of several lobes with different means. That is, there is not one single smooth generator, but many, which are switched or blended between based on some state.

Any single-lobe predictor will incorrectly identify some valid image detail as noise.

I like to make it clear that the problem has two parts : deciding if a pixel is noise or not noise, and then filling in a replacement value if you decide that the pixel is noise.

My feeling is that the second part is actually not the important or difficult part. Something like a median filter or a bilateral filter is probably an okay way to fill in a value once you decide that a pixel is noise. But the first part is tricky and as I've said any simple weighted linear predictor is no good.

Now, ideally we would have a rich model for the statistical generation of images. But I wrote about that before when talking about Image Doubling (aka Super-Resolution), and we're still very far from that.

In the mean time, the best thing we have at the moment, I believe, is the heuristic modes of something like CALIC, or the Augural Image Zooming paper, or Pyramid Workshop or TMW. Basically these methods have 6 - 100 simple models of local image neighborhoods. For example the basic CALIC models are : {locally smooth}, {vertical edge}, {horizontal edge}, {45 degree edge}, {135 degree edge}, {local digital}, {pattern/texture}. The neighborhood is first classified to one of the heuristic models, and then a prediction is made using that model.

We can thus propose a simple heuristic noise detection algorithm :

Bayesian Noise Detection :

N = current local neighborhood
A = actual pixel value

P(M|N) = probability of model M given neighborhood N
P(A|M) = probability of pixel A was generated by model M


P(A|N) = argmax{M} P(A|M) * P(M|N)

then classify A as noise if

P(A|N) < T

for some threshold T

(I don't specify how the P's are normalized because it just changes the scaling of T,
but they should be normalized in the same way for the whole image)

Note that a very crucial thing is that we are using the argmax on models, NOT the average on models. What we're saying is that if *any* of our heuristic local models had a high likelihood of generating the value A, then we do not consider it noise. The only values that are noise are ones which are unlikely under *all* models.

In a totally hand-wavey heuristic way, this is just saying that if a pixel is within threshold of being a locally smooth value, or an edge value, or a texture, etc. then it's not noise. If it fits none of those models within threshold, it's noise.

I started by looking at the Median Filter and the Bilateral Filter. There have been some cool recent papers on fast Median Filtering :
Constant Time Median Filter
Weiss Log(r) Median Filter
Fast Bilateral Filter ; Sylvain Paris and Frédo Durand + good references
Siggraph Course on Bilateral Filtering

Those are all very worth reading even though I don't think it's actually super useful. The fast median filter approaches use cool ways of turning an operation over a sliding window into incremental operations that only process values getting added in and removed as you step the window. Median filter is a weird thing that works surprisingly well actually, but it does create a weird sort of Nagel-drawing type of look, with nothing but smooth gradients and hard edges. In fact it's a pretty good toon-shading process.

BTW the fast median filters seem useless for denoising, since they really only matter for large r (radius of the filter), and for denoising you really only want something like a 5x5 window, at which size a plain brute force median is faster.

Bilateral filter actually sort of magically does some of the heuristic cases that I've talked about it. Basically it makes a prediction using a filter weighted by distance and also value difference. So similar values contribute and disparate values don't. That actually does a sort of "feature selection". That is, if your pixel value is close to other pixel values in a vertical edge, then the bilateral filter will weight strongly on those other similar pixel values and ignore the other off-edge pixels. That's pretty good, and the results are in fact decent, but if you think about it more you see the bilateral filter is just a ghetto approximation of what you really want. Weighting based on pixel value difference is not the right way to blend models, it makes no distinction about the context of that value difference - eg. it doesn't care if the value difference comes from a smooth gradient or a sudden step. As others have noted, the Bilateral Filter makes the image converge towards piecewise-constant, which is obviously wrong. Getting towards piecewise linear would be better, piecewise bicubic would be better still - but even that is just the very easiest beginning of what the heuristic estimator can do.

NL Means is a denoising algorithm which is a bit closer to the right idea; he's definitely aware of the issues. However, the actual NL means method is poor. It relies on closely matching neighborhoods to form good predictions, which anyone who's worked in image compression or super-resolution knows is not a good approach. The problem is there are simply too many possible values in reasonably sized neighborhoods. That is, even for a moderately sized neighborhood like 5x5, you have 2^8^25 possible values = 2^200. No matter how much you train, the space is too sparse. It may seem from the NL Means formulation that you're weighting in various neighbors, but in reality in practice you only find a few neighbors that are reasonably close and they get almost all of the weight, and they might not even be very close. It's like doing K-means with 2^200 possible values - not good.

There's a lot of work on Wavelet Denoising which I haven't really read. There are some obvious appealing aspects of that. With wavelets you can almost turn an image into a sum of {smooth}+{edges}+{texture}+{noise} and then just kill the noise. But I don't really like the idea of working in wavelet domain, because you wind up affecting a lot of pixels. Most of the noise I care about comes from cameras, which means the noise is in fact isolated to 1 or 2 adjacent pixels. I also don't like all the research papers that want to use 9x9 or larger local windows. Real images are very local, their statistics change very fast, and pulling in shit from a 9x9 window is just going to mess them up. IMO a 5x5 window is the reasonable size for typical image resolutions of today.

BTW one thing I've noticed with my camera noise images is that the fucking camera JPEG makes the noise so much harder to remove. The noise looks like it's originally just true single pixel noise, but when it goes through the camera JPEG, that single-pixel peak is really unfriendly to the DCT, so it gets smeared out, and you wind up having noise lumps that look like little DCT shapes. To specifically denoise photos that have gone through JPEG you probably have to work on 8x8 blocks and work directly on the DCT coefficients. (also the Bayer pattern demosaic obviously spreads noise as well; ideally you'd get to work on the raw taps before jpeg, before the camera denoise, and before the Bayer demosaic).

ADDENDUM : a lot of the denoise people seem to be trying to perfect the Playboy Centerfold algorithm, that makes photos look extremely smoothed and airbrushed. Often if you're not sure a pixel is noise it's best to leave it alone. Also, all the methods that use a pixel-magnitude threshold value for noise are wrong. The threshold for noise needs to be context sensitive. That is, in smooth parts of the image, you might be able to say that a pixel is probably noise when it's off from expectation by only 1 or 2 pixel values. In chaotic textures parts of the image, a pixel might be off by 20 values or more and you still might not be able to say it's noise. The correct parameter to expose to the user is a *confidence*. That is, I want to do something like replace all pixels which the algorithm is >= 90% confident it can improve.

Another problem I've seen with the majority of the denoisers is that they create structures from noise. If you run them on just staticy junk, they will form local flat junks, or linear bits or weird feathery patterns. This is because even in random noise there will be little bits that have similar values so they become seeds to create structures. This is very bad, the weird structures that are formed by this "denoising" are much worse than the original static, which is pretty inoffensive to the eye.

Marc sent me the link to GREYCstoration a free denoiser based on the image manifold PDE research. I don't like that this technique is becoming known as "PDE" - PDE just means partial differential equation; in this case it's a diffusion equation, in particular a variant of anisotropic diffusion with different diffusion rates based on the local curvature - eg. it diffuses across smooth areas and not across edges. (that's actually an old basic technique, the new thing he does is the diffusion follows contour lines (but is actually 2d, just weighted) and works on all components). It looks pretty decent. It's actually more exciting to me for super-resolution, it looks like it does a pretty good job of image super-resolution.

12-15-08 | Monday Randoms

I want Windows to always automatically change its resolution to the native display resolution of the LCD that I plug in. I just never ever want to run an LCD at anything but its natural resolution. Nobody should. It seems like this is an obvious standard feature everyone would want.

Insurance is by design minus EV. That doesn't mean it's by definition -EV. In fact, you can "win" by being reckless and driving like a lunatic or just crashing on purpose.

C is the fucking devil. I had this in my code :

static s_frameTimeAccumulator = 0.f;
s_frameTimeAccumulator += dt;

No warning. WTF. (You may notice there's a word missing after "static").

12-14-08 | Sunday Randoms

My iPods have been locked into certain songs sets for a while now because I just don't want to run iTunes. I moved the location of my mp3 dir a while ago when I got the HTPC, and iTunes is pointing at all the old stuff. There are some hacks around the net about ways to move the dir without reimporting everything, but I tried some and it didn't work, so I just gave up and deleted the iTunes database and am reimporting everything. I'm currently about 30 hours in to the import; it seems near done now...

Having iTunes running and slowing the machine down to a crawl lets you see weird bugs in Explorer. For example, when Explorer redraws the desktop, it actually redraws the whole thing top-left justified with no task bar. Then if your task bar happens to be in the top left, it clears the screen, draws the taskbar, and redraws the icons shifted over by the task bar amount. That is some crazy shitty programming and the kind of thing I always try to be careful about in games. It's a good trick in game development to test your game at 5 fps so that you can see all the one-frame glitches. People often have one-frame bugs in the whole input->action->rendering chain.

It snowed a lot last night. It's bitter cold inside my apartment (around 62 degrees, but it feels colder cuz of the drafts). Apparently neither my girlfriend nor the other tenants think this is a big deal, which has killed my momentum in trying to get the landlord to do anything. On the plus side, I'm surprised at how good the elecriticity in this old building is; I'm running space heaters and have yet to blow a fuse. My place in San Francisco would blow a fuse if the refrigerator happened to start a cooling cycle while I was vacuuming.

Maybe someone can help me find a new digital camera. What I want is :

The Panasonic FX 150 looks okay, but it's 15 MP which is worrisome. The LX3 looks semi-ideal.

Also apparently there's this CHDK thing (see also ) which lets you hack any Canon Point and Shoot. Apparently the Canon P&S and the DSLR actually have the same chip, but for the consumer level stuff Canon just disables those features (like 10-bit RAW). Apparently the best for this was the Canon S70 which is not made any more.

I've read at many of the fancy photo places that you should get a physically large CCD (2.3" not 1.8" or 1.4") and that lower megapixels = less noise = better quality photos. That's sort of interesting to me theoretically.

Obviously larger CCD = better because more photons are coming in. Subdividing the CCD into more pixels gives you more resolution, but fewer real photons per pixel. I'm not sure what the right model for noise is, but I think it's pretty close to just constant intensity random noise that's independent from pixel size. That is, the signal measured in each pixel is :

Signal per pixel = # Photons in pixel + Random Noise of intensity N
S = P + N

Your signal-to-noise-ratio per pixel obviously gets better and better when you have fewer pixels, all the way down to the case of just one giant pixel which would be very accurate. However, obviously that needlessly decreases your spatial resolution. On the other hand, if you divide the CCD into too many megapixels you have too little signal per pixel and it becomes hard to tell real signal from the noise.

If we try to maximize the amount of real total information captured, there must be some sweet spot. The total information is like (# pixels) * (information per pixel). Information per pixel is maximum when # of pixels = 1, and it goes down to near zero when N >= P. So as # of pixels increases this must be some curve that goes up then comes back down and has a maximum at some # intermediate of pixels.

Anyhoo, supposedly the 10+ MP cameras make ugly noise. I think a lot of that perception is due to crappy denoising software.

I've read that the "RAW" coming out cameras isn't really raw; they still run denoise and stuff on it. What you really want is a floating point file that just gives you the voltage at each pixel. I'm sure the camera companies don't want to do that because places like "dpreview" would put up sample screens of what the raws coming out without processing would look like and they would look awful. Then some jackass would build in some denoising and their raw would look much better and retarded consumers would buy it because of that.

12-12-08 | Friday Randoms

It's really fucking annoying how macros are not namespaced. I've got the same macros in cblib, Galaxy, Oodle, etc. etc. and sometimes they conflict. Even worse is macros that have the same name but aren't identical; it's standard practice to either just undef the old one or not define the new one and god knows what the side effects of that are. The macro processor is one of the things that would actually be useful to fix in programming languages, instead we get a lot of shitty languages where they just get rid of the preprocessor completely, which is a huge step backward IMO. *Yes* you should make it so that I don't have much reason to use the preprocessor (by making inline functions actually work, by improving metaprogramming, by improving switches on constants and such to not even generate unreachable code) but that doesn't mean you should kill something the user likes.

Using "#ifdef" for toggles is really bad. I've always known that, but I keep doing it. The big problem with it is if you put in the wrong name to check for ifdef, it just returns false with no error. That hits you in two ways, one is just if you put in a typo in your ifdef it will never be true. Perhaps worse is when you change the name of the thing that you are toggling, all the old checks just fail. Of course what you should do is just "#if " for the toggle.

Musicians cover the wrong thing. Listen up musicians, let me straighten you out. They tend to do covers of songs that are good songs to begin with, or that are popular. This is the wrong criterion; you should cover songs that you can make *better*. In particular, songs where doing a different version would show the original song in amusing new light (the semi-comedy cover is okay, like Macha's "Believe" or Luna's "Sweet Child of Mine") , but mainly songs where the original execution was flawed and the true beauty of the song was not expressed through the first performance. Some good examples would be pretty much any Bob Dylan song or any Steve Merritt song because god those guys are awful singers, but they write great songs, so there's a lot of potential for covers. In any case, the new MGMT covers album is the epitome of what not to do. The original versions of the MGMT songs are the perfect expression of those songs. The best a cover can do is come close to the original, but mainly they just leave you unsatisfied and frustrated that you're not just listening to the better original version. (yes, I know that this is mainly due to stupid record companies; the cover albums are just created because the original sold well and the producer is trying to make some money off that)

I can easily be guilted or pressured into doing things. Not because I actually feel guilty or want to please the person (in fact, quite the opposite, when somebody tries to make me feel like I "should" so something it makes me want to *displease* them or just ignore them). Rather it's because I just find it so unpleasant I want them to stop and go away, and usually the easiest way is just to give them what they want.

I don't believe that people who claim to like super-snobby things actually like them. Examples - noise techno like DJ Spooky or Matmos. In movies, for example Igmar Bergman or Akira Kurosawa. With all of those I can recognize the high level of execution of what they're doing, but it's just so unpleasant, I can't imagine actually enjoying it. Anyway, I just assume that when someone says something like "oh, I adore Kurosawa" they're just lying to try to seem cultured.

I just came up with the perfect analogy for Popcap games today. They're like Television. Mildly amusing, somewhat engaging, not at all challenging or surprising, cute with lots of stimulus, but they let you turn off your brain and zone out. Most people love television. Most people love Popcap games.

One of the big problems with the Netflix ratings in practice is still selection bias. For example, the more obscure a movie is, the higher it gets rated, because the only people who rent it are people who probably know it or love it. Really old movies pretty much all have 4-5 star ratings, but that's only because the people who wouldn't like them don't rent them! In fact this isn't exactly a flaw in the ratings, because the ratings do not actually predict what you would think if you saw a movie - they predict how you would rate a movie *if you rented it*. It's a Bayesian kind of problem - their prediction is for the rating that you would make, given that you chose to rent it and rate it.

12-11-08 | Thursday Randoms

It's supposed to snow over this weekend in a rare freak cold snap. I'm fully expecting total chaos, like this . ADDENDUM : ZOMG WTF I just noticed that video is actually from Seattle (addendum : or not...) ; it's an internet classic, I had no idea it was actually here. LOL Seattle bad-driveaments.

I really want to boycott fucking asshole websites that make you register just to view their content (such as Beeradvocate or Gamasutra or etc etc.) but I usually cave in. To get back at them I usually give a fake name and a junk email address.

Which reminds me, I've started just randomly giving fake info to people. When some fucking bank or stock trader or whatever wants my date of birth for me to open an account, I just tell them a random wrong year close to the right one. You don't need my info, and it's a lot faster to just give them fake bogus info than to refuse. It's satisfying. Even just random times, like when some fucking retailer like Bed Bath or Radio Crap asks you for your zip code, don't bother refusing, just make something up.

Beeradvocate could be really sweet if it let you do a Collaborative Filter, but as is with just the grades, I find it near useless, because I very much disagree with the average rankings. It's biased towards what the beer snobs love, which is shit like the Russian Imperial Stouts and Strong Ales and other such weird foul nonsense that's generally way too high in alcohol and way too fruity and too carbonated. They all secretely love Barleywine and cider and wish they were making. Just make beer. Deschutes knows what I'm talking about. BTW "Dogfish Head" has never made a single good beer.

I hate the beers that sneak-attack drunk your ass (mainly aforementioned snob beers). Like I have two or maybe three and then I feel trashed and I'm like WTF ? And then I look at the bottle and it's like 12% ABV and I'm like "oh snap" I basically just drank a whole bottle of wine unintentionally. Curse you Mr. Sneak Attack Beer.

One of the big differences between me now and me when I was doing "research" is just the patience and ability to stick on one subject for a long time. Back then I would stick on something like arithmetic coding for *months* ; I would gather and read every single paper on the topic. Back then "Teh Interweb" did not really exist so you actually had to go to libraries to photocopy papers; often I would have to request books to be transferred through the interlibrary system so I could get rare stuff like Frank Rubin's arithmetic coder paper (which presaged the modern "range coder"). I might spend weeks just waiting for a paper to come in. I would sit in the garden and write notes about ideas on different algorithms.

Now I just grab all the papers I can easily grab on the web. I read them all in a few hours, mostly just skimming and thinking "junk" or "too obscure" or "ldo, obvious" or "bad writing, bleck I can't bother with this". I work something up really quick that's close enough to state of the art, and then I have to move on, because it's just irresponsible to keep working on some little aspect of my work that isn't driving me towards shipping.

I'm super out of shape right now; I went jogging the other day and coughed up some blood. I went to the gym today and got nauseous and felt ill for an hour afterward. Yeah yeah, I know I'm pushing it too hard too fast, but it's hard for me to comprehend how far I have to dial it back; I was like a ultimate-fighting ex-navy-seal lumberjack, and now I'm like a 70 year old with polio. Working out absolutely sucks when you're out of shape. I can see why all you weak fatties hate it so much. You have to get over a certain hump and then it starts actually feeling good, but until then it's just miserable. When I run right now I picture an elephant running in slow motion. Each step is like a ton of bricks slamming into the earth, and then all my weight sinks down onto that leg and my joints and tendons howl in protest.

Guy LaLiberte' dropped around $3M online in the last few days (around $20M total in 2008). It makes me want to boycott Cirque de Soleil. $150 for a ticket? Outrageous! But I do love me some circus.

Burlesque shows would be pretty damn great if it wasn't always the fat Rosie-O'Donnell-esque women that wanted to be in them. No, I'm sorry, your "sass" does not make up for the fact that you are gross. I'm not asking for stripper type girls; in fact, the *lack* of stripper type girls is what makes burlesque appealing in the first place. I want to see girls who are actually having fun and enjoying the tease show, not soul-crushed strippers. I'm just asking for quirky, real girls that aren't disgusting.

"Why aren't there transvestite girls?"
"You know, like how transvestite males dress up girls; there are no tranny girls that dress up as men."
"Sure there are, they're called lesbians."

God the winter here is really depressing. It's cold and grey and wet and always dark. I could easily slip into really bad patterns, not sleeping, eating badly, not going out, just staying home and fucking around on the computer. It takes a real force of will to stay positive and sane through the gloom. I mean, my sanity at any time is like a pendulum standing straight up; the slightest breeze can send it into a sudden fall and wild oscillations. Normally the slightest thing can send me into a funk for days (like doing a bad job of parallel parking for example). The winter here is like a gale trying to blow my pendulum over.

I hate the style of writing that has become standard for blogs all around the net. It's sort of smarmy, it usually involves an anecdote in which you mock the people or the event that you are telling about. It's very superior and often contains a "lesson". Even many of my favorite blog writers that I often read are very guilty of this, it is *not* funny or clever or amusing. Stop it. (I guess this is like, ironic, or something).

12-08-08 | File Hashes

So I'm sticking a file hash in Oodle and thought I'd test some of the stuff out there. Test candidates :

SHA1 (Sean's stb.h implementation)

MD5 (OpenSSL implementation)

BurtleBurtle Lookup3 (hashlittle2)

Cessu's SSE2 hash (+ extra tail code I added)



In all cases I create a 64 bit hash. Hey, it's plenty of bits, it's easier to pass around cuz I have a 64 bit type, and it makes it a fair competition. SHA1 makes 160 bits (= 5 dwords), MD5 makes 128 bits (= 4 dwords), so I use Bob's Mix method to get that down to 64 bits.

A lot of people think SHA1 or MD5 or something is the "right thing" to do for file hashes. That's not really true. Those hashes were designed for cryptography which is not the same purpose. In particular, they are slow *on purpose* because they want to be hard to invert. They also make tons of bits, not because you need tons of bits to tell files apart, but again to make them hard to invert by brute force attack. I don't care about my file hashes being vulnerable to attack, I just want the chance of accidental collisions to be microscopic.

CRC32+32 means doing CRC32 on alternating bytes and jamming them together to make 64 bits. This is not a true "CRC64" but I might refer to it as CRC64 sometimes. (suggestion by "Joe Smith" ; Joe Smith? Is that a pseudonym?)

Just for background, if the 64 bit hashes are "perfect" - that is the 64 bit words coming out of them are random in every bit, even for input that is very non-random - then the chance of collision is indeed microscopic. (in fact you only need maybe 40 bits). The number of items you can hash in B bits is around 2^(B/2) , so B = 32 is not enough bits since 2^16 = 64k and you may in fact run on 64k files. But even at B = 40, 2^20 = 1 Million is a lot, and certainly B = 64, means 2^32 = 4 Billion items before you expect a collision. So, anyway, the point is to test whether these hashes are actually close enough to perfect on real data that they don't generate an unexpected high number of collisions.

I ran these hashes on every file on my hard drive. I threw out files that were exactly equal to some other file so there would be no false collisions due to the files actually being identical. I have 24963 files. I made 2 variants of every file, one by taking a random byte and changing it to a random value, and another variant by flipping 4 random bits. So in total 74853 arrays were hashed.

First the speed numbers :

sha1                 : 48.218018
md5                  : 19.837351
burtle               : 7.122040
Cessu                : 6.370848
crc32+32             : 15.055287
crc32                : 21.550138

These are in clocks per byte. The CRC numbers are a little funny because the CRC32+32 loop is a bit unrolled, but the CRC32 loop goes byte by byte. In any case, even though CRC is very simple, it is not fast, because even unrolled it still works byte by byte and there's a hard data dependency - you have to completely process each byte before you can work on the next byte.

Cessu's hash is only barely faster than Bob's lookup3 even though it uses SSE2 and works on 16 bytes at a time. Bob's hash is really damn good. When I tested it on strings it did not perform well for me because I'm so close to the bone on strings that the rollup & rolldown overhead killed me, but on larger arrays or even long strings, lookup3 kicks butt. ( Bob's page )

So... how many total collisions in the hashes do you think I had? (only testing collisions versus hashes of the same type of course). Remember I tested on 74853 different arrays, made from 24963 files and 24963+24963 more tiny modifications.


One collision. Of course it was in CRC32. None of the 64 bit hashes had any collisions.

I then tried making 8 variants of each file by 8 different random byte jams, so I was running 199704 arrays. Again zero collisions for any 64 bit hash.

So, in an attempt to actually get a collision, I made 10,000,000 test arrays by sprintf'ing the digits from 1 to 10,000,000 , and then tacked on 2 random bytes. (note : you should not test hashes by making random arrays, because any decent hash will return random output bits from random input bits; the test I am interested in is how close the hash output is to random on highly *nonrandom* input). I ran the hashes on all those strings and got :

collisions on 10,000,000 tests :

sha1                 : 0
md5                  : 0
burtle               : 0
Cessu                : 0
crc64                : 0
rand32               : 11,530
crc32                : 11,576

Again none of the 64 bit hashes has any collisions. CRC32 had quite a few of course - but only as many as a 32 bit random number generator !! That means the CRC is in fact performing as a perfect hash. It is perfectly randomizing the nonrandom input.

So, I have no idea which of the 64 bit hashes is "best" in terms of randomizing bits and detecting errors. Obviously if they are actually perfectly making 64 bits, the chance of me ever seeing a collision is tiny. I thought maybe the "crc32+32" might not have 64 real bits of quality and might fail sooner, but it's not bad enough to fail in any kind of real world scenario apparently.

So, anyway, I'm gonna use "lookup3" because it's both super fast, plenty good, and it has the Bob Jenkins seal of approval which means it should actually be "robust".

HOWEVER : SSE4.2 has a CRC32 instruction. If you were in some application where you could rely on having that, then that would definitely be the fastest way to go, and a CRC32+32 appears to be plenty high quality for file identification.

BTW I keep hearing that CRC32 has degeneracy failures on real world data, but I have yet to see it.

12-08-08 | DXTC Summary

I thought I should fix some things that were wrong or badly said in my original DXTC posts :

DXTC Part 1
DXTC Part 2
DXTC Part 3
DXTC Part 4

On the "ryg" coder : there was a bug/typo in the implementation I was using which gave bogus results, so you should just ignore the numbers in those tables. See for correction : Molly Rocket Forum on Ryg DXTC coder . Also I should note he does have some neat code in there. The color finder is indeed very fast; it is an approximation (not 100% right) but the quality is very close to perfect. Also his "RefineBlock" which does the Simon Brown style optimize-end-from-indices is a very clever fast implementation that collapses a lot of work. I like the way he uses the portions of one 32 bit word to accumulate three counters at a time.

I also mentioned in those posts that the optimized version of the Id "real time DXTC" bit math was "wrong". Well, yes, it is wrong in the sense that it doesn't give you exactly the same indeces, but apparently that was an intentional approximation by JMP, and in fact it's a very good one. While it does pick different indeces than the exact method, it only does so in cases where the error is zero or close to zero. On most real images the actual measured error of this approximation is exactly zero, and it is faster.

So, here are some numbers on a hard test set for different index finders :

    exact : err:  31.466375 , clocks: 1422.256522

    id    : err:  31.466377 , clocks: 1290.232239
            diff:  0.000002

    ryg   : err:  31.466939 , clocks:  723.051241
            diff:  0.000564

    ryg-t : err:  31.466939 , clocks:  645.445860
            diff:  0.000564

You can see the errors for all of them are very small. "ryg-t" is a new one I made which uses a table to turn the dot product checks into indexes, so that I can eliminate the branches. Start with the "ryg" dot product code and change the inner loop to :

    const int remap[8] = { 0 << 30,2 << 30,0 << 30,2 << 30,3 << 30,3 << 30,1 << 30,1 << 30 };

    for(int i=0;i < 16;i++)
        int dot = block.colors[i].r*dirr + block.colors[i].g*dirg + block.colors[i].b*dirb;

        int bits =( (dot < halfPoint) ? 4 : 0 )
                | ( (dot < c0Point) ? 2 : 0 )
                | ( (dot < c3Point) ? 1 : 0 );

        mask >>= 2;
        mask |= remap[bits];

I should note that these speed numbers are for pure C obvious implementations and if you really cared about speed you should use SSE and who knows what would win there.

Now this last bit is a little half baked but I thought I'd toss it up. It's a little bit magical to me that Ryg's Mul8Bit (which is actually Jim Blinn's) also happens to produce perfect quantization to 565 *including the MSB shifting into LSB reproduction*.

I mentioned before that the MSB shifting into LSB thing is actually "wrong" in that it would hurt RMSE on purely random data, because it is making poor use of the quantization bins. That is, for random data, to quantize [0,255] -> 32 values (5 bits) you should have quantization bins that each hold 8 values, and whose reconstruction is at the middle of each bin. That is, you should reconstruct to {4,12,20,...} Instead we reconstruct to {0,8,...247,255} - the two buckets at the edges only get 4 values, and there are some other ones that get 9 values. Now in practice this is a good thing because your original data is *not* random - it's far more likely to have exact 0 and 255 values in the input, so you want to reproduce those exactly. So anyway, it's not a uniform quantizer on the range [0,255]. In fact, it's closer to a uniform quantizer on the range [-4,259].

I think it might actually just be a numerical coincidence in the range [0,255].

The correct straight-forward quantizer for the DXTC style colors is

    return (32*(val+4))/(256+8);

for R.  Each quantization bin gets 8 values except the top and bottom which only get 4.  That's equivalent to quantizing the range [-4,256+4) with a uniform 8-bin quantizer.


1/(256 + 8) = 1/256 * 1/(1 + 8/256)

We can do the Taylor series expansion of 1/(1+x) for small x on the second term and we get ( 1 - 8/256 + 64/256/256) up to powers of x^2

So we have

    ( (32*val+128) * ( 1 - 8/256 + 64/256/256) ) >> 8

And if we do a bunch of reduction we get 

    return ( 31*val+124 + ((8*val+32)>>8) ) >> 8

If we compare this to Mul8bit :

        return ( 31*val+128 + ((val*31 + 128)>>8)) >> 8;

it's not exactly the same math, but they are the same for all val in [0,255]

But I dunno. BTW another way to think about all this is to imagine your pixels are an 8.inf fixed point rep of an original floating point pixel, and you should replicate the 8 bit value continuously. So 0 = 0, 255 = FF.FFFFFF.... = 1.0 exactly , 127 = 7F.7F7F7F..

BTW this reminds me : When I do Bmp -> FloatImage conversions I used to do (int + 0.5)/256 , that is 0 -> 0.5/256 , 255 -> 255.5/256 . I no longer do that, I do 0->0, and 255 -> 1.0 , with a 1/255 quantizer.

12-05-08 | lrotl

Well I found one x86 ASM widget. I've always known you could do nice fast barrel rotations on x86 but thought they were inaccessible from C. Huzzah! Stdlib has a function "_lrotl()" which is exactly what you want, and happily it is one of the magic functions the MSVC recognizes in their compiler and turns into assembly with all goodness. (They also have custom handling for strcmp, memcpy, etc.)

Oh, I noticed lrotl in OpenSSL which seems to have a ton of good code for different hashes/checksums/digests/whatever-the-fuck-you-call-them's.

As a test I tried it on Sean's hash, which is quite good and fast for C strings :

RADINLINE U32 stb_hash(const char *str)
   U32 hash = 0;
   while (*str)
      hash = (hash << 7) + (hash >> 25) + *str++;
   return hash;

RADINLINE U32 stb_hash_rot(const char *str)
   U32 hash = 0;
   while (*str)
      hash = _lrotl(hash,7) + *str++;
   return hash;

stb_hash : 6.43 clocks per char
stb_hash_rot : 3.24 clocks per char

Woot! Then I also remembered something neat I saw today at Paul Hsieh's Assembly Lab . A quick check for whether a 32 bit word has any null byte in it :

#define has_nullbyte(x) (((x) - 0x01010101) & ( ~(x) & 0x80808080 ) )

Which can of course be used to make an unrolled stb_hash :

RADINLINE U32 stb_hash_rot_dw(const char *str)
   U32 hash = 0;
   while ( ! has_nullbyte( *((U32 *)str) ) )
      hash = _lrotl(hash,7) + str[0];
      hash = _lrotl(hash,7) + str[1];
      hash = _lrotl(hash,7) + str[2];
      hash = _lrotl(hash,7) + str[3];
      str += 4;
   while (*str)
      hash = _lrotl(hash,7) + *str++;
   return hash;

stb_hash_rot_dw : 2.50 clocks

So anyway, I'm getting distracted by pointless nonsense, but it's nice to know lrotl works. (and yes, yes, you could be faster still by switching the hash algorithm to something that works directly on dwords)

12-05-08 | 64 Bit Multiply

Something that I've found myself wanting to do a lot recently is multiply two 32 bit numbers, and then take the top 32 bit dword from the 64 bit result. In C this looks like :

U32 mul32by32returnHi(U32 a,U32 b)
    U64 product = (U64) a * b;
    U32 ret = (U32)(product >> 32);
    return ret;

That works fine and all, but the C compiler doesn't understand that you're doing something very simple. It generates absolutely disgusting code. In particular, it actually promotes a & b to 64 bit, and then calls a function called "_allmul" in the Windows NT RTL. This allmul function does a bunch of carries and such to make 64-by-64 bit multiply work via the 32+32->64 multiply instruction in x86. You wind up with a function that takes 60 clocks when it could take 6 clocks :

U32 mul32by32returnHi(U32 a,U32 b)
        mov eax, a
        mul b
        mov eax,edx

Now, that's nice and all, the problem is that tiny inline asm functions just don't work in MSVC these days. Calling a big chunk of ASM is fine, but using a tiny bit of ASM causes the optimizer to disable all its really fancy optimizations, and you wind up with a function that's slower overall.

What I'd like is a way to get at some of the x86 capabilities that you can't easily get from C. The other thing I often find myself wanting is a way to get at the carry flag, or something like "sbb" or "adc". The main uses of those are Terje's cool tricks to get the sign of an int or add with saturation

It would be awesome if you could define your own little mini assembly functions that acted just like the built-in "intrinsics". It would be totally fine if you had to add a bunch of extra data to tell the compiler about dependencies or side effects or whatever, if you could just make little assembly widgets that worked as neatly as their own intrinsics it would be totally awesome for little ops.

ADDENDUM : urg ; so I tested Won's suggestion of Int32x32To64 and found it was doing the right thing (actually UInt32x32To64 - and BTW that's just a macro that's identical to my first code snippet - it doesn't seem to be a magic intrinsic, though it does have platform switches so you should probably use that macro). That confused me and didn't match what I was seeing, so I did some more tests...

It turns out the first code snippet of mul32by32returnHi actually *will* compile to the right thing - but only if you call it from simple functions. If you call it from a complex function the compiler gets confused, but if you call it from a little tight test loop function it does the right thing. URG.

Here's what it does if I try to use it in my StringHash interpolation search test code :

            int start = mul32by32returnHi( query.hash, indexrange );
004087A5  xor         eax,eax 
004087A7  push        eax  
004087A8  mov         eax,dword ptr [esp+38h] 
004087AC  push        eax  
004087AD  push        0    
004087AF  push        ebx  
004087B0  mov         dword ptr [esi],ebx 
004087B2  call        _allmul (40EC00h) 
004087B7  mov         ecx,dword ptr [esp+14h] 

You can see it's extending the dwords to 64 bits, pushing them all, calling the function, then grabbing just one dword from the results. Ugh.

And here's what it does in a tight little test loop :

      U32 dw = *dwp++;
      hash = mul32by32returnHi(hash,dw) ^ dw;
00401120  mov         eax,ecx 
00401122  mul         eax,edx 
00401124  add         esi,4 
00401127  xor         edx,ecx 
00401129  mov         ecx,dword ptr [esi] 

Which not only is correct use of 'mul' but also has got crazy reordering of the loop which makes it a lot faster than my inline ASM version.

12-03-08 | Stuff

One of the advantages of a real newspaper over reading on the computer is that if you get jam on your newspaper you just throw it away.

God dammit, if I forget and open a project in VC that's in the RAD perforce before I do the VPN, it fucking kills DevStudio. It stalls out for like a minute and then wants you to "Work Disconnected" and all this nonsense. It should just pop up a messagebox that says "can't connect to server - retry | abort " and let me go start the VPN. I've already disabled the "open last project" option because it was fucking me so much (and I like that option a lot). Curse you.

A lot of people here seem to have custom headlights that are specifically designed TO MAKE ME MAD. They're like white or blueish, super bright, and they have like laser robotic targetting to shine directly into my eyes at all times. When I drive home at night, I wind up placing my head in the one spot in my car where I can see neither the rearview mirror nor my side mirror. Which of course is very dangerous because I'm now not seeing the mirrors at all.

Why does anyone ever buy a Malcolm Gladwell book? They contain literally zero information. If you're smart enough to understand it, then you already know it. If you didn't know it, you won't ever get it. And no, reading "Blink" at a cafe does not make you look smart.

12-02-08 | Awesome

Greetings In the Light of Our Radiant One, this is Lord Ashtar, of the Ashtar Command, of the Intergalactic Confederation of Worlds. I want to give more insight into the soul you all know now in this life as Barack Obama. ; wow, this is one amazing ball of Crazy.

Watching TV recently I was forced to abruptly stop my regular TiVo skipping of the ads and go back to watch them. Once for this really great ad and once for this stonkeringly bad advertisement . Guess which one's for a video game.

12-02-08 | Oodle Happy

This is from a few weeks ago, but I just pulled it up again and realized it pleases me :

Free Image Hosting at www.ImageShack.us

This is an image of the little thread profile viewer that I made which shows timelines of multiple threads. This particular portion is showing an LZ decompressor thread in the top bar and the IO thread in the bottom bar. The LZ decompressor is double buffered, you can see it ping-ponging between working on the two buffers. As it finished each buffer it fires an IO to get more compressed data. You can see those IO's get fired and run in the bottom. This portion is CPU bound, you can see the IO thread is stalling out a lot (that's partly because there's no seeking and the packed data is coming through the windows file cache).

Anyway, the thing that pleases me is that it's working perfectly. The CPU is 100% busy with decompression work and never wastes any time waiting for IO. The IO thread wakes and sleeps to fire the IOs. Yum. In the future complication and mess I will never again see such nice perfect threading behavior.

12-02-08 | Random

A 15 year old tranny asked me for gas money. I said no because they were obviously in the fancy car their parents bought for them. Fucking spoiled ass teenage trannies these days. God damn suburban kids come in to the city and make it crazy.

I know Alissa already said this, but I felt like the first 15 minutes of Wall-E were sort of something special, or at least different. It was reminiscent of 2001 in the lack of dialog - oh and of course they knew that and used the 2001 music and copied HAL (I guess they would say it was a "reference" or an "homage" which are fancy words for copying). And the Wall-E character design is basically a 100% rip off the Short Circuit robot ( see for example ). And the Eve robot is a really terrible design that's not at all expresive or interesting. Anyway, after all that great promise, it devolves into typical Pixar madcap junk with lots of "zany" chase sequences and characters screaming "eeee" as they cling onto some object which carries them away quickly.

I really don't get the love for Pixar. Yeah their animators are good, but their shading is just awful, only recently have they started to make their movies look like anything but Gouraud / Matte Plastic. Most of their designs still look like conic sections jammed together or really soft subdivision-looking stuff (which makes things look like a skeleton inside a rubber balloon). Until "Ratatouille" I don't think they'd made a decent movie, and that was only semi-decent (often devolving into chase sequences and plodding moralizing).

Even as kids movies, I don't think they've made anything *near* the heart and character of something like "The Jungle Book", "The Dark Crystal", or "Spirited Away".

12-02-08 | H264

I hate H264. It's very good. If you were making a video codec from scratch right now you would be hard pressed to beat H264. And that's part of why I hate it. Because you have to compete with it, and it's a ridiculously over-complex bucket of bolts. There are so many unnecessary modes and different kinds of blocks, different entropy coders, different kinds of motion compensation, even making a fully compliant *decoder* is a huge pain in the ass.

And the encoder is where the real pain lies. H264 like many of the standards that I hate, is not a one-to-one transform from decompressed streams to code streams. There is no explicit algorithm to find the optimal stream for a given bit rate. With all the different choices that the encoder has of different block types, different bit allocation, different motion vectors, etc. there's a massive amount of search space, and getting good compression quality hinges entirely on having a good encoder that searches that space well.

All of this stifles innovation, and also means that there are very few decent implementations available because it's so damn hard to make a good implementation. It's such a big arcane standard that's tweaked to the Nth degree, there are literally thousands of papers about it (and the Chinese seem to have really latched on to working on H264 improvements which mean there are thousands of papers written by non-English speakers, yay).

I really don't like overcomplex standards, especially this style that specifies the decoder but not the encoder. Hey, it's a nice idea in theory, it sounds good - you specify the decoder, and then over the years people can innovate and come up with better encoders that are still compatible with the same decoder. Sounds nice, but it doesn't work. What happens in the real world is that a shitty encoder gains acceptance in the mass market and that's what everyone uses. Or NO encoder ever takes hold, such as with the so-called "MPEG 4" layered audio spec, for which there exists zero mainstream encoders because it's just too damn complex.

Even aside from all that annoyance, it also just bugs me because it's not optimal. There are lots of ways to encode the exact same decoded video, and that means you're wasting code space. Any time the encoder has choices that let it produce the same output with different code streams, it means you're wasting code space. I talked about this a bit in the past in the LZ optimal parser article, but it should be intuitively obvious - you could take some of those redundant code streams and make them decode to something different, which would give you more output possibilities and thus reduce error at the same bit rate. Obviously H264 still performs well so it's not a very significant waste of code space, but you could make the coder simpler and more efficient by eliminating those choices.

Furthermore, while the motion compensation and all that is very fancy, it's still "ghetto". It's still a gross approximation of what we really want to do, which is *predict* the new pixel from its neighbors and from the past sequence of frames. That is, don't just create motion vectors and subtract the value and encode the difference - doing subtraction is a very primitive form of prediction.

Making a single predicted value and subtracting is okay *if* the predicted probability spectrum is unimodal laplacian, and you also use what information you can to predict the width of the laplacian. But it often isn't. Often there are pixels that you can make a good prediction that this pixel is very likely either A or B, and each is laplacian, but making a single prediction you'd have to guess (A+B)/2 which is no good. (an example of this is along the edges of moving objects, where you can very strongly predict any given pixel to be either from the still background or from the edge of the moving object - a bimodal distribution).

12-01-08 | VM with Virtual Disk

I had this idea for a VM to completely sandbox individual programs. It goes like this :

Do a fresh install of Windows or whatever OS. Take a full snapshot of the disk and store it in a big file. This is now *const* and will be shared by all sandboxes.

Every program that you want to run in isolation gets its own sandbox. Initially a sandbox just points at the const OS snapshot which is shared. File reads fall through to that. When you run the installer on the sandbox, it will do a bunch of file writes - those go in a journal which is unique to this sandbox that stores all the file renames, writes, deletes, etc. That can be saved or simply thrown away after the program is done.

You can optionally browse to sandbox journals. They look just like a regular disk with files. What you're seeing is the const OS snapshot with the changes that the individual program made on top of it. You can then copy files in and out of the sandbox drive to get them to your real disk.

So, for example, when you download some program from the internet that you don't trust, you can just pop up a new sandbox and run it there. This is *instant* and the program is 100% isolated from being able to do file IO to your real system. But if it makes some files you want, you can easily grab them out.

You could also mount "portals" across the sandboxes if you want to. For example, say you don't trust shitty iTunes and you want to run it in a sandbox so it can't mess with your registry or anything. But you want your music files to be on your main drive and have those be accessible to iTunes. You can mount a portal like a net drive for the sandbox to be able to "see out" to just that directory. That way you don't have to like duplicate your music files into the iTunes sandbox or whatever.

Aside from isolating rogue programs, this fixes a lot of problems with Windows. It lets you do 100% clean uninstalls - boom you just delete the whole sandbox and the program has no fingers left over. Every program gets its own registry and set of DLLs and such so there can never be conflicts. You don't have that damn problem of Windows always mysteriously going to shit after 5 years.

If you put your OS on c: and all your data on d:, you could easily just let all the sandboxes of trusted programs have portals to d: so that you can just run Photoshop in a sandbox and browse to d: and work on images, and it feels like just run normal programs on a normal computer.

11-30-08 | Whacked out Seattle Streets

First of all, all the mini traffic circles in the residential neighborhoods are fucking retarded. Big full-size traffic circles are *great* when they work right. These things are just there to slow you down, and they actually make the flow of traffic much worse. They make you have to swerve into the path of pedestrians to get around the circle, which annoys me both when I'm in the car and when I'm a pedestrian. People frequently left turn around them the wrong way (clockwise instead of counterclockwise).

There are lots of weird ambiguous traffic situations. The entire Queen Anne Ave going up the hill is unclear. Is it one lane? Is it two? I frequently see people take it as two lanes, but then sometimes jackasses will drive right down the middle of it. It seems to change between one and two just based on peer pressure - if some people are driving down the middle then everyone does, if people are driving it two laned then it's two.

Lanes disappear randomly and without warning. On the aforementioned Queene Anne halfway up the hill it definitely becomes one lane, with no merge warning of course. 1st Ave N where it hits Mercer suddenly goes from 3 lanes to 2 with no warning.

All over Cap Hill here they have a brilliant system where major roads are either 1 or 2 lanes depending on the time of day. They're mostly painted to look like 2 lanes, so you can be driving along in the right lane and suddenly someone is parked in your lane and you have to do a sudden merge or get stuck. It's especially mad because it's on the busiest roads, like Madison, Olive, etc. If you get off the 5 North onto Olive you're immediately in a clusterfuck of lanes mysteriously disappearing and people parked in the right lane (sometimes).

Of course we haven't even got into the lack of pedestrian safety traffic devices.

11-30-08 | Alt Key !!

So a lot of the problems with the HTPC go back to this fucking Alt Key thing. I don't have a keyboard plugged in to it, and randomly and Alt key comes on and gets stuck at some point. If I plug a keyboard in, it fixes itself. When the alt key is down, it fucks up everything, because the remote sends commands by sending keypresses, and they become alt-modified. I just put it together that that's why the volume control randomly quits working sometimes.

I've been searching madly in vain, I can't find anything about this problem.

It's tough to debug, because when it occurs, of course the fucking Alt key is stuck on at a system global level, so I can't run apps and try to get information (you can't do an Alt-R to run something, and even clicking icons on the desktop can't run things because alt-click is something else).

If I knew where in the stack the Alt key was getting injected I could detect it and fire an Alt-down alt-Up which I think would clear it. But I assume the alt is down at a very low level and I'm not sure how to inject keys there. ( GetAsyncKeyState says alt is down ).

The easy solution is just to leave a keyboard plugged in to the mother fucker, which I might just do because I'm sick of dealing with this bitch.

In other random news : Chrome (the browser) is okay, but (A) fuck you for using a name that's generic and already been used by other software products, and (B) until they have a decent plugin architectures it's not usable IMO. I need Adblock and Flashblock and DownloadThemAll and PDFDownload, etc. and (C) I hate your custom bubbly buttons and menus; just use standard UI widgets you goofballs.

In other browser complainer news : I wish Firefox downloader replacements could actually *replace* the default downloader, rather than being a separate button. There are lots of websites that auto-start downloads or having weird buttons that fire the download rather than letting you just right click the file, adn those still go to the standard garbage downloader.

11-28-08 | Historic Capitol Hill

One of my favorite things to do here is just walk around the neighborhood. The streets are lovely; not exactly tree-lined because Seattlites have a great loathing of trees that block the sun, but still verdant, and with great old houses, lovely brick apartments , and lots of variety.

I really dig the "Tudor style" (or "mock tudor" or "Tudorbethan") medieval-esque apartments, many with round towers. One of the men primarily responsible is Anhalt ( and more ). Most of the stuff was built in the late 20's just before the Great Depression, in our penultimate building boom. Architecturally most of the buildings are absolutely monstrous; they combine country cottage style elements with pretentions of grandiosity. They mix graceful decorative elements with heavy blocky imposing forms, and it all seems just randomly slapped together with little logic. Still they are like jewels compared to modern construction.

As the Mike Whybark guy says the Loveless building is also quite lovely; sometimes the gate is open and you can wander into the interior courtyard. Bacchus is gone and in its place is Olivar, a very mediocre restaurant, but the Russian-themed murals on the walls are quite remarkable, and apparently original.

Random other page with historic capitol hill photos .

It's annoying that "Capitol Hill" also exists in Washington DC.

Other local highlights : mansion row on 14th street is okay of course. The mansions over at Harvard and Prospect are good too; here's an example . Hmm. Weird. Google Maps shows that spot as "Lakeview Place" which is supposedly a park, but I haven't noticed a public park there. It has a bare bones entry at seattle.gov

Also I just heard the stairway over at E Blaine and Broadway is nice.

Oh, and the St. Mark's Greenbelt is mostly lame; it should have great views but there are too many trees. I was very amused to find this story by a guy who made a fort in the Greenbelt though.

11-28-08 | Chance of CRC being bad

Say I run the CRC32 on data chunks. I've got them identified by name so I only care about false equality (collisions) on objects of the same name. I'm going to be conservative and assume I only get 31 good bits out of the CRC so there are 2^31 values, not a full 2^32. I think around 20 versions of the data per name max. So the chance of collision is a birthday "paradox" thing (it's not a freaking paradox, it's just probability!) :

P = "1 - e^( - 20*19/(2*(2^31)) )" = 8.847564e-008

Now if you have N names, what's the chance that *any* of them has a collision?

C = 1 - (1 - P)^N

So, for what N of objects is C = 50% ?

log(0.5) = N * log(1 - P)

N = log(0.5)/(log(1-P)

N = 7,834,327

You can have 7 Million files before you get a 50% chance of any one of them having a collision.

For a more reasonable number like N = 100k , what's the chance of collision?

C = "1 - (1 - 8.847564*(10^-8))^100000" = 0.00881 = 0.881 %

These probabilities are very low, and I have been pessimistic so they're even lower, but perhaps they are too high. On any given project, an 0.881% chance of a problem is probably okay, but for me I have to care about what's the chance that *any customer* has a problem, which puts me closer to the 7 M number of files and means it is likely I would see one problem. Of course a collision is not a disaster. You just tell Oodle to refresh everything manually, and the chance is that only happens once to anybody ever in the next few years.

BTW we used CRC32 to hash all the strings in Oddworld : Stranger's Wrath. We had around 100k unique strings which is pretty close to the breaking point for CRC32 (unlike the above case, they all had to be unique against each other). Everything was fine until near the end of the project when we got a few name collisions. Oh noes! Disaster! Not really. I just changed one of the names from "blue_rock.tga" to "blue_rock_1.tga" and it no longer collided. Yep, that was a crazy tough problem. (of course I can't use solutions like that as a library designer).

11-28-08 | Google = SF and MS = Seattle

Just like Google has amazing maps data only in SF, MS has really amazing maps data in Seattle.

I think those companies are pretty good representatives for their respective cities.

11-28-08 | Optimization

Yes yes, I'm generally in the "measure carefully and only optimize where needed" camp which the pretentious and condescending love. Weblog pedants love to point out pointless optimizations and say "did you really measure it?". Fie.

There's a great advantage to just blindly optimizing (as long as it doesn't severely cost you in terms of clarity or maintainability or simplicity). It lets you stop worrying. It frees up head space. Say you do some really inefficient string class and then you use it all over your code. Every time you use you have to wonder "is this okay? am I getting killed by this inefficient class?" and then you measure it, and no, it's fine. Then a month later you worry again. Then a month later you worry again. It gives you great peace of mind to know that all your base components are great. It lets you worry about the real issue, which is the higher level algorithm. (note that micro-optimizing the higher level algorithm prematurely is just always bad).

11-28-08 | Internet Crap

The explosion of garbage on the internet is effectively now making the internet smaller. As long as you confine yourself to a small arcane field like I dunno hash table algorithms the "internet" seems to work fine - you can do searches and find good pages and such.

But try doing something more mainstream today. Even if you search for something like DXTC you will be deluged with garbage. God help you if you try to find something like product reviews for consumer goods like microwaves or something. What you will find is : automatic aggregator sites that are worthless, sites camping on the keywords that literally have zero real content and just ads, sponsored sites, web forums with lots of retards writing reviews they know nothing about, etc..

The net effect is that smart internet users are disconnecting from the global web and retreating into "neighborhoods". More and more now I go directly to sites I know rather than doing global searches. If I want camera reviews I go to dpreview, if I want information about Windows I go straight to MSDN, and for general information I go to my friends' blogs.

This is really a negative thing. It means I'm not getting information from new sources, I'm not exploring the web and finding new connections. Metaphorically I imagine a vast cityscape of internet information, and there's goods out there I'd like to find, but 90% of the houses are now literally spewing feces and vomit out their doors, so I have built a wall of sandbags around my internet home. I look out on the city of shit and long for a time when I could aimlessly wander its streets and stumble upon real treasures.

11-27-08 | WTF Seattle

It's freezing in our apartment and as usual the landlord has ignored my call. I figured I'd look up the law . WTF Seattle. The law here only requires heat to get you to 58 degrees at night. 58 !?!!? 58 is like hobos huddling together level of warmth. Our apartment is somewhere in the low 60's and apparently that's perfectly legal. In SF the requirement is for 68 degrees, which is actually a bit retardedly high IMO; plus heat is a lot more crucial here than in SF. A requirement around 65 would be best.

In other WTF Seattle news - there's disgusting rotting leaf slop everywhere around the streets and sidewalks here. Early on when I got here I noticed there was no city street sweeping. No street sweeping at all! I noticed it mainly because you don't have to ever move your car when it's parked on the street, which does save a lot of stress worrying about the street sweeping days. It was fine through the summer, but then all the leaves fell, and it rained and rained, and nobody swept them up. And no I'm not talking about in front of my house, I'm talking about 25% of the homes around the city. Now the fallen wet leaves have rotted and turned into a brown slop that squishes and sticks in your shoes.

My first thought was "they don't bother street sweeping because it rains so much", but the real answer, like most answers to "WTF Seattle" is "no income tax".

ADDENDUM : neighbors are having another backyard fire and sing along. You can't really possibly imagine how outrageous this is. It's not like we have big properties. The buildings on capitol hill are literally less than 10 feet apart, so their fire pit is about 20 feet from our bedroom and the smoke comes right in the window. So I went out to talk to them tonight after midnight when they were having another guitar sing a long around the fire. It seems like a fun hippy crowd, they drink and sing and hang out by the fire, I wish I was doing that. But not right by my freaking bedroom, WTF. I hate the experience of confronting people. I hate being the fucking nosey asshole neighbor. But I hate sitting in here fuming about doing nothing even worse. And once I make a stink I expect results. If they don't cooperate voluntarily I have various mechanisms to extract action, though I hate to get into a whole neighbor war situation. I've never complained to neighbors about noise before in my life and now I've done it three times with two different neighbors here.

Bleck. One thing that bugs me is that when we looked at this apartment we *knew* these things would be issues. We looked at the squeaky old wood floors, and the thin drafty windows, and the hippy house right next door, and we explicitly called out every one of those problems. But we took the place anyway because it was the best thing we'd seen in 8 days of intensive looking and the hotel bill was getting out of hand and the effort of searching for apartments every day had worn us down.

We might have to just move.

Buying a house is so scary to me because you never know if your neighbors are going to be inconsiderate nutjobs. Sure if you get away from the apartment/youngster area you can probably avoid big late night parties, but as you get into the yuppie suburban areas you start getting the Mr. Fix It 7 AM home improvement guy who's constantly redoing everything in his house and doing it himself so he works at all hours and it just never ends.

Buying a condo just seems like madness to me because your quality of life still hinges so strongly on your neighbors and you can't be sure what you're getting into (or that they will stay the same). Plus you have to go through the condo maintenance guy still and just hope he's reasonable. You have almost all the problems of apartments, but you're locked into it so you can't easily flee if it turns out to be a disaster.

I need to buy like 20 acres of undesirable swamp land and just park a trailer in the middle of it. Ah, sweet sleep.

11-26-08 | Oodle

OMG I finally got hot loading sort of working with bundles and override bundles. So you can build a package of a bunch of files. The game runs and grabs the package that pages in. You touch some file, the game sees the change, that causes it to make a single-file-bundle for the changed file, it loads in the new bundle, and then patches the resource table so that the new version is used instead of the old one in the package. Then the Game Dictionary is updated so that future runs will also load the new version.

Phew. It works but it's fragile and I'm not happy with how careful you have to be. There are lots of ways it can fail. For example, the packages all load one by one asynchronously. If you fire the big package load first, it may come in and get hooked up before the override package. Then if you hook up your game resources, you have hooked up to the old (wrong) data. You can prevent this by doing a call on each resource that you get to say "is this the right version of this resource" before hooking up to it. But currently there's just a ton of stuff like that you have to be aware of and make sure to do the right call, etc. etc.

I keep coming back to the problem that I need to know whether I have the right version of the file in a package. There are three options I can see :

1. Fuck it. Do nothing. This requires the game to load the right versions all the time. Clients could, for example, always just work unpackaged while editing, then when they make paging packages they would simply not be editable or incremental-linkable in a nice way. This is not a happy solution but it "works" in the sense that it is *reliable* about not working.

2. Use mod times. This works fine and is easy and fast *as long as you only touch and make files on your local machine*. Which would fail for major game devs since they tend to compile files on some kind of build server, or check in and share compiled files. But if you think about the way programmers work - we don't ever check in our .OBJ files, we just check in sources and everybody has to rebuild on their local machine. If you make your artists do the same thing - only share source files and always local rebuild - OR sync to a full compiled set from the server and only run compiled, but never try to mix local changes with server compiles - then it works fine.

3. Use CRC's. This is the most robust and reliable. The packages store the CRC of the files they were made from. If the source file on disk has a different CRC, we assume that the source file is better. (we don't try to use mod times to tell who's newer because of the server-client mod time problem). If the CRC is different we repackage using the local source file. Then when we load we always prefer packages that have content that matches the local CRC. This works and is stable and good and all. The only problem is all the time you spend doing CRC's on files, which may or may not be a big deal. Obviously running over your whole tree and doing CRC's all the time would be ridiculous, but you can use mod time to tell when to redo the CRC, so you only incrementally redo it. Even getting all the mod times is too slow, so you can use a persistent Watcher app to cache the drive dir listing and file infos.

The "CRC" method is not necessarilly "right" in the sense that it doesn't load the newest version of the content, but it is reliable, it always does the same thing - it loads content that corresponds to the source version on the user's disk.

BTW this last thing is something the OS should really do but totally fails on. With the amount of memory we have now, the OS should just keep a listing of every dir you've ever visited in memory at all times. Even if you visit every dir on the disk it would only be 100 Megs. You could cap it at 64 MB or something and LRU, and obviously having 64 MB of dir listings is going to make your dir access instant all the time. I might just write an app to do this myself.

I'm kind of tempted to offer #1 and #3 to clients and not offer #2. I don't want to offer any features that sort of work or will have weird behavior that looks like "bugs".

Also BTW yes of course there will also just be a mode to load what's in the packages and not even look at what's newest or anything. This is what the real shipping game will use of course, once you have your final packages set up it just loads them and assumes you made them right.

11-26-08 | Offices

God damn I hate how you can't open windows in any fucking office buildings any more. That shit should be illegal. We need laws for everything because if it's not a fucking law the fuckers will fuck you over. This office we're in is really very nice so far as offices go, but the air gets all stale, and god forbid if somebody makes something in the microwave it smells the place up for the next 24 hours because THERE's NO FUCKING WINDOWS. Urg.

I despise the construction of new offices; I think the reason I hate new tract homes and condos so much is that they feel like offices to me. Drywall, aluminum or latex windows, berber carpet, bleck.

On the plus side, new places actually block sound and air and rain pretty well. Our apartment is practically like camping. It's super cold and drafty, sound comes right through the windows like there's nothing there. And of course to complete the camping parallelism, our neighbors light fires in the middle of the night and stand around and drink and smoke and are just generally redneck white trash. Ah, camping, getting away from the crowded city so you can be in an assigned spot right next to a bunch of rednecks that run their car engines all night long, with no walls to protect you from the sound. What a treat.

I'm a really sensitive baby when it comes to work spaces. I need wood and plants and light and fresh air. Ideally 50's modern design, some long horizontal lines ala Richard Neutra or FLW. Little things really bug me, like having people behind my back, or really any kind of distraction when I'm really trying to focus. I understand the value of having everybody in a game company all together in a big open space, and that's cool to some extent, but I can only stand that for maybe an hour out of the day and then I want to retreat into my womb-like office that opens out onto a courtyard with limestone floor, ferns around the edges, a big oak in the middle, and a trickling fountain.

Oh, this is only tangentially related, but this place : The House in Magnolia (Seattle) is pretty good for 50's modern. Unfortunatey it's become very popular of late, so it's all very expensive now. People run businesses going to the midwest where they have no idea what's good or bad and buying up the great 50's designer stuff and bring it to the coasts.

Continuing the tangent - I'm sitting on Tom's Swopper right now. I've been sitting on physioballs for years, so it's not that different than that really. It's good for you because you sit "actively" and you can't slouch and all that. But I'm starting to think the biggest benefit is that it's just so damn uncomfortable that you move around and get up a lot and don't ever stay in the same position for a long time.

11-25-08 | Funnies

Things to cheer me up :

SFW Porn ZOMG this is so disturbing.

Tuff Fish plays Mario ; okay, the audience for this is pretty small; you have to be a video game geek and also be familiar with Tuff Fish and his amazing HYACHACHA moment. If you are, then this is the funniest thing ever.

FAIL is always good.

Charlie sings Night Man ; best moment ever on TV.

11-25-08 | Drear

It's so dark and drizzly here all the time. Even on a clear sunny day, the sun doesn't climb more than 20 degrees above the horizon. Today it's like gray pea soup outside. It makes me just want to stay in bed. Wake me in March.

ADDENDUM : one day later it's gorgeous out, so I'm being taught that when I whine in my blog things magically get better. Being the whiney child that I am this is a bad lesson for me.

11-24-08 | Lua Coroutines

.. are cool. Coroutines is the awesome way to do game logic that takes time. The script looks like :

if ( GetState(Friends) == dancing )
    LookAt( Friends );
    speech = StartSpeech("Screw you guys");
    Go( Gome );

These things take various amounts of time and actually yield back execution to the game while that happens. Each NPC or game object has its own coroutine which the game basically just "pings". The Lua code really only ever runs a few lines, all it does is pick a new task and make the object do that task, and then it yields again. (obviously if it was just a static list like the example above you could do that easily other ways, the good thing about script is it can do conditionals and make decisions as it goes).

Coco lets you do a real Lua yield & resume from C calls, which lets you build your "threading" into your game's helper functions rather than relying on your designers to write scripts that yield correctly. LuaJIT (at the Coco link) is cool too.

I don't really like the Lua syntax, and I also personally hate untyped languages. (I think the right thing for scripting style languages is "weakly typed", that is all numeric types convert silenty and thing do their ToString and such, but I like to just get a compile error when I try to pass an alligator to the factorial function. Also having explicit types of "location" and "NPC" and such would be nice.) But having an off the shelf scripting language that does the coroutine thing is a pretty big win.

I generally think of simple game AI as being in 2 (or more) parts. The 2 main parts are the "brain" (or reactions) and the "job" (or task). The job/task is what the guy is currently doing and involves a plan over time and a sequence of actions. This is best done with a coroutine of some kind. The brain/reaction part is just a tick every frame that looks around and maybe changes the current task. eg. it's stuff like "saw the player, go into attack job". That can be well implemented as a regular Lua function call that the game just calls every frame, it should just do stuff that doesn't take any time and return immediately.

11-24-08 | Unix Utils

I've always hated "cygwin". I don't want a crappy semi-emulation of a Unix shell. I just want to be able to run some of the Unix utils (like grep and patch) natively as Win32 console apps in my 4DOS (TCC). Most of the POSIX API is available natively in Win32, so just use that! Well, here you go : Unix Utils in native Win32 . Yay.

11-24-08 | The Economy and the Wrong Argument

The pro-corporate anti-regulation junta has already won because they've succeeded in making the debate about whether you support the "free market" or not. Umm, as opposed to what? Who exactly is on the other side? There's zero debate here, it's a common tactic, they just keep saying "I believe in the free market" even though nobody is saying that they don't. And that really has absolutely nothing to do with what we should be talking about.

The reality is that we don't have a completely free market and we *shouldn't* and no sane person would say that we should. We have regulations and laws. Those laws protect consumers, they stabilize the economy (or at least they're supposed to), and they help companies.

In fact, at the most fundamental, the entire existance of the idea of a "Corporation" is a fabrication which is enforced by our government and in fact is a very powerful distortion of a true anarchistic/libertarian free market. The corporation allows individuals to take very great risks without personal responsibility. That's okay if it is balanced by requirements that make sure they are supportive of the greater good of the nation. Furthermore, the whole Federal Reserve system is a big distortion of "free market" ; it's sort of ironic for these Federal Reserve chiefs to be lobbying for the free market; umm.. so your giving cheap money to a select few institutions, what exactly is free about that? But again, that's fine, the Fed is a reasonable thing to have to stabilize the economy for the greater good of the nation - but only if the people getting fed money are held responsible.

The debate is not about "free market" vs. "not free market". The debate should be about what kind of regulatory structure will be best for the people of America as a whole. Often what's good for certain large businesses is also what's good for America, but not always. The real debate here is not about "free markets" but about corporatism - the idea that if the government acts in the best interest of large corporations that it is also helping the majority of Americans.

I just randomly found this blog by another programmer with liberal leanings who actually does some research and doesn't just write nonsense like me.

11-23-08 | Hashing & Cache Line Size

A while ago I wrote about hash tables and reprobing and cache sizes. I mentioned Cliff Click's nonblocking hash table which is sort of interesting to me in theory but also scary and not something I really need at the moment. Anyhoo I was looking at : Some dude's C version of the lock free hash table and I noticed a little thing that's relevant to my old discussion.

He explicitly uses the cache line size and does +1 linear stepping along the cache line, then does a complex reprobe by rehashing, then does +1 again while in the cache line, then rehashes, etc. That seems like a pretty reasonable thing to do. I guess I considered that at the time, but didn't want to do it because it requires you to explicitly know the cache line size of the machine you're on. It literally requires a hard coded :

NumLinearSteps = CACHE_LINE_SIZE / sizeof(table_entry);

Some other : blogs about the lock free hash have extensive link lists.

11-22-08 | Stupid Google Searches

I've done some really bone-headed foolish google searches recently.

The other day Casey was talking about buying a "CT Butt" which I guess is some type of pork roast, so of course I go straight to google and search Images for "CT Butt". Yeah, dumb.

Now today I just read that Hung from Top Chef is gay, which surprised me, so I figured I'd google and see if it was true. So of course I just go to google and type in "Hung Gay". Umm.. yeah. Whoah, rookie mistake. Of course all those people named "Hung" or "Wang" are really just setting themselves up for google problems.

11-22-08 | Rasterization

I just found this pretty sweet introduction article on barycentric rasterization from 2004. It's not super advanced, but it starts at the beginning and works through things and is very readable. There are some dumb things in the block checking, so if you care go to the last page and see the posts by "rarefluid".

BTW the "edge equations" are really 2d plane equations (edge cross products). Checking just the edge planes against 8x8 blocks is only a rough quick reject. You can have blocks that are right outside of one vertex at an acute corner, and those blocks are "inside" all the planes but outside of the triangle. The code they have posted also checks against the bounding box of the whole triangle which largely fixes this case. At most they will consider one extra 8x8 block which doesn't actually contain any pixels.

(it's also really not yet a full barycentric rasterizer, he's just doing the edge tests that way; from his other posts I figure he's doing interpolation using the normal homogenous way, but if you're doing the edge-tests like that then you should just go ahead and do your interpolation barycentric too).

This kind of block-based barycentric rasterizer is very similar to what hardware does. One of the nice things about it is the blocks can easily be dispatched to microthreads to rasterize in parallel, and the blocks are natural quanta to check against a coarse Z buffer.

The old Olano-Greer paper about homogenous coordinate rasterization is now online in HTML. Some other random junk I found that I think is just junk : Reducing divisions for polygon mappers & Triangle Setup .

This blog about Software occlusion culling is literally a blast from the past. As I've often suggested, if you care about that stuff, the SurRender Umbra technical manual is still the godly bible on all things occlusion. (or you can read my ancient article on it ). But I also still think that object-based occlusion like that is just a bad idea.

Coarse Z that's updated by the rasterizer is however a good thing. Doing your own on top of what the card does is pretty lame though. This is yet another awesome win from Larrabee. If we/they do a coarse Z buffer, it can get used by the CPUs to do whole-object rejections, or whole-triangle rejections, or macroblock rejections.

Apparently the guy who wrote that top article is Nicolas Capens ; he wrote "swShader" which was an open source DX9 software rasterizer, which got taken down and is now a commercial product (which was a silly thing to do, of course any customer would rather buy Pixomatic !!). I learned this from a random flame war he got in. Don't you love the internet?

11-22-08 | WTF Seattle

It's midnight and my neighbors just lit a fire in their back yard. WTF is wrong with this city?

I'm a little torn because hey, people can have parties once in a while, we don't need to forbid all after midnight socializing. There's a certain social contract to being neighbors in the city, mostly it means being reasonably quiet, but it also means sometimes being tolerant of others' noise.

Ugh, I was about to go to bed and now I'm so full of rage and self loathing that I'll be up for hours.

Double WTF some drunk driver just ran into the telephone pole in front of my apartment.

All of this would not really be exceptional in a big city like NY or SF or whatever, but Seattle is really quite small and feels completely dead at night, the streets are empty and yet there's a high concentration of craziness right next to me.

11-21-08 | DXTC Part 4

So I finally implemented the end point lsqr fit from indeces thing that Simon does for myself. One thing immediately fell out - doing just 4 means and then end point fit pretty much equals the best mode of Squish. That's very cool because it's quite fast. Note that this is not doing all the searches of all possible clusterings - I just pick one clustering from 4 means and then optimize the end points for those indeces. (when I do 4 means I actually try a few different ways as I mentioned previously, I try all the permutations of putting the 4 means on the 4 palette entries, which is 4!/2 ways = 12 ways, but then I only optimize the best one of those, so it's still very fast).

One thing I noticed is that the lsqr fit really doesn't do much other than shift an end point by one. That is, the end points are in 565 already, you can do this nice optimization in floats, but when you quantize back to 565 you pretty much always hit the point you started with or at most a step of 1.

So the new "CB" modes are :

CB 1 = just 4 means then lsqr fit , faster than Squish, a bit slower than ryg. Quality is roughly competitive with Squish, but they have different strengths and weakness, so taking the best of the two might be reasonable. Squish never beats "CB 1" by a lot, but "CB 1" kills it on the weird "frymire" image.

CB 2 = CB 1 followed by simulated annealing.

CB 3 = Very slow heavy optimization. This is an attempt to see what "optimal" would get you. It tries "CB 2" and then it also tries using all 16 colors in the block as end points, so 16*16 trials, does the lsqr optimization on each trial, and then anneals the best of those. There are still various other things to try that might find a better result, but this is already pretty crazy slow. This is too slow to use in real production, the idea is simply to get an idea of how close "CB 2" is to optimal. Of course "CB 3" could still be far off optimal, I'm only conjecturing that it's close.

One of the interesting things to look at is the curve of diminishing returns from CB1 to 2 to 3. In most cases there's only a small improvement from 2 to 3, but there are exceptions, mainly in the weird degenerate images. kodim02 is one case (this is a photo but it's almost all red), and frymire of course. That meets expectations. On noisy natural images, the cluster of colors is pretty close to Gaussian noise, which works well with the PCA single line fit and the least-squares contiuous distribution approximation. On weird images with degenerate cases there can be stranger optimal solutions (for example : putting one of the color end points outside of the volume of original colors, so that one of the interpolated 1/3 colors can hit a certain value more precisely).

ADDENDUM : you might validly object and ask why the annealing is not getting closer to the global optimum. There are some approximations in the annealing that are hurting. For one thing I only try wiggling the ends by 1 step in 565. Then I don't really run it for very long, so it doesn't have a chance to make big walks and get to really different solutions. All it can really do is local optimizations with small steps to tunnel past local barriers to find better minima - it's not trying huge steps to other parts of the solution space. Theoretically if I ran a much longer annealing schedule with more time spent at higher temperatures it would do a better job of finding the global minimum. But I'm happy with this approach - the annealing is just an improved local optimization that steps bast small barriers, and to find drastically different global solutions I have to seed the trial differently.

The new numbers :

file CB 1 CB 2 CB 3 Squish opt Squish ryg D3DX8 FastDXT
kodim01.bmp 8.447 8.3145 8.2678 8.2829 8.3553 8.9185 9.8466 9.9565
kodim02.bmp 5.6492 5.4759 5.2864 6.1079 6.2876 6.8011 7.4308 8.456
kodim03.bmp 4.7533 4.6776 4.6591 4.7869 4.9181 5.398 6.094 6.4839
kodim04.bmp 5.5234 5.4286 5.3967 5.6978 5.8116 6.3424 7.1032 7.3189
kodim05.bmp 9.7619 9.6171 9.5654 9.6493 9.7223 10.2522 11.273 12.0156
kodim06.bmp 7.2524 7.1395 7.1086 7.15 7.2171 7.6423 8.5195 8.6202
kodim07.bmp 5.7557 5.6602 5.634 5.784 5.8834 6.3181 7.2182 7.372
kodim08.bmp 10.3879 10.2587 10.2056 10.2401 10.3212 10.8534 11.8703 12.2668
kodim09.bmp 5.3242 5.2477 5.2247 5.2935 5.3659 5.7315 6.5332 6.6716
kodim10.bmp 5.2564 5.1818 5.1657 5.2478 5.3366 5.7089 6.4601 6.4592
kodim11.bmp 6.7614 6.6503 6.6139 6.731 6.8206 7.3099 8.1056 8.2492
kodim12.bmp 4.8159 4.747 4.7308 4.7968 4.8718 5.342 6.005 6.0748
kodim13.bmp 11.0183 10.8489 10.7894 10.8684 10.9428 11.6049 12.7139 12.9978
kodim14.bmp 8.4325 8.3105 8.2723 8.3062 8.3883 8.8656 9.896 10.8481
kodim15.bmp 5.6871 5.5891 5.548 5.8304 5.9525 6.3297 7.3085 7.4932
kodim16.bmp 5.1351 5.0439 5.0274 5.065 5.1629 5.5526 6.3361 6.1592
kodim17.bmp 5.5999 5.5146 5.4976 5.509 5.6127 6.0357 6.7395 6.8989
kodim18.bmp 8.1345 8.0103 7.9791 7.9924 8.0897 8.6925 9.5357 9.7857
kodim19.bmp 6.6903 6.5979 6.5645 6.5762 6.652 7.2684 7.9229 8.0096
kodim20.bmp 5.4532 5.3825 5.3582 5.4568 5.5303 5.9087 6.4878 6.8629
kodim21.bmp 7.2207 7.1046 7.0718 7.1351 7.2045 7.6764 8.4703 8.6508
kodim22.bmp 6.52 6.4191 6.3933 6.4348 6.5127 7.0705 8.0046 7.9488
kodim23.bmp 4.9599 4.8899 4.8722 4.9063 5.0098 5.3789 6.3057 6.888
kodim24.bmp 8.5761 8.4633 8.4226 8.4299 8.5274 8.9206 9.9389 10.5156
clegg.bmp 14.8934 14.8017 14.6102 14.9736 15.2566 15.7163 21.471 32.7192
FRYMIRE.bmp 7.4898 7.3461 6.0851 10.7105 12.541 12.681 16.7308 28.9283
LENA.bmp 7.1286 7.0286 6.9928 7.1432 7.2346 7.6053 8.742 9.5143
MONARCH.bmp 6.6617 6.5616 6.5281 6.5567 6.6292 7.0313 8.1053 8.6993
PEPPERS.bmp 6.0418 5.9523 5.8026 6.4036 6.5208 6.9006 8.1855 8.8893
SAIL.bmp 8.4339 8.3077 8.2665 8.3254 8.3903 8.9823 9.7838 10.5673
SERRANO.bmp 6.7347 6.2445 5.9454 6.3524 6.757 7.0722 9.0549 18.3631
TULIPS.bmp 7.6491 7.5406 7.5065 7.5805 7.656 8.0101 9.3817 10.5873
lena512ggg.bmp 4.8961 4.8395 4.8241 4.8426 4.915 5.1986 6.0059 5.5247
lena512pink.bmp 4.6736 4.5922 4.5767 4.5878 4.6726 5.0987 5.8064 5.838
lena512pink0g.bmp 3.7992 3.7523 3.7455 3.7572 3.8058 4.2756 5.0732 4.8933
linear_ramp1.BMP 1.5607 1.348 1.3513 1.4035 2.1243 2.0939 2.6317 3.981
linear_ramp2.BMP 1.5097 1.2772 1.2772 1.3427 2.3049 1.9306 2.5396 4.0756
orange_purple.BMP 2.9842 2.9048 2.9074 2.9125 3.0685 3.2684 4.4123 7.937
pink_green.BMP 3.2837 3.2041 3.2031 3.2121 3.3679 3.7949 4.5127 7.3481
sum : 250.8565 246.2747 243.2776 252.3828 259.7416 275.5826 318.5562 370.8691

11-21-08 | More Texture Compression Nonsense

I guess the DX11 Technical Preview was just publicly released a few weeks ago. Unfortunately it's semi-crippled and still doesn't have information about BC7. From what I gather though BC7 does seem to be pretty high quality.

There're multiple different issues here. There's providing data to the card in a way that's good for texture cache usage (DXT1 is a big winner here, especially on the RSX apparently). Another is keeping data small in memory so you can hold more. Obviously that's a bigger issue on the consoles than the PC, but you always want things smaller in memory if you can. Another issue is paging off disk quickly. Smaller data loads faster, though that's not a huge issue off hard disk, and even off DVD if you're doing something like SVT you are so dominated by seek times that file sizes may not be that big. Another issue is the size of your content for transmission, either on DVD or over the net; it's nice to make your game downloadable and reasonably small.

I guess that the Xenon guys sort of encourage you to use PCT to pack your textures tiny on disk or for transmission. PCT seems to be a variant of HD-photo. MCT is I guess a DXT-recompressor, but I can't find details on it. I'm not spilling any beans that you can't find at MS Research or that end users aren't figuring out or PS3 developers are hinting at.

The "Id way" I guess is storing data on disk in JPEG, paging that, decompressing, then recompressing to DXT5-YCoCg. That has the advantage of being reasonably small on disk, which is important if you have a huge unique textured world so you have N gigs of textures. But I wonder what kind of quality they really get from that. They're using two different lossy schemes, and when you compress through two different lossy schemes the errors usually add up. I would guess that the total error from running through both compressors puts them in the ballpark of DXT1. They're using 8 bits per pixel in memory, and presumably something like 1-2 bits per pixel on disk.

Instead you could just use DXT1 , at 4 bits per pixel in memory, and do a DXT1-recompressor, which I would guess could get around 2 bits per pixel on disk. DXT1-recompressed is lower quality than JPEG, but I wonder how it compares to JPEG-then-DXT5 ?

If I ignore the specifics of the Id method or the XBox360 for the moment, the general options are :

1. Compress textures to the desired hardware-ready format (DXT1 or DXT5 or whatever) with a high quality offline compressor. Store them in memory in this format. Recompress with a shuffler/delta algorithm to make them smaller for transmisssion on disk, but don't take them out of hardware-ready format. One disadvantage of this is that if you have to support multiple hardware-ready texture formats on the PC you have to transmit them all or convert.

2. Compress texture on disk with a lossy scheme that's very tight and has good RMSE/size performance. Decompress on load and then recompress to hardware-ready format (or not, if you have enough video memory). One advantage is you can look at the user's machine and decide what hardware-ready format to use. A disadvantange is the realtime DXT compressors are much lower quality and even though they are very fast the decompress-recompress is still a pretty big CPU load for paging.

3. Hybrid of the two : use some very small format for internet transmision or DVD storage, but when you install to HD do the decompress and recompress to hardware-ready format then. Still has the problem that you're running two different lossy compressors which is evil, but reduces CPU load during game runs.

I don't think that having smaller data on disk is much of a win for paging performance. If you're paging any kind of reasonable fine-grained unit you're just so dominated by seek time, and hard disk throughput is really very fast (and it's all async anyway so the time doesn't hurt). For example a 256 x 256 texture at 4 bits per pixel is 32k bytes which loads in 0.32 millis at 100 MB/sec. (BTW that's a handy rule of thumb - 100k takes 1 milli). So the real interesting goals are : A) be small in memory so that you can have a lot of goodies on screen, and B) be small for transmission to reduce install time, download time, number of DVDs, etc.

One idea I've been tossing around is the idea of using lossy compressed textures in memory as a texture cache to avoid hitting disk. For example, a 256 X 256 texture at 1 bit per pixel is only 8k bytes. If you wanted to page textures in that unit size you would be ridiculously dominated by seek time. Instead you could just hold a bunch of them in memory and decompress to "page" them into usable formats. The disk throughput is comparable to the decompresser throughput, but not having seek times means you can decompress them on demand. I'm not convinced that this is actually a useful thing though.

BTW I also found this Old Epic page about the NVidia DXT1 problem which happily is fixed but I guess there are still loads of those old chips around; this might still make using DXT1 exclusively on the PC impossible. The page also has some good sample textures to kill your compressor with.

11-20-08 | Pointless

I keep thinking about pointless things and I have to remind myself to stop.

One is texture compression, as I just mentioned. As long as the hardware needs DXTC, there's really not much I can or *should* do. I'd have to decompress then recompress to DXTC which is just pointless. Stop thinking about it.

Another is antialiased rasterization. Particularly in LRB I think you could do a pretty sweet exact antialiased rasterizer. But it's pretty pointless for 3d. It might be nice for 2d, for fonts and vector graphics and such, but in 3d the bad aliasing isn't even because of edges. The bad aliasing comes from various things - mainly failing to filter textures well when they aren't plain color textures (eg. mip-mapping normal maps), or aliasing due to lighting, and especially reflections (BTW specular is a type of reflection). Fresnel specular reflections are the worst because they're strong right at edges where you have very little precision, so they sparkle like crazy. To really handle these cases you need something like adaptive super-sampling (putting more samples where you need more detail). And to really do that right you need your shaders to return frequency information or at least derivatives.

... but I'm not thinking about that any more because it's pointless.

Oh and I should mention some other things about texture compression while they're in my head.

1. While the fixed block size is really bad for RMSE, it's not quite so bad for perceptual error. What it means is that you put "too many" bits in flat areas and send them almost perfectly. In noisy areas you don't have enough bits and send them very badly. But that's sort of what you want perceptually anyway. It's not actually ideal, but it's also not horrible.

2. One thing that none of these formats do is take advantage of mips. Assuming you always have a full mip chain, and you're doing trilinear filtering - then any time you need mip M the hardware already have mip M-1. Obviously you could do very good delta compression using the previous mip. For something like paging the situation is a little different, but you could certainly require that you have the 16x16 mip always in memory and at least use that for conditioning. I haven't really thought about this in detail, but obviously sending a whole mip chain without using the information between them is very wasteful.

11-20-08 | DXTC Part 3

So we know DXT1 is bad compared to a variable bitrate coder, but it is a small block fixed rate coder, which is pretty hard to deal with.

Small block inherently gives up a lot of coding capability, because you aren't allowed to use any cross-block information. H264 and HDPhoto are both small block (4x4) but both make very heavy use of cross-block information for either context coding, DC prediction, or lapping. Even JPEG is not a true small block coder because it has side information in the Huffman table that captures whole-image statistics.

Fixed bitrate blocks inherently gives up even more. It kills your ability to do any rate-distortion type of optimization. You can't allocate bits where they're needed. You might have images with big flat sections where you are actually wasting bits (you don't need all 64 bits for a 4x4 block), and then you have other areas that desperately need a few more bits, but you can't gived them to them.

So, what if we keep ourselves constrained to the idea of a fixed size block and try to use a better coder? What is the limit on how well you can do with those constraints? I thought I'd see if I could answer that reasonably quickly.

What I made is an 8x8 pixel fixed rate coder. It has zero side information, eg. no per-image tables. (it does have about 16 constants that are used for all images). Each block is coded to a fixed bit rate. Here I'm coding to 4 bits per pixel (the same as DXT1) so that I can compare RMSE directly, which is a 32 byte block for 8x8 pixels. It also works pretty well at 24 byte blocks (which is 1 bit per byte), or 64 for high quality, etc.

This 8x8 coder does a lossless YCoCg transform and a lossy DCT. Unlike JPEG, there is no quantization, no subsampling of chroma, no huffman table, etc. Coding is via an embedded bitplane coder with zerotree-style context prediction. I haven't spent much time on this, so the coding schemes are very rough. CodeTree and CodeLinear are two different coding techniques, and neither one is ideal.

Obviously going to 8x8 instead of 4x4 is a big advantage, but it seems like a more reasonable size for future hardware. To really improve the coding significantly on 4x4 blocks you would have to start using something like VQ with a codebook which hardware people don't like.

In the table below you'll see that CodeTree and CodeLinear generally provide a nice improvement on the natural images, about 20%. In general they're pretty close to half way between DXTC and the full image coder "cbwave". They have a different kind of perceptual artifact when they have errors - unlike DXTC which just make things really blocky, these get the halo ringing artifacts like JPEG (it's inherent in truncating DCT's).

The new coders do really badly on the weird synthetic images from bragzone, like clegg, frymire and serrano. I'd have to fix that if I really cared about these things.

One thing that is encouraging is that this coder does *very* well on the simple synthetic images, like the "linear_ramp" and the "pink_green" and "orange_purple". I think these synthetic images are a lot like what game lightmaps are like, and the new schemes are near lossless on them.

BTW image compression for paging is sort of a whole other issue. For one thing, a per-image table is perfectly reasonable to have, and you could work on something like 32x32 blocks. But more important is that in the short term you still need to provide the image in DXT1 to the graphics hardware. So you either just have to page the data in DXT1 already, or you have to recompress it, and as we've seen here the "real-time" DXT1 recompressors are not high enough quality for ubiquitous use.

ADDENDUM I forgot but you may have noticed the "ryg" in this table is also not the same as previous "ryg" - I fixed a few of the little bugs and you can see the improvement here. It's still not competitive, I think there may be more errors in the best fit optimization portion of the code, but he's got that code so optimized and obfuscated I can't see what's going on.

Oh, BTW the "CB" in this table is different than the previous table; the one here uses 4-means instead of 2-means, seeded from the pca direction, and then I try using each of the 4 means as endpoints. It's still not quite as good as Squish, but it's closer. It does beat Squish on some of the more degenerate images at the bottom, such as "linear_ramp". It also beats Squish on artificial tests like images that contain only 2 colors. For example on linear_ramp2 without optimization, 4-means gets 1.5617 while Squish gets 2.3049 ; most of that difference goes away after annealing though.

file Squish opt Squish CB opt CB ryg D3DX8 FastDXT cbwave KLTY CodeTree CodeLinear
kodim01.bmp 8.2808 8.3553 8.3035 8.516 8.9185 9.8466 9.9565 2.6068 5.757835 5.659023
kodim02.bmp 6.1086 6.2876 6.1159 6.25 6.8011 7.4308 8.456 1.6973 4.131007 4.144241
kodim03.bmp 4.7804 4.9181 4.7953 4.9309 5.398 6.094 6.4839 1.3405 3.369018 3.50115
kodim04.bmp 5.6913 5.8116 5.7201 5.8837 6.3424 7.1032 7.3189 1.8076 4.254454 4.174228
kodim05.bmp 9.6472 9.7223 9.6766 9.947 10.2522 11.273 12.0156 2.9739 6.556041 6.637885
kodim06.bmp 7.1472 7.2171 7.1596 7.3224 7.6423 8.5195 8.6202 2.0132 5.013081 4.858232
kodim07.bmp 5.7804 5.8834 5.7925 5.9546 6.3181 7.2182 7.372 1.4645 3.76087 3.79437
kodim08.bmp 10.2391 10.3212 10.2865 10.5499 10.8534 11.8703 12.2668 3.2936 6.861067 6.927792
kodim09.bmp 5.2871 5.3659 5.3026 5.4236 5.7315 6.5332 6.6716 1.6269 3.473094 3.479715
kodim10.bmp 5.2415 5.3366 5.2538 5.3737 5.7089 6.4601 6.4592 1.7459 3.545115 3.593297
kodim11.bmp 6.7261 6.8206 6.7409 6.9128 7.3099 8.1056 8.2492 1.8411 4.906141 4.744971
kodim12.bmp 4.7911 4.8718 4.799 4.9013 5.342 6.005 6.0748 1.5161 3.210518 3.231271
kodim13.bmp 10.8676 10.9428 10.9023 11.2169 11.6049 12.7139 12.9978 4.1355 9.044009 8.513297
kodim14.bmp 8.3034 8.3883 8.3199 8.5754 8.8656 9.896 10.8481 2.4191 6.212482 6.222196
kodim15.bmp 5.8233 5.9525 5.8432 6.0189 6.3297 7.3085 7.4932 1.6236 4.3074 4.441998
kodim16.bmp 5.0593 5.1629 5.0595 5.1637 5.5526 6.3361 6.1592 1.546 3.476671 3.333637
kodim17.bmp 5.5019 5.6127 5.51 5.6362 6.0357 6.7395 6.8989 1.7166 4.125859 4.007367
kodim18.bmp 7.9879 8.0897 8.0034 8.225 8.6925 9.5357 9.7857 2.9802 6.743892 6.376692
kodim19.bmp 6.5715 6.652 6.5961 6.7445 7.2684 7.9229 8.0096 2.0518 4.45822 4.353687
kodim20.bmp 5.4533 5.5303 5.47 5.5998 5.9087 6.4878 6.8629 1.5359 4.190565 4.154571
kodim21.bmp 7.1318 7.2045 7.1493 7.3203 7.6764 8.4703 8.6508 2.0659 5.269787 5.05321
kodim22.bmp 6.43 6.5127 6.4444 6.6185 7.0705 8.0046 7.9488 2.2574 5.217884 5.142252
kodim23.bmp 4.8995 5.0098 4.906 5.0156 5.3789 6.3057 6.888 1.3954 3.20464 3.378545
kodim24.bmp 8.4271 8.5274 8.442 8.7224 8.9206 9.9389 10.5156 2.4977 7.618436 7.389021
clegg.bmp 14.9733 15.2566 15.1516 16.0477 15.7163 21.471 32.7192 10.5426 21.797655 25.199576
FRYMIRE.bmp 10.7184 12.541 11.9631 12.9719 12.681 16.7308 28.9283 6.2394 21.543401 24.225852
LENA.bmp 7.138 7.2346 7.1691 7.3897 7.6053 8.742 9.5143 4.288 7.936599 8.465576
MONARCH.bmp 6.5526 6.6292 6.5809 6.7556 7.0313 8.1053 8.6993 1.6911 5.880189 5.915117
PEPPERS.bmp 6.3966 6.5208 6.436 6.6482 6.9006 8.1855 8.8893 2.3022 6.15367 6.228315
SAIL.bmp 8.3233 8.3903 8.3417 8.5561 8.9823 9.7838 10.5673 2.9003 6.642762 6.564393
SERRANO.bmp 6.3508 6.757 6.5572 6.991 7.0722 9.0549 18.3631 4.6489 13.516339 16.036401
TULIPS.bmp 7.5768 7.656 7.5959 7.8172 8.0101 9.3817 10.5873 2.2228 5.963537 6.384049
lena512ggg.bmp 4.8352 4.915 4.8261 4.877 5.1986 6.0059 5.5247 2.054319 2.276361
lena512pink.bmp 4.5786 4.6726 4.581 4.6863 5.0987 5.8064 5.838 3.653436 3.815336
lena512pink0g.bmp 3.7476 3.8058 3.7489 3.8034 4.2756 5.0732 4.8933 4.091045 5.587278
linear_ramp1.BMP 1.4045 2.1243 1.3741 1.6169 2.0939 2.6317 3.981 0.985808 0.984156
linear_ramp2.BMP 1.3377 2.3049 1.3021 1.5617 1.9306 2.5396 4.0756 0.628664 0.629358
orange_purple.BMP 2.9032 3.0685 2.9026 2.9653 3.2684 4.4123 7.937 1.471407 2.585087
pink_green.BMP 3.2058 3.3679 3.2 3.2569 3.7949 4.5127 7.3481 1.247967 1.726312

11-18-08 | DXTC Part 2

First of all, let's dispell the illusion that some people have that DXT1 is "pretty good". DXT1 is fucking awful. It compresses at 1.33333 bits per byte (4 bits per pixel). That's very large as far as image compressors are concerned. For typical images, around 4.0 bpb is true lossless, around 1.5 is perceptually lossless, and around 0.5 is "very good". In fact wavelet compressors can get as low as 0.1 bpb and acheive about the same quality as DXT1. Despite this I've heard smart people saying that "DXT1 is pretty good". Yes, it is a convenient fixed size block, and yes the decoder is extremely fast and simple, but as far as quality is concerned it is not even close. At 1.33333 it should be near lossless.

Here are some numbers on various DXT1 compressors. The numbers in the table are the RMSE (sqrt of the L2 error). The far right column is a wavelet compressor for comparison; it's not the best wavelet compressor in the world by a long shot, it's "cbwave" which is very old and which I designed for speed, not maximum quality. In any case it gives you an idea how far off DXT1 is. (BTW I always try to show results in RMSE because it is linear in pixel magnitude, unlike MSE or PSNR which is a very weird nonlinear scale). More discussion after the table...

file Squish opt Squish CB opt CB ryg D3DX8 FastDXT cbwave
kodim01.bmp 8.2808 8.3553 8.352 8.6924 9.374 9.8466 9.9565 2.6068
kodim02.bmp 6.1086 6.2876 6.1287 6.3025 7.52 7.4308 8.456 1.6973
kodim03.bmp 4.7804 4.9181 4.8312 5.0225 5.855 6.094 6.4839 1.3405
kodim04.bmp 5.6913 5.8116 5.7285 5.9394 6.9408 7.1032 7.3189 1.8076
kodim05.bmp 9.6472 9.7223 9.707 10.112 10.8934 11.273 12.0156 2.9739
kodim06.bmp 7.1472 7.2171 7.1777 7.44 8.1005 8.5195 8.6202 2.0132
kodim07.bmp 5.7804 5.8834 5.8379 6.0583 6.8153 7.2182 7.372 1.4645
kodim08.bmp 10.2391 10.3212 10.346 10.747 11.3992 11.8703 12.2668 3.2936
kodim09.bmp 5.2871 5.3659 5.3306 5.5234 5.9884 6.5332 6.6716 1.6269
kodim10.bmp 5.2415 5.3366 5.2777 5.4633 5.9377 6.4601 6.4592 1.7459
kodim11.bmp 6.7261 6.8206 6.7643 7.0216 7.8221 8.1056 8.2492 1.8411
kodim12.bmp 4.7911 4.8718 4.8204 4.9863 5.6651 6.005 6.0748 1.5161
kodim13.bmp 10.8676 10.9428 10.925 11.4237 12.402 12.7139 12.9978 4.1355
kodim14.bmp 8.3034 8.3883 8.3398 8.6722 9.4258 9.896 10.8481 2.4191
kodim15.bmp 5.8233 5.9525 5.8568 6.0862 6.6749 7.3085 7.4932 1.6236
kodim16.bmp 5.0593 5.1629 5.0863 5.2851 5.8093 6.3361 6.1592 1.546
kodim17.bmp 5.5019 5.6127 5.5313 5.7358 6.4975 6.7395 6.8989 1.7166
kodim18.bmp 7.9879 8.0897 8.0192 8.3716 9.7744 9.5357 9.7857 2.9802
kodim19.bmp 6.5715 6.652 6.6692 6.91 8.0128 7.9229 8.0096 2.0518
kodim20.bmp 5.4533 5.5303 5.4895 5.6864 6.3457 6.4878 6.8629 1.5359
kodim21.bmp 7.1318 7.2045 7.1724 7.4582 8.1637 8.4703 8.6508 2.0659
kodim22.bmp 6.43 6.5127 6.4644 6.7137 7.8264 8.0046 7.9488 2.2574
kodim23.bmp 4.8995 5.0098 4.9244 5.0906 5.6989 6.3057 6.888 1.3954
kodim24.bmp 8.4271 8.5274 8.4699 8.8564 9.3906 9.9389 10.5156 2.4977
clegg.bmp 14.9733 15.2566 15.1755 15.7168 16.3563 21.471 32.7192 10.5426
FRYMIRE.bmp 10.7184 12.541 12.132 12.8278 12.989 16.7308 28.9283 6.2394
LENA.bmp 7.138 7.2346 7.1763 7.4264 8.1203 8.742 9.5143 4.288
MONARCH.bmp 6.5526 6.6292 6.5949 6.846 7.5162 8.1053 8.6993 1.6911
PEPPERS.bmp 6.3966 6.5208 6.4557 6.677 7.3618 8.1855 8.8893 2.3022
SAIL.bmp 8.3233 8.3903 8.3598 8.6627 9.8685 9.7838 10.5673 2.9003
SERRANO.bmp 6.3508 6.757 6.8385 7.9064 7.5303 9.0549 18.3631 4.6489
TULIPS.bmp 7.5768 7.656 7.6146 7.8786 8.4084 9.3817 10.5873 2.2228

Back to comparing the DXT1 encoders. BTW the test set here is the Kodak image set plus the Waterloo Bragzone image set. The Kodak set is all photographs that are pretty noisy, and there's not a huge difference in the coders. The Bragzone image set has some synthetic images with things like gradients which are harder to compress well, and there you can really dramatically see the bad encoders fall apart. In particular if you look at the results on "clegg" and "frymire" and "serrano" you can see how bad the "FastDXT" coder is.

The "Squish" in the table is the iterative cluster fit with uniform weighting. All coders work on linear RGB error; and the MSE is mean per pixel not per component.

The "CB" encoder is a simple 2-means fit. I seed the means by doing the PCA to find the best fit line. I put a plane perpendicular to that line through the average and take all the points on each half to be the two clusters, average the cluster to seed the means, and then iteratively refine the means by reclustering. Once I have the 2-means I do a simple search to find the best 565 DXT end points to find the two means. There are 3 cases to try :

1. put c0 and c1 = the two means

2. put c2 and c3 = the two means (so c0 and c1 are pushed out)

3. make (c0,c2) straddle mean 1 and (c3,c1) straddle mean 2 - this is best for gaussian clusters around the mean

A 2-means like this is slightly better than doing a pca "range fit" like Simon's fast method or the "ryg" method. If the data was all Gaussian noise, they would be equivalent, but of course it's not. You often get blocks that have a bunch of pixels at the low end which are all exactly the same color ( for example, all perfect black), and then a spread of a bunch of junk at the high end (like some orange, some yellow, etc.). You want to put one end point exactly on perfectly black and put the other endpoint at the center of the cluster on the high end.

"CB opt" and "Squish opt" take the results of the CB and Squish compressors and then improve them by iterative search. Simon Brown on his page mentions something about doing greedy endpoint refinement but claims it "doesn't converge because the indeces change". That's silly, of course it converges.

To do a greedy search : try wiggling one or both end points in 565 space. Find new best index fit for the new end points. Measure new error. If the error is lower, take the step.

Of course that works and it does converge and it's pretty simple and improves the encoding after Squish.

In fact you can do even better by doing simulated annealing instead of a plain greedy search. We should know by now that any time we have a greedy search like this where we can measure the error and it can get stuck in local minima, we can improve it with something like simulated annealing. I use 256 steps of annealing with a sinusoidal decreasing temperature pattern. I randomly pick a way to wiggle the two end points (you need to consider wiggling both, not just single wiggles). I try the wiggle and measure the error delta. Negative errors (improvements) are always taken, positive errors are taken probabilitistically based on the temperature. This is what was done in the "opt" coders above.

Most of the time the annealing search just makes a small improvement, but once in a while (such as on "frymire" and "serrano") it really finds something remarkable and makes a big difference.

Simon's cluster fit alg is very sweet. I didn't really understand it at first from the description of the code, so I'm going to redescribe it for myself here.

The basic idea is you find an assignment of indices, and from that solve a best fit to give you the end points for those indices. If you think about all the colors in the block living in 3x3 color space, assigning indices is like giving labels to each point. All the points with the same label are a "cluster". Then each cluster is fit with either an end point or one of the interpolated points.

Once you know the clusters, putting the best two end points is a simple linear optimization. You can solve the least squares analytically, it's not like a matrix iteration least squares problem or anything.

So the trick is how to decide the indices. If you tried them all, it would be 4^16 ways, which is way too many. So what Simon does is create a linear ordering of the points using the PCA axis, then try all clusterings that can be created by planes perpendicular to that linear axis. That's equivalent to just trying all groupings of the indeces sorted by their dot along the PCA axis. That is, the clusters are

[0,i) , [i,j) [j,k) [k,16)

for all i,j,k
which is a manageable amount to search, and gives you most of the interesting clusterings that you care about. Something that might improve Squish is tring a few different axes and picking the best.

BTW this end point optimization is very approximate. One issue is that it assumes the best index for each point doesn't change. It also of course just uses real number arithmetic to make the 1/3 points, not the actual integer rounding that's done. Those factors are actually pretty big.

11-18-08 | DXTC

I've been doing a little work on DXT1 compression. The main reason I'm looking at it is because I want to do something *better* than DXT1, and before I do that, I want to make sure I'm doing the DXTC right.

For now let's gather prior art :

The only real decision the coder gets to make is what the two 565 endpoints are. The other details you just have to get right - reconstruct the palette interpolants right, and assign the 16 indices to the closest palette entries right. For the moment I don't care about speed, I just want to make sure the indices are actually the best choices.

All the papers by Ignacio and J.M.P. van Waveren are modern classics. One thing of note : the "simplified" index finder that JMP has in his original paper is wrong. On page 12 he says "Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be omitted because it does not significantly contribute to the final result" - apparently that is not true, because when I use the "rewritten" index finder it produces bad indexes. His original version of the index finder bit twiddle is fine.

One thing I've found with the index finding is that the integer and nonlinear effects are pretty strong and any ideas you have about planar geometry fail if you assume that your 4 points are colinear (they are not actually). For example, an obvious idea is to do a dot product test to pick between the 4 points. First dot product with the middle separating plane, then the two planes between the next pairs of points. This doesn't work. Part of the problem is because of the bad rounding in the generation of the 1/3 point. The JMP method is actually comparing vs the true distances, so that part is good.

In more detail, an obvious idea is to do something like this :

int dot = (color - c0) * (c1 - c0);

int bmid = (dot > mid01);
int b02 = (dot > mid02);
int b31 = (dot > mid31);

// using the DXT1 palette order label 0,2,3,1

int threebit = (bmid<<2) + (b02<<1) + b31;

int index = remap[threebit];

// remap is an 8-entry table that remaps from 3 bits to 2 bits to give an index

That "works" just as much as the "ryg" code works or the "Extreme DXT" method of finding indexes work - which is to say that it doesn't work.

FastDXT appears to be an implementation of the id "Real Time DXT" paper.

Ericsson Texture Compression (ETC2) is similar to DXTC but different; this is a pretty good paper, there are some interesting ideas here like the T and H blocks. It gets slightly better quality in some cases. It's obvious that you can beat DXTC by having a base color and a log-scaled delta color, rather than two end points. The two 565 end points is wasteful; you could for example do a 777 base color and a 444 log scaled delta.

Tight Frame Normal Map Compression is similar to DXT5 (really to 3Dc) with some small improvement on normal maps.

Extreme DXT Compression by Peter Uliciansky is indeed super fast. However, it has some errors. The division method that he uses to assign the indeces is not correct. You can improve it by offsetting the end points correctly to put the division quantization boundaries in the right place, but even then it's still not exactly correct unless the diagonal of the color bbox is along {1,1,1}. The error from this is pretty small and he's going for max speed, so it's semi reasonable what he does. Note that using the full range of the bbox for the division but insetting it to make the dxt1 colors (as he does) improves the indeces slightly, but it's not actually the correct range scale.

In more detail :

To find indices for the DXTC colors c0 and c1

You should use

base = c0 + (c0 - c1)/6

range = (c1 - c0) * 4/3

index = (color - base) * 4 / range;


index = (color - base) * 3 / (c1 - c0);

not :

index = (color - c0) * 4 / (c1 - c0);

But even if you do that it's still wrong because the planes through color space do not actually separate the colors from their closest palette entry.

MSDN compressed format documentation now has the right rounding for the 1/3 points, but the old DirectX8 docs have the wrong rounding. The old docs said

color_2 = (2*color_0 + 1*color_1 + 1)/3
color_3 = (1*color_0 + 2*color_1 + 1)/3

Which is actually better, but is not what the standard or the hardware does.

Simon Brown's Squish is very good. His Cluster Fit idea is elegant and clever and produces quite good quality.

There's a thread on MollyRocket about DXTC with some convenient code from "ryg". It's nice simple code, but it has some errors. The way he finds the indices is the dot product method which is wrong. Also the way he computes the 1/3 and 2/3 colors for the palette using LerpRGB is wrong, it rounds differently that the hardware really does.

One thing that some simple code gets wrong is the way to quantize from 24 bit to 565. You can't just take the top bits, because the 565 values are not reconstructed to the middle of their bins. You need to correct for the offsetting. You can either use the true range, which is something like -4 to 259, or you can skew the bits in the quantizer. The "ryg" code has a perfect quantizer that's very neat in the "As16Bit" function. An example is that the value 250 should map to 30 in 5 bits, not 31.

BTW the YCoCg-DXT5 method is pretty good on current hardware, but I'm not really looking at it. It's obviously inherently wasteful, it's just a way of making use of the current hardware, but it's not really what you'd want to be doing. It's also very large, at around 2.6 bits per byte (8 bits per pixel).

More advanced topics to come.

11-13-08 | Links

Top Chef the show is pretty meh as a show; I'm a sucker for food shows, but god it's such a cheesy standard reality show with all the fake drama and manipulation. I have to watch it though so that I can understand the blogs . The blogs are reliably hillarious; Tom's is generally good, and then there will be other tasty ones by the likes of Tony Bourdain, Richard's is often good, Harold is good, and whoever the guest celebrity chef is sometimes writes a good one.

Simon Brown has a nice DXTC library, and some other good stuff on his page; good page on spherical harmonics for example.

Shawn Hargreaves has a good blog that's mainly on introduction-to-game-development kind of stuff, mostly with XNA. I don't know how many commercial people are actually using the whole XNA studio thing, but I guess I might have to make Oodle work with their "Content Pipeline" thing.

Wolfgang Engel's blog is pretty good. Apparently he made an iPhone game SDK.

11-11-08 | Poker Taxes

There's a story going around that the winner of the WSOP will be hit by a 73% tax rate in his home country of Denmark. Poker players roundly call it "ridiculous" and "awful" and say what's the point of winning blah blah.

I'm not sure exactly where I stand on the issue. I do think it's a bit retarded that gambling is taxed even higher than income in some cases. It should at most be taxed as high as stocks. I can see a certain argument that gambling shouldn't be taxed - I mean if you imagine two guys each put in $50 and flip a coin to see who gets the whole $100 - if you take 50% taxes off that, it means the winner only gets $50 ? (well, not exactly, but you get the point, kind of). Why should it be taxed at all, it's just a transfer of money? But by that logic you could argue for not taxing anything.

Anyway, that's not the point. The interesting thing is that all the people who are so upset about this "travesty of excessive taxation" are horrible poor low level gamblers who will never win anything and never be subject to these taxes, but they're still really pissed on behalf of the winner. It finally made me realize where that sentiment comes from. It's the same thing that makes all these morons play the lottery. They are not thinking in their own best interest. They are far better off when the rich pay high taxes, because these dumb fucks will never be rich. But they want to be able to dream. They've basically given up on actually improving their lives in a real way. They have given up logic and action, and they now only live in fantasy. Fantasies where they might win the WSOP Main Event and get rich, fantasies where they might win the lottery, fantasies where they can buy a house on a subprime mortgage and it will appreciate fantastically and they'll get rich. And they want the taxes to be low so that when they get all this money in their fantasy, they get to keep it all.

11-11-08 | Blah Blah

Dan's Data Review of the Dell 30 ; Dan's got a nice site, lots of good value.

iHologram anamorphic perspective demo is totally fucking rad. Found from Kokoromi

There's tons of videos on Youtube of realtime raytracers, voxel engines, and cool CUDA stuff. Galaxies collide with CUDA , Voxelstein , and a cool compilation .

11-11-08 | REYES

Gritz' course on Renderman : To Infinity and Beyond is the best reference I've found on REYES. I'm still curious about the details of how to efficiently do the part from micropolygons to pixel colors.

The brute force way takes a micropolygon and puts a bound on it that cover its entire temporal sweep across the current frame. That is then binned into all the pixels it touches. In each pixel, 16 (or so) supersampling locations are checked with precomputed jittered sample locations and times. At each spot where a sample is found, the color & Z are added to a list for later blending. Note with fast moving objects and depth of field, it can get really inefficient, because micropolygons have to get tested in many pixels.

The Pixar Library is a nice collection of papers. There's a new paper I hadn't seen on point-based GI that's based on the Bunell disk method for realtime GI but enhanced for better quality. It seems extremely simple and the results look good. It's not super accurate in terms of solving the rendering equation with minimal error, but that's not really what we want anyway. Though actually now that I think about it it's just a variant on the ancient "render the world from each surfel's point of view" method for GI and doesn't really add much.

Anyway, back to REYES. Ignoring temporal sampling, the spatial sampling stuff can all be done with a normal rasterizer I'm pretty sure. The micropolygon subdivision and binning is just your fragment rasterizer (a rasterizer is just a device that creates micropolygons that are at most pixel size and makes a grid that's aligned with the screen grid). When you decide a fragment goes in a pel, you don't just stuff the color in the frame buffer, instead you stick it in the list to sample against the stochastic supersampled grid test just like REYES.

But it occurs to me that if you're building up a list of fragments in each pel like this, you may as well do analytic coverage than stochastic sampling. When you rasterize your triangles, compute the area of each pel that's covered. This can be done efficienctly incrementally just like a Wu antialiased line drawer for the case of only one triangle edge interesting a pel, if 2 or 3 edges intersect one pel it's more work.

Add the fragment to the pel list, consisting of {color, alpha, Z, area}. Once you have all these fragments sort them back-to-front and accumulate into the framebuffer. Fragments that have the same Z are first gathered together before being alpha-blended onto the current value in the frame buffer. That means that two triangles that share an edge act like a continuous surface and will correctly give you 100% area covers and will correctly not alpha blend with each other. This gives you exact anti-aliasing with alpha blending.

11-11-08 | Mapped Drives

Is there a way to make "Disconnected Network Drives" try to connect from DOS or from code? I'm talking about when you "net use" to make a network path to a drive letter, but you reboot or something so it's disconnected. It does a sort of lazy connect, where if you go click on it in Explorer it will hook right up, but if you go try to dir it from a CLI before you explore to it, you can't get to it.

I'd like to make my CLI file copy utils do something like "if target is a disconnect network drive, wake it up first".

The answer seems to rely somewhere in the functions RealDriveType, IsNetDrive, and WNetRestoreConnection.

These are functions that the DOJ forced MS to expose and the documentation is total balls. This page has better info than MSDN :


I'm seeing some weird things with them though. For one thing, IsNetDrive is not a light query. It actually stalls and tries to wake up the drive right there, so most of the time all you have to do is call IsNetDrive and that actually will restore the connection. IsNetDrive can stall out for a very long time.

IsNetDrive also lies, as that page I linked details.

I haven't gotten WNetRestoreConnection to do anything useful. It always tells me "The local device name is already in use".

WNetRestoreConnection also pops up an interactive dialog box when it has errors which is fucking evil.

So ... in every case I've tried so far IsNetDrive actually refreshes the connection. I haven't yet found a need to actually call WNetRestoreConnection, which is good because it doesn't seem to work.

11-10-08 | Junk

Guy La Liberte is basically single handedly subsidizing the top online poker players. He's dumping 20-30 mil a year to something like 10-20 people. Sometimes I see how bad he is and think I should take a shot at the $500/$1000 NLHE games, but of course the issue is not beating Guy, it's beating the other guys who are feeding off Guy, like durrr or PA.

Hamburger has to be at 15% or its just gross. The "lean" stuff at 10% is dry and bland. The budget stuff is usually around 20% and that's too greasy. I could buy both and mix but that means I have to get an awful lot. I guess I might have to get into grinding my own.

One of the fun things about game development is that you only have to finish things 99% of the way. The last little bit of programming is really not fun. The initial getting it working to 90% is really fun, and then for games you just fix some bugs and polish a tiny bit - but you only have to make it work in the way it's used in the game, not really robustly. If it has some quirks under certain uses, that's fine, just don't use it that way.

Programmers all talk about completion percentages in a really distorted scale. I just did it in the last paragraph. I sort of have to do it, because everyone does. It's common parlance to say "oh yeah, I implemented self-balancing binary trees, I got it like 90% done in an hour". Of course that's nonsense. Any good experienced programmer knows that you've actually done maybe 10% of the total work needed to make it a finished functioning efficient robust piece of code. But it appears to be 90% done because hey you made a bunch of characters in a file that ends in "cpp" so you say 90%. Then when something is pretty solid we say 95% or 99% when in reality it's highly likely there will be a few more nasty issues which will take almost as much time as you've put in already, so you're only about 50% done.

When somebody tells you they're "x%" done, you should raise that to the 10th power. So "90% done" = "0.9^10" = 35% done. "99% done" = 90% done, etc.

I stumbled on this suggestion to compare floats using their int difference which counts the number of possible float values off they are. I wonder if this is actually a robust way to do float math with tolerances. Christer? ... actually I'm sure it's not. It does seem like an okay way to do the relative epsilon compare, but it is not a replacement for also having an absolute epsilon check.

And What Every Computer Scientist Should Know About Floating-Point Arithmetic , by David Goldberg. It's really long and old, but quite excellent. And actually the oldness is good because it means he talks about floating point very generally, since the old mainframes had lots of different weird implementations.

"The Life and Times of Tim" is horrifically not funny. It's like the love child of "Dr. Katz" and "Southpark" , which is like interpolating between boring and retarded.

11-10-08 | Bleck

I'm still really sick. I know for sure a doctor is not going to do a damn thing for me, but I'm starting to feel like I have to go just so that I'm trying to do something about this. I hate having problems and not doing anything about it.

I have all the symptoms of a bacterial infection, except that I can't find a specific site of infection, and my fever's not super high. The feeling really reminds me of when I had those infected spider bites though. Maybe I could prescribe myself some antibioitics. I'm super out of it, faint, weak, have a head ache, really achy muscles everywhere, like my eye muscles really hurt. My fever's like 99, but I normally run low, like 97.5 (btw oral temperature is so massively inaccurate, it's silly that doctors have these thresholds like 100.4 for antibiotics. your temperature can vary by 1 degree throughout the day; it varies massively based on whether you've been wrapped up in blankets, if you exercised, if you just ate hot food or drank cold water, etc. etc.).

I'm trying to make sure I know a hospital to go to in case I collapse. There are literally 5 hospitals on Capitol / First hill, but not a single one of them is public & subsidizied. WTF. Seattle public services really sucks complete balls. SF's got a subsidized public hospital that's completely free if you're poor and uninsured. Sure you have to sit next to the homeless crack heads in the waiting room, but the wait's not even that bad unless you go on a Friday night.

I realized yesterday that I've passed out an awful lot in my life. I passed out in Mammoth once because I drank too much beer and sat in the hot tub too long. That night I got up to go pee and just collapsed. No biggie. Then twice in SLO, once when I had a really bad cold and a high fever and we thought I might have pneumonia; I went to the hospital and they told me I was a baby wasting their time and to get the fuck out. Then another time with my infected spider bites. When I went to the hospital then they couldn't believe I'd let it go so long. Then again in San Francisco just from some kind of cold. I walked into the kitchen and felt really dizzy and was trying to steady myself and just suddenly passed out cold, I fell completely limp like a ton of fleshy bricks and hit my head on the stove. That was scary. Oh yeah - and one time in a Rugby game when I caught a kick and was running in the open field and took a tackle really badly. I was out cold and people gathered around to hold up fingers for me to count. And I got right back in the game because our team never had enough reserves and I stupidly wanted to show I was tough enough.

I'm a little confused about what to do about fevers now. Modern understanding is that most of the bad problems from infection is actually caused by the body's overreaction and excessive fever, which is what causes organ failure and tissue destruction and so on. On the other hand, modern understanding also says that you should not take fever reducers for normal illnesses, because the fever does help kill things off faster. And in fact some people say you should bundle up and try to get extra warm to help your body be warm and kill the infection. I guess as long as the fever is under 100 you should let it be.

BTW I just put together that sepsis = bacterial infection, and "septic tank" is not called "septic" referring to the nasty poo inside, but rather the fact that there's bacteria inside.

11-07-08 | Online Games

Being sick and stuck at home I've tried a few MMO's. I know, I know, slippery slope and throwing away life and all that. Meh, I'm sick and bored. Doing anything real gives me a head ache, I just want to zone out.

Guild Wars : initial download is super quick. Make an account, start up the game... oh another little download; I thought I downloaded the game already? no, okay... make a character... here we go... THONK. D'oh. Downloading 1 GB. Okay, I'll go eat and watch TV for a while. Oh, here we are. Doo dee doo. Grind grind. The game is pretty bland. The character classes are very interchangeable and they all start in the same area and play the same quests. Okay, zone transition... THONK.. downloading! WTF it downloads the zone very time I go to a new zone. That's kind of cool in a way, but I let it sit for a long time to download the game I wanted it to just get everything! Now when I'm in the middle of playing and want to go to a different zone it freezes up and downloads. This would absolutely blow if I was actually in a group and had to sit for the download. I got a necro to level 10 but man the PvE is just really boring in this game. Supposedly the PvP is good but I don't think I'm going to make it. On the plus side, the loads between zones are really fast, and the networking seems to be really good. Even in zones with tons of people I haven't experienced any lag.

World of Warcraft : bleck I hate the art so much but WTF I'll try it. Download 4 GB. Okay, let's go... downloading patch.. okay, install it... downloading patch.. okay, aw fuck, this is lame. I can't even get up and walk away because there are fucking a million prompt boxes for each of these patches. Downloading patch.. Ugh. Okay, I give up. WTF, how can you not make a single download that gives me the current state of the full game + all patches. It's downloaded probably 8 GB so far. Uninstall.

Everquest 2 : I tried it briefly long ago with Ryan and didn't like it, they fucked up the continent layout and the racial starting places and such that they got so perfect in EQ 1. Anyway, lets' try it again. Downloading... approx time left : 12:37:47 ; yarg. This will be okay if it actually gets everything and I can just let it run overnight, but if it pops up confirmations all the time, I will kill someone.

Do game developers actually ever play their games the way a new consumer would? I don't believe that reasonable people have tried this and think that it's okay. The vast majority of game devs never run their game at all in a full clean install the way a consumer does (you make some junior guy do it to make sure the installer is working right, but everybody else just runs developer builds).

I dunno about WoW but EQ2 and GW both suffer from letting you nerf yourself right from the start without knowing it. You can create a character and play for months and then discover that you made a foolish race/class choice or assigned your skills badly. The game devs try to create the illusion that all choices are valid, but the truth is that at higher levels you really need to have maxed certain abilities that are crucial. Giving the player choices which are semi-permanent and not obvious and can fuck them badly down the road is so bad. (GW lets you change everything at level 20 which is also broken in its own way).

All of these games have made the low levels ridiculously easy. You just talk to the first NPC and you're level 2. Literally. I know this is response to what players say they want "waah waah it's too hard to level up, the low levels are too boring". Yeah, yeah, stupid players. They want all the items to be super powerful. They want the wizards to have more health and the fighters to have spells. Shut up silly game player, you don't know what you want. Level gain is only rewarding if it's difficult.

ADDENDUM : Well I played some EQ2 and got a Dirge to level 12. Meh. For one thing this game is just dead, the zones are totally empty, nobody else is on, which ruins it. Also they totally nerfed the difficulty, made it too easily like WoW. That makes it feel like "Progress Quest" . Yay, I did nothing special and gained twenty levels. Plus the zones are way smaller than EQ1 and don't have that big real world open space feel. You progress so fast you never get to know an area. It's just not interesting to play these games unless you feel like you're going to stick with it and really build a character, and I know I'm not, which kills it.

Part of the magic of EQ1 was that it was pretty unforgiving; it's one of the only games in which I actually really *cared* if I died, which added a huge new level of tension and excitement and adrenaline and feeling of accomplishment. Of course I'm sure that I'm romanticizing the past as gamers tend to do. I'm sure I wouldn't put up with the ridiculous grind of that game today. We expect kinder games now.

Almost all old game devs like me have dreams of reproducing their favorite games from the past, but it's almost always a bad idea. There are reasons why games are different, a lot of those old games were just awful. Part of why we feel so fond for them is that it took a lot of investment to get past all their problems, so we became very attached.

On a semi-related tangent : if your taste doesn't match well with the mass market, you cannot make mass market products. You might think "I'm intelligent, I can see what the mass market likes and make something for them", but it just doesn't work. You have to really believe in what you're making, you have to think that boobies are great, and shooting stuff and big explosions is really exciting, that vampires and zombies are fascinating, and that everyone should wear space-marine body armor. If you don't really believe that, you can't make mass market games. If your taste is actually interesting and unique, you need to go ahead and make what your taste tells you to do.

To be a really successful mass market auteur, you have to be very smart and driven, and also actually have terrible taste and retarded ideas of what's cool, like the common man. This is where people like J.J. Abrams, Michael Crichton, Michael Bay, etc. shine.

11-07-08 | Reviews

The Bird People in China - quite magical; a sort of allegory, in the style of the magical realism books; somewhere between an amazing adventure and a daydream. It does come off the rails a bit (the repeating of that song over and over is very tiresome), but overall I found it delightful and mesmerizing. Partly because the setting is just so ridiculously beautiful and other-worldly. It's by Miike who is now one of my absolute favorite directors (Last Life in the Universe, Gozu, Audition). Most of Miike's work seems to be shitty mass-market Yakuza flicks for the Japanese market, but he's also got some of this really weird surreal stuff that I love.

Spain : On the Road Again. Hello!? Gwyneth Paltrow? WTF is this bitch doing here? She's a boring drag, she has zero charisma, she knows nothing about food or Spain, she's really not very attractive either, they're fucking driving around in Mercedes and having retarded conversations about nothing. They're showing me Gwyneth Paltrow getting a fucking spa treatment at a fancy hotel, WTF!? Ugh. I like Batali and Bittman, and I am in love with Spain right now, and I love food travel shows, but please please just focus on the men and the food.

Fleet Foxes. I know I reviewed this before, but I want to write more. Fleet Foxes are often now smuggly described as what would've happened if the Beach Boys hung out in the woods. Yeah, that's sort of true, but the songs are far more bare and raw and emotional than the Beach Boys. Yes, the four part harmony is similar and just as magical, but Fleet Foxes uses it better, because Robin's voice on its own is so beautiful and earnest, then when the four part harmony kicks in it's like an angel's chorus, it's literally a dopamine explosion inside your head as it hits some primal happy nerves. I find the Fleet Foxes album shares a lot with the Arcade Fire actually in terms of the raw exuberance, the ecstatic joy of music. The songs tell of all the moods of the country. You feel like you're out in the middle of nowhere living in an ancient pastoral life. The hope of the sunrise on a country morn. The lazy freedom of strolling through the fields on a hot summer day. The insular closed in togetherness of long country nights. The stark loneliness of snowed in winter. The darkness and mystery of the deep woods. The occasional visit to far away friends, the connection to family through the generations, and all the quiet time for contemplation in the rain.

A Streetcar Named Desire - Finally saw the movie. I've never read the play or anything, but I've seen the Simpsons musical spoof of Streetcar and so many other references that I knew all the lines and pretty much exactly knew the story already. It's actually quite amazing. The thing is, the direction is horrible, it's so cheesy and over-wrought, as horrible plays tend to be, and all the acting is just atrocious - *except* for Brando. Everyone is stuck in this ridiculous theatrical style (and early movie style) of over-acting, and Brando is just there, totally casual. His first scene he just barely mumbles out the words as if he couldn't be bothered to enunciate. He's so natural, and the contrast of how bad everything else is just makes it even more powerful. Brando's like a quivering ball of masculine energy, it's a perfect role for him, but the movie is only mediocre. Fortunately he did make one great movie in his virile youth before he turned into a fat old weirdo - "On the Waterfront" is the rare classic that holds up to modern viewing.

11-07-08 | Randoms

I'm really sick and kind of out of it, so this might not be totally coherent, but here goes :

I randomly found this : Nick Chapman's renderer blog ; he's just some dude who wrote a ray tracer and blogged about it; and now apparently it's A commercial ray tracer called Indigo . Anyway the blog is amusing and has lots of links to good papers and such. I've always wanted to write a physically accurate raytracer with all-frequency light propagation. It seems pretty easy and fun.

Drew asked me about random numbers so I was searching around and found Pixelero . Pixelero has lots of rally awesome Flash code. In particular he's got some nice graphs of the distributions you can make by doing some simple transforms on random numbers : here . More on this in a second.

GoodPracticeRNG by D.T. Jones is a good short simple paper on random number generation and talks about the basic issues for uniform random generation.

Drew asked about generating gaussian or other probability distributions of random numbers. I'm mainly interested in approximate distributions that are tweakable and useful for random events in games. We don't really want true Gaussians that have infinite domain, we want something that's kind of like a Gaussian over a finite range, with a hump in the middle at lower at the edges.

There are lots of methods for this in the literature, but it's mainly pretty hard core and exact stuff. I'm interested in clever little hacky tricks. If you want some background in the serious stuff, the most awesome reference is : Non-Uniform Random Variate Generation book by Luc Devoye . this handbook seems to be a short version of some of the same material. The beginning is a good read for anyone. ( new link )

I'll briefly mention if you want a true Gaussian everyone likes the polar form of the Box-Muller transform. You can look on Wikipedia for more info about this stuff.

Transforms are a handy way to distort probability distributions. Typically you'll be generating a uniform random with a standard random number generator, then you want to transform it to create a nonuniform distribution.

Say you want generate a random variable X with a probability distribution P(X). You want to do a random draw where the chance of each is P(X). To do this you sum the P's to make a cumulative probability function, C(X) = P(X) + C(X - step). (or C(X) = integral P(X)). I assume we're on the interval [0,1] so C(0) = 0 and C(1) = 1.

Now you just generate a random R = rand() in [0,1] , and you find what "bucket" it goes in, that is find X such that C(X - step) < R < C(X).

That search could be slow, but notice that if we could invert C, then it would just be a function call : X = C^-1(R).

Now, some cumulative probability functions are explicitly invertible. Luc Devoye lists a few common useful ones in his book. Others are not invertible directly, but by adding an extra variable or munging them somehow they are invertible, such as the Gaussian with the Box-Muller transform.

Low-order polynomials are invertible, but they're a pain. If your probability distribution is cubic, like a bezier curve, the CDF is quartic, and to invert it you have to solve a quartic equation analytically. That's possible but ugly. For example, a simple hump probability would be P(x) = x*(1-x) (unnormalized). The CDF is 3x^2 - 2x^3 (normalized). You can invert that directly by solving the cubic, but it's ugly.

A handle simple case is to approximate the CDF as piecewise linear. Then you can trivially invert each piecewise linear portion. So you just look at R and select the linear region you're in and then do a simple slope linear invert thing to make X.

Another approach is to just to play directly with C^-1. C^-1 is a kind of unit interval remapper. It has the properties C^-1(0) = 0 and C^-1(1) = 1. (for a symmetric probability distribution it also has other constraints). We should be quite familiar with unit interval remappers and we can just play around with things here. You can tweak directly on C^-1 and see what kind of shape of probability distribution that gives you.

For example, C^(-1)(t) = t , the identity function, is just uniform constant probability. Anywhere that C^-1 is flatter corresponds to higher probability, anywhere it's steep corresponds to low probability. To make something like a Gaussian that has low probability at the edges and high probability in the middle, what you want is a symmetric function that's steep at first, then flattens out, then is steep again. Like a "smooth step" but reflected across the 45 degree axis.

One such function is "sinh". You can look at graphs of sinh to see what I mean. It's not ideal, I'm sure there are better choices to play with, but sinh works. (btw you might think x^3 would work; no it does not, because it becomes completely flat at X = 0 which corresponds to infinite probability).

You can create various shapes with sinh by using different parts of it : curve = sinh( x * K ) / sinh( K ) , where K is a parameter that affects the curve. As K -> 0 it becomes linear, as K gets larger it becomes more severe.

Another option is to use a cubic curve. For a symmetric probability distribution you also have that C^-1(0.5) = 0.5 and you only need to design have the curve, the other half is given by reflection. You thus can design with f(0) = 0, f(0.5) = 0.5, and then set the slope you want at 0 and 0.5 and that gives you a cubic. The slope inversely corresponds to setting the probability P(X) at those points.

The mathematics of combining random numbers is quite interesting. Maybe if I wasn't sick this would be more obvious and less interesting, but right now it seems really cute.

A random unit float is a box filter. That is, if you plot the probability distribution it's a square step on [0,1].

Adding random numbers is equivalent to convolution of the probability distributions.

Add two random unit floats and you get a linear step filter, that's a convolution of a box with a box. If you add another random unit, you convolve with the box again and get a quadratic step filter. Add another random you get a cubic. Note that these are also the polynomial approximations of the Gaussian. I believe I've talked about building up the Gaussian from polynomials like this before. In any case it's in my Filter.h in cblib.

To lerp probability distributions you random select. eg. lerp(P1,P2,t) is if ( rand() < t ) return rand1 else return rand2.

Pixelero showed some interesting stuff with multiplying randoms. Apparently addition is convolution, I'm not sure what multiplication does exactly.

You can also create distributions by doing random selection vs. other randoms, stuff like x = rand1(); if x < rand2() return x; By using functions on the conditional you can create lots of shapes, but I'm not sure how to understand what kind of shapes you get. Luc Devoye's book has got some complex conditional tests like this that generate all kinds of shapes of probability distributions.

Some other good links I found : basic Gaussian rand , stocc source code with lots of random generators , anuran automatic code gen for random generators

11-05-08 | Stuff

WTF : Butt lifting undergarment . (don't ask how I found this)

I've now got my email forwarding through gmail, so if you have trouble mailing me, you can post a blog comment here. I had something wrong in my SPF record and found this nice page on IP addresses . In particular I didn't realize the part after the slash was a bit count.

The mail forward is adding a lot of header records that show all the intermediate email addresses; I'd like those to not show up, and for it to just look like you're getting normal mail from my primary address.

11-04-08 | System Restore

System Restore seems like a nice idea, but so far as I can tell it's completely worthless. I don't even mind that it's rather a large performance hit because it watches your drives all the time and copies files into the restore dir. The big problem is that it stomps on your data files when you try to do a restore.

My lappy's gone weird for some reason, it no longer automatically changes resolution when I plug in an external monitor. I have no idea how that was working in the first place, but it was working magically perfectly; when I plugging in my external it would res up to 1920x1200, then when I uplugged it, it would switch back to the native lappy LCD res of 1450x1050. Now when I plug in the external it's staying at the same res and scaling, and I have to manually tell it to change res. Urg! What happened?

Anyway, I tried jumping back with system restore. I tried running TextBlog and nothing was working. WTF all the batch files I wrote are gone!? The exes I compiled are gone?! And of course the C# VS I installed doesn't work (that one's no surprise). I want you to just restore the registry and the Windows dir, not all the stuff I wrote by hand all over my drive!

11-03-08 | Curves

I finally read Ignacio's talk on displaced subdivision with the NV geometry shader pipeline. You can find the whole talk with audio and such online but I like just the PDF . There's a lot of good stuff in there. A lot of the issues are general issues for low-high poly work, it's just that most people so far have been just ignoring the problems and creating meshes with bad discontinuities.

Ignacio's stuff is based on the Charles Loop paper "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches". It's funny that it's really just a fancy version of the PN Triangles hack; the actual geometry surface made is not continuous, but there's a separate tangent field over it which makes the shading look smooth.

While there I noticed a bunch of Loop papers I hadn't read. The Loop-Blinn thing from 05 was particularly interesting because I've talked to a few people in the last N years about rendering fonts better. Thatcher worked on it a bit at Oddworld, and recently Sean's been doing it at RAD for our new GUI product. The Loop Blinn Paper described how to use a pixel shader to shade quadratic or cubic curves. The cubic stuff seems like rather a pain in the butt, but piecewise quadratic is certainly good enough for glyphs. Also they sort of gloss over the details of anti-aliasing, which is of course crucial; there's not much point in being this careful with curves unless you're going to get the antialiasing right.

The MDK Curvy Blues blog covers the same material and claims there are some mistakes in the Loop-Blinn paper. There's also a nice GLSL conics paper and demo app of the Loop-Blinn technique.

Zheng Qin, Michael McCool and Craig Kaplan seem to be doing a lot of work on this stuff at Waterloo. I sort of assume you're familiar with the simple Valve Alpha-test Distance Field thing ; the Qin work is the basis of the Valve technique and is more robust and general, but much more complicated. The Valve technique is probably okay in a game development scenarios where artists are monitoring how their work looks in game and are used to tweaking with weird algorithms, but it definitely has parameters that you have to tweak, and the use of single (or even two) distances can cause bad degeneracies if you're not careful, whenever you have details that are close to the size of a pixel in the distance map.

11-02-08 | Fucking HTPC

I'm afraid I can't recommend a HTPC to anyone.

The fact that it's a computer and I know I *could* fix it if I put a ton more time into makes it even more frustrating, because it's a giant fucking todo list sitting in my living room. When I'm sick of coding and stress and todo lists and I just want to zone out on my TV, the thing is a never ending list of quirks and bugs and missing features and bad settings.

11-02-08 | The Most Important Election of Our Generation

The Most Important Election of Our Generation was in 2000, and we completely fucking blew it. We had a chance to elect Al Gore. We could be spending money on research on alternative energy, we could be providing health care and cleaning up the environment, we could have worked with international institutions to deal with foreign problems. Instead we elected the fucking W.

2004 was the next most important election and we fucked that up too.

This one is less important for two reasons - 1) the Republican choice is not as pure evil as the last two, and 2) they have very little chance of winning.

The really important things to fight for are things that 1) make a big difference and 2) have a high chance of going the wrong way.

So let's stop patting ourselves on the back. Getting things right after your stupid mistakes have blown up in your face is not really anything to be proud of.

I also get the feeling that some liberals are very pleased to be electing a black man. Oh, how very enlightened we are. How magnanimous of us whites sharing our stranglehold on power with other races. Think of how impressed the Europeans will be.

11-01-08 | ZOMG PAD S5

Poker After Dark Season 5 looks fucking awesome. Tournaments are such balls, cash deep stack no limit is the one true poker. The show production is still not as good as High Stakes Poker but this lineup is sick, and the fact that it's 6-handed should mean better play than the donkey shit they do on HSP :

Tom Dwan = Durrr = huge online winner, 22 years old, 2+2er, expert at NLHE and PLO

Antonius = luigi66369, Tryharderfish, AllTheWomen, I_Knockout_U = perhaps the best in the world

Ivey you know, maybe could be the best but gets bored of poker

Eli is a businessman and a huge donkey donator, one of the guys that drives the big game (like Sammy and Guy)

Howard is a nit and mediocre player but a nice guy, presumably he just inserted himself because he owns The Show

Ilari Sahamies = Ziigmund = online PLO specialist, huge nutter, crazy table talker, crazy tilt shover

This is such an awesome action table and it's full of actually good players unlike the donks they usually show on TV. Durrr and Antonius are probably the best two players in the world right now. Sometimes Antonius and Ivey get bored at these TV tables and don't really pay attention, so hopefully they'll actually be interested in this game. Also Ivey has always refused to play cash games on TV with hole cam cards because he didn't want to give up any information; glad to see he's finally given up that misconception.

A very standard Ziig-Durrr hand ; they both have monster hands here it's not very interesting, with that flop they're gauranteed to get all in (with how aggressive they are).

Here's a taste of Ziig going nuts (he's famous for playing the top games drunk) :

Gus Hansen: you cant get lucky every time
Ziigmund: fcuk u r retard
Ziigmund: how u can say so?
Ziigmund: u had again 6 out
Ziigmund: f tilt retard
Gus Hansen: wauw
Gus Hansen: ups
Ziigmund: really i mean
Ziigmund: everybody says how bad u r
Ziigmund: its f omg
Gus Hansen: that was easy
Gus Hansen: I figured he would be back
Ziigmund: what do u think....all big pots what that retard has
won......he has been always so much underdog
Ziigmund: 18k pre
Ziigmund: then all in
Gus Hansen: ups
Ziigmund: look at this f bull s hit
Ziigmund: hey gus
Ziigmund: how u think
Gus Hansen: yes
Ziigmund: u r f retard
Ziigmund: FU
Ziigmund: FU
Ziigmund: FU
Gus Hansen: im starting to like this game
Ziigmund: everybody see how f retard that guy is
Ziigmund: gus
Gus Hansen: yes
Ziigmund: everybody see how f retard that guy is

Ziigmund: come to play 200 400
Ziigmund: i ll give u 20 prosent back
Ziigmund: your losses
Mike Matusow: n ty
Mike Matusow: i jsut got 5ich again dont watn to go
broke to fast

Ziigmund: gus is fish
Ziigmund: so f bad
Ziigmund: and looks like chec republic man
Gus Hansen: you might lose a coin flip one day Ziggy

Ziigmund: gus is from czech rebubplik
Ziigmund: from ROMANIA HE IS
Ziigmund: gus likes eurovision
Ziigmund: gus should sing in eurovision..team chez romanisa
Ziigmund: gus is singer from romania
Ziigmund: czech rebuplic singer from eurovision gay bar GUSTAV HANSEN
Gus Hansen: no i dont think so

10-31-08 | Mythbusters

Mythbusters is so fucking retarded, the guys have absolutely no idea WTF they are talking about. I think Adam and Jamie are really amusing characters, but they should be buddy cops fighting crime or something, not pretending to do experiments that are completely bogus and drawing nonsense conclusions from incorrect assumptions and faulty logic.

Just about every episode is wrong, but the Motorcycle Flip has made me mad enough to write. They claim that it's physical fact that you can't flip a motorcycle over the front wheel by braking the front wheel. YouTube says O RLY !? . And then to just drive their wrongness into the ground they pull out the condescending attitude and tell us that it's "scientific fact" that the forward momentum of the bike cannot be turned into upward motion. Oh thanks for the complete nonsense, guys. That might be true if the braking force was applied at THE CENTER OF MASS, and not on the bottom of the front wheel, which creates a torque around that point, giving the bike a huge angular momentum, which can in fact flip it into the air.

It's just such typical Mythbusters.

Step 1. Concoct an experiment that's totally uncontrolled and doesn't actually recreate the Myth and wouldn't prove anything

Step 2. Draw incorrect conclusions from the experiment that don't really follow from what was seen anyway.

Step 3. Jabber about some cockamamie misunderstand of science that somehow "explains" or "proves" the nonsense that they told us. 

Ugh. Just make cool stuff and shut up about it please.

10-31-08 | The 520 is my nemesis

Holy shit :





One accident per hour. Good job people. I believe people should be charged for the cost to their peers when they block a bridge. You cost 10,000 people 30 minutes at $50/hour, you get a bill for $250,000. That money should go into a fund for building new bridges.

In general I'm a big believer that people should pay a fair price for the consequences of their actions. You want to live in a high fire risk zone? Okay, you pay for the firemen to do all the firebreak cutting, you pay for them when they have to come out, and you pay the families $10 million if a fireman dies.

The governmemnt should not be in the business of subsidizing retards who make poor life choices, and we the rest of the populace shouldn't have to foot their bills.

I also believe that when somebody does something retarded like runs out of gas on the 520, you should be allowed to pull over and kick them in the nuts.

My other idea is that we should put all our fancy military hard ware to an actual productive use. If you break down on the 520, you better jump in the lake really fast, because a predator is about to disintegrate your car. Blammo! And the rest of us can drive through the debris. None of this fucking around with sending cops and tow trucks out to the scene.

10-31-08 Pirates

Yarrrr. Shiver me timbers. I just went pee and there was a pirate in the bathroom. Then I came back to work and was doing some searching for things and found this forum with an amusing discussion of UT3 Todos - the most awesome thing about it is that all the posters write like they're pirates! WTF!?

10-31-08 Junk

The new blog makes me feel like I can't just write drivel and nonsense like a normally do. I feel more aware that I'm being watched and the personal nonsense seems more inappropriate than ever.

My lappy's been acting kind of weird so I popped open the Event Viewer. WTF there's a ton of bizarre errors. Some thing about NetBIOS browser masters having elections !? Some stuff about NAT and internet connection sharing; WTF I have ICS disabled. Ugh. God damn I hate computers. The fucking HTPC keeps having retarded problems that I find so damn frustrating. Every time I look at it, just sitting there all smug lording over my living room I want to just beat it up. Fuck you for not showing media info on the LCD. Fuck you for randomly deciding to un-maximize the video player window. Fuck you for skipping ahead 4 fucking minutes when I hit the jump ahead button on video.

I'm so fucking sick of software and all its problems. My email is full of spam.

My favorite things have been shit recently.

The last Good Eats about meat pies : the good thing about Good Eats is when Alton drops some interesting science or techniques, the horrible part is the cheezy fucking skits and his cringe-worthy idea of humor. I know, why don't we make a whole episode that's 100% skit! Urg!!

Anthony Bourdain : No Reservations. The great part about the show is seeing the real cuisine of different countries, the travel, and in fact it's 90% about his guides, some of the episodes suck when he has a lame guide (like that Russian guy) but some of them rock when he has a really good guide (Greece, Korea). The worst part is Tony's pretentious pseudo-intellectual grandiose voice over. You just have to roll your eyes and laugh and try to ignore his philosophizing. I know, let's make a whole episode called "At the Table" where we remove all the good parts (the travel) and just have Tony sit and spew retarded cocktail party conversation for an hour.

And then the fucking Buzzing Fly show is fucking talk radio with Justin Martin. I like Justin's music quite well and I understand Ben wants to pimp his label but this is disgusting. The questions are like a Playboy Playmate interview. What are your turn ons and turn offs? How did you get into making music? What do you like to do other than make music? And his answers are even sadder, apparently he's a complete moron. I almost always hate seeing interviews with musicians because their total retardation ruins my illusion that they're these interesting cool smart people locked away in a dungeon creating great works. I don't like reading about movie directors or actors or writers. I want to enjoy the art that they create without my head being filled with their own thoughts on it, or the hollywood gossip about them, or anything except the art product itself.

10-30-08 Oodle Rev Tracking

I'd like Oodle to work really nicely with running the "compiled" version all the time, and server compiles and local changes, but I'm starting to think it's impossible.

The way I would like it to work :

A server somewhere builds nice optimized bundles of the levels from checked in content
The server puts those on a share somewhere

Clients (artists, etc.) sync to the server bundles

When a client runs the game, by default it loads from the server bundles

That has many advantages - they get fast loads, and the game runs like it will on ship, so the artists can
see paging problems, memory use, all that good stuff.  I'm a big believer that having your developers run as
similar to the real shipping product as possible is a huge win.

When clients have newer local copies of content, the game loads those individual files as patches
on top of the server bundles.

Thus clients can make changes and see them instantly in the game without rebaking the whole level,
but they still get most of the advantage of using the compiler server bundling.

That's all great in theory, but it falls apart when you start looking at this question of which files go in the patch. That is, which files should I load from the compiled bundles and which should I load singly as local overrides.

The first idea is to use modtimes. That falls down quickly even in very simple cases. Just using the modtimes of the bundles works fine on your local machine if you never get stuff from a server, but once you share files across machines the modtimes are worthless (particularly if you use something like perforce and have server modtime turned off, which you probably do; the perforce modtime option is strongly discouraged by perforce and by me).

One option is just to punt and not automatically load the client versions. Make the client take an extra step to get the local overrides to load. One way to do that is to just go ahead and use modtime, which will fail sometimes, but tell them to "touch" a file to get it in the game. You could provide helpers to make that easier, like for example looking at the p4 open for edit list and making sure those all come in as overrides.

10-30-08 C#

C# is obviously a copy of Java, MS wanted complete control and didn't like the open standard (and Sun is a bunch of bitches so I don't really blame MS on this one). Anyway, there is a certain simplicity and cleanliness to both C# and Java, and I think it almost all comes from the fact that they force you to stop doing silly micro-optimizations and worry more about your higher level code.

Everything that C# does that makes it clean you could do in C++. Initialize every value. Use garbage collection. Make everything an allocated GC'ed object so that you can put it in containers. Make everything derive from Object so you can put them in super generic containers. Dynamic cast between objects & throw exceptions. I think all of that is quite sensible. It makes code clean and safe, easy to write and easy to use.

But the vast majority of C programmers resist those things. They insist on using things like objects on the stack or char [] buffers for strings for the sake of efficiency. They don't initialize values when they "know" they don't need to, for efficiency. Once in a while that's actually important, but most of the time it's a mistake.

The nice thing about C# is you no longer have those options - and best of all, you stop thinking about it ! Even when I'm writing C++ I find myself thinking "is this ok? is this fast enough? am I using too much overhead?". In something like C# you have no option so you just stop worrying about it and that frees up your mind to focus on getting things done.

10-29-08 Blog Ho!

Sweet! Somebody at Google flipped my switch and my blog is now getting up. (URG; update : it looks like my limit is now 1000 posts a day and I just ran out). I did some fixes in TextBlog and it's running pretty nicely now. One fix was to try-catch every network op and retry them a few times, because the server can just randomly drop connection once in a while. The other fix was to only query the list of entries once instead of doing a stupid N^2 loop.

It's still a bit slow to just upload one post. The problem is that I have to get the list of all the blog entries to see if you're doing an update or a new post. Getting the list is slow because I can only get 25 at a time and I have to get 25, follow the next page link, wash rinse repeat. Just getting the list of all the posts takes something like 10 seconds. I guess if I was confident that my local disk cache of the posts was accurate, then I could just use the local cache and skip that completely, but I'm trying to be extra safe for now.

Speaking of extra safety, it's something I'm a little worried about in Oodle. The issue arises with "baked" content and original file syncing, and different versions of the game, different versions of the baker, and different versions of the original file. There are a huge host of possible problems.

The basic bad situation goes like this :

Artist updates a file on their machine

Oodle does an automatic local Bake of that file to get it in the game

They see the file in the game and are happy

At the same time, a server is running an auto-bake of the whole game using an older rev of the source file

The server checks in the bundles from the older file, but the bundle itself from the server is newer

Artist sync to the server, the newer bundle comes down

The game prefers the newer bundle, and the artist no longer sees their edit
The fix is for the bundles to track a lot of information about how they there made. The mod time of the files they were made from, maybe the Perforce revision #, who made them, what version of the code made them, etc. The disadvantage is that I can't be minimal about loading bundles then, I have to just go ahead and load all of them to look inside and who's got the newest version of everything.

I think during development it's better to be safe and have a slightly slower load that does scan everything and give you the best content.

More :

08/2008 to 10/2008
03/2008 to 08/2008
11/2007 to 03/2008
07/2006 to 11/2007
12/2005 to 07/2006
06/2005 to 12/2005
01/1999 to 06/2005

back to cbloom.com