cbloom.com

Go to the new cbloom rants @ blogspot


12-26-10 | Fix for my stuttering HTPC

So my MediaPortal HTPC has been randomly stuttering for a few months now. Mostly it's fine, and then randomly video or audio playback will start stuttering, which is super fucking annoying (in my youth I would've smashed something, but now I just get revenge on my HTPC by buying a record player; take that fucking annoying modern electronics!).

So I started reading about stutters and of course there are a million pages on it where people randomly suggest things that don't actually help (change this or that windows setting, install new codecs, blah blah blah). I tried some of the random things that are supposed to maybe help, such as the Windows dual-core TSC hot fix, the AMD dual core optimizer, turning off cool & quiet, turning DXVA on and off, etc. None of that helped.

Of course the real answer is to diagnose the problem and fix the actual issue instead of just randomly changing things. The difficult thing about diagnosing a stutter is you can't just look at what is taking too much CPU in Resource Mon or whatever, because in the steady state it's not happening.

Every video game should have some kind of "long frame trap" mode. What this would do is trace every frame, and if the frame was faster than some threshold (33 millis for example) it would just discard the trace; if it was longer it would parse the trace and pause the game and let you see a hiprof. The idea is that what you care about is the rare spike frame, not the average. (this is a little harder on modern games with many threads and an unclear concept of "one frame of takes X millis" due to the fact that many operations are deferred or spread across several frames)

Anyway, there's a new Windows tool I didn't know about which does something like this : xperf ; which is in the Windows Performance Analysis Tools, which is in the Windows SDK. They are supposed to be Vista+ but you can actually copy xperf.exe onto an XP system and use it in limited ways.

So xperf is cool, but I didn't actually end up needing it because some digging revealed the spikes were due to DPC's. The driver DPC just seems like one of the design flaws in windows. It means that badly written drivers can ruin your computer by causing this stuttering.

The old DPC Latency Checker for XP+ is cool, but to figure out what driver exactly is causing the stutters, you have to go to Device Man and turn on and off devices one by one. There's a new awesome one for Vista+ called LatencyMon that tells you exactly what driver is taking the time. Radical. My newer Dell Precision lappy has no DPC problems, the longest time taken is by the MS network layer (ndis.sys).

So I found the problem on the HTPC was the Asus/Ralink 802.11n Wifi card in the HTPC. The card is branded Asus, but apparently the key bits are actually Ralink. I tried various versions of the drivers for it (from Asus and from Ralink) and all had the problem more or less. I tried switching my network to 802.11g , still the problem. I tried the "WLan Optimizer" which turns off the periodic Wifi scan for networks, no help. So I yanked the card out of the machine and stomped on it. Stutters gone!

Now I have a 50 foot net cable running across the house and I love it.

Some related links :

Measuring DPC time with XPerf - Pointless Blathering - Site Home - MSDN Blogs
Watching H.264 videos using DirectX Video Acceleration (DXVA) My collection of short anime reviews
Stuttering Playback - MediaPortal Wiki
SAF v4.00 ''stable'' (StandAlone Filters) - DXVA ready (H.264 and VC-1). - MediaPortal Forum
Resplendence Software - LatencyMon DPC, ISR and pagefault execution monitor
Resplendence Software - Free Downloads - LatencyMon
ReClock 1.8.7.3 - VideoHelp.com Downloads
Ralink corp. Windows
My Solution for Dell XPS M1530 DPC Latency
What video stuff can Reclock do! [Archive] - SlySoft Forum
How to config ffdshow and reclock for HD audio, - Doom9's Forum
FAQ-WLAN Optimizer - Optimize wireless gaming, audio and video streaming...
Codec FAQ - Playback issues
DXVA - Free with the new DivX Plus H.264 Decoder DivX Labs
M-Audio Delta 1010 Troubleshooting - CPU Throttling
ATI HD Hardware Accelerated DXVA for H.264 AVC L5.0 L5.1 Zach Saw's Blog
ASUS Driver Download ASUS Driver Upgrade (Wireless LAN Security & Wardriving - 802.11)


12-22-10 | Obscure issues with modern car electronics

It would be nice if there was a car review site that actually gave you the real dirty information about cars, things like will the brakes overheat in one lap of a track, is it dialed for understeer, can you turn traction control completely off, etc. Instead we get the same shite all the time, stories about how great it felt on their favorite back road and useless 0-60 times and so on. The annoying this is there's just nowhere to look to actually get good information on a car; what cars really can handle track abuse? what cars really give you full manual control with no limitations? what cars really have properly sorted chassis that don't have inherent understeer or snap oversteer or high speed instability?

Some things you may not know that you may want to watch out for :

Just about every car made now is on e-gas (electronic throttle). Mostly this is an improvement, because it ensures that you get a good idle throttle level, and because it means you actually get a fully open throttle for the range of your pedal movements, which often was not the case with old cable-actuated cars. However, there are problems with e-gas that the enthusiast should be aware of :

1 : One is brake throttle override. This mean that pressing the brake sends a signal that cuts the throttle. The idea of this is as a fail-safe if something goes wrong in the electronic throttle, you can still brake. Many e-gas cars already have this (all VW,Audi,Porsches do for example), and it will be even more common because of the Toyota bullshit. (in fact I think it is required for 2012+ cars). In normal driving of course this is no big deal, but it is a big problem if you are trying to left-foot brake, or keep on throttle during braking to spool your turbos, or brake and throttle at the same time to control a spin, etc.

2 : Another problem is that the egas signal is often smoothed by the ECU. Basically they run a low-pass filter on the signal. This is usually done to improve emissions because sudden throttle changes lead to lots of inefficient ignition cycles which are highly polluting. In some cases manufacturers have put smoothing in the throttle to reduce driveline noise and lash. In a non-filtered car, if you are coasting along and then you suddenly slap on the throttle, you will get some clunks and grinding sounds as the driveline goes from unloaded to loaded and lots of little bits of slack gears and such knock together. In order to give cars a false feeling of "solidity" the ECU smooths out the throttle so that it very gently engages pressure on the driveline before ramping up.

Smoothing the throttle is not the worst thing in the world, because for track or daily driving you actually should be smooth on the throttle anyway. But if you're trying to kick the throttle to start a power-on oversteer drift, it's annoying.

3 : Electronic Stability Control is already in most cars, and will be mandatory in all cars in the future. Mostly it's a good thing, but when you want to play around with your car (eg. whipping the tail to do hairpin 180's), it becomes a big negative. Performance cars generally have an "off" for ESC, but it rarely actually turns it all the way off, it just puts it into "minimal" mode. For example, a lot of cars now (G37, 135, etc) use the ESC as a type of electronic LSD. That is, they have an open diff, and the ESC brakes the spinning wheel, which is the only way power is transferred to the wheel with traction. This stays on even when you turn ESC "off". Furthermore, most cars will still kick in ESC when you are braking and the car is sliding, for example Porsches (PSM) and Nissans (VDC) will both kick in the ESC even when it is "off" when you touch the brakes. Because of the "electronic LSD" and other issues, it's probably not even desirable to turn ESC completely off on most cars - they are designed to work only with ESC on.

4 : Most cars are set up to understeer. This is done for safety/liability, and it actually affects the famous oversteering brands the most, such as BMW and Porsche. Because they were so renowned for dangerous oversteer, they are now sold in just the opposite way - understeering plowing beasts. This is done through alignment settings (too much toe), suspension (too soft in the rear usually), and tire staggers. Non-cognoscenti see big tire staggers (wider rear tires than front) and think "muscle", but in reality the excess of rear grip and surfeit of front grip also means understeer.

5 : Your "track ready car" is probably not actually safe to drive on the track. It's sort of pathetic that manufacturers get away with this. They sell cars with "competition package" and talk about how great they are on the track, but the vast majority of "track ready" cars are not track ready, and quite a few are downright unsafe. You need to do research on your exact car to determine what the problems are, but some common ones are : insufficient oil cooling (sends car into limp mode; affects many cars with HPFP's), insufficient brake cooling (very dangerous! affects some M3's and Nissans), cheapo brake pads, cheap lug nuts (for example Dodge SRT's are known to lose wheels), oil sloshing / oil pan starvation ; lots of cars have this problem if you put on R-comps or slicks (because of the increased cornering forces), but most are okay if you are on road tires ; there are exceptions though, for example I know the Pontiac/Holden G8 will slosh all its oil out and you'll be bathed in a cloud of blue smoke.

And of course, there's also the problem that manufacturers will deny warranty claims if you track the car, even with cars like the Dodge Viper ACR or a Porsche GT3 RS which are clearly intended as track weapons, and even when the problem is clearly manufacturing defects and not abuse. But this is totally off the electronics topic, so back to that -

There are some solutions :

1 : On most cars this can be defeated by snipping a wire that goes from the brakes to the ECU. You have to be careful about how you do this on your exact car, because you presumably still want ABS and brake lights and such. The smoothest way to do this is to find the right wire and splice in a switch, so you can turn it on and off. ( some info on Porsche EGas throttle cut )

2 : On most cars you can defeat this with an ECU flash (aka a "tune"). Most of the claims of "tunes" are nonsense (on non-turbo cars; on turbo cars they can of course up the boost and help you blow up your WRX engine) but getting rid of throttle low-pass filtering is something they can do.

3 : Similar to 1, on most cars you can defeat this by finding the right wire and splicing in a switch. On Porsches the most elegant way is to disable the yaw sensor. Put it on a switch and you now have a true "PSM off". Looks like there's a similar trick on Nissans . On Mercs there's a secret code .

4 : It's reasonably easy to undo most of this, but you do have to do some research. The obvious answer is to remove the tire stagger. The details for your car are important, you need to find an expert who knows your suspension and how it all works together. I know that for Boxster/Caymen, and for M3's (E46 in particular), going to a "square" (non-staggered) setup works very well. I wrote before about alignment a bit; and you can get stiffer rear / softer front sways. But depending on the car, you may not be able to dial out the understeer in a nice way and actually get a good handling result - if you just do it by decreasing rear grip that's not very cool.


12-23-10 | Shitty modern cars and the Frankencar solution

Last post was about modern car electronics and how they foul up your fun even when you think they're all off. I really don't like this trend, but it goes with other trends that are much worse :

Fat, big, heavy cars :
How cars are getting fatter (infographic)

Low visibility and rising waist lines :
Datsun Z old vs new
Camaro old vs new

Every time I see an old-vs-new comparison like that, it's shocking how much better the old one looks, and how much more glass there is. Let there be light!

ADDENDUM : some more :
Honda S600 vs S2000

I also think the old cars are special in a way that new cars aren't. When you see a beautiful old car, like even something as commonplace as a Porsche 356 or a Jag E-type, you actually stop and admire it. They have hand-hammered steel curves; they're real works of sculpture, not works of CAD. Old cars are just infinitely cooler too; when a new supercar goes by, you can see the crowds think "dick" and nobody really cares, but when something old goes by, women drop their babies and everyone stares.

In other "new cars suck trend", I hate the way most car makers are trying to improve fuel efficiency. The Honda CR-Z is a great example of the modern fuckup. It's heavy and weak and gets only 30-35 mpg. You can get that mileage with an old CRX with a plain old gas engine, and the car would be an absolute joy to drive. A modern Honda engine in a lightweight car could easily be tuned to be both very fun and very efficient.

The only automaker I know of that's breaking the mold is Mazda, who have announced a 2200 pound Miata and high compression small engines . Because they are too small of a company to develop a hybrid, their goal is light weight and efficient normal engines. Way to go. Unfortunately -

Government subsidies for hybrids is fucking horrible. It's completely typical of the corrupt government way of doing things - you take a perfectly reasonable basic idea, that government should encourage development of new clean technology, and help consumers buy it, okay that's fine, but then you hand out tons of cash in semi-abritrary ways which allow you to favor certain companies over others, which completely distorts free market innovation. The most offensive subsidies to me are for Fisker and Tesla, which are self-indulgent rip-off operations by the rich for the rich. There are subsidies on both ends, and all over in other places; the car makers get cheap loans (several billion for Nissan and GM), some get direct free cash from the DOE ($500m for Fisker), then car buyers get tax credits, and car buyers get priviledges like HOV access and cheap registration, etc. So sensible Mazda gets no subsidy; somebody who buys a 100 mpg Vespa instead of a car gets no subsidy; but some rich jackass who buys a Tesla even though his daily driver is an SUV gets a subsidy. (and worst of all, people who don't drive at all get no subsidy).

(though I should note that the ridiculousness of this subsidy pales in comparison to the vast stupid ethanol subsidy or the ridiculous corrupt CAFE exceptions for large trucks or large wheelbases or "flex fuel" and tax subsidies of large trucks - which of course should have tax *penalties* under a sane government).

Anyway, I'm getting off topic.

Another thing that's got me thinking about Frankencars is the price of modern car engines. I get on this train of thought where I think - maybe I should tweak my car a bit, nah that would be stupid, maybe I should trade up for a GT3, nah if I want a track car I'd rather have a Cayman, what about a Cayman-GT3?, what if I blow the engine in my car? it's $15k, what if I buy a GT3 engine, it's $30k. Holy fuck, that's too expensive for a track car, which will inevitably need a new engine at some point.

All of this has got me thinking that the idea of an old car with modern internals is more and more appealing. You can easily find old cars that are under 2500 pounds, which is a delightful nimble weight to be. Drop in a modern engine and suspension and you have a crazy power to weight ratio with none of the new car electronic horribleness.

The thing that's kind of amazing is that a custom build like that could be relatively cheap if you choose a donor car that's not very popular. Great engines can be had for peanuts. For example the wonderful Honda K20 engine can be had for around $4k with transmission and ECU and everything - just drop it in! I adore the way those Hondas rev, and in a 2000 pound car, 220-250 hp would be more than enough. And if you run it on the track and blow up your engine - no biggie, just buy another (but you wouldn't because Honda engines are made of indestribonium).

Another cool swap appears to be the Chevy LS1, which is a Corvette engine, but it's also found detuned in their SUVs and whatnot; apparently you can get them from junkyards for $500 and then pay a few K to bring it up to undo the detuning, and blammo you have a vette engine you can stick in a Datsun 240. (you can also get the LS1 in Aluminum so it's not crazy heavy). Apparently the LS series is one of the great engines of modern times; very cheap, robust, reliable, light weight, high power.

As for what donor body to put this engine in, I love the AE86 (hachiroku) and the old MR2, super light and nimble RWD twitchy cars, maybe because I watch too much Best Motoring; I really like older Japanese cars, back when they were really Japanesey, now they're just so bland and generic. The Porsche 944 is the easiest option for transplants because the chassis and suspension and brakes are actually superb - the problem with them is the engines, which you are going to replace - and they are worth almost zero dollars; unfortunately they are a bit heavy. Actually an early Boxster would be a great transplant recipient as well; they are super super cheap now because that 2.5L M96 engine is made of mud and twigs, but the chassis/layout is superb - drop in an LS1 and you have a monster. Actually maybe the 914 is the best choice.

The Opel GT is super cheap and just gorgeous, but apparently the chassis is no good so you'd have to tube-frame it; but an Opel GT - LS1 would be shweet :

Some random links related to swaps :

Welcome to Nofearmotorsports
Renegade Hybrids
YouTube - Sound EG K20
YouTube - ls1 powered datsun 240z
YouTube - 944 Porsche LS1
The Porsche® 944 V8 Conversion Manual
HMotorsonline : Specializing in JDM-USDM engines and parts...
Smog-Legal LS1 Power 1987 Porsche 944 Turbo - $14k all done
LS1 conversion - Rennlist Discussion Forums
Classic Mini Cooper - VTEC K SERIES HONDA ENGINE CONVERSION KIT FOR MINI - www.MiniMania.com
opel_gt_3.jpg (JPEG Image, 797x600 pixels) - purty and cheap
Opel GT 1968
The New Honda Integra Type R Video - just a bit about how ridiculously good Honda engines are
Toyota 2000GT - almost the ideal looking car IMO
Factory Five Type 65 - kit car eewww gross , but it's got the look
FactoryFive Type 65 Coupe
Boxster LS1
Porsche boxster ls1 swap - LS1TECH
Boxster LS1 swap Ideas needed plz - Page 2 - LS1TECH
Boxster engine swap Grassroots Motorsports forum Grassroots Motorsports Magazine

ADDENDUM : some more on 240Z swaps :
Z, Z car, 510, 240SX, Altima, Suspension, Brakes
Ultimate Z Car Link Page!
LS1 Speed Inc - Pro Touring Super Cars
Introductory Discussion of V8 conversions
Cars for Sale - HybridZ
YouTube - 240z rb26dett drift test
Early Z Cars For Sale - The Datsun Classifieds

Also, not directly related, but if you ever start thinking about buying a car and tweaking it and turning it into more of a track car - don't. You can get track cars for ridiculously cheap, and they cost a *lot* to build. For example, somebody might buy an old 911 for $30k, spend $30k in track mods, and the result is a car that resells for $25k - super tweaked out track cars with awesome suspension (Moton) and full cages and all that often sell for less than the equivalent street car. Now granted the engine and tranny may need rebuilds at some point, but even with that cost it's probably cheaper than doing it yourself.

Just as a tiny example :
Race Cars For Sale

The best values are in cars where the base car is not worth much, but they put a ton of expensive mods on it. eg. something like an old Boxster with Motons.

If you're interested in having a track car, it's a way better value to just buy something like a Miata or a 944 that's fully set up for the track already than it is to take something like your Porsche 997 or your BMW M3 and convert it into more of a track car. It's kind of an amazing fact that mods to cars do not help resale value at all - in fact they often hurt it. So if you actually want some mods, you get amazing +value by buying a car that already has them.

ADDENDUM : Another very nice base car is the Lotus Elan +2 which is very pretty and very light. One problem with these older beautiful light cars is that I don't fit very well inside them. The 914, the Opel GT, the Elan, all are problems for people over 6'


12-21-10 | Rambles

C++0x will make lots of new styles of template code practical. It occurred to me the other day that complex template patterns that don't work in current C++ (C++98 ?) just don't even enter my mind any more. When I first started writing C++ I had all these ideas, you know, you start playing with the medium, oh I can do this, I can express this, but then you start running into limitations of the language and the compilers and practical issues, so you just stop doing those things.

The dangerous thing is that you stop even thinking about those possibilities. Your mind becomes so locked into what is practical with your current tools that the ideal doesn't even enter your mind.

It struck me that I'm now in this position with C++, whereas when I was new to it, my mind was fresh and unbiased by this stale old knowledge. Someone who just got into C++ with C++0x would not have that irrelevant historical bias, and would jump right into doing things in new, better ways.

It also struck me that I ran into this a lot as I was coming up in software. I would talk to older practitioners that I respected about ways of doing things, and they would at times tell me that such and such was a bad idea, or that what I was suggesting wasn't possible or practical, when I was pretty sure my idea was good. In hindsight I was almost always right on the technical issue. Obviously the more experienced person has a lot to teach you about restraint and the pitfalls of coding and so on, but they can also hold you back with outdated ideas.

Winter here is so bloody fucking depressing. It starts affecting me in all sorts of weird ways that I don't realize are just winter depression. Like, I start sleeping in a lot and just generally wanting to lay about a lot; okay, that's easy to recognize as depression and I catch that one. But then I also start thinking about buying random things. I already bought a damn PS3, and just moments ago I bought new floor mats for my car for unknown reasons, and I was starting to think about getting a supercharger when I suddenly realized - my god this is just winter depressed shopping. Obviously a lot of consumerism comes from depression. And just like binging on desserts or booze or whatever, it feels good very briefly and then just feels worse than ever.

Another weird side effect of winter depression is that I start thinking about politics. When I'm running around in the summer time and I see people saying "we need government off our backs" I might briefly think "hmm, that's odd, I don't see government on our backs hurting us anywhere, in fact I see quite the opposite, a severe lack of government interference in our lives causing problems right up and down through every level of society". But in the summer time I say "oh well, whatever" and go off bike riding. In the winter I stew on it until I get angry.

I was doing some research for a political blog post that I deleted, and I realized something. When I do scientific research I try to be open to anything, unbiased by preconceptions about what the answer will be. Often I have a hypothesis about what the best approach will be, but if I find conflicting evidence I am ready to change my mind. But when I do political research, I already know the point I'm trying to prove and I'm just looking for data to back it up. I already have the idea that I want to write about in mind and I just want to find "experts" or data to make it seem more "legitimate", which of course is not really research at all (and is a pretty sleazy way to add impact to your argument, though it's beyond standard practice). Of course some people do scientific research that way - they already know the outcome they want to prove and they just keep trying until they find a study that proves it (yes I'm looking at you, medicine).

Doing things "the best" is egotistical self indulgence. Anybody can do it if they waste enough time on it. (Charles' quick recipe to doing anything "the best in the world" : find out how the current best in the world does it, first make an implementation that matches them; then read some other papers and steal some other ideas; put them into your current implementation and tweak until it provides benefit; tada!). One of the things I was really proud of at Oddworld was that we didn't try to do each thing "the best" ; obviously sometimes we self-indulged a bit once in a while, but in general I think we were pretty good at staying away from that childish competitiveness that plagues so many game developers who want to have "the best" graphics engine or whatever. The productive way is to do it well enough for your use case, and then move on to the next thing. That's the only way you can knock out tons of code quickly.

When I'm procrastinating, I start making up all these strange things that I decide are a good idea to do. I feel like I need to be busy doing something productive, I won't just let myself sit on the couch or be nice to my girlfriend, but I wind up doing things that are completely pointless; recently I started writing my own email client, and making air scoop screens for my car, and replacing all the air filters on all the appliances in the house. When I'm doing it I have no concept that I'm procrastinating, in my mind this is a "todo" item and I'm taking care of things, but then I have a moment of clarity and I'm like "whoah wtf am I doing this for?" and realize that I'm just avoiding the work I should be doing.


12-11-10 | Perceptual Notes of the Day

You have to constantly be aware of how you're pooling data. There's no a-priori reason to prefer one way or the other; when a human looks at an image do they see the single worst spot, or do they see the average? If you show someone two images, one with a single very bad error, and another with many small errors, which one do they say is worse?

There are also various places to remap scales. Say you have measured a quantity Q in two images, one way to make an error metric out of that is :


err = Lp-norm {  remap1(  remap2( Q1 ) - remap2( Q2 ) )  }

Note that there are two different remapping spots - remap1 converts the delta into perceptually linear space, while remap2 converts Q into space where it can be linearly delta'd. And then of course you have to tweak the "p" in Lp-norm.

You're faced with more problems as soon as you have more quantities that you wish to combine together to make your metric. Combine them arithmetically or geometrically? Combine before or after spatial pooling? Weight each one in the combination? Combine before or after remapping? etc.

This is something general that I think about all the time, but it also leads into these notes :

For pixel values you have to think about how to map them (remap2). Do you want light linear? (gamma decorrected) or perhaps "perceptually uniform" ala CIE L ?

The funny thing is that perception is a mapping curve that's opposing to gamma correction. That is, the monitor does something like signal ^ 2.2 to make light. But then the eye does something like light ^ (1/3) (or perhaps more like log(light)) to make mental perception. So the net mapping of pixel -> perceptuion is actually pretty close to unity.

On the subject of Lp norms, I wrote this in email (background : a while ago I found that L1 error was much better for my video coder than L2) :


I think L1 was a win mainly because it resulted in better R/D bit distribution.

In my perceptual metric right now I'm using L2 (square error) + masking , which appears to be better still.

I think that basically L1 was good because if you use L2 without masking, then you care about errors in noisy parts of the image too much.

That is, say your image is made of two blocks, one is very smooth, one has very high variation (edges and speckle).

If you use equal quantizers, the noisy block will have a much larger error.

With L2 norm, that means R/D will try very hard to give more bits to the noisy block, because changing 10 -> 9 helps more than changing 2 -> 1.

With L1 norm it balances out the bit distribution a bit more.

But L2-masked norm does the same thing and appears to be better than L1 norm.


Lastly a new metric :

PSNR-HVS-T is just "psnr-hvs" but with thresholds. (*3) In particular, if PSNR-HVS is this very simple pseudo-code :


on all 8x8 blocks
dct1,dct2 = dct of the blocks, scaled by 1/ jpeg quantizers

for i from 0 -> 63 :
 err += square( dct1[i] - dct2[i] )

then PSNR-HVS-T is :

on all 8x8 blocks
dct1,dct2 = dct of the blocks, scaled by 1/ jpeg quantizers

err += w0 * square( dct1[0] - dct2[0] )

for i from 1 -> 63 :
  err += square( threshold( dct1[i] - dct2[i] , T ) )

the big difference between this and "PSNR-HVS-M" is that T is just a constant, not the complex adaptive mask that they compute (*1).


threshold is :

threshold(x,t) = MAX( (x - T), 0 )

The surprising thing is that PSNR-HVS-T is almost as good as PSNR-HVS-M , which is in turn basically the best perceptual metric (*2) that I've found yet. Results :


PSNR-HVS-T Spearman : 0.935155

fit rmse :
PSNR-HVS   : 0.583396  [1,2]
VIF        : 0.576115  [1]
PSNR-HVS-M : 0.564279  [1]
IW-MS-SSIM : 0.563879  [2]
PSNR-HVS-T : 0.557841
PSNR-HVS-M : 0.552151  [2]

In particular, it beats the very complex and highly tweaked out IW-MS-SSIM. While there are other reasons to think that IW-MS-SSIM is a strong metric (in particular, the maximum difference competition work), the claims that they made in the papers about superiority of fitting to MOS data appear to be bogus. A much simpler metric can fit better.

footnotes :

*1 = note on the masking in PSNR-HVS-M : The masked threshold they compute is proportional to the L2 norm of all the DCT AC coefficients. That's very simple and unsurprising, it's a simple form of variance masking which is well known. The funny thing they do is multiply this threshold by a scale :


scale = ( V(0-4,0-4) + V(0-4,4-8) + V(4-8,0-4) + V(4-8,4-8) ) / ( 4*V(0-8,0-8) )

V(x,y) = Variance of pixels in range [x,y] in the block

threshold *= sqrt( scale );

that is, divide the block into four quadrants; take the variance within each quadrant and divide by the variance of the whole block. What this "scale" does is reduce the threshold in areas where the AC activity is caused by large edges. Our basic threshold was set by the AC activity, but we don't want the important edge shapes to get crushed, so we're undoing that threshold when it corresponds to major edges.

Consider various extremes; if the block has no large-scale structure, only random variation all over, then the variance within each quad will equal the variance of the whole, which will make scale ~= 1. On the other hand, if there's a large-scale edge, then the variance on either side of the edge may be quite small, so at least one of the quads has a small variance, and the result is that the threshold is reduced.

Two more subtle follows up to this :

1A : Now, the L2 sum of DCT AC coefficients should actually be proportional to the variance of the block ! That means instead of doing


 threshold^2 ~= (L2 DCT AC) * (Variance of Quads) / (Variance of Blocks)

we could just use (Variance of Quads) directly for the threshold !! That is true, however when I was saying "DCT" earlier I meant "DCT scaled by one over JPEG quantizer" - that is, the DCT has CSF coefficients built into it, which makes it better than just using pixel variances.

1B : Another thought you might have is that we are basically using the DCT L2 AC but then reducing it for big edges. But the DCT itself is actually a decent edge detector, the values are right there in DCT[0,1] and DCT[1,0] !! So the easiest thing would be to just not include those terms in the sum, and then don't do the scale adustment at all!

That is appealing, but I haven't found a formulation of that which performs as well as the original complex form.

(*2) = "best" defined as able to match MOS data, which is a pretty questionable measure of best, however it is the same one that other authors use.

(*3) = as usual, when I say "PSNR-HVS" I actually mean "MSE-HVS", and actually now I'm using RMSE because it fits a little better. Very clear, I know.


12-09-10 | Rank Lookup Error

Trying some other methods of testing to make sure the function fit isn't screwing me up too much.

Spearman rank correlations (it's just the Pearson correlation on sort ranks) :


mse.txt                 0.66782
square_ms_ssim_y.txt    0.747733
scielabL1.txt           0.855355
scielabL2.txt           0.868207
vif.txt                 0.87631
ms_ssim_bad_down.txt    0.880403
ms_ssim.txt             0.89391
iw_ms_ssim.txt          0.901085
wsnr.txt                0.90905
mydctdelta.txt          0.932998
my_psnr_hvs_m.txt       0.94082
mydctdeltanew.txt       0.944086

I just thought of a new way to check the fit for this kind of scenario which I think is pretty cool.

You have two vectors of values. One vector is the original which you wish to match, the other is output by your program to try to match the original vector. The problem is that your program outputs in some other scale, different units which may involve some warping of the curve, and you don't know what it is. You wish to find how close your program is to the target without worry about matching that curve.

Well, one way is "rank lookup error" :


given vectors "orig" and "test"

find "test_rank" such that
  r = test_rank[i] means item i is the r'th in sorted order in the vector test

find "sorted_orig" = orig, sorted

Sum{i} :
  err += square( orig[i] - sort_orig[ test_rank[ i ] ] )

that is, the fit value for mapping from test's scale to orig is to find the sort index within test, and lookup the value in the sorted list of originals.

Obviously this isn't quite ideal; it does handle ties and near-ties pretty well though (better than Spearman/Kendall, because you get less error contribution when you get the rank wrong of two items with very similar value). Most importantly it avoids all the fidgetty function fitting stuff.

Here are the scores with "rank lookup rmse" :


mse.txt                 1.310499
square_ms_ssim_y.txt    1.090392
scielabL1.txt           0.853738
scielabL2.txt           0.835508
ms_ssim_bad_down.txt    0.716507
ms_ssim.txt             0.667152
wsnr.txt                0.643821
vif.txt                 0.591508
iw_ms_ssim.txt          0.575474
mydctdelta.txt          0.548946
my_psnr_hvs_m.txt       0.543057
mydctdeltanew.txt       0.494918

if nothing else it's a good sanity check for the fitting stuff.

Also with rank-lookup-error you don't have to worry about transforms like acos for ssim or undo-psnr or anything else that's monotonic.

For comparison, these were the fit scores :


mse.txt                 1.169794
square_ms_ssim_y.txt    1.005849
scielabL1.txt           0.819501
scielabL2.txt           0.808595
ms_ssim_bad_down.txt    0.690689
ms_ssim.txt             0.639193
wsnr.txt                0.632670
vif.txt                 0.576114
iw_ms_ssim.txt          0.563879
my_psnr_hvs_m.txt       0.552151
mydctdelta.txt          0.548938
mydctdeltanew.txt       0.489756

mydctdeltanew is a new one; it just uses mydctdelta for only the AC part of the DCT (excludes the DC which is gross luma differences), and then it adds back on the gross luma difference using a form similar to IW-MS-SSIM (that is, using large filters and then using both variance masking and saliency boosting).


12-09-10 | Perceptual vs TID

My TID2008 score is the RMSE of fitting to match MOS values (0-9) ; fit from perceptual metric to MOS score is of the form Ax + Bx^C , so 0 -> 0 , and for fitting purposes I invert and normalize the scale so that 0 = no error and 1 = maximum error. For fitting I try {value},{acos(value)}, and {log(value)} and use whichever is best.

I also exclude the "exotic" distortions from TID because they are weird and not something well handled by most of the metrics (they favor SSIM which seems to be one of the few that handles them okay). I believe that including them is a mistake because they are very much unlike any distortion you would ever get from a lossy compressor.

I also weight the MOS errors by 1/Variance of each MOS ; basically if the humans couldn't agree on a MOS for a certain image, there's no reason to expect the synthetic metric to agree. (this is also like using an entropy measure for modeling the human data; see here for example)


MSE        : 1.169795  [1,2]
MSE-SCIELAB: 0.808595  [2]
MS-SSIM    : 0.691881  [1,2]
SSIM       : 0.680292  [1]
MS-SSIM-fix: 0.639193  [2] (*)
WSNR       : 0.635510  [1]
PSNR-HVS   : 0.583396  [1,2] (**)
VIF        : 0.576115  [1]
PSNR-HVS-M : 0.564279  [1] (**)
IW-MS-SSIM : 0.563879  [2]
PSNR-HVS-M : 0.552151  [2] (**)
MyDctDelta : 0.548938  [2]

[1] = scores from TID reference "metrics_values"
[2] = scores from me
[1,2] = confirmed scores from me and TID are the same

the majority of these metrics are Y only, using rec601 Y (no gamma correction) (like metric mux).

WARNING : the fitting is a nasty evil business, so these values should be considered plus/minus a few percent confidence. I use online gradient descent (aka single layer neural net) to find the fit, which is notoriously tweaky and sensitive to annoying shit like the learning rate and the order of values and so on.

(*) = MS-SSIM is the reference implementation and I confirmed that I match it exactly and got the same score. As I noted previously , the reference implementation actually uses a point subsample to make the multiscale pyramid. That's obviously a bit goofy, so I tried box subsample instead, and the result is "MS-SSIM-fix" - much better.

(**) = PSNR-HVS have received a PSNR-to-MSE conversion (ye gods I hate PSNR) ; so my "PSNR-HVS-M" is actually "MSE-HVS-M" , I'm just sticking to their name for consistency. Beyond that, for some reason my implementation of PSNR-HVS-M ([2]) is better than theirs ([1]) and I haven't bothered to track down why exactly (0.564 vs 0.552).

WSNR does quite well and is very simple. It makes the error image [orig - distorted] and then does a full image FFT, then multiplies each tap by the CSF (contrast sensitivity function) for that frequency, and returns the L2 norm.

PSNR-HVS is similar to WSNR but uses 8x8 DCT's instead of full-image FFT. And of course the CSF for an 8x8 DCT is just the classic JPEG quantization matrix. That means this is equivalent to doing the DCT and scaling by one over the JPEG quantizers, and then taking the L2 delta (MSE). Note that you don't actually quantize to ints, you stay in float, and the DCT should be done at every pixel location, not just 8-aligned ones.

VIF is the best of the metrics that comes with TID ; it's extremely complex, I haven't bothered to implement it or study it that much.

PSNR-HVS-M is just like PSNR-HVS, but adds masked thresholds. That is, for each tap of the 8x8 DCT, a just noticeable threshold is computed, and errors are only computed above the threshold. The threshold is just proportional to the L2 AC sum of the DCT - this is just variance masking. They do some fiddly shit to compensate for gross variance vs. fine variance. Notably the masking is only used for the threshold, not for scaling above threshold (though the CSF still applies to scaling above threshold).

IW-MS-SSIM is "Information Weighted" MS-SSIM. It's a simple weight term for the spatial pooling which makes high variance areas get counted more (eg. edges matter). As I've noted before, SSIM actually has very heavy variance masking put in (it's like MSE divided by Variance), which causes it to very severely discount errors in areas of high variance. While that might actually be pretty accurate in terms of "visibility", as I noted previously - saliency fights masking effects - that is, the edge areas are more important to human perception of quality. So SSIM sort of over-crushes variance and IW-SSIM puts some weight back on them. The results are quite good, but not as good as the simple DCT-based metrics.

MyDctDelta is some of the ideas I wrote about here ; it uses per-pixel DCT in "JPEG space" (that is, scaled by 1/JPEG Q's), and does some simple contrast-band masking, as well as multi-scale sum deltas.

There's a lot of stuff in MyDctDelta that could be tweaked that I haven't , I have no doubt the score on TID could easily be made much better, but I'm a bit afeared of overtraining. The TID database is not big enough or varied enough for me to be sure that I'm stressing the metric enough.


Another aside about SSIM. The usual presentation says compute the two terms :


V = (2*sigma1*sigma2 + C2)/(sigma1_sq + sigma2_sq + C2);
C = (sigma12 + C3)/(sigma1*sigma2 + C3);

for variance and correlation dot products. We're going to wind up combining these multiplicatively. But then notice that (with the right choice of C3=C2/2)

V*C = (2*sigma12 + C2/2)/(sigma1_sq + sigma2_sq + C2);

so we can avoid computing some terms.

But this is a trick. They've actually changed the computation, because they've changed the pooling. The original SSIM has three terms : mean, variance, and correlation dot products. They are each computed on a local window, so you have the issue of how you combine them into a single score. Do you pool each one spatially and then cross-multiply? Or do you cross-multiply and then pool spatially ?


SSIM = Mean{M} * Mean{V} * Mean{C}

or

SSIM = Mean{M*V*C}

which give quite different values. The "efficient" V*C SSIM winds up using :

SSIM = Mean{M} * Mean{V*C}

which is slightly worse than doing separate means and multiplying at the end (which is what MS-SSIM does).


12-07-10 | Patents

Ignacio wrote about software patents a while ago and it's got me thinking.

First of all I applaud the general idea that every individual is responsible for acting morally, regardless of the environment they live in. I see far too many people who use the broken system as an excuse to behave badly themselves. eg. lots of people are polluting it doesn't matter how much I hurt the environment, the tax system is broken it doesn't matter how much I cheat, the patent system sucks so I have to be a part of it, etc. I believe this attitude is a facile rationalization for selfish behavior.

In any case, let's get back to the specific topic of patents.

In my youth I used to be the lone pro-patent voice in a sea of anti-patent peers. Obviously I was against the reality of the patent system, which consists of absurd over-general patents on things that aren't actually innovations, and the fact that expert review of technical issues before a court is an absurd farce in which money usually wins. But I was pro the basic idea of patents, partly for purely selfish reasons, because I had these technical inventions I had made and was hoping I could somehow get rich from them, as I saw others do. As a young individual inventor, I believed that getting a patent was my only way to get a fair price for my work.

Now I have a different view for a few reasons :

1. Policy should be made based on the reality of an issue, not your theoretical ideal.

The reality is that patents (and particularly software patents) are ridiculously broken. The court system does not have the means to tell when patents are reasonable or not, and it is unrealistic to think that that can be fixed.

2. The purpose of laws should be to ensure the greatest good for society.

Even if you think software patents are good for the lonely independent developer, that in itself is not reason enough to have them. You have to consider the net benefit to society. I believe that the world would be better off without software patents, but this is a little tricky.

What are the pro-patent arguments ?

One is that they encourage research funding, that companies wouldn't spend money on major research if they didn't have the patent system to ensure a monopoly for their invention.

I find this argument generally absurd. Do you think that IBM or Microsoft really wouldn't fund research that they believe will improve their business if they couldn't patent the result? Companies will fund research any time it is likely profitable; a long-term monopoly from a patent doesn't really change the research equation from "not profitable" to "profitable", it changes it to "ridiculously profitable".

Furthermore, in reality, most of the major tech companies don't actually use their patents to keep mopolies on technology, rather they engage in massive cross-license agreements to get open access to technologies. Patents wind up being a huge friction and cost for these companies, and you have to maintain a big war chest of your own patents to ensure that you can participate in the cross-licensing. The end result of this is yet another oligarchy. The big tech companies form cross-license agreements, and smaller players are frozen out. This is a huge friction to free market innovation.

Now, I believe one legitimate point is that in a world without patents, companies might be more motivated to keep their innovations secret. One pro-patent argument is that it allows companies to patent their work and then publish without fear of losing it. Of course, that is a bit of a false promise, because the value to the public of getting a publication which describes a patented algorithm is dubious. (yes obviously there is some value because you get to read about it, but you don't actually get to use it!).

Finally it is absolutely offensive to me that researchers who receive public funding are patenting their works, or that university professors patent their works. If you receive any public funding, your work should be in the public domain (and it should not be published in the pay-for-access journals of the ACM or IEEE).

But this is an issue that goes beyond software. Are any patents good for society? Certainly medical patents have encouraged lots of expensive research in recent years, and this is often used as a pro-patent argument. But lots of those new expensive drugs have been shown to be no better than their cheap predecessors. Certainly patents provide a massive incentive for drug companies to push the new expensive monopolized product on you, which is a bad effect for society. Would you actually have significantly less useful research if there were no patents? Well, 30-40% of medical research is publicly funded, so that wouldn't go away, and without patents that publicly funded research would be much more efficient, because they could be open and not worry about infringement, and furthermore it would be more focused on research that provides tangible benefits as opposed to research that leads to profits. It's a complicated issue, but it's definitely not obvious that the existance patents actually improve the net quality of medical treatment.

In summary, I believe that patents do accomplish some good, but you have to weigh that against the gains you would get if you had no patents. I believe the good from no patents is greater than the good from patents.


In any case, hoping for patents to go away is probably a pipe dream.

Smaller goals are these :

1. I find it absolutely sick that public universities are patenting things. That needs to stop. Professors/researchers need to take the lead by refusing to patent their inventions.

Any corporation that receives the bullshit "R&D" tax break should be required to make all their patents public domain. Anyone that gets a DoD or NSF research grant should be required to make the results of their work public domain. How can you justify taking public money for R&D and then locking out the public from using the results?

2. Some rich charity dude should create the "public patent foundation" whose goal is to supports the freedom of ideas, and has the big money to fight bullshit patents in court. The PPF could also actively work to publish prior art and even in some cases to apply for patents, which would then be officially released into the public domain.

A more extreme idea would be to make the PPF "viral" like the GPL - build up a big war chest of patents, and then release them all for free use - but only to other people who release their own patents under the same license. All the PPF has to do is get a few important patents and it can force the opening of them all.

(deja vu , I just realized I wrote this before )


12-06-10 | More Perceptual Notes

To fit an objective synthetic measure to surveyed human MOS scores, the VQEG uses the form :

A1 + A2 * x + A3 * logistic( A4 * x + A5 )

I find much better fits from the simpler form :

B1 + B2 * x ^ B3

Also, as I've mentioned previously, I think the acos of SSIM is a much more linear measure of quality (SSIM is like a Pearson correlation, which is like a dot product, so acos makes it like an angle). This is borne out by fitting to TID2008 - the acos of SSIM fits better to the human scores (though fancy fit functions can hide this - this is most obvious in a linear fit).

But what's more, the VQEG form does not preserve the value of "no distortion", which I think is important. One way to do that would be to inject a bunch of "zero distortion" data points into your training set, but a better way is to use a fit form that ensures it explicitly. In particular, if you remap the measured MOS and your objective score such that 0 = perfect and larger = more distorted, then you can use a fit form like :


C1 * x + C2 * x ^ C3

(C3 >= 0) , so that 0 maps to 0 absolutely, and you still get very good fits (in fact, better than the "B" form with arbitrary intercept). (note that polynomial fit (ax+bx^2+cx^3) is very bad, this is way better).

A lot of people are using Kendall or Spearman rank correlation coefficients to measure perceptual metrics (per VQEG recommendation), but I think that's a mistake. The reason is that ranking is not what we really care about. What we want is a synthetic measure which correctly identifies "this is a big difference" vs. "this is a small difference". If it gets the rank of similar differences a bit wrong, that doesn't really matter, but it does show up as a big difference in Kendall/Spearman. In particular, the *value* difference of scores is what we care about. eg. if it should have been a 6 and you guess 6.1 that's no big deal, but if you guess it's a 9 that's very bad. eg. if your set of MOS scores to match is { 3, 5.9, 6, 6.1, 9 } , then getting the middle 3 in the wrong order is near irrelevant, but the Kendall & Spearman have no accounting for that, they are just about rank order.

The advantage of rank scores of course is that you don't have to do the functional fitting stuff above, which does add an extra bias, because some objective scores might work very well with that particular functional fit, while others might not.


12-05-10 | Health Care and Deficits

I think Republicans have reached a new peak of hypocrisy and inconsistency.

"Deficits are bad" , "Deficits caused the crisis"

First of all this is just ridiculously not true. Deficits may be a problem 10+ years down the road when the interest becomes excessive, but deficit spending in no way hurts an economy (in fact it helps) and the deficits have absolutely nothing to do with the current weakness in our economy.

And furthermore, where were you during GW Bush when taxes were cut and spending raised? Oh, that's right you were writing policy papers that supported Cheney's famous "deficits don't matter" diatribe.

"We need to control health care spending"

Umm , where were you when Obama was trying to control health care spending last term? Oh, that's right, you were doing photo-ops with seniors in Florida guaranteeing them that their benefits would not be cut. WTF.

Oh, and thanks for the prescription drug benefit in which you very specifically ensured pharma companies would get to keep making obscene profits and the government would not be allowed to require cheaper drugs.

"We need to do something drastic to control the deficit" ; "medicare and social security will bankrupt America"

This is not just a Republican lie, but also a "centrist Democrat" lie. It's simply not true. First of all, the main thing we need to worry about is the short term; most of the big costs that have contributed to the recent deficit are temporary, but the big one we can control is to rescind the GWB tax cuts.

One issue is that a lot of the liars like to compare the pre-recession deficit to the post-recession deficit (which is of course absurd, because tax receipts change massively). For example the WSJ did this to claim that the "bush tax cut is not the problem" because before the recession the deficit wasn't so bad. The NYT also did this to point out that "rescinding the bush tax cut is not enough" to fix the deficit, by using post-recession deficit numbers. Umm, are you guys really that fucking stupid or are you intentionally lieing? Of course the dominant factor is whether the economy is in boom or bust; it is foolish to try to completely balance your budget during a recession. Also clearly you can get away with cutting taxes during a boom, but that doesn't mean you should.

But they don't want to do that, so they cast the Medicare/Social Security problem as "intractable". That's nonsense. The way they usually distort it is to say that the "SS fund will be bankrupt at some point". That's absurd, the SS fund's balance has nothing to do with SS's viability, you simply have to look at what the total cost of it would be. One easy way to fix SS funding permanently is to eliminate the cap on SS tax. Medicare involves an even bigger lie, they claim that aging population makes it an inherently rising cost that will crush the youth. This is also not true (and particularly not true as long as we allow immigration). In fact, the thing that will crush us is super-inflationary growth in health care costs.

See for example :

Why Health Reform Must Counter the Rising Costs of Health Insurance Premiums - The Commonwealth Fund
packet_HealthCareWydenBennettused062607.pdf - Powered by Google Docs
Insurance Premiums Still Rising Faster Than Inflation and Wages - NYTimes.com
Health Care Costs - Baby Boomers Not to Blame for Rising Health Care Costs
Fox's Cavuto so wedded to tax cut mythology that he wrongly corrects his guest Media Matters for America
Employer Healthcare Costs Rising Faster Than Inflation - Health News - redOrbit
Critics Still Wrong on What’s Driving Deficits in Coming Years — Center on Budget and Policy Priorities
CBO Extending Tax Cuts Could Have Huge Debt Impact - TheFiscalTimes.com

"states' rights"

I'm pretty sure nobody actually believes this, it's just a way to hide what they really want. Basically any time the federal government does something they don't like they claim to like "state's rights", eg. national health care, environmental regulations, etc. But then when individual states do things they don't want, suddenly the federal government needs to step in, eg. euthanasia, gay marriage, California's higher environmental standards, etc. It's just ridiculously transparently bullshit.

"constitutional originalism"

See above (states' rights). For one thing the whole idea of this is pretty absurd (that we should be bound by the exact intended meaning of the founders), but moreover it's just bullshit that they abuse to push certain policies. Reading the original meaning is so open to interpretation that it's like reading tarot cards - you can find whatever you want in it.


12-05-10 | Spending and Debt

There are a lot of weird things being said in the news and around the net about spending and debt. Let's have a look at some.

One particularly absurd one is that

"Money saved is helping the economy just as much or more than money spent"

The basic argument here is that money that you put in the bank doesn't sit in the bank, it gets lent out to businesses or home buyers or whatever. The point of their argument is that stimuli which cause people or banks to hold less cash in savings aren't actually stimuli, because they just take money from some useful purpose (being loaned out). This argument is massively flawed on two points. One is that the choice is not really an either/or ; it's not "lend money to businesses" OR "consumer buy things with it", when you buy things as a consumer, the money moves from one bank account to another - it's always in a bank, and therefore can also be used for loans (BTW this does suggest that money held in cash is a big disadvantage to the economy - electronic currency can be used twice while cash money sits and does nothing; and all the people "investing" in gold these days must be very bad for the economy, because the gold is just sitting there). Furthermore, the contention that banks loan out money proportional to their reserves is not quite right. In fact, when there are good prospects for investment, they can leverage up many fold. The idea that businesses wouldn't be able to get loans because banks don't have enough cash is not quite right, because the fed basically gives them free cash to invest whenever they want. In our economy, banks basically leverage up to invest in all the possible money returning investments - their balance sheet is proportional to demand of cash, not supply. The book value of money in circulation can be many times the actual supply of money, and that ratio goes up and down based on the amount of good investments available.

"Deficit spending doesn't actually help the economy"

This one is being spread by the Ron Paul anti-deficit types. Their basic argument is that government deficit spending is not a stimulus, because it is just taking money from somewhere else, either by raising taxes or issuing treasuries. They contend that if the money was left where it was it would be doing just as much good because they claim money in savings accounts is "working" (they love to use silly quotes like "money never sleeps"). From the above argument we can see the hole in this - what government spending actually does is create more demand for money, which may be a useful thing in times of low demand. There is the issue of how much utility you get from the government's choice of spending, but it's certainly greater that zero, so government spending is better than no spending. Also, the people who argue this point seem to be intentionally obtuse about the benefit of foreign capital. Foreign capital is clearly a stimulus in the short term; it's more subtle what the long term effect is, but they argue that it's not even beneficial in the short term. If your economy was actually healthy and provided good investment oppportunities, foreign capital would be coming in that way.

"Government dollars from tax-and-spend do not help the economy"

Now that we've gone over the first two, we can address this slightly more subtle point. What actually happens when the government taxes some money and then spends it? The money is not disappearing from the private economy, it is being spent on something which has some utility, and then it goes back into private circulation. The only possible disadvantage is that during that cycle, the money can't be used for other things (because transactions are not instant but rather take time), and if it was left in private hands, during that time it could be spent on something with higher utility. I believe that the measure of "best" we should use is maximizing the total utility to the citizens. So the question comes down to - would it be spent if it was left in private hands? If it would have just been put in savings, then we should see that the government spending is better. Even if private people would have spent it, I believe the issue is whether their utility would be higher than the governments. Now, in theory if people are rational actors, then being able to make their own choice about how their money is spent should result in higher utility (this is the basic theory behind free market capitalism), but that isn't always the case. For one thing it can happen simply because individual spenders don't have the same choices as a collective force like government; eg. government can buy health care with negotiated prices and force prescription drug makers to void patents (not that the US does this), things that individuals could not.

I started thinking about all this because I was thinking about Christmas spending. It's often said that Christmas spending is a big stimulus to the economy. But is it really? Let's say there was no Christmas, people would still buy lots of things they need, but they would make better choices about it. Christmas forces you to spend a lot more than you might otherwise, and also to buy lots of things that the recipient doesn't really want - these are near zero utility purchases (not quite zero, because the satisfying the social obligation of giving a gift is a form of utility).

It seems obvious to me that instead of spending a thousand dollars on Christmas presents that nobody really wants (low utility), the world would be better off if we took that Christmas money and gave it to the government to spend on roads, transit, schools, etc. - things that will actually help us all.


12-03-10 | Politics Misc Rambling

One of the tackiest most retarded things you can do in an argument is to use perjorative descriptors of the other person's argument, such as "naive" or "indefensible" or "uninformed" or whatever. If you disagree with something, you should argue based on the merits of the issue. Perhaps, not surprisingly, this sort of lazy nastiness is rampant, even in such respected places as the WSJ, NYT, and me.

One of the great cons is that many reasonable people have fallen into this trap that anyone who doesn't go along with mainstream thinking is a crackpot. They are simply laughed at or made fun of instead of having their actual issue addressed. It's a way of shutting out any viewpoints outside of the mainstream norm, it happens all the time in the news media, and also in casual conversation.

I think the debate about "left" vs "right" is artificial and obscures the real issue. The real issue is corrupt and retarded government (and populace). Turning it into a theoretical ideological question about whether laissez faire or intervention is a better policy is a false elevation of the issue. It elevates it to a difficult to solve conceptual issue, in reality the problem is just massive corruption.

Let me clarify that because I believe it's an important point. Yes there are a lot of difficult theoretical issues in governance, questions of how much social net a government should provide, when and how much we should have protectionism, etc.. These are difficult and interesting questions, but they are *irrelevant*. Our actual government is a sewer of mismanagement, intentional lies and manipulation. We could massively improve things without ever touching those difficult theoretical issues.

It's sort of like optimization issues in video game code. Often I would find myself in meetings where people were debating whether they should expose a SIMD vector type throughout the code, or whether to use a kdtree or a bsp tree, or whatever. You know what? It's irrelevant. You're calling SuperSlowFunction() a thousand times per frame that you don't need to be. Go fix your huge obvious easy fuckup. By the time you've fixed all the easy obvious fuckups, the game works fine and its time to ship. The difficult theoretical issue that we disagreed on is like an improvement that's way off in tweaky epsilon space that we will never reach in the real world. It's like all the retarded fat people who want to talk about whether fructose or sucrose is more harmful to their glycogen/lipid pathway - umm, hello, put down the fucking beer and the bacon-stuffed twinkie, the scientific details are not why you are fat.

Probably the biggest example of this that's in my mind these days is the issue of capitalism and regulation and free markets in America. This is irrelevant; we don't have anything like free markets, and we never will. What we have are lots of government-enforced monopolies, illegal collusion between corporations, sweetheart deals between governments and certain corporations. See for example : cable, health care, defense, mining/oil, finance, etc.

Any reasonable liberal and any reaonsable free-marketer should be in agreement that these corrupt alliances between corporations and governments are bad for society. The intellectual debate is irrelevant, and it's only used as an excuse to convince us that these issues are "hard" or that "people don't agree on the solution".

There's an awful lot of insanity being bandied about these days. It's important to stop every once in a while and question basic things.

Why are taxes supposed to be bad for an economy? They don't actually take money out of circulation. (if you're using the taxes to pay off debts, then they do, so in fact all the "free market conservatives" who think belt-tightening and paying off our defecit will somehow help the economy are seriously smoking crack; paying off debt is certainly bad in the short term, and should generally be done only when necessary, but that's a complex issue so let's ignore it). Anyway, ignore the paying off debt issue and assume that taxes will just be used for government spending. So the money is getting circulated and put back into the economy. We know that circulating money in general is good for economies. So taking money that a rich person would've just put into the bank and instead taxing and spending it should in fact help growth. The only argument I can see against this logic is that government spending may not be as efficient as private spending; that is, the utility gain per dollar spent may be higher when people are allowed to keep their money and spend it themselves instead of having it taxed and spent by the government. (please don't tell me anything about government "beaurocracy" or "waste" ; the money doesn't disappear somehow, just because a tiny fraction of winds up in federal employee salaries doesn't prevent it from getting back into circulation). (more on this in a future post)

Why didn't places like Ireland just let its banks fail? They had masses of toxic debt, much more than the US as a percentage of GDP. The only convincing reason is that the international community would punish them by not investing in them in the future. Which seems to argue that capital depends on the massive implicit government underwriting of their risk!

During the internet boom, investment banks were taking completely bullshit dot-coms public, while their own analysts were offering "buy" recommendations on those stocks, and their own portfolio managers were putting those stocks in funds. The bubble and bust was not caused by a natural "market cycle", it was caused by massive corruption, which was possible due to deregulation, which let various aspects of financial services work together in collusion.

The US government is not allowed to negotiate fair prices for prescription drugs, unlike most other countries in the world. We're also one of very few countries where pharmaceutical companies are allowed to charge any amount they want for patented drugs (many don't allow patents on necessary drugs, or require some reasonable low price). And the medical research supporting drugs is funded by the pharma companies making those drugs.

The idea that we can have a successful "service" or "knowledge" based economy. That's never been done in the history of the world. We don't have it now, and the contention that we somehow have an edge on this is an egotistical delusion; we might right now, but it is rapidly declining.

The idea that inventions can save the economy. This is a particular canard which the NYT loves, the idea that we need some "breakthrough technology" to save our economy. It's sort of like planning to win the lottery to save your personal finances. Sure it's nice when you invent the steam engine, but it's not an issue in governance.

The idea that the "internet boom" was a great time for the economy. It was pretty good overall for tech companies (I've seen figures suggest that from pre-boom to post-crash the average tech stock went up around 10%, which is not bad), but during that time median income didn't budge, indicating that the net result was an increase in GDP accompanied with an increase in income inequality - eg. the rich got richer.

You might be impressed with our FBI's ability to infiltrate and stop domestic terrorists before they can strike. But in almost all (maybe all?) of those cases, those cells were actually created by the FBI, by planting instigators and providing the weapons and ideas for the strike.

Our government doesn't allow journalists independent access to Iraq or Afghanistan. We receive almost zero unfettered coverage of what's really going on in our war zones. Our government is right now spying on peaceful domestic protest groups. Our government is opaque in ways it never has been before.

The greatest American period of prosperity was from 1940-1970, during which we had a huge debt, high taxes, lots of government spending, and high personal savings rates. None of the things that are supposed to correlate with prosperity actually do.


12-03-10 | WikiLeaks

I greatly admire what WikiLeaks is doing. Yes some of the things are not perfected, maybe they endangered some undercover operatives (a favored activity of our own government, BTW).

It's foolish and reductive to dismiss a major act of good just because it's not perfect. It's ridiculous to hold them to a standard of moral perfection.

I believe that our government is corrupt and immoral, and furthermore that the system is broken such that attempts to change it by legal means are hopeless. Any brave, moral, righteous individual should work to improve things by any means necessary.

The Obama administration had its chance to back out many of the evils of the last administration, and they have chosen not to. Let's do a quick review of what's wrong :

Independent reporters are not allowed to cover our wars.

Our government creates propaganda/policy reports and injects them into the press as if they were independent, using hired "experts" and fake reporters.

Our government blacks out documents that they are required by law to make public, even though they have no real national security risk.

Our executive branch refuses to comply with congressional subpeonas, in violation of the law.

We hold non-combatants prisoner and refuse to give them civilian trials.

We continue to extradite and torture prisoners.

The NSA continues to snoop on our communications.

The FBI continues to investigate peaceful antiwar activist groups.

Our government doesn't allow investigation into the several cases where we know that we tortured innocent civilians, or into the fabrication of the cause for war in the original Iraq invasion.

.. etc, etc. Our government has demonstrated over and over a lack of respect for transparency, the need to disclose their function to the scrutiny of the public, and international law.

You can no longer trust our government to tell us whether information really is a national security risk. They've cried that wolf too many times. You simply cannot trust them on that any more, they use it to hide their own illegal activity or simply politically embarassing activity, therefore any information is fair game.

Large companies (like Amazon, PayPal, Swiss Bank, Visa, etc) are cooperating with the government's attempts to shut down WL. WL is not charged with any crime, so these companies are under no obligation to cooperate with the government campaign for silence.

The mainstream media (like the NYT) is cooperating with the attempt to smear Mr. Assange as a weirdo, sexual criminal, subversive, whatever. He may be those things, but his personal character is not the issue. We should be talking about the content of the leaks and how our government continues to fail to stand up to the truth and account for its actions.

I strongly believe that the world needs a new internet. A free, open, shadow internet, that is entirely peer-to-peer and encrypted, so it cannot be blocked in Iran or China, so that governments cannot control what's posted on it.

I think that people using their computer skills to try to change the world for the better (whether you agree with them or not) is admirable, and to do so at serious risk to themselves is heroic.

I assume that WikiLeaks will be shut down at some point, but I hope that an even larger free-information movement rises in its wake.


12-02-10 | Orphaned Tech Ideas

If you have a really good idea, you have to just go for it right away. Sometimes I have an idea, but I'm busy with something else, so I just write it down, and then try to come back to it later, but when you come back the excitement is gone. You have to pounce on that moment of inspiration.

That said, there are many things I'd like to do right now but probably have to just set aside and never come back to :

DXTC for lightmaps (and other non-photographic data). In particular there should be a way to preserve smooth data with much higher precision.

DXTC optimized for perceptual metrics. Should be easy, I'll probably do this.

cblib::hash_table entry currently always has {hash, key, data} , should change that to a functor with calls { hash(), key(), data() } so that user can choose to store them or not. (in many cases you don't need seperate key and data because the key is the data).

csv util suite. csv -> google chart, csv lsqr fit, csv eval, etc. basically matlab in command line utils that work on csv's. Would let me do processing from batch files and make html logs of compression runs and such.

Texture synthesis from examples. Fast enough for realtime so it can be used with SVT. Should be progressive. I believe this is the right way to do large world texturing. Texture synthesis from procedural/perlin noise stuff is too hard for artists to use, and doing fancy synthesis offline and then storing it ala Id just strikes me as profoundly silly.

Really good JPEG decoder. Maximum-likelihood decompression. Nosratinia and POCS. You can basically remove all blocking/ringing artifacts. This seems to me like a very useful and important piece of software and I'm surprised it doesn't exist.

Luma aided chroma upsample. Could be part of "Really good JPEG decoder" ; also useful for improving video decompression quality.

Finish lossy image comparison test and release the exe which autogens nice HTML reports.


12-02-10 | Perceptual Metric Rambles of the Day

There's an interesting way in which saliency (visual attention) fights masking (reduction of sensitivity due to local variance). People who weight image metrics by saliency give *more* importance to high contrast areas like edges. People who weight image metrics by masking give *less* importance to high contrast areas. Naively it seems like those would just cancel out.

It's subtle. For one thing, a smooth area is only not "salient" if it's smooth in both the original and the disorted image. If you do something like creating block artifacts, then those block edges are salient (high visual attention factor). Also, they work at sort of different scales. Saliency works on intermediate frequencies - the brain specifically cares about detail at a certain scale that detects edges and shapes, but not speckle/texture and not gross offsets. Contrast or variance masking applies mainly to the aditional frequencies in an area that has a strong medium-scale signal.

MS-SSIM actually has a lot of tweaked out perceptual hackiness. There's a lot of vision modelling implicit in it. They use a Gaussian window with a tweaked out size - this is a model of the spatial filter of the eye. The different scales have different importance contributions - this is a model of the CSF, variable sensitivity and different scales. The whole method of doing a Pearson-style correlation means that they are doing variance masking.

The issue of feeding back a perceptual delta into a non-perceptual compressor is an interesting one.

Say you have a compressor which does R-D optimization, but its D is just MSE. You can pretty easily extend that to do weighted R-D, with a weight either per-pixel or per-block. Initially all the weights are the same (1.0). How do you adjust the weights to optimize an external perceptual metric?

Assume your perceptual metric returns not just a number but an error map, either per pixel or per block.

First of all, what you should *not* do is to simply find the parts with large perceptual error and increase their weight. That would indeed cause the perceptual error at those spots to reduce - but we don't know if that is actually a good thing in an R-D sense. What that would do is even out the perceptual error, and evening it out is not necessarily the best R-D result. This is, however, an easy to get the minimax perceptual error!


Algorithm Minimax Perceptual Error :

1. Initial W_i = 1.0
2. Run weighted R-D non-perceptual compressor (with D = weighted MSE).
3. Find perceptual error map P_i
4. W_i += lambda * P_i
    (or maybe W_i *= e^(lambda * P_i))
5. goto 2

what about minimizing the total perceptual error?

What you need to do is transform the R/D(MSE) that the compressor uses into R/D(percep). To do that, you need to make W_i ~= D(percep)/D(mse).

If the MSE per block or pixel is M_i, the first idea is just to set W_i = ( P_i / M_i ) . This works well if P and M have a near-linear relationship *locally* (note they don't need to be near-linear *globally* , and the local linear relationship can vary from one location to another).

However, the issue is that the slope of P_i(R) may not be near linear to M_i(R) (as functions of rate) over its entire scale. Obviously for small enough steps, they are always linear.

So, you could run your compress *twice* to create two data points, so you have M1,M2 and P1,P2. Then what you want is :


P ~= P1 + ((P2 - P1)/(M2 - M1)) * (M - M1)

slope := (P2 - P1)/(M2 - M1)

P ~= slope * M + (P1 - slope * M1)

C := (P1 - slope * M1)

P ~= slope * M + C

I want J = R + lambda * P

use

J = R + lambda * (slope * M + C)
J = R + lambda * slope * M + lambda * C

so

W_i = slope_i

J = R + lambda * W * M

with the extra lambda * C term ; C is just a constant per block, so it can be ignored for R-D optimization on that block. If you care about the overall J of the image, then you need the sum of all C's for that, which is just :

C_tot = Sum_i (P1_i - slope_i * M1_i)

it's obviously just a bias to turn our fudged MSE-multiplied-by-slope into the proper scale of the perceptual error.

I think this is pretty strong, however it's obviously iterative, and you must keep the steps of the iteration small - this is only valid if you don't make very big changes in each step of the iteration.

There's also a practical difficulty that it can be hard to actually generate different M1 and M2 on some blocks. You want to generate both M1 and M2 very near the settings that you will be using for the final compress, because large changes could take you to vastly different perceptual areas. But, when you make small changes to R, some parts of the image may change, but other parts might not actually change at all. This gives you M1=M2 in some spots which is a divide by zero, which is annoying.

In any case I think something like this is an interesting generic way to adapt any old compressor to be perceptual.

ADDENDUM :

Note that this kind of "perceptual importance map" generation cannot help your coder make internal decisions based on perceptual optimization - it can only affect bit allocation. That is, it can help a coder with variable bit allocation (either through truncation or variable quantization) to optimize the bit allocation.

In particular, a coder like H264 has lots of flexibility; you can choose a macroblock mode, an inter block can choose a movec, an intra block can choose a prediction mode, you can do variable quantizers, and you can truncate coefficients. This kind of iterative approach will not help the mode decisions to pick modes which have less perceptual error, but it will help to change the bit allocation to give more bits to blocks where it will help the preceptual error the most.


12-01-10 | Pratt

Pratt Arts Center in Seattle provides classes and equipment in various "fabrication" or "industrial" arts. They do ceramic, letterpress, glass blowing, cold glass working, forging, casting, stone carving, etc.

We did a weekend glass blowing class a while ago. It was a very powerful experience. You walk in and there's a whoosing sound like a jet engine from the air rushing through the furnaces. The heat and glow is immense. I'm normally just dieing of boredom when teachers go over safety rules and shit like that, but the obvious seriousness of the hot glass made me rapt.

The really cool moment is when you've got a blob of molten glass on your rod and you're trying to blow it and work it and stretch it, I found myself going into this like trance. It's a beautiful kind of trance that I used to get from writing code, where you are concentrating to fully on one little thing, that the rest of the world completely disappears, you get tunnel vision, and you don't know how much time is passing. We just made some goofy little hollow spheres, but after five minutes I was exhausted from the concentration.

It's a pretty unique place to be able to go and play with that stuff. There's at least an annual open house where anyone can stop by and see demos of all the stuff going on, it's worth doing.

Anyway, while I was there I was thinking about the weirdness of economies.

Many of the arts/crafts done at Pratt are fabrication skills that 50 years ago were done in factories in America to make commercial goods. Now that is all gone. You can still get most of those things, like hand-forged wrought iron or hand-blown glass, but it's now done by "artisans" instead of laborers, and it's some frufru thing that's only for the rich. Essentially it's now become much *more* expensive than it used to be, because we no longer have cheap skilled labor to do these things.

There's another weird thing. I can go to Cost Plus and buy a hand blown wine glass for $5-$10. Wine glasses are considered extremely difficult by glass "artists" ; it would cost $100-200 or something to get one made here; of course they don't even make them, because it's too difficult and you can't charge enough to make up for it (except for the morons who custom order wine glasses flecked with color or with unicorn bases or something). I imagine that the third world country laborers that are making the cheap blown shit that we buy are actually much more skilled at the craft of blowing glass - I mean, they must be able to just crank out perfect orbs over and over. For some reason in America we can't have support local skilled craftsmen just making good stuff, but we can support people doing the crafts for "art" which basically involves intentionally doing it badly. Like you're trying to blow a nice big clear sphere, and you fuck up and get some random bits of color in it, and then you let it sag and deflate so it gets wrinkly and droops to one side, blammo now we can sell it. I dunno I guess it makes sense, but the whole phenomenon of resurrecting dead trade skills as art is weird to me.


11-30-10 | Tchebychev

A little utility, and example of how I use templates : (this is not meant to be efficient, it's for offline apps, not realtime use)

template < int n >
struct Tchebychev
{
    double operator () ( double x )
    {
        return 2.0 * x * Tchebychev < n-1>()(x) - Tchebychev < n-2>()(x);
    }
};

template < >
struct Tchebychev < 0 >
{
    double operator () ( double x )
    {
        return 1.0;
    }
};

template < >
struct Tchebychev < 1 >
{
    double operator () ( double x )
    {
        return x;
    }
};

double TchebychevN(int n,double x)
{
    switch(n) {
    case 0 : return Tchebychev<0>()(x); 
    case 1 : return Tchebychev<1>()(x); 
    case 2 : return Tchebychev<2>()(x); 
    case 3 : return Tchebychev<3>()(x); 
    case 4 : return Tchebychev<4>()(x); 
    case 5 : return Tchebychev<5>()(x); 
    case 6 : return Tchebychev<6>()(x); 
    case 7 : return Tchebychev<7>()(x); 
    case 8 : return Tchebychev<8>()(x); 
    case 9 : return Tchebychev<9>()(x); 
    NO_DEFAULT_CASE
    }
    return 0;
}

template < typename functor >
void FindTchebychevCoefficients(functor f,double lo = 0.0, double hi = 1.0, int N = (1<<20))
{
    double PI_over_N = PI/N;

    #define NUM_T   6

    double t[NUM_T] = { 0 };
    
    for(int k=0;k < N;k++)
    {
        double pk = PI_over_N * (k + 0.5);
        double x_k = cos(pk);
        double farg = lo + (hi - lo) * 0.5 * (x_k+1.0);
        double fval = f( farg );
        for(int j=0;j < NUM_T;j++)
        {
            t[j] += fval * cos( pk * j );
        }
    }
    for(int j=0;j < NUM_T;j++)
    {
        t[j] *= 1.0 / N;
        if ( j != 0 )
            t[j] *= 2.0;

        //lprintfvar(t[j]);
        lprintf("t %d: %16.9f , %16.8g\n",j,t[j],t[j]);
    }
    
    double errSqr[NUM_T] = { 0 };
    double errMax[NUM_T] = { 0 };
    for(int i=0;i < N;i++)
    {
        double xunit = (i+0.5)/N;
        double farg = lo + (hi - lo) * xunit;
        double xt = xunit*2.0 - 1.0;
        double y = f(farg);
        double p = 0.0;
        for(int j=0;j < NUM_T;j++)
        {
            p += t[j] * TchebychevN(j,xt);
            errSqr[j] += fsquare(y-p);
            errMax[j] = MAX( errMax[j] , fabs(y-p) );
        }
    }
    for(int j=0;j < NUM_T;j++)
    {
        lprintf("%d : errSqr = %g errMax = %g\n",j,errSqr[j]/N,errMax[j]);
    }
}

You can also very easily find inner products by using the Integrate<> template I posted earlier plus this simple product adaptor :


template < typename f1, typename f2 >
struct FunctorProduct
{
    double operator () ( double x )
    {
        return f1()(x) * f2()(x);
    }
};

(eg. you can trivially find Hermite or Legendre series this way by doing inner products).

Another handy helper is a range remapper :


template < typename functor >
struct RemapFmTo
{
    double m_fmLo; double m_fmHi;
    double m_toLo; double m_toHi;
            
    RemapFmTo( 
            double fmLo, double fmHi,
            double toLo, double toHi )
    : m_fmLo(fmLo), m_fmHi(fmHi), m_toLo(toLo), m_toHi(toHi)
    {
    }
    
    double operator () ( double x )
    {
        double t = (x - m_fmLo) / (m_fmHi - m_fmLo);
        double y = t * (m_toHi - m_toLo) + m_toLo;
        return functor()(y);
    }
};

template < typename functor >
struct RemapUnitTo
{
    double m_toLo; double m_toHi;
            
    RemapUnitTo( 
            double toLo, double toHi )
    :  m_toLo(toLo), m_toHi(toHi)
    {
    }
    
    double operator () ( double x )
    {
        double y = x * (m_toHi - m_toLo) + m_toLo;
        return functor()(y);
    }
};

Now we can trivially do what we did before to find the optimal approximation in a known range :


    struct SqrtOnePlusX
    {
        double operator () ( double x )
        {
            return sqrt( 1 + x );
        }
    };


    RemapUnitTo< SqrtOnePlusX > tf(-0.075f,0.075f);
    FindTchebychevCoefficients( tf );

and the output is :

t 0:      0.999647973 ,       0.99964797
t 1:      0.037519818 ,      0.037519818
t 2:     -0.000352182 ,   -0.00035218223
t 3:      0.000006612 ,   6.6121459e-006
t 4:     -0.000000155 ,  -1.5518238e-007
t 5:      0.000000004 ,   4.0790168e-009
0 : errSqr = 0.000469204 errMax = 0.0378787
1 : errSqr = 5.78832e-008 errMax = 0.000358952
2 : errSqr = 2.12381e-011 errMax = 6.77147e-006
3 : errSqr = 1.18518e-014 errMax = 1.59378e-007
4 : errSqr = 8.23726e-018 errMax = 4.19794e-009
5 : errSqr = 6.54835e-021 errMax = 1.19024e-010

I guess Remez minimax polynomials are better, but this is so easy and it gives you a good starting point, then you can just numerically optimize after this anyway.

ADDENDUM :

obviously the TchebychevN dispatcher sucks and is silly, but you don't actually use it in practice; you know which polynomials you want and you use something like :


double approx = 
    c0 * Tchebychev<0>(x) + 
    c1 * Tchebychev<1>(x) + 
    c2 * Tchebychev<2>(x) + 
    c3 * Tchebychev<3>(x);

which the compiler handles quite well.

I also did Legendre polynomials :


template < int n >
struct Legendre
{
    double operator () ( double x )
    {
        double L_n1 = Legendre < n-1 >()(x);
        double L_n2 = Legendre < n-2 >()(x);
        const double B = (n-1)/(double)n; // n-1/n
        return (1+B)*x*L_n1 - B*L_n2;
    }
};

template< > struct Legendre<0>
{
    double operator () ( double x ) { return 1.0; }
};

template< > struct Legendre<1>
{
    double operator () ( double x ) { return x; }
};


// but ryg's loop variants are mostly just better :

__forceinline double legendre_ryg(int n, double x)
{
  if (n == 0) return 1.0;
  if (n == 1) return x;

  double t = x, t1 = 1, t2;
  for (int i=1; i < n; i++) {
    t2 = t1; 
    t1 = t;
    const double B = (i)/(double)(i+1);
    t = (1+B)*x*t1 - B*t2;
  }
  return t;
}

__forceinline double chebyshev_ryg(int n, double x)
{
  if (n == 0) return 1.0;
  if (n == 1) return x;

  double t = x, t1 = 1, t2;
  for (int i=1; i < n; i++) {
    t2 = t1;
    t1 = t;
    t = 2.0 * x * t1 - t2;
  }
  return t;
}

// you can find Legendre coefficients like so :

    RemapUnitTo< Legendre<0> > ShiftedLegendre0(-1,1);

    double c0 = Integrate(0.0,1.0,steps,MakeFunctorProduct(functor(),ShiftedLegendre0));
    // (don't forget normalization)

// using new helper :

template < typename f1, typename f2 >
FunctorProduct< f1,f2 > MakeFunctorProduct( f1 _f1 , f2 _f2 ) { return FunctorProduct< f1,f2 >(_f1,_f2); }


11-30-10 | Reference

You can do weighted least squares using a normal least squares solver. To solve :

A x = b 

in the least squares sence (that is, minimize |Ax - b|^2) , with weights W , you just are minimizing the error :

E = Sum_i { W_i * ( Sum_j {A_ij * x_j} - b_i )^2 }

E = Sum_i { ( Sum_j { sqrt(W_i) * A_ij * x_j} - sqrt(W_i) * b_i )^2 }

so set

A'_ij = A_ij * sqrt(W_i)
b'_i = b_i * sqrt(W_i)

then

E = Sum_i { ( Sum_j { A'_ij * x_j } - b'_i )^2 }

so we can just do a normal lsqr solve for x using A' and b'.


If you have a Gaussian random event, probability of x is P(x), the (natural) bits to code an event is -ln(P(x)) =


P(x) = 1/(sigma * sqrt(2pi)) * e^( - 0.5 * ((x-c)/sigma)^2 )

H = -ln(P(x)) = ln(sigma * sqrt(2pi)) + 0.5 * ((x-c)/sigma)^2

H = const + ln(sigma) + 0.5 * ((x-c)/sigma)^2

And bits to code events is a good measure of how well events fit a model, which is useful for training a model to observed events. In particular if we ignore the constants and the ln term,

total H = Sum_ij { (1/sigma_i^1) * (x_j - c_i)^2 }

the bit cost measure is just L2 norm weighted by one over sigma^2 - which is intuitive, matching the prediction of very narrow Gaussians is much more important than matching the prediction of very wide ones.


Now that I have autoprintf, I find this very handy :


    #define lprintfvar(var) lprintf(STRINGIZE(var) " : ",var,"\n");


11-29-10 | Useless hash test

Clocks to do the full hash computation and lookup on char * strings of english words :


hash functions :

 0 : FNV1A_Hash_Jesteress
 1 : FNV1A_HashStr_WHIZ
 2 : FNVHashStr
 3 : HashFNV1
 4 : MurmurHash2_Str
 5 : HashKernighanRitchie
 6 : HashBernstein
 7 : HashLarson
 8 : Hash17
 9 : Hash65599
10 : stb_hash
11 : HashWeinberger
12 : HashRamakrishna
13 : HashOneAtTime
14 : HashCRC

The hash functions are not given the string length, they must find it. For hashes that work on dwords the fast way to do that is using


#define DWORD_HAS_ZERO_BYTE(V)       (((V) - 0x01010101UL) & ~(V) & 0x80808080UL)

Hash functions are all inlined of course. The easy way to do that is to use macros to make functors for your hash functions and then pass them into a templated hash test function, something like this :


#define TEST_HT(hashfunc) \
  struct STRING_JOIN(htops_,hashfunc) : public hash_table_ops_charptr \
   { inline hash_type hash_key(const char_ptr & t) { return hashfunc( t ); } }; \
  typedef hash_table < char_ptr,Null,STRING_JOIN(htops_,hashfunc) > STRING_JOIN(ht_,hashfunc); \
  test_a_hash < STRING_JOIN(ht_,hashfunc)>(STRINGIZE(hashfunc),words,queries);

TEST_HT(FNV1A_HashStr_WHIZ);
TEST_HT(FNVHashStr);

This is using cblib:hash_table which is a reprobing hash table, with the hash values in the table, quadratic reprobing, and with special key values for empty & deleted (not a separate flag array). cblib:hash_table is significantly faster than khash or std::hash_map (though it takes more memory than khash and other similar hash tables due to storing the hash value). (BTW this was studied before extensively; see earlier blog posts; the four major deviations from khash are all small wins : 1. don't use separate flags because they cause an extra cache miss, 2. use cache-friendly quadratic reprobe, not rehashing, 3. use pow2 table size not prime, 4. store hash value).

The four tests in the chart above are :


blue:   StringHash_Test("M:\\CCC\\canterbury\\LCET10.txt","M:\\CCC\\canterbury\\PLRABN12.txt");
yellow: StringHash_Test("m:\\ccc\\book1","m:\\ccc\\book1");
red:    StringHash_Test("M:\\CCC\\words\\Sentence-list_00,032,359_English_The_Holy_Bible.txt","M:\\CCC\\words\\Word-list_00,038,936_English_The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd");
green:  StringHash_Test("m:\\ccc\\large\\enwik8","m:\\ccc\\book2");

make hash from words in first arg
lookup words in second arg

(generally second file is much smaller than first, so we get some hits and some misses)

Anyway, the result is that the Whiz and Jesteress variants of FNV1 by Georgi 'Sanmayce' are in fact quite good in this kind of usage. They rely on having fast unaligned dword reads, so they are pretty much Intel-only, which makes them a bit pointless. But here they are :


U32 HashFNV1(const char *key) 
{
    U32 hash = 2166136261;
    while(*key)
        hash = (16777619 * hash) ^ (*key++);
    return hash;
}

inline uint32 FNV1A_HashStr_WHIZ(const char *str)
{
    const uint32 PRIME = 709607;//15607;
    uint32 hash32 = 2166136261;
    const char *p = str;

    for(;;)
    {
        uint32 dw = *(uint32 *)p;
        if ( DWORD_HAS_ZERO_BYTE(dw) )
            break;

        hash32 = (hash32 ^ dw) * PRIME;
        p += sizeof(uint32);
    }
    while( *p )
    {
        hash32 = (hash32 ^ *p) * PRIME;
        p++;
    }
    
    return hash32;
}

U32 FNV1A_Hash_Jesteress(const char *str)
{
    const U32 PRIME = 709607;
    U32 hash32 = 2166136261;
    const char *p = str;

    for(;;)
    {
        uint32 dw1 = *(uint32 *)p;
        if ( DWORD_HAS_ZERO_BYTE(dw1) )
            break;
        
        p += 4;
        hash32 = hash32 ^ _lrotl(dw1,5);
        
        uint32 dw2 = *(uint32 *)p;
        if ( DWORD_HAS_ZERO_BYTE(dw2) )
        {
            // finish dw1 without dw2
            hash32 *= PRIME;
            break;
        }
        
        p += 4;
            
        hash32 = (hash32 ^ dw2) * PRIME;        
    }
    
    while( *p )
    {
        hash32 = (hash32 ^ *p) * PRIME;
        p++;
    }
    
    return hash32;
}

I have no doubt these could be massaged to be a bit faster through careful study of the asm (in particular the handling of the 0-3 byte tail doesn't look awesome). An even better thing would be to ensure all your strings are 4-byte aligned and that after the null they have null bytes to fill the final dword.

Anyway, you can pretty much ignore all this, because hash functions have to be tested in-situ (eg. on your exact data, on your hash table implementation, on your platform), but it is what it is.

BTW FNV1 is a lot faster than FNV1a. Also the mixing of Whiz and Jesteress are quite poor and they do have a lot more collisions on some values, but that appears to not matter that much on this particular kind of test.


11-20-10 | Function approximation by iteration , Part 3 : Reciprocal Square Root

See part 1 or part 2 .

Now that we have our toolbox, we can attack a popular issue in 3d engines - the reciprocal square root (used primarily in normalizing vectors). We want to find y = 1 / sqrt(x) .

Let me first state that I'm not doing all this because it's actually important for optimization. If you care about speed all you have to do is set :


 /arch:SSE2 /fp:fast

and get on with your life. Worrying about this issue is a clear case of pointless micro-optimization. I'm doing this for the mathematical curiosity. Nevertheless, lots of people continue to worry about this :

Some Assembly Required - Timing Square Root
ompf.org • View topic - SSE Patterns
Ken Turkowski's Computer Graphics Papers
Inverse Square Root ECE 250 University of Waterloo
HLNUM.ORG Documentation RSqrt
hlnum frsqrt
High order algorithms to find square roots and inverses
computation of square root & reciprocal calculation

To do reciprocal square root we're going to start with y0 from the IEEE floating point trick. In games this is commonly know as "Carmack's trick from Quake 3" , but it goes back to Greg Walsh apparently :

Beyond3d part 1
Beyond3d part 2

Also on the Slashdot page about that article Paul Hsieh mentions that apparently Kahan knew of this idea, but doesn't know the time frame exactly.

BTW Chris Lomont has the best paper about this IEEE trick for recipsqrt, but the Wikipedia page is a good start

So the initial approximation is :


const int magic = 0x5f375a86;

inline float rsqrtf_walsh( float x )
{
    union { int i; float f; } u;
    u.f = x;
    u.i = magic - (u.i>>1);
    return u.f;
}

(more on the magic number later, but see Lomont for an explination of where it comes from). (BTW I don't mean to imply that it's "magic" by using that name; it's a standard IEEE bit trick).

This has rather large error so we are going to use this as y0 and find an iteration for y1.

We can now use our dimensional analysis method. Our unitless number must be


u = x * y0*y0

and our iteration must be :

y1 = y0 * f(u)

and we'll look at a few possible forms for f :

f1 = A + B * u

f2 = A + B / u

f3 = A + B * (1-u) + C * (1-u)^2

we can confirm that these forms are sane by finding them in other ways.

If we set f(y) = 1/y^2 - x , then the root f(y) = 0 occurs where y = 1/sqrt(x) , so we can use Newton's method to find the roots of f(y). The result is :


y1 = y0 * ( (3/2) - 1/2 x y0^2 )

or :

f1 , with A = 3/2 , B = - 1/2

If we set f(y) = y^2 - 1/x , the root is in the same place and we can use Newton's method and we get :

y1 = 0.5 * ( y0 + 1 / (x*y0) )

or :

f2 , with A = B = 1/2

If we use the first f(y) form and do a "Halley" or "Householder" iteration instead of a Newton iteration, that's form f3 with A = 1 , B = 1/2, C = 3/8.


(maximum % relative error) :

y0 : (no iteration) : 3.436526

y0 + one Newton     : 0.175126  (f1 with A = 3/2 , B = - 1/2)

y0 + two Newtons    : 0.000470

y0 + one Babylonian : 0.060595  (f2 with A = B = 1/2)

y0 + one Halley     : 0.010067  (f3 with A = 1 , B = 1/2, C = 3/8)

obviously these methods do not have equal computational complexity. In particular, the Babylonian method uses a divide, so it's much slower than the other methods. A serious study of function approximation should compare the quality to the CPU cost to make sure the method is at a good place on the speed/quality curve.

But of course we know the punch line is coming : we don't need to use those values for ABC. Because the initial estimate y0 has a known error range, we can optimize ABC over that range. We won't bother with doing it analytically, we'll just numerically search. The result is :


maximum percent relative error :

y0 : (no iteration) : 3.436526

y0 + one Newton     : 0.175126  (f1 with A = 3/2 , B = - 1/2)
method f1 optimal   : 0.087665  ( A = 1.501338 , B = -0.500461 )

y0 + one Babylonian : 0.060595  (f2 with A = B = 1/2)
method f2 optimal   : 0.030292  ( A = B = 0.499848 )

y0 + one Halley     : 0.010067  (f3 with A = 1 , B = 1/2, C = 3/8)
method f3 optimal   : 0.002523  ( A = 1.0 , B = 0.5011, C = 0.375608 )

by optimizing the coefficients we get roughly a 2X reduction of maximum error *for free* ! (4X for Halley - because it's quadratic; two newton steps would also get a 4X improvement). It's no more CPU work because a float constant multiply is a float constant multiply. We're just using better constants. As noted in the last post - these constants are no longer an iteration that you can repeat over and over, they are optimized for the first step only.

Lastly, let's go back to that magic number 0x5f375a86 that was used in the y0 approximation. Can we do better if we optimize that? The answer is basically no. (* see addendum)


In Quake3 Carmack used 0x5f3759df ; Carmack + 1 Newton has a max relative error of 0.175214 %

In his paper Lomont finds 0x5f375a86 is slightly better ; Lomont + 1 Newton gives 0.175129 %

but that level of tweakage is pretty irrelevant because they missed the big tweak, which is using an optimal iteration instead of a newton iteration :

inline float rsqrtf_cb_method1( float x )
{
    const int magic = 0x5F37592F;
    const float A = 1.501338f;
    const float B = -0.500461f;

    union { int i; float f; } u;
    u.f = x;
    u.i = magic - (u.i>>1);
    float y0 = u.f;

    float Bx = B * x;
    
    return y0 * ( A + Bx * y0 * y0 );
}

rsqrtf_cb_method1 has a relative error of 0.087714 %

the magic number I happened to use is different than Lomont or Carmack, but it's actually irrelevant - you can use anything in 0x5f375000 - 0x5f376000 and you will get the same final error (but you will get slightly different values for A and B). The point is that the bias from the magic number can be balanced by optimizing A or B very effectively. So you don't have to worry about it - just get the magic number vaguely close. (* see addendum)

To demonstrate that this iteration should not be repeated, let's change :


    return y0 * ( A + Bx * y0 * y0 );

to

    float y1 = y0 * ( A + Bx * y0 * y0 );
    return y1 * ( A2 + B2 * x * y1 * y1 );

now the errors for two steps are :

first step is cb_method1             : 0.087714 %

repeat same step ( A2 = A , B2 = B ) : 0.087716 %  (!!)

second step Newton (A2= 3/2, B2=-1/2): 0.000127 %

second step optimal                  : 0.000072 %  (A2 = 1.500000596f , B2 = -0.500000060f)

You should see obviously that the closer you get to the answer, the less optimizing helps, because the Newton iteration (which is based on Taylor series) becomes optimal as epsilon goes to zero.

Now before you get excited and go copy-pasting rsqrtf_cb_method1 around, let me stop you. As I said at the beginning, it's pointless because you have :


inline float rsqrtf_sse( float x )
{
    return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) );
}

with max relative error : 0.032585%

One Newton step after this takes you to 0.000029% error ; optimizing the iteration only gets you to 0.000026% so it's very pointless.

ADDENDUM :

Jan Kadlec (rrrola) posted a comment pointing to his prior work on the topic . He finds a better initial magic number and then followup floats by doing a more rigorous search. What I wrote at (*) about the magic number not mattering wasn't quite right. I was doing greedy searches in the optimization, and got stuck in a local minimum. You have to have quite a good non-greedy searcher to find his solution. The exact value of it doesn't matter much, but there is this much better solution if you make a big jump to another neighborhood :

Copy-pasting rrrola's version for reference , with 0.0652 % max relative error.


float inv_sqrt_rrrola(float x)
{
  union { float f; uint32 u; } y = {x};
  y.u = 0x5F1FFFF9ul - (y.u >> 1);
  return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f);
}

Also, something you can perhaps help me with : Pizer had done even earlier work on this topic and roughly found the right (right = Kadlec) constant by finding the magic number that minimizes (1 + e_max)/(1 + e_min). Doing that he gets 0x5F1F1412.

If we say ratio = approx/correct , then what Pizer is minimizing is ratio_max/ratio_min , which is the same as minimizing log(ratio_max) - log(ratio_min) . I can't reproduce Pizer's result and I'm not sure why that would be the thing you want to minimize. Intuitively it does make sense that what you care about is really *log* relative error, not relative error. That is, going +100% is clearly better than going -100% in a relative sense (this is sort of like the fact that doing -100% in the stock market is much worse than doing +100% is good, as I wrote long ago, we should really talk about log-return).

I guess one way to see it is if you think about e as a floating point number; what we really want is for the error of the approximation to show up as far to the right of the decimal as possible. The number of binary digits to the right of the decimal is the log of the error. But that would just mean minimizing log( MAX(e_max,-e_min) ) so something is missing in my understanding of this bit.


11-20-10 | Function approximation by iteration , Part 2

See part 1 .

Let's get to an example. We want y = sqrt(x) , but without actually doing a square root.

You can get an iteration for this in various ad-hoc ways. Let's see one real quick :


y = sqrt(x)

square it

y^2 = x

divide by y

y = x/y

add y

2y = y + x/y

y = 0.5*(y + x/y)

y_1 = 0.5*(y_0 + x/y_0)

and that's an iteration for sqrt. BTW in all these notes I'll be writing y_0 and y_1 as two steps, but I really mean y_n and y_n+1.

You can get this in another funny way :


y = sqrt(x)

y = sqrt( y * (x/y) )

y = geometric_mean( y , (x/y) )

y_1 = arithmetic_mean( y_0, (x/y_0) )

and you can see that it converges because if y is bigger than sqrt(x), then x/y is smaller than sqrt(x), so the mean must be closer to the answer.

But these are sort of ad-hoc weird ways of deriving the iteration. Let's do it in a more formal way which will be more useful in the general case. (BTW this is called the "Babylonian method for sqrt" and you can also get it trivially from Newton's method).

Say we have y_0 which is an approximation of sqrt(x) that we got some other way. It's close to the real answer y = sqrt(x) , so we write :


y = sqrt(x)

y^2 = y_0^2 * (1 + e)

or

y_0 = y / sqrt(1 + e)

e = "epsilon" is small

y_0^2 * (1 + e) = y^2 = x

1+e = x/y_0^2

e = x/y_0^2 - 1

the choice of how I've factored e out from (y/y_0) is pretty arbitrary, there are lots of choices about how you do that and all are valid. In this case the important thing is that e multiplied on y_0, not added (eg. I did not choose y^2 = y_0^2 + e).

So far this has all been exact. Now we can write :


y = sqrt(x) = sqrt( y_0^2 * (1 + e) ) = y_0 * sqrt( 1 + e )

exactly :

y = y_0 * sqrt( 1 + e)

approximately :

y ~= y_0 * approx( sqrt( 1 + e) )

so this is our iteration :

y_1 =  y_0 * approx( sqrt( 1 + e) )

The principle step in these kind of iterations is to take a twiddle-equals for y (an approximation of y) and make than an exact equals (assignment) for an iteration of y_n.

Now the only issue is how we approximate sqrt( 1 + e). First let's do it with a Taylor expansion :


approx( sqrt( 1 + e) ) = 1 + 0.5 * e + O(e^2)

y_1 = y_0 * ( 1 + 0.5 * e )

y_1 = y_0 * ( 1 + 0.5 * (x/y_0^2 - 1) )

y_1 = y_0 * ( 0.5 + 0.5 * x/y_0^2 )

y_1 = 0.5 * ( y_0 + x / y_0 )

which is the Babylonian method.

But as we discussed last time, we can do better. In particular, if we know the error bound of y_0, then we know the range of e, and we can use a better approximation of sqrt( 1 + e). Specifically if we use the IEEE float-int kind of method for finding y_0 :


float y_0(float x)
{
    union { int i; float f; } u;
    u.f = x;
    const int ieee_sqrt_bias = (1<<29) - (1<<22) - 301120;
    u.i = ieee_sqrt_bias + (u.i >> 1);
    return u.f;
}

Then the maximum relative error of y_0 is 3.53% That means (1 - 0.0353) < |y_0/y| < (1 + 0.0353)

(1 - 0.0353) < 1/sqrt( 1 + e) < (1 + 0.0353)
(1 - 0.0353)^2 < 1/( 1 + e) < (1 + 0.0353)^2
(1 - 0.0353)^-2 > ( 1 + e) > (1 + 0.0353)^-2
(1 - 0.0353)^-2 -1 > e > (1 + 0.0353)^-2 -1
0.07452232 > e > -0.06703023

we'll expand the range and just say |e| <= 0.075 ; so instead of just using a Taylor expansion that assumes e is small, we can minimize the maximum error over that range.

We do that by expanding in the basis {1} and {x} (if you like this is the orthogonal Legendre or Tchebychev basis, but that's kind of silly to say when it's two functions). To find the optimal coefficient you just do the integral dot product with sqrt(1+e) over the appropriate range. The result is :


sqrt(1+e) ~= 0.999765  + 0.037516 * ( e/0.075 )

sqrt(1+e) ~= 0.999765  + 0.5002133 * e

y_1 = y_0 * ( 0.999765  + 0.5002133 * e )

y_1 = y_0 * ( 0.999765  + 0.5002133 * (x/y_0^2 - 1) )

y_1 = y_0 * ( 0.4995517  + 0.5002133 * x/y_0^2 )

y_1 = 0.4995517 * y_0 + 0.5002133 * x/y_0

We'll call this the "legendre" step for sqrt. Note that this is no longer an iteration that can be repeated, like a Newton iteration - it is specific to a known range of y_0. This should only be used for the first step after getting y_0.

And what is the result ?


max relative error :

y_0 ieee trick : 3.5276%

y_1 babylonian : 0.0601%

y_1 legendre   : 0.0389%

a nice improvement!

Now that we understand the principles behind how this works, we can just get a bit more hacky and find the answers more directly.

We want to find an iteration for sqrt(x) , and we have some y_0. Let's use a physics "dimensional analysis" kind of argument. We pretend x has units of "Widgets". Then sqrt(x) has units of Widgets^1/2 , as does y_1. We only have x and y_0 to work with, so the only unitless thing we can write is


u = y_0^2/x

(or any power of u), and our iteration must be of the form :

y_1 = y_0 * f(u)

for some function f

now , f can be any function of u, but to minimize the difficulty of the math we specifically choose

f(u) = A * (1/u) + B + C * u

now the issue is just finding the A,B,C which minimize the error.

The Babylonian/Newton iteration is just A = B = 0.5 (C = 0)

Numerical optimization tells me the optimal C is near 0, and optimal AB are near A = B = 0.499850


max relative error :

y_0 ieee trick : 3.5276%

y_1 babylonian : 0.0601%

y_1 legendre   : 0.0389%

y_1 numerical  : 0.0301%

the result is half the error of the babylonian method, for no more work - just a better choice of constant (0.499850 instead of 0.5).


11-19-10 | Games without Strings

Quite a number of major video games have shipped with embarassingly large amounts of time spent in "strcmp" and embarassingly large amounts of memory spent on strings.

I'm of the opinion that games should not ship with strings. However, strings are very useful for development. How should we reconcile this? The basic strategy in all cases will be to replace the string with just an integer GUID or index. Then instead of strcmp you just do == test on the ints, and instead of storing lots of strings in memory, you just have ints to refer to other objects.

First of all, let's say that the premature optimization of making all your references into compact indices from the beginning is a mistake IMO. The main place that strings come up and where they are super useful is as the names of content. That is, textures, models, prefs, NPC types, weapon types, levels, attributes, etc. If I want to make an NPC it's nice to just be able to say, okay use the "ToughMarine" model and give him "AggressiveEnemy" behavior and "BigShotgun" weapon. Often those strings correspond to file names, either directly or indirectly, which is very handy because it lets you add new elements by just making new files, and it lets you find the content that a string corresponds to.

The bad premature optimization that I'm talking about is to work with "databases" of content right from the start rather than loose files. Then your references can be indices. So instead my NPC would use model 7, behavior 3, and weapon 41. This makes your references fast and compact right from the start, but the penalties to development process are just too severe. (some of the old game engines did this, but I believe it is now understood in the industry that this is very bad for source control and large teams and is being phased out).

So, we don't want that method, we want to work with strings during development, but ship without strings.

( some example of unnacceptable workflow in my opinion : when artists have to yell out "who has the texture name database checked out? I have to add a new string to it" , or when there's an error in the content hookup and the game prints out "error : npc 13 couldn't find model 9" (though, kudos for at least printing an error) )

One option is to have a "bake" step that converts all the loose content and string references into these packed databases and indexes. Basically it has to load up every piece of content, see what all the strings in the game are and convert them to an index, then order all the data by its index so you can find it. While this does work, it's pretty painful. Whole-game bake operations like this are pretty disastrous for development, because it means you can't test the real shipping game without doing some big time consuming process. I'm a big believer in being able to run at least portions of the game in their real shipping state all the time during development, not just at the very end. It makes it hard to just change one piece of content or add something and have.

Another option is to have a sort of "incremental baking" from a GUID server. I've seen this work in practice, so it is viable, but it's a little scary. Basically the idea is you keep a central string table that maps unique strings to indices (and vice versa). Each time you make a new piece of content or make a new string, you have to send a packet to the GUID server and get the index for that string. Now you can refer to your piece of content by that index, so you have compact references. While this does certainly work, relying on being able to communicate with the GUID server for development is a bit scary. Also if you ever accidentally get a bug in the GUID system you could corrupt a bunch of content. To deal with both of those issues, it would be nice to keep the original strings around in the content files during development as backup for the central repository.

The option we used for Oddworld Stranger was string hashing. In the shipping game, every 32 bit char * was replaced with a 32 bit integer hash of the char *. Using a hash makes the string -> index mapping deterministic and local, you don't have to talk to your neighbor to make sure you are getting a unique int. This method has lots of advantages and only one (rather large) disadvantage : the possibility of hash collisions. When you get a hash collision you wind up with a perplexing but amusing bug, like you try to put "SinisterHelmet" on a guy and instead you get "RubberChicken" attached to his head because those string hashes collided. During development you need to keep both the hash and the string around.

To handle collisions, we had a debug mode that would load up a level with the strings and the hashes and check if any strings had the same hashes. Note that you don't need to check collisions across the whole game, but only for content that can possibly be loaded at the same time. On Stranger I think we wound up with 2 collisions over the whole course of development. The solution to those collisions was simply to rename some content. eg. "FloppyHat" was colliding with "PurpleSparkle" , so we just renamed it to "PurpleSparkle2" and we were hash collision free. I definitely do not advocate the 32-bit hash method for large scale game development. It's okay on a small team where you can check things like that once in a while and handle it, but with big distributed teams and huge amounts of content it's not viable.

The simplest fix is to go to a 64-bit hash. Then the probability of collision becomes so small that you could just deal with it in the unlikely event that it happens. Note that in the shipping game you never actually generate these hashes, so the speed of the hash function is irrelevant; in the final game they are opaque GUIDs that are used to link pieces of content together.

With the hash+string method there's also an issue of how you store your content and what you do with the string in the shipping version of the game. What we did is just to store both inline in the data files (eg. 4 bytes of hash then the string bytes immediately following), and in the shipping game you just read the string in then immediately discard it. But if you want to be really super optimal and flat-load your data, then you would have to factor out the strings at some point. One option is to make all your data files in chunks, and make one of the chunks be "nonfinal" data, and put the hash -> string mapping in the nonfinal data chunk. Then you can "flat load" the rest of the data.

But none of these solutions is definitively awesome, so I'm curious what other people do or if there is a better solution.


ADDENDUM : I knew I'd seen this written about before, but forgot about it.

Mick West wrote about it for game developer but doesn't really address the difficult issues.

Ivo Beltchev describes CRCView - a VC extension to do GetStr() for CRC's. I wonder how well that works; I've always found the VC expression evaluator to stop working right when I need it most.

Mischief.mayhem puts it together. He suggests using an external SQL database for GUID->String mapping and collision resolution.

BTW I'm not sure why you bother with hashing if you are using a global database. Just make the first string be "1", the second string be "2", etc. But I'm not super convinced that the global database is an okay method.


11-19-10 | Hashes and Cache Tables

A thread at encode.ru has poked me into writing some things I've been perking on for a while.

That thread points to a hash function test at strchr.com :

Hash functions An empirical comparison - strchr.com

At first this test looks pretty cool, but on further examination it has a lot of flaws. For one thing, they're basically testing only english words. If what you care about is short english text, then maybe the results are good, but if you care about other things they are not. I do like the fact that they are testing the total time to compute the hash + the time to lookup. However, when you do that then the details of the hash table become crucial, and it seems they are using an externally-chained hash table which is a severe bias (I'd like to see tests using stlport std::hash_map, rdestl, khash, google's "dense hash", or the cblib hash_table, etc. - some standard reference hash implementations). You also need to talk about your hash fill ratio, whether you store the hash value, whether the data is interned, etc. These will affect which hash function is "best".

There are also major architectural issues. For example, the large multiplies of Murmur are not fast on all current platforms. Furthermore, on some platforms you can't do unaligned DWORD reads fast, and you may have to do endian swaps. These issues do not affect all hashes in the same way, so to compare hashes without considering these issues is a mistake.

Also, to do a fair comparison you really need to understand the details of each hash. Some are designed for use only in prime-size tables. Some should be collapsed by xor folding, others should not. You cannot simply plug a bunch of different hash functions into the same code and have a fair comparison.

But perhaps most crucially, the hashes are not optimized to the same amount. Some are unrolled, others are not. Some are reading dwords, some are reading bytes. This is not a fair comparison. Really the only way to do a perfect comparison is to optimize each hash function as much as possible, because hashes have a different degree of capacity for optimization. For example, Adler and Fletcher (which are checksums not hashes, BTW) can easily be made fully 16-byte parallel (see for example asmcommunity Adler32 ; there are also some very good Altivec versions floating around), most hashes cannot be directly parallelized the same way (they can only be "trivially" parallized in the sense of running multiple hashes at once in parallel on interleaved input streams).

(ADDENDUM : actually the biggest problem with the test is probably that the hash functions are called through function pointer, not inlined; this is a severe penalty for the cheaper hash functions; in particular with properly inlined hash functions I ran a similar test and found FNV to just destroy Murmur, which is the opposite of what they find)

Anyway, enough being a grouch, let's get to what I've been thinking about.

One piece of news : there's a (beta) update to Murmur2 called (unsurprisingly) Murmur3 :

MurmurHash3 information and brief performance results

Murmur3 improves the basic mixing a bit to fix some flaws in Murmur2. It also works on larger data elements, which is a mixed blessing. That will make throughput on large keys faster, but hurts performance on small keys. One of the nice things about Murmur2 was how simple it was.

And some links from the past for reference :

What I wrote previously about hash table lookup :

cbloom rants 10-17-08 - 1
cbloom rants 10-19-08 - 1
cbloom rants 10-21-08 - 4
cbloom rants 10-21-08 - 5

My final answer for hashing strings was a variant of FNV, hash value stored in the table, max 70% full, quadratic reprobing.

File hashing :

cbloom rants 08-21-10 - Adler32

My final answer for file hashing was Bob Jenkin's Lookup3, modified to be 4-way SIMD. Since Lookup3 inherently works on 12 bytes at a time, the 4-way simd makes it work on 48 bytes at a time, and I then unroll that 8 times so that I can work on 384 bytes at a time (which is crucially 3 whole 128 byte cache lines).

So anyway. When you're talking about hashes it's important to distinguish what you're using them for. There are lots of variations, but I think there are three primary categories :

1. Hash for hash table lookup (with full resolution, either by in-table probing or external chains).

In this case, collisions are only bad in that they cause you to take more CPU time to probe. You will never fail to find what you are looking up due to collisions. There's a very subtle trade off here between hash quality and the time it takes to compute the hash - the winner will depend intricately on the exact data you are hashing and what kind of hash table implementation you are using. I'm sure this is one of those scenarios where different authors will all claim a different hash function is "best" because they were testing in different scenarios.

2. File checksum/integrity/corruption/hacking testing.

In this case you want the hash function to change any time the file has been changed. There are then two subsets of this : catching accidental corruption and catching intentional corruption. To check the ability to catch accidental corruption, you want to make sure your hash is robust to common types of damage, such as file truncation, injection of zeros, single bit flips, etc. Catching intentional corruption protects you from attacks. We're not talking about cryptographic security, but a good hash should make it very difficult to construct alternative data which is both *valid* and has the same hash.

3. Hashes for "cache tables". This is the main thing I want to talk about.

A "cache table" is a hash table in which collisions are not resolved (or they are resolved only some finite number of times). So far as I know there's not a standard term for this, but we found this Google Code project :

cache-table - Project Hosting on Google Code

And I think "cache table" is a pretty good name for it. It's similar to CPU caches, but without a backing store. That is, a traditional "cache" is basically a "cache table", but then if the data is not found in the cache, you go and get it out of the slow location. We say "cache table" to make it clear that in this case there is no slow location - if it's not in the cache you don't find it.

Cache tables are very popular in data compression. To my knowledge they were first used popularly in LZRW. I then made heavy use of them for LZP (and PPMCB, which is very similar to the way modern compressors use them). They are now the basis of most modern compressors such as PAQ and related context mixers. Cache tables in traditional data compression usage never resize, you simply set your memory usage and that's how big the cache table is.

Usage of a cache table is something like this :


h = hash(key)

look up table[h]

(optionally) check if table[h] is right for key
  if so return its data

optionally :
reprobe, either by stepping h and/or by searching "ways"

# of reprobe steps is finite
if failed to find, return not in table

There are a few optional places. A crucial one is whether you do exact resolution or simply accept collisions. (in the old language of LZP, this is "context confirmation"). That is, for minimum memory use, table[h] might not store anything about key. That means when hashes collide, you can't tell, and you might get back an entry which does not correspond to your key.

Sometimes this is just fine. For example, in LZRW style string matchers, if you get back the wrong data you will simply not get a string match. In other cases you might want some amount of collision resolution, but only probabilistically. For example, PAQ stores only a few bits of key in the table, so when hash collisions occur it can detect them most of the time, but accepts a certain amount of false positives. This issue should always be considered as a trade off of memory usage - is it better to use the memory to resolve collisions more accurately, or just to have a larger table so that fewer collision occur?

The other issue is how you reprobe, and that is closely related to your insertion strategy. When you try to insert and find a collision, you have a variety of choices. You can try all the possible reprobes for that hash value and pick the one that is "best" to replace under some criteria (perhaps such as the last used, or simply the slot that's empty). If you're using "ways" you might just do round-robin insertion in the ways, which is a probabilistic replacement strategy.

Anyway, I think "cache tables" is an interesting topic which I haven't seen explored much in the literature. It's particularly interesting in compression, because when you get a false collision (eg. you retreive data from the table and think it applies to you when it doesn't), that doesn't ruin your compressor, it just makes you code that event with more bits than ideal. So you can measure the cost of collisions as bit excess, and you ideally want to tune collisions to occur for less important data. Hash function collisions in cache tables affect both speed and the quality of results - eg. more collisions means more dropped data - but you also usually care exactly *which* data is dropped.

The question of reprobes within a linear tables vs. ways (multiple independent linear tables) is interesting, and sometimes the optimal solution is to use both. For example, my "medium speed" string matcher for the RAD LZ77 uses two independent hash values (in the same table) and 2 ways. That is, there are two linear tables, and you look at two spots within each table.

I think that Cuckoo hashing for cache tables is a particularly interesting possibility. "Cuckoo cache tables" could do the normal Cuckoo thing (where you have two hashes for each key and you swap data around to make space for an insertion), but you set a maximum number of swaps on insertion (something like 4 or 8), and if you have to do the cuckoo swap step and hit the maximum number, you just stop and let some item fall out of the cache. You can also easily extend this by keeping track of age or quality or something as you do the Cuckoo, and choose to drop the worst item instead of simply dropping the last one in the chain. You don't have to worry about the pathological Cuckoo case of having a cycle you can't resolve - you simply drop an item in that case (in fact you don't even have to detect that case, you'll just go around the cycle up to your maximum finite number of steps).


11-18-10 | Bleh and TID2008

Bleh, I just spent the whole day trying to track down a bug in my MS-SSIM implementation, that turned out to be a total red herring. I was trying to test my MS-SSIM on the TID2008 database to confirm their results (they find that MS-SSIM is the best by far). (note that perceptual image papers currently have the nasty property that every author demonstrates that their method is the "best" , because they all use different testing methods and on different databases).

Anyway, I was getting totally wrong results, so I went over my MS-SSIM implementation with a fine toothed comb, checked everything against the original Matlab; I found a few deviations, but they were all irrelevant. The problem was I was running the files in the TID database in a different order than the TID test program wanted, so it was matching the wrong file to the wrong file.

As part of that, I downloaded all the implementations of MS-SSIM I could find. The best one is in Metrix Mux . Most of the others have some deviation from the original. For example, many people get the Gaussian window wrong (A Gaussian with sdev 1.5 is e^(- 0.5 * (x/1.5)^2 ) - people leave off the 0.5), others incorrectly apply the window at every pixel position (you should only apply it where the whole window is inside the image, not off any edge), another common flaw is to get the downsample wrong; the way authors do it is with a Daub9 lowpass tap and then *point* downsampling (they use :1:2:end matlab notation which is a point downsample). Anyway, a lot of these details are pretty random. Also of note : Metrix Mux uses rec601 luma and does not gamma correct.

The TID2008 perceptual distortion database is a lot better than the Live database, but it's still not great.

The problem with both of them is that the types of distortions applied is just too small of a subset. Both of them mainly just apply some random noise, and then they both apply JPEG and JPEG2000 distortions.

That's okay if you want a metric that is good at specifically judging the human evaluation of those types of distortions. But it's a big problem in general. It means that metrics which consider other factors are not given credit for their considerations.

For example, TID2008 contains no hue rotations, or images that have constant luma channels but visible detail in chroma. That means that metrics which only evaluate luma fidelity do quite well on TID2008. It has no images where just R or just G or just OpponentBY is modified, so you can't tell anything about the importance of different color channels to perception of error.

TID2008 has 25 images, which is too many really; you only need about 8. Too many of the TID images are the same, in the sense that they are photographs of natural scenes. You only need 2 or 3 of those to be a reasonable representative sample, since natural scene photos have amazingly consistent characteristics. What is needed are more distortion types.

Furthermore, TID has a bunch of distortion types that I believe are bogus; in particular all the "exotic" distortions, such as injecting huge solid color rectangles into the image, or changing the mean luminance. The vast majority of metrics do not handle this kind of distortion well, and TID scores unfairly penalize those metrics. The reason it's bogus is that I believe these types of distortions are irrelevant to what we are doing most of the time, which is measuring compressor artifacts. No compressor will ever make distortions like that.

And also on that thread, too many of the distortions are very large. Again many metrics only work well near the threshold of detection (that is, the distorted image almost looks the same as the original). That limitted function is actually okay, because that's the area that we really care about. The most interesting area of work is near threshold, because that is where we want to be making our lossless compressed data - you want it to be as small as possible, but still pretty close to visually unchanged. By having very huge distortions in your database, you give too many points to metrics that handle those huge distortions, and you penalize

Lastly, because the databases are all too small, any effort to tune to the databases is highly dubious. For example you could easily do something like the Netflix prize winners where you create an ensemble of experts - various image metrics that you combine by estimating how good each metric will be for the current query. But training that on these databases would surely just give you overfitting and not a good general purpose metric. (as a simpler example, MS-SSIM has tons of tweaky hacky bits, and I could easily optimize those to improve the scores on the databases, but that would be highly questionable).

Anyway, I think rather than checking scores on databases, there are two other ways to test metrics that are better :

1. Metric competition as in "Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities". Unfortunately for any kind of serious metrics, the gradient is hard to solve analytically, so it requires numerical optimization and this can be very time consuming. The basic method I would take is to take an image, try a few random steps, measure how they affect metrics M1 and M2, then form a linear combo of steps such that the affect on M2 is zero, while the affect on M1 is maximized. Obviously the metrics are not actually linear, but in the limit of tiny steps they are.

2. Metric RD optimization. Using a very flexible hypothetical image coder (*), do R/D optimization using the metric you wish to test. A very general brute for RD optimizer will dig out any oddities in the metric. You can test metrics against each other by making the image that this compressor will general for 1.0 bits per pixel (or whatever).

* = you shouldn't just use JPEG or something here, because then you are only exploring the space of how the metric rates DCT truncation errors. You want an image compressor that can make lots of weird error shapes, so you can see how the metric rates them. It does not have to be an actual competitive image coder, in fact I conjecture that for this use it would be best to have something that is *too* general. For example one idea is to use an over-complete basis transform coder, let it send coefficients for 64 DCT shapes, and also 64 Haar shapes and maybe some straight edges and linear ramps, etc. H264-Intra at least has the intra predictors, but maybe also in-frame motion vectors would add some more useful freedom.


11-16-10 | A review of some perceptual metrics

In general this appears to be an active area of research, with tons of good work coming out in just the last few years. It's finally getting the attention it deserves, thank god.

In general there are two major approaches to perceptual metrics : HVS masking/filtering methods , and structural/statistical methods.

The HVS methods are mainly still using something like a per pixel delta, but they are weighting or masking or remapping it to account for various perceptual effects. Largely they are considering the capabilities of the eye, as opposed to what the brain processes and takes from the image. This includes stuff like rod/cone spatial resolution, perceptually linear response to intensity differences, contrast masking, luma masking, relative contrast sensitivity, etc.

The original major HVS work was the JPEG standard itself with its quantization matrices that are based on CSF (contrast sensitivity function) measurements. After that Watson's DCTune and related work added masking and other effects. This has been extended over the years. Probably the strongest modern work is Lin's JND (just noticeable differences) stuff.

JND - Associate Professor Lin Weisi

The JND papers work out complicated models of visual masking to figure out what kind of deltas between images are imperceptible.

Obviously color spaces like CIELAB etc are part of the HVS image difference metric, as is the SCIELAB filter. (I guess CIEDE2000 is the newer version)

A lot of the HVS stuff doesn't really deal well with color - most of the old data is for luma only. The Fairchild/iCAM group is doing a lot of work on perception of color and mainly HVS based metrics :

iCAM An Image Appearance Model
Fairchild Color Publications & Presentations
garrett m. johnson

Garrett Johnson's Thesis is the best reference I have found for lots of data on the HVS with actual useful formulas and numbers.

Many of the newer works use a whole-image FFT in opponent color space so that they can apply a CSF (ala JPEG quantization matrices) and a rod/cone spatial resolution filter (ala SCIELAB).

There are lots of metrics made over the years that incorporate some amount of these HVS factors, for example PSNR-HVS-M :

Nikolay Ponomarenko homepage - PSNR-HVS-M download page

For video there are more HVS factors to consider such as temporal contrast sensitivity and motion masking.

There are also a lot of papers on contrast visibility and color appearance models from the HDR Tone Mapping research guys; in fact there are some good Siggraph courses such as Image Quality Metrics from Siggraph 2000 and Survey of Color for Computer Graphics . There were also a bunch of papers on selective raytracing that gave good details on perceptual metrics.

However, in general the HVS-centric methods are a bit out of favor and "structural methods" appear to be better. Again most of the structural methods work on luma only and ignore color.

The most well-known structure method is SSIM , and there have been many extensions of SSIM that purport to improve it, such as gradient-SSIM and region-pooled-SSIM or feature-type (edge,smooth,texture) pooled SSIM.

The LIVE team has a ton of good papers, with the best ones being on "structural" type metrics :

LIVE
LIVE-Publications Search Results

In particular, they have VIF (Visual Information Fidelity) which is a measure of human-perceptual mutual information. There are a number of modern works using metrics like this for image quality which appears to be a very promising line.

There are various other structural-type metrics, such as doing SVD to find singular vectors and values and so on. So far this work feels to early to be useful.

One interesting work on color I found is SHAME : (Spatial Hue Angle Metric) :

SHAME - Computational Color Imaging Second ... - Google Books

It's basically SCIELAB with the addition of a hue histogram. The hue histogram weights colors that occur more often more. eg. if your image has a big patch of red and you get the color wrong on it, that's visible, but if it only has a few scattered red pixels, then their exact color doesn't matter because relative to their surrounding they will seem "red" even if they are pretty far off.

Some other random links I thought were interesting :

JPEG Post Processing by reapplication of JPEG
NASA ADS Predicting image quality using a modular image difference model
Color FAQ - Frequently Asked Questions Color
[x264-devel] commit New AQ algorithm option (Jason Garrett-Glaser )
MetriX MuX Home Page


11-12-10 | Some notes on function approximation by iteration

People often blindly use Newton-Raphson method to iteratively improve function approximations. It's worth taking a moment to think about where that comes from and how we can do better.

First of all, iterative function approximation almost always comes from some kind of epsilon expansion (often Taylor series). You assume you are near the desired answer, you set the discrepancy to epsilon, then expand and drop higher powers of epsilon.

For example, Newton-Raphson iteration is this :


You wish to find a root r of f , eg. f(r) = 0

Assume I have x_0 which is close to r, eg, r= x_n + e

0 = f(r) = f(x_n + e) = f(x_n) + e * f'(x_n) + O(e^2)

so

e = -f(x_n)/f'(x_n) + O(e^2)

r = x_n - f(x_n)/f'(x_n) + O(e^2)

therefore

x_n+1 = x_n - f(x_n)/f'(x_n)

is closer to r than x_n

The way we use this to do function approximation is we set up a function f() such that the root of f is at the function we wish to approximate.

For example to find sqrt(A) you would set f(x) = x^2 - A ; this f has roots where x^2 = A , eg. x = sqrt(A).

Note that this method is only really useful when you have a function like sqrt where the function is expensive but its inverse is cheap. This is where the "art" comes in - there are any number of functions you can write which have the root at sqrt(A), but you need to write one that doesn't involve a sqrt; the whole point is to find an iteration that is much cheaper than the full quality function. For example to approximate reciproal sqrt you could also use f(x) = x^2 - 1/A or f(x) = 1/x^2 - A , both are valid but produce different iterations of different complexities.

So, now that we sort of see where Newton-Raphson iteration comes from, we can throw it away. What we really want is a more general idea - start with a point close to the solution, do an epsilon expansion, solve for epsilon, that gives us an iteration. In particular, Newton-Raphson is doing an expansion of the form r = x_n + e , but we could also do r = x_n * (1+e) or r = x_n/(1+e) or whatever. Sometimes these give better iterations (in terms of complexity vs. accuracy).

The next major general point to remember is that all of these methods are based on something like Taylor expansion, and while Taylor expansion is great for infinite series, we should all know that it is *not* optimal for finite series. That is, the first N terms of the Taylor expansion of a function are *not* the best N-term polynomial expansion of that function (except in the limit as epsilon goes to zero).

Over a finite interval, you could use Legendre Polynomials to find the actual best N-term approximation (or any other orthogonal polynomial basis). Since we are on computers here it's often easier to just do a brute force solve for the best coefficients.

This is well known to people who do function approximation (eg. any sane person doing a polynomial approximation of sin(x) over the range [0,pi] is not using a Taylor expansion) (see previous discussion of this 06-13-09 and 06-21-09 ). However, when people do iterative approximation this idea goes out the window for some reason.

In particular, after your initial approximation, you usually know something about the error, whether it is positive, what the bound is. Then you can expand in the error and find the optimal coefficients over that range. This gives you an iterative refinement step which is superior to Newton.

I'll do a followup post and work a specific example.


11-09-10 | New Technology Fucking Blows

I recently "upgraded" to an HDTV and a PS3 and have been watching some Blu Rays.

WTF. Why didn't you all tell me this is a fucking disaster ?

First of all, the fucking Blu Rays want to connect to the internet. Umm, no you don't. Second, they want to run fucking Java to make their menus even more fucking annoying than the damn DVD menus.

Okay, friggle frack, I hate it, but I can live with that.

Then I find YOU CAN'T FUCKING RESUME on BD-J discs !? WTF "Resume" is like the most basic feature that you need in a movie player, like I'm not allowed to fucking stop my movie and then start it again !? I love the "recommended fixes" from even the official Sony page, to just "skip ahead to where you stopped watching". Are you fucking kidding me? Somebody should be shot for this, it's like the engineers making a video format without audio. You add a bunch of menu features and shit and oh, by the way it doesn't play audio anymore. You can't break the basic fucking functionality when you add features!

Resume Play feature on Blu-ray titles - Blu-ray Forum
Recommend me a blu-ray player with resume after power off [Archive] - DVD Talk Forum
Hugely disappointed by my Blu-ray player. [Archive] - AnandTech Forums
Blu-ray Resume Playback Guide Useful Information Sony Asia Pacific

Okay, so all Blu Ray players have that problem, but the PS3 is actually even worse than most and won't even Resume from normal blu rays unless you fucking say a hail mary and cover it with garlic and press just the right buttons. And god forbid it do something like remember the location you stopped watching even after you power it down or remove the disc, like you know every fucking DVD player from 10 years ago does. ( PS3 missing resume function ).

(aside : that PS3 link is a pretty funny example of people defending their enemies :
"damn just remember the chapter before shutting down."
"quit being lazy and find the chapter where you left off and then press play on that"
"I just leave mine on 24/7 and if I stop a movie, it resumes where I left off until I eject it."
WTF R u serious? )

And of course every time you do watch a Blu Ray, you get the wonderful fucking joy of sitting through a bunch of mandatory fucking previews and other nonsense because they "forbid that operation". Yeah yeah you can usually just skip chapters to get to the menu, but you can't go direct to top menu on most discs, and you can't jump directly to playing.

How to Skip Movie Trailers [Archive] - Audioholics Home Theater Forums
Any way to skip previews on blu-ray - AVForums.com
Any tips to skip DVDBD trailers, warnings, and ads on PS3 - Home4Film
annoying mandatory previews - Blu-ray Forum

As for HDTV, it's mostly good. The fucking Bezel on my LG is glossier than greased shit, which is pretty damn awful. Also HDMI connectors are the fucking devil, they're like bad old USB connectors, they don't go in far enough and are wobbly. I have problems with the HDMI connector making good stable contact at the TV and at the PS3 and the computer.

Locking HDMI Cables and Connectors — Reviews and News from Audioholics
HDMI Connector - High Def Forum - Your High Definition Community & High Definition Resource
anyone else have problems with loose hdmi connection on tv - AVS Forum

Oh, and of course neither the TV nor the PS3 came with cables even though they're like fifty cents OEM price now. That's just fucking cheap ass annoying shit.

So far the only really huge win is that the Netflix app for the PS3 is way better than the one for the HTPC, and of course Netflix movies fucking start playing directly without a bunch of shit, and you can resume them. I would just use Netflix 100% of the time except for the fact that Brownstripe is not reliable, and there's nothing more annoying than being in the middle of a movie and your net connection decides to flake out (I wish I could fucking locally cache the whole movie before watching it to ensure full playback).


11-09-10 | Recent Links

English Russia - ridiculous great photos
sahelsounds - nice African music blog, weird/anthropological
www.middleforkgiants.com - Middle Fork Snoqualmie Valley used to be all logging land but is now an untapped wilderness, it's kind of fascinating to me
Why Japan finally got its foot off the brake The Japan Times Online - interesting, I only recently learned about the JDM 280 gentleman's agreement from watching Best Motoring
Syntopia » Blog Archive » Written Images - I adore Syntopia and everything that's coming out of "Generative Art" these days
Seattle RainWatch at UW , in particular I'm struck by the incredible rain-shadow at Sequim/Port Townsend (and to a lesser extent, the San Juans)
MIMIC-TPW - Pacific temperature map - you can often see the "Pineapple Express" shooting right at us on this map
Moonlight on the Avenue of Faith Book by Gina B. Nahai - Simon & Schuster - I really enjoy the Persian "Magical Realism" stuff, I'd like to find more.
Lighthouse Reef, Belize Albums Leisegang Liaisons Off Exploring - and the Blue Hole, wow. I spent a summer on Caye Caulker near there and didn't know this existed.
Jacek Yerka painter of the fantasy worlds - preview of the paintings - Video Game art needs to be more like this, or you know, pretty much anything interesting, unique, stimulating, whimsical
Incomprehensible Matlab Animations - self explanatory and awesome
Abandoned & Little-Known Airfields Eastern Washington State - interesting possibilities for exploring

SVM is one of those things you definitely don't want to write yourself. SVM-Light appears to be the winner at the moment for ease of use and good performance :

SVM-Light Support Vector Machine

Interesting story about the beginning of Facebook :
Mark Zuckerberg at Harvard The Truth Behind ‘The Social Network’ - The Daily Beast
How Mark Zuckerberg Hacked ConnectU

Various good technical links :

Xin Li of WVU - lots of goodies
VideoProcessing UCSD Index of publications
To Imaging, and Beyond! Small survey of Objective Image Quality metrics
The home of PackJPG
The FreeImage Project
The Blog of Ben Rockwood ZFS Prefetch Cache
Sylvain Durand papers index - denoising etc good stuff
Suffix Trees - white page - great
Peter's Functions for Computer Vision - the Phase guy
Tianyi Zhou's Research Blog - compressed sensing, SVM, etc
Near-regular Texture Synthesis
Jesper Larsson's research - Suffix Trees
Gaussian Derivatives


11-08-10 | 709 vs 601

The old CIE rec601 standard for Y was :

Y = 0.299 R + 0.587 G + 0.114 B

(as used in JPEG, MPEG, etc.)

The new CIE rec709 standard for Y is :


Y = 0.2126 R + 0.7152 G + 0.0722 B

Modern HDTV's and such are supposed to be built to the 709 standard, which means that the perceived brightness (ADD: I'm playing a little loose with terminology here, I should say the "output luminance" or something like that) should be that linear combo of RGB. eg. new TV's should have brighter greens and darker blues than old TV's. I have no idea if that's actually true in practice, when I go in to Best Buy all the TV's look different, so clearly they are not well standardized.

Those formulas are for *linear* light combination. In codec software we usually use them on *gamma-corrected* values, which makes them semi-bogus. eg. there's not much reason to prefer the 709 coefficients for your video encode even if you know you will be displaying on a 709 spec monitor (if you are sending it signal in RGB), because the monitor should be 709-spec on linear RGB but you are using that matrix on gamma-corrected RGB. I suppose if you hope to decode to a hardware-supported YUV format directly (eg. such as for Overlays on Windows), you would want to be using the same color space as the overlay surface wants, but that's a flaky hopeless hell not worth dealing with.

Two points :

1. This is *NOT* a formula for how important R, G and B are. For example if you have an error metric which you compute on each color channel, do not combine them like


Err_tot = 0.2126 Err_R + 0.7152 Err_G + 0.0722 Err_B

that linear combo is just the contribution to *brightness* , but the human eye sees more than just brightness. It's like you're modeling the eye as only rods and no cones.

(BTW while Blue has a very small contribution to brightness and furthermore only 2% of the cones perceive blue, they are however the most sensitive cones, and their signal is amplified in the brain, so our perception of blue intentinsity levels (eg quantizer step size) is in fact very good; the way that you can get away with killing blue is by giving it much lower spatial resolution )

So, for example, if you are using rec709 Y and then only measuring Y-PSNR , you are severely undercounting the contribution of non-green to error.

2. Now, for what we actually do (data compressors) there's a question of how the difference between 601 and 709 affects us. I'll assume that you are using the color conversion in something like JPEG or H264 which will subsample the chroma and probably has some fixed way to quantize the channels differently.

Choosing 709 or 601 for the conversion matrix are just special cases of doing arbitrary color matrices (such as a KLT), obviously it will depend on the content which one is best. It will also depend on the error metric that you use to define "best" - in particular if you measure error in Y-PSNR using the 709 definition of Y, then it will appear to you that the 709 spec color conversion is best. If you measure error in RGB, then the 601 spec will appear best (usually).

My random guess is that 601 will be better for coding most of the time because it doesn't have such a severe preference for green in the high fidelity non-chroma channel. Basically it's better hedged - when you are given a synthetic image that's all red-blue with no green at all, the 709 matrix is very unfortunate for coding, the 601 is not quite so bad.

Note that using the matrix which matches your reproduction device has basically nothing to do with which matrix is best for coding. The definition of "luminance" for your viewing device refers to *linear* RGB anyway, and in coding we are working on gamma-corrected RGB, so it's best just to think of the color matrix as part of the compressor.


11-08-10 | Name your techniques!

When you create an algorithm, give it a god damn name. I want to be able to refer to "PPMC" or "ARC" or "lookup2" or something and have a well defined name for a specific algorithm.

It's really fucking annoying when people write a paper and call it "A novel DCT compressor based on pattern segmentation" and then within the paper they refer to the new algorithm as "the method in this work". Then if you are referencing them you have to call it something and it's a fucking disaster. (to get back at them you should refer to it as "The work by the morons who didn't name their algorithm").

Furthermore, when you publish revisions, fucking use a new name or a version number or something. It sucks when people publish a series of papers where they introduce changes and the whole time keep referring to it as "VDP" . No no, it's "VDP" the first time and then after that it's "VDP-2" or "VDP-M" or whatever. Don't make me have to specify "the PPMD from the first paper in 2004".

I know a lot of authors are timid and think it's not cool to just name a technique after themselves, but that's what they really want, so they intentionally don't give their algorithm a name in the hope that the community will call it "The Wang Method" or whatever. Don't be a pussy, just name it after yourself if that's what you want, it's fine.

Of course another sin that I have been guilty of myself is to release source code for an algorithm ("LZP2" for example in my case) that's different from that paper, so you refer to "LZP2" and describe it in detail in the paper and then your release an exe called "LZP2" that's actually different. This is just constantly done and is very sloppy ; at minimum you should make it clear and call the source code "LZP2a" or something; almost every time I see source code released with a paper, it differs from the description in the paper. (of course I am still happy that there was at least source code released).


11-05-10 | Brief note on Perceptual Metric Mistakes

A lot of perceptual metrics have some weird ideas about things that are important. Some things that are not actually important :

1. Symmetry. A lot of researchers ask for the property M(X,Y) = M(Y,X). There's absolutely no reason to think this is a good property of a perceptual image metric. You may well want to treat the reference (original) differently than the distorted image in your metric.

Obviously for image database similarity metrics or something, you want symmetry, but that's not what we're dealing with here. Also you can of course trivially make a symmetric metric from an asymetric one by doing S(X,Y) = A(X,Y) + A(Y,X)

2. Offset/Scale Invariance. Again people demand M(X,Y) = M(S*X + O,S*Y + O) , that is invariance to constant scale and offset. There is no reason to think this is good (in fact there is reason to think this is bad). Pixel values have an absolute meaning and 2->4 is not the same as 4->6.

3. Rotation Invariance. A lot of people ask for rotation invariance, and are careful to normalize their wavelet basis or DCT or whatever so that they have rotation invariance. Again, bad idea. The human eye has different perception of detail in different directions - strongest horizontally, slightly weaker vertically, and much weaker diagonally. Also real images have by far the most correlation horizontally - people with cameras tend to hold them aligned to the horizon, not at completely random angles. Images are not rotationally invariant in the real world, so why should your metric be?

For example one simple thing they do in the PSNR-HVS paper is to do a one step wavelet transform first to make LL,LH,HL,HH bands, then you run whatever metric you want on the four bands, and then weight their contribution thusly :


UQI-HVS = 0.5779ULL  + 0.1582UHL  + 0.1707ULH  + 0.0932UHH

This kind of issue comes up in non-obvious ways as well. For example if you do a unitary DCT (or any other basis) transform on a block, the L2 norm is preserved, so you might think that L2 norm is theoretically more logical, however we have no reason to believe that human vision is invariant under basis-space rotations, therefore there is no reason to prefer the rotation-invariant norm.

4. Not quite in the same category, but another thing that I have my doubts about is use of things like visual attention focus. The basic idea is you can predict what part of the image/video people actually care about, and give more bits to those objects. This sounds nice in theory, but in practice you can't know where people will be looking, so if you look at an ensemble of observers who look at a variety of places, you might improve your average score, but you hurt the worst case (people who were looking at the area you took bits away from). In practice there's also the issue that when you take bits away from an unimportant area, you can create artifacts there, and then those artifacts become visually annoying and thus make that area more of a focus. eg. in old crappy MPEG when you take bits away from the background and give it to the human actors in the foreground, you can make huge blocky ringy awfulness in the background.


A major problem I see in paper after paper is that people don't define their terms clearly. One of biggest issues is "luma" , where people will just start writing equations with an "L" or a "Y" in it without defining exactly what that means. Is it light-linear (eg. CIE Y) ? Perceptually uniform "lightness" (CIE L)? What's the scale? eg. what does L=1 mean?


11-05-10 | Papers

I still use the old PSView once in a while for viewing postscripts (it's the only free standalone PS reader for Windows that I know of; standalone = not requiring Ghostscript nightmare). But it's janky and pretty much obsolete because there are now good fast online PS to PDF converters :

www.ps2pdf.com
neevia.com

ps2pdf is a bit faster but sometimes fails to handle files that neevia does handle.

Also randomly I just found this page of amazing docs by Gernot Hoffmann !!! It's mostly pedantic stuff, but the docs are unbelievably well made, with nice simple writing, tons of details, and FIGURES, omg omg, it's so nice to have actual figures drawing out the plots and geometries of things.

Some ofěthe great ones I've looked at so far :

cielab - tons of details, figures, gamut plots, etc.

planar projections - also see Euler angles, Gimbal lock, etc. lots of great intro-to-computer-graphics topics.

Another random interesting paper : A Chronology of Interpolation: From Ancient Astronomy to Modern Signal and Image Processing

And some good authors :

Fairchild - Color Publications & Presentations
Boulos - Graphics Publications


11-02-10 | Book Review : Deep Down Things

"Deep Down Things" is an attempt to explain the Standard Model of Physics for the lay person (eg. without mathematics). I think it's a disappointing and useless book. The proper way to explain physics without mathematics is by developing intuition and talking about geometry and making good analogies. Too often DDT simply takes the mathematical approach, but then leaves out the actual mathematical details that would make it clear.

There's constant name dropping and random historical information. Stories about the discovery of various mesons, or all the notes about nobel prizes that were won, are just pointless. And descriptions of the confusion before the Standard Model do not aid in understanding the modern theory at all.

There are some major diversions explaining old theories that aren't really necessary. For example he goes through a big section on the "Eightfold Way" which is a historical artifact that doesn't need to be taught in modern explanations of the Standard Model, as it introduces an apparent SU(3) symmetry of u,d,s which is not the real SU(3) symmetry of quarks and is thus unnecessarily confusing. One of the big mistakes is that he spends a lot of time talking about "isospin" as nuclear isospin (proton-neutron identity rotation invariance), but then changes and finally admits that is not the isospin of the standard model and introduces us to Weak isospin.

One of the big mistakes in the book is that he is constantly introducing not-quite-right simplified explanations of things which are really not any simpler, and wind up taking more text to explain the same thing.

He also randomly uses non-standard notation, such as calling the group of rotations in 3 dimensions R3 instead of SO(3) , and he weirdly refuses to explain things, such as using the term SU(2) but noting "the S stands for something technical that we don't need to bother with here". What? Just say it means length-preserving.

I think the explanation of group theory in the book is disappointing. I think lay people can easily understand a lot about groups, and more time should have been spent on this. Even concepts like building up macroscopic rotations by applying infinitesimal ones over and over could be explained.

Worst of all I think a great opportunity is missed. Feynman's QED is a brilliant shining star of explaining the quantum field theory of U(1) in a non-mathematical way, which actually builds up a physical intuition for the reader in a very non-intuitive topic. The author could have focused on the geometry of gauge fields and fiber bundles, and what an SU(2) gauge field is like intuitively. We have an intuitive for what electromagnetic forces are like because we can see them at macroscopic scales, but what would an SU(2) gauge force field be like at macroscopic scales?

If you want an intro to particular physics without mathematics, I can still only recommend "QED". If you want an intro to gauge fields, I recommends Baez's "Gauge Fields, Knots and Gravity".


11-02-10 | Video Games need better directors

1. No that kid you hired from Full Sail cannot write good dialogue. There's an absurd arrogance in video game makers that think the story and dialogue are sort of trivial secondary things, and some random game designer can take care of them. Lots of game developers/designers fancy themselves as screenwriters and think they have great ideas. No you don't. You need actual writers to write stories, you need to buy scripts, and you need to hire dialogue specialists to fill in the speech.

There's absolutely no reason why video game stories should be so much worse than movies or other media, but they are, almost uniformly. I mean really, it's like embarassing, cringe-worthy, it's literally like watching student films, because that's what it is - amateur.

2. Games are just full of ridiculous boring badly designed moments that I can't believe would ever pass an executive review. I can only conclude that pretty much all games ever have not been seriously reviewed. By that I mean :

When the game is near done, assemble the producers and executives and creative directors and so on, put them in a room together with somebody to play the game. The person playing should not be one of the game developers, ideally it's someone who's never touched the game before (maybe someone from the publisher's test unit). Now just sit there and watch the game play through from start to finish and review it.

I gaurantee you when the game goes through "okay now take this apple and walk to other side of the universe to deliver it to the apple pie maker" any sane reviewer would be like "oh my god this is ridiculous, cut this".

It's important that the playthrough is not done by a developer or someone familiar with the game, because they will be able to get through the tedious parts quickly. It's important to see someone unfamiliar with the controls struggle with the first jumping puzzle for 30 minutes. Then the reviewer can go "okay this is fucking ridiculous, you need better hints here, this needs to be easier at first, etc."

There are way too many moments in games where if you stop and think "why am I doing this?" , it's not because it's fun, it's not because it advances the story, it's just sheer busy work. I think way too much game design is done simply to fill levels and not with specific thinking about why you are doing it. eg. some junior designer gets a directive to fill out a level, and he puts in some random side quest, drops down people to fight. So then the player is fighting the same type of enemy for the upteenth time and asks "why the fuck am I doing this over and over?".

I think the comparison to movies, while not exactly analogous, is useful. A 2 hour movie is a bunch of little scenes, and a good director has gone over every single scene and where it fits in that time and how long it is, and has a reason for every scene - it develops mood or a character or advances the plot or whatever. If it's boring and pointless, it just gets cut. A game is the same thing. In a 10-20 hour game, there's no reason it should be full of boring pointless play moments, a good director should be able to go over a whole playthrough experience and ask "why is this battle here?" if it's for no reason, you should cut it (or fix it). It's very rare to see anyone in video game production who thinks of the player's time as a valuable resource that they shouldn't just be wasting on moments that aren't that great.

One of the big problems with games is inconsistency. Even in great games, there is usually some portion that's just awful, tedious or frustrating. The directors need to be more aggressive about just cutting these stinker parts. Of course this problem is much worse in big RPG's or "open world" games, where some of the side quests may be quite fun and well developed and I'd like to do them, but others are really terrible.

Bleh, I really shouldn't play games, they just make me really angry about how shitty they are.


10-30-10 | Detail Preservation in Images

I've been thinking about how to do a simple measure of "detail" in images that could be used as a metric for lossy compressors.

First of all, one simple think to ask is what is the "sharpness" of the image. One way to measure this is to compare the image with a blurred version (lowpassed if you prefer) version of itself. Basically if an image is already blurry (eg. smooth ramps) and you run a blur filter on it, it doesn't change much, but if it was very sharp (eg. black and white pixel checkerboard), it changes a lot. See for example this page on MTF with nice pictures.

One nice way to measure this is the ratio of the highpass energy over the lowpass energy. To motivate that what we are basically doing is :


I = original image
L = lowpass of I

Sharpness = Energy[I] / Energy[L]

H = highpass = I - L
I = L + H

Energy[I] = Energy[L] + Energy[H]

Sharpness = 1 + Energy[H] / Energy[L]

where "Energy" might be something like transform to Fourier domain and sum the power spectrum. Now, this Sharpness has an implicit cutoff frequency f that is the parameter of the lowpass. So it's really S(f) and we could scan f around and make a chart of the sharpness at various frequencies. To measure preservation of detail, you want to compare S(f) at all f's.

Now we'd like to have something like this that is more discrete and also localized. We want to ask if the detail at a specific spot is preserved.

A natural idea is the (abs or square) sum of laplacian filters. Something like :


In a local neighborhood of I :
Energy = L1 or L2 sum of Laplacian filters on I
L = blur of I

Sharpness = Energy[I] / Energy[L]

Instead of scanning the lowpass cutoff around, we just picked some single blur amount, but then we can do this in a multi-scale way. Let I0 = original image, I1 = blur I0, I2 = blur I1, etc. , then S0 = E[I0]/E[I1], S1 = E[I1]/E[I2], etc.. To measure preservation of detail at various scales, we compare S0,S1,S2.. from each image to S0,S2,S3.. of the other image (on each local neighborhood). That is, we require the detail level is preserved in that area in the same frequency band.

That is, we make a Gaussian pyramid of images that are blurred more and more, and then we take the energy in each level vs the energy in the parent.

But the laplacian is just the delta of each level from its parent (roughly), something like I0 - I1. So we can just make these delta images, D0 = I0 - I1, D1 = I1 - I2, and then S0 = |D0|/|D1| (just magnitudes, not "energy" measures).

By now the similarity to wavelets should be obvious. The fine detail images are just the high pass parts of the wavelet. So really all we're doing is looking at the L1 or L2 sum of coefficients in each band pass of a wavelet and comparing to the sum in the parent.

But wavelets also suggest something more that we could have done from the beginning - instead of a symmetric lowpass/highpass we can do seperate ones for horizontal, vertical, and diagonal. This tells us not just the amount of energy but a bit more about its shape. So instead of just a sharpness Sn we could measure Sn_H,Sn_V and Sn_D using a wavelet. This would be like using a horizontal laplacian [-1, 2, -1], a vertical, and an "X" shaped diagonal one.

And wavelets suggest something more - we could just use block transforms to do the same thing. An 8x8 Haar is a wavelet on a local chunk (and an 8x8 DCT has "wavelet structure" too). In particular you can arrange it into frequency bands like so :


01223333
11223333
22223333
22223333
33333333
33333333
33333333
33333333

and then take the L1 or L2 sum in each region and ask for preservation.

The similarity to x264's SATD energy metric is obvious. They use Haar and take the L1 sum of the energy in all the frequency bands to measure the total energy in the block. But we can be a lot more specific. In fact it suggests a whole "multi-accuracy" kind of delta.


Do 8x8 Haar or DCT.
Compare 8x8 blocks A and B.

Add terms :

1. each of the 64 coefficients should be the same :

 += |A_ij - B_ij|

2. the sum of each savelet band should be the same, that is
if you use the diagram above for groups 0-3, then within each group
there is H,V, and D, add :

    S(g,H/V/D) = Sum{in group g, H,V or D} 
    
 += | S(A,g,H) - S(B,g,H) |
 += | S(A,g,V) - S(B,g,V) |
 += | S(A,g,D) - S(B,g,D) |

  for g in 0-3

3. ignore the H,V,D and do the same thing just for the frequency
 subband sums :

 += | S(A,g) - S(B,g) |

4. ignore the frequency subbands and do the sum of all the coefficients :

 += | S(A) - S(B) |

These are error terms that go from fine to coarse. This last one (#4) is the most coarse and is the "SATD". Adding the multiple terms together means that if we have errors that screw up the highest precision test (#1) but preserve the other measures, we prefer that kind of error. eg. we prefer the energy to move somewhere nearby in frequency space rather than just disappear.

Now obviously if your coder works on something like 8x8 blocks then you don't want to run a test metric that is also 8x8 block based, certainly not if it's aligned with the coding blocks. You could run 8x8 test metric blocks that are offset by 4 so they straddle neighbors, or you could do 16x16 test blocks centered on each 8x8 code block, or you could do 8x8 test blocks but do one for every pixel instead of just at aligned locations.


10-28-10 | Game Review : Fable 2

I beat Fable 2 a while ago and thought it was pretty rubbish (but amusing enough to play through to the end of the main story line). The whole time I was playing I was thinking "man this is rubbish" but I also didn't stop playing, because it was so ridiculously easy and mostly non-frustrating that it never gave me a point where I threw down the controller in disugst and stopped playing.

(the closest it got to making me quit was when I was in the spire and had basically a 30 minute non-interactive sequence where they make you press A every 5 minutes and pretend you're playing, but it's all forced and canned. I hate this trend in games; I think it comes from Valve and their "cinematics while the player still has control" which everyone seems to like so much; I fucking hate it, there's no point in giving me control and saying "walk over to the reactor" when that's my only choice, just make it a fucking non-interactive sequence and walk me over to the reactor for me, thank you. That way I can at least set down my controller and go to the bathroom or get a drink, this fucking semi-interactive canned sequence thing is real shit. If it's a fucking cinematic, make it a fucking cinematic, and you know what, fucking pre-render it too, I don't want god damn low poly in game character models with shitty lip sync, if you're showing me a movie, show me something good)

Games like Fable 2 are really tragic to me, because I thought the art was pretty cool and some of the play mechanics were okay (or could have been okay with a bit of better guidance), but the design and direction was just so completely retarded that it spoils the game.

I like the game world, sort of semi-realistic recent past but with magic, and very British, but I don't find all the stealing from pop culture very amusing (Star Wars, Pirates of the Caribbean, etc.). Make up your own damn fantasy world, or use historical references. I like the fact that it's colorful and playful and cheery and whimsical and all that, it's a nice break from all the horrible gray and "gritty" and grim and gruesome.

There's all this work on the expression system and the town interactions, but it actually does almost nothing for the game play, it's not actually integrated into the story or anything like that, so it's just pointless. Obviously the way your character changes over time is a complete gimmick and completely irrelevant. Like it affects the way townspeople interact with you, but you always have such a huge excess of money that it doesn't matter if they give you good prices or not. It's classic pot-smoking game "director" style designed where they have a bunch of "cool ideas" but don't actually think about how they will be used in game balance and play dynamics.

It's ridiculously badly balanced. They use the classic "I'm a bad designer" RPG trick of making each step up the experience ladder like 10X more expensive than the last. This is a standard crutch and it just ruins RPGs, but it's the only way for designers to fix broken games at the last minute. Let me do an aside on this for a moment because it's such a general problem.

The classic problem with RPGs which makes them difficult to develop is that players have so much freedom in how they develop their character, what side quests they do, etc. that when they reach story point X in the game, two different players could have just massively different development. This makes it almost impossible to tweak for difficulty statically. There are a few good solutions : 1. dynamic difficulty based on player development, 2. just a massive amount of side quests so that if a player is insufficiently developed they are sent off to buff up in side quests. But most shitty developers use the "I'm a bad designer" solution - make each level of advancement so big that it makes all previous advancement irrelevant, and then give out huge gold/exp rewards along the main quest. This means that player A who did a bunch of side quests and highly developed his character is only slightly powerful than player B who was a moron at chosing upgrades. You can tell that a designer is doing this to you when the next step up of a weapon is like 2X as good as the previous weapon (eg. you go from 10 damage to 20 damage, as opposed to like 10 to 12), also when the exp for a level is just massive, it makes it irrelevant whether you got the exp from side quests - eg. in the beginning of the game you're fighting mobs that give you 50 exp, suddenly you start meeting guys who gives you 500. It makes it pretty irrelevant whether you killed a bunch of extra early mobs or not. It's a way of wiping out the past, and as soon as you detect this in an RPG you know that you can just stop doing any side quests or worrying about your purchases.

The spells are terribly designed; the vast majority of them are completely identical (all the damage spells), but then Raise Dead is absurdly out of balance. All you need is level 1 raise dead and you can beat the game easily with no other magic.

The main quest is really short, I guess there are a lot of side quests but they just feel so pointless because the rewards are unnecessary. In a proper RPG the main quest should feel really difficult if you don't do any side quests, and the side quests should feel like they give you something useful that makes the main quest easier.

Generally it's just way way too easy. You can beat the game by just mashing X for all the combat, so there's no need to ever bother to learn any of the combo moves or strategy.

The controls are also just horribly broken; obviously the spell selector is just super rubbish, but the worst sin is that they massively miss button presses. This was the game that pushed me over the edge and made me write the game controls rant . I was constantly holding B to cast spells or holding A to run, and my guy's not actually doing it, and I'm like "friggling fracking WTF" and have to let go of the button and press it again to make them register it.


10-28-10 | Game Review : Uncharted 1

After about two hours of jumping around on rocks and running through jungles and shooting brown people in the face, running around picking up their ammo, it all grows rather repetetive. Climb around, shoot brown people, climb around, bleh, this again?

The one sentence Hollywood style pitch would be "it's Tomb Raider, but instead of playing as a scantily clad hot chick, you play as a smug douchebag". Brilliant! Green light that concept!

There's some mildly bad game design in that the world is not at all a "simulation", but they also don't make clear the gameyness - like there are tons of things in the world that you should be able to interact with that you can't, but then they also don't make it clear at all what things you *can* interact with, and they add new ones all the time to make the "puzzles" (which are not so much puzzles as stand around mystified until the narrator gives you hints, or randomly walk around until you trigger the cinematic, or randomly press triangle until you find the interactive piece that looks just like all the non-interactive pieces).

The combat is not very well designed, you feel like you don't really have choices or much cleverness you can use in dealing with enemies, and it's not very varied. It makes me feel like the difficulty comes from me fighting the controls, not the enemies. On the plus side, the basic platforming/climbing controls are really good, it's the first game I've ever played where I can do cliff climbing and it just does what I want it to, I don't feel like I'm fighting the controls all the time (as in most ledge-hangers like Ico, PoP, etc).

There are some nice kind of cinematic moments where you transition in and out of cut scenes nicely, though it all feels a bit too "Dragon's Lair" , like I'm just pressing A to watch a movie, and the opportunity for beauty is ruined by the monotony of the environments.

But the real problem is that Nathan Drake is a huge douchebag. He's obviously modeled after J Rubin (real life douchebag extraordinaire), and I just don't want to play as a douchebag. It's not my escapist character roleplaying fantasy to be a self-satisfied smug wisecracker. Any time it went into dialog sequences I'd just be like "groan, eye roll, I hate you".


10-27-10 | Image Comparison : JPEG-XR

I'm testing JPEG-XR (aka HD Photo, aka PTC, aka WPF or WIC codec) using the maximum-size chunk width option and with the optional arithmetic coder (instead of the faster RLE/Rice type coder), both of which are supposed to give me maximimum quality/compression. I let it do its color conversion (I believe they use the lossless YCoCg).

On the PDI 1200 image :

On the plus side, the RMSE chart reveals that their color conversion & such code is not just broken the way it is in so many coders.

On the minus side, the SSIM SCIELAB score is just *abysmal* at the crucial area of logbpp = 0 , even worse than Hipix, and in fact much worse than good old JPEG-Huff. The bad perceptual score is confirmed by personal evaluation - JPEG-XR seems to be just the worst of both worlds, it is *both* ringy like JPEG and blurry like JPEG-2000, plus it has a new type of artifact, a kind of blobby ringing (if you are familiar with lapped transforms you have seen these before).

It's just by far the worst thing I've seen yet and it's not even close. Here's the image : PDI JPEG-XR 116,818 bytes

The only thing going for JPEG-XR is that they standardized adding alpha channels, color spaces and HDR. Of course as others have pointed out you could have just added that to JPEG.

BTW the JPEG XR Wikipedia Page reads like an advertisement and should be edited. In particular "JPEG XR file format supports higher compression ratios in comparison to JPEG for encoding an image with equivalent quality." is manifestly not true.


10-26-10 | Image Comparison : Hipix vs PDI

If you actually want to make a fair test vs. JPEG :

1. Don't test JPEG at quality levels below 40, it doesn't work.

2. Don't compare to "JPEG" when you use a bad encoder and use basic huff-only. At least use IJG and use -optimize and -progressive.

3. Don't test on a 2400 line image when you will be viewing it on a 1200 line monitor. Always view your tests at native pixel res, that's what the compressors are designed for.

So anyway, I found a "PDI-Target" image that's 2297 x 3600 in a jpeg ( here among other places ). I scaled it to 766x1200 so it would fit on my monitor. I ran Hipix "Good" to set the target file size - it used 117050 bytes, which 1.019 bpp , which is a reasonable target for compression. Then I ran JPEG and Kakadu to try to get the same file sizes.

Here are the images and you can look at them with your own eyes :

PDI 766 x 1200 original
JPEG Arith , 116818 bytes
JPEG PackJPG , 116457 bytes
Kakadu JPEG 2000 , 117120 bytes
Hipix , 117050 bytes

Note : do NOT zoom in when comparing them! Also, it's easier to see differences in A/B toggle tests, so before you post a comment, download the above and do A/B toggle testing with something like ACDSee in full screen mode please.

(BTW I'm using PackJPG now instead of PAQ for the "modern entropy backend of JPEG" ; PackJPG is very good, it's fast, and it also doesn't have the bug that PAQ has for some small size files; it usually compresses slightly larger than PAQ, but pretty close; also I've switched JPEG-Huff to progressive as it helps slightly (doesn't help JPEG-ari or JPEG-pack))

My subjective conclusion :

Overall I think Hipix is the worst quality. It's the only one that badly screws up parts of the faces, messes up some large-scale DC, and just generally loses tons of detail. JPEG preserves detail and sharpness way better than any other, and is only really bad in one way - ringing artifacts, it has lots of ringing artifacts. Kakadku is amazingly free of ringing (some early JPEG2000 coders suffered from bad ringing), but it also just blurs the hell out of the image. If you look at the Kakadu output without comparing to the original, it looks pretty nice and artifact free, but when compared to the original it looks just like a gaussian blur has been run on the whole image.

Basically Kakadu has the least visually annoying "artifacts" , but at the cost of pretty severe blurring and general chunky blobby look everywhere. JPEG is great except for ringing articacts. Hipix is somewhere between the two (it both blurs and rings) but is just not a good middle ground, it's worse than an average of the two.

Some portions. The order in these images is :

[ original , hipix, kakadu, jpeg pack, jpeg ari ]

Fruit :

Bad chunkiness in the hipix image. Kakadu also has nasty edges on the apple. JPEG looks like the winner, despite some ringing around the lemon.

Leather satchel thingy :

Hipix and Kakadu both completely destroy the detail of the leather texture, lots of blurring.

Black baby's hair :

This one might be clearest win for JPEG. Excellent preservation of the detail in both JPEGs. Kakadu and Hipix both blur the bang wisps to hell. Hipix also creates a bad overall change of color and brightness, this is easist to say by toggling the original vs the hipix version.

Sunflower :

Note how blurry Kakadu is, especially the nasty chunky blurs on the lower stem area and the curve of the leaf on the right. Some bad ringing in JPEG around the stem and leaves.

Gear and circuit board :

Hipix and Kakadu again just toss out detail on the gear like crazy. Kakadu blurs the circuit board to all hell. The JPEGs actually add detail to the circuit board that shouldn't be there by ringing ;)

Hand and corn :

Hipix stands out here by completely screwing up the back of the hand, throwing away all detail and changing overall luma and adding weird chunkies. The JPEGs as usual do great with detail, the back of the hand is best on the JPEGS, but the lower edge and the fingers show some bad ringing.

CDs :

Again Hipix stands out as the only one that makes the rainbow patterns all chunky. Kakadu does okay on the interior rainbows but ruins the the edges of the left CD with blurry chunks. The JPEG does well except for some ringing on the inside circular edge of the CD.

Robots :

JPEG ringing is really bad on these, notice the black disease all over the robots body, and chroma distortion on the robot's left hand. Hipix makes the diagonal edge in the lower left all chunky and has a little ringing. Kakadu is probably best here.

Color Boxes :

JPEG is the only one that does really badly on these, creating ringing ghosts in the colors from the black bars. Hipix does very well on this type of "graphic arts" material (just as WebP does BTW), so if you are doing graphic-design type images it might be a win there (though I'm guessing x264 probably does that better, or you know, you could just use PNG). ( Color boxes shows [ original , hipix, kakadu, jpeg-xr, jpeg pack, jpeg ari ] )

Some charts for good measure :

Kakadu is by far the best numeric performer. Its one big fault is making everything blurry. Since our perceptual metric so far does not have any measure of detail preservation, Kakadu gets away with it (SSIM doesn't do much for us here).

You can really see the way JPEG works from these test sets. If you take any of them and zoom up a lot, the JPEG just looks horrible. But at correct pixel size, they look great. This is because JPEG is intentionally allowing errors that are just under the threshold of visibility.

In normal viewing conditions, JPEG is just great. One usage in which it is not great is for video game textures, because those often get sheared, zoomed, colored, etc. which ruins the JPEG perceptual model, which means they may have much larger visible artifacts than other compressors.

What are some valid complaints about JPEG ?

1. Yes there are a lot of bad encoders out there and the average JPEG that's out on the net is probably pretty far from optimal. In the WebP recompression project, you could easily replace that with just re-jpegging the JPEGs. (this includes people using grossly wrong quality settings, or not down-scaling images that will be shown very small on the web page).

2. It falls apart at very low quality. If for some reason you really need super low bit rates, JPEG is not for you. (However, the common test that people do of very large images at very low bit rates is not a valid test, nor is cranking down the quality to "see the difference").

3. JPEG needs to be viewed at native res with the original pixel intensities. The whole way it works is based on the human-optical model, so if your image will be stretched or shown in some weird way, JPEG is not for you.

4. It does create a lot of ringing. This is sort of an inherent trade off in signal processing - when you represent a signal with a truncated basis set, you can either get smoothing or ringing. JPEG is way towards the choice of ringing, not smoothing, it might be slightly more ideal to be able to get somewhere in between.


More :

01/2010 to 10/2010
01/2009 to 12/2009
10/2008 to 01/2009
08/2008 to 10/2008
03/2008 to 08/2008
11/2007 to 03/2008
07/2006 to 11/2007
12/2005 to 07/2006
06/2005 to 12/2005
01/1999 to 06/2005

back to cbloom.com