Friday, May 11, 2007

How Would Google Render?

Here's another interesting post from Tim O'Reilly: "What Would Google Do"? Render farm management isn't a "Live" application (yet), but there are still plenty of lessons to be taken from Google, or Amazon. (Amazoogle!)

In my last post, I held Amazon up as the only real contender for an online compute/render service; Google would be the other obvious choice, though it doesn't offer a general computing service (yet).

But it's the history-tracking possibilities that interest me, more than the "Live" aspect of a web service. We already use a MySQL database behind Qube! - which lets you track the history of everything that goes through the farm. This kind of history is extremely useful for figuring out future bids and purchases: what was our peak/average utilization? How many iterations of each shot are we doing? What's our average render time? In short - where are we spending our time and money, and where are the bottlenecks?

And there's still more room for improvement. Optimal scheduling is a difficult problem, that can be made easier when you know more about the items in the queue: in particular, how long will they take to run? (Shortest job first is an easy/obvious optimization, but only if you know which jobs are the shortest.) But digital media is extremely iterative - even though the parameters of the shot will change between iterations, there's a lot of predictive information that can be passed from job to job. Texture references can be cached on the local disks of the render nodes - and their presence can be used as a weighting factor to favor jobs that can use them. "Hotspots" on central storage can be "cooled" by favoring jobs that don't draw from those areas - lots of possibilites.

The data is relatively easy to collect, and not overly difficult to incorporate into the decision-making process; the real trick is being able to build an interface that's simple enough to be generally useful. Queuing systems need to be easier to use, and easier to set up - and advanced functionality can't come at the expense of that useability...