Skip to content

Reinventing the Wheel: How to Average Numbers

Other the past week I’ve been writing some code for work that has to happen periodically and then re-schedule itself so it can happen again. We never want two of this process running, and we want the process to automagically expand its time window if it takes longer than we think it should, because arbitrary magic numbers are a Bad Thing™.

So I keep track of each time we run and keep an average of how long the job takes every time. But I don’t want to keep around every run time, so all I do is keep around the average, and how many times the job has actually run. From those two numbers plus our current run time I can approximate a new average and use that to schedule the next run. I came up with a nice little algorithm that pushes the average around whenever a new number comes in. It works great:

def find_new_avg(old_avg, old_count, new_number)
  weight = (new_number.to_f - old_avg.to_f)/(old_count+1)
  old_avg + weight

But in my tests, I noticed something; this doesn’t just approximate the average. It actually calculates the average. It’s right all the time.

I did few quick Google searches and checked Wikipedia couldn’t find anything. I looked at it again. I pared down the code to the above (it used to be a bit more complicated). I tried more tests. It still worked.

I asked my friend Nick if he had ever heard of this algorithm. He hadn’t. He pushed some numbers at it and decided it did work. We did a video conference over Skype and worked out the algebra, and we proved that it did work. I said: “It just seems insane that I could have invented–by accident– a new way to average numbers that uses less space than any other way I’ve ever seen.”

I searched online again, and finally found it. It’s from The Art of Computer Science, Volume 2, by Donald Knuth.

Knowing that I’m not the first one to discover this algorithm restores my faith in how much research has been done in the past (although my faith in Google’s capabilities was somewhat shaken), but the fact remains that, not knowing about it beforehand, I discovered a reliable algorithm that does what I need in a fraction of the space it could take. And that makes me happy, even if I did reinvent the wheel.