‘Datewise’ AND’ing
For an application I’m working on, I needed to figure out the average time of day something occurred.1
“Simple”, you say. “Famous last words”, I reply.
To understand the issue, you need to understand that most operating systems store time as the number of seconds since a reference date.2 More specifically, on the platform where I spend most of my time (the Mac and iPhone OSes), time is stored as an offset since “the first instant of 1 January 2001, GMT”.
So, since a time is just a big number, the most naive approach to averaging time is to add all of those times together and divide by the number of times you added together. Simple math. It’s how you do averages. But this wouldn’t be a blog post if that was the end of the story, now would it? Just like the hero (for the most part) can’t die at the beginning of the movie, this blog post can’t be over now, can it?
No, it isn’t.
So, from a technical standpoint, the “naive approach” is absolutely valid. It will give you the average time something happened, mathematically. So let’s say my data set contained 4/27/10 8:00AM and 4/28/10 8:00AM. For our simple data set, the expected average is 8:00AM, right? Well, actually, the average time for those two values is drumroll 8:00PM. Wha wha wha!?
Yeah, that’s what I thought, too.
What’s actually happening is your mind is playing a trick on you. This is a simple problem for you to figure out because you inherently filter out the “noise”. In this case, there’s a lot of noise in the data. For instance, the month, day and year that the event occurred. I’m leaving out the time zone, etc because that’s merely a projection placed ON the data, it’s not a part of the data, per se.3 Or, since we think of time differently than a computer, it’s an easier problem for us to solve. I haven’t really figured out which is the cause of the distortion; perhaps it’s a combination of both.
The base issue here is that the data contains information that’s irrelevant to the solution we want to attain. Specifically, the month, day and year the event occurred. In this instance, we only care at what time of the day the event occurred. And that’s when I came up with the concept of ‘Datewise’ AND’ing. The concept is very similar to bitwise AND’ing. For those non-CS majors reading this, bitwise AND’ing is a way to cover up parts of data you aren’t interested in. Wikipedia likened it to applying masking tape to data, so you only see what you’re interested in.
In order to figure out the average time of day an event occurred over a series of day, we have to realize that we have to mask the insignificant or irrelevant data. In this case, we need to mask the month, day, and year. The way to do that depends on the platform you are working in but the concept is the same. Establish a baseline day. In my project, I chose 1/1/2001. This date can be ANY date. I chose this date because it’s the reference date Mac and iPhone platforms use. The key is that it’s consistent for each day. Project your data so that the interesting bits are “baselined” to this baseline day. In Cocoa, you would do something along the lines of this:
NSDateComponents *hourMinuteComponents = [[NSCalendar currentCalendar] components:NSHourCalendarUnit | NSMinuteCalendarUnit fromDate:[event time]]; [hourMinuteComponents setMonth:1]; [hourMinuteComponents setYear:2001]; [hourMinuteComponents setDay:1]; NSDate *normalizedTime = [[NSCalendar currentCalendar] dateFromComponents:hourMinuteComponents];
In the above code, if my event’s time was 4/27/10 8:00AM, the resulting normalized time would be 1/1/2001 8:00AM or 46800 the number of seconds that occur from 1/1/2001 12:00:00AM and 1/1/2001 8:00:00AM. Go ahead.. do the math.. it’s 8 hours, though, I promise.
Once I ‘AND’ out the insignificant data, I’m free to perform my calculations exactly as you’d expect. Summing up my seconds since my reference date and dividing by the number of items in my data set.
1.As a disclaimer, this method is specifically to solve the problem of averaging at what time of day an event occurs on average.
2.I believe all OSs do, I’m just too lazy to verify it.
3.In fact, pretty much everything is a “projection of the data”, except for the seconds, everything is calculated from the seconds and therefore relative to that.
-
drugsrdrugs liked this
-
fortmyers liked this
-
tumbljack liked this
-
urbanape liked this
-
jeffrock liked this
-
micahtcollins liked this
-
jamiepinkham posted this