PezBlog: August 2012

This blog post could also be called "becareful what you group for"

The original timestamp of kdb is based on a floating point number, which they call a datetime, where the date and time is on the left and right hand side of the decimal point respectively. This causes a lot of issues when comparing timestamps for equality and especially when grouping by timestamps. A colleague who adjusts risk of the portfolio got bitten by this so it is worth blogging.

The issues are not isolated to kdb but it is the inherent nature of floating point numbers as it has to make discrete approximations to the real number line. The most simple example is trying to represent 0.1 in base 2,as it is an irrational number so it has to make an approximation that is close to 0.1. Two variables representing 0.1 could have two different approximation values, and this can cause a lot of issues if you analysing data based on datetimes as it compares the approximation values and not by 0.1.

The proof is in the pudding, so I will demonstrate with an example.

/P 0q) x:1.0-0.90.1q)y:1.0-0.90.1
/P 1q) x:1.0-0.90.99999987q) y:1.0-0.90.99999789 As you can see when the precision is set to true, x and y are not the same values. Just think about this situation: what will happen if we wanted to group by datetimes but the same logical timestamp could have two (approximation) values. If we are grouping by timestamps then it is not guaranteed that it will group correctly because a logical datetime could have different approximation values.

The lesson to come away from this blog is not to use datetimes for equality logic both implicitly and explicitly.
The new timestamps that kdb introduced are based on 64 bit unsigned integers (computer scientists call them longs) so are reliable to use for grouping or logical equality.

PezBlog

Friday, 3 August 2012

Grouping in kdb using datetime

About Me

Blog Archive