Wednesday, 31 October 2012

Computing logarithms to arbitrary bases in Java

I noticed that Java's math API does not provide a method to compute logarithms for arbitrary bases so I decided to write a function to provide this ability. As an API design, you cannot expect developers to know how to compute logarithms of arbitrary bases using the natural logarithm. It is quite common to compute logarithms to specific bases like 2, 8  and 16.

public static double log(double x, double b) {
    return log(x) / log(b)
}

All I am doing is taking the natural log of x over the natural log of the base value. This is not the optimal solution as it would be best to write a native method but it is close enough.

Simples!

Friday, 3 August 2012

Grouping in kdb using datetime

This blog post could also be called "becareful what you group for"

The original timestamp of kdb is based on a floating point number, which they call a datetime, where the date and time is on the left and right hand side of the decimal point respectively. This causes a lot of issues when comparing timestamps for equality and especially when grouping by timestamps. A colleague who adjusts risk of the portfolio got bitten by this so it is worth blogging.

The issues are not isolated to kdb but it is the inherent nature of floating point numbers as it has to make discrete approximations to the real number line. The most simple example is trying to represent 0.1 in base 2,as it is an irrational number so it has to make an approximation that is close to 0.1. Two variables representing 0.1 could have two different approximation values, and this can cause a lot of issues if you analysing data based on datetimes as it compares the approximation values and not by 0.1.


The proof is in the pudding, so I will demonstrate with an example.


/P 0q) x:1.0-0.90.1q)y:1.0-0.90.1
/P 1q) x:1.0-0.90.99999987q) y:1.0-0.9
0.99999789 As you can see when the precision is set to true, x and y are not the same values. Just think about this situation: what will happen if we wanted to group by datetimes but the same logical timestamp could have two (approximation) values. If we are grouping by timestamps then it is not guaranteed that it will group correctly because a logical datetime could have different approximation values.


The lesson to come away from this blog is not to use datetimes for equality logic both implicitly and explicitly.
The new timestamps that kdb introduced are based on 64 bit unsigned integers (computer scientists call them longs) so are reliable to use for grouping or logical equality.

Monday, 21 May 2012

Measuring latency in Java







I noticed a common pattern of Java code that attempts to measure latency within a Java process, which will give false measurements of elapsed time.

long start = System.currentTimeMillis();
doSomething();
long end = System. currentTimeMillis();
long time = end - start;


The problem with this snippet of code is that it measures latency by adjusted wall clock time. Wall clock time will continously drift from correct time and the operating system periodically sends a delta to skew it to keep it synchronized with real time; so can potentially skew your measurement.

The code can be changed by replacing System.currentTimeMillis() with System.nanoTime() as the former will include the drift adjustments and the latter gives measurements without including the skew. For example, it could skew the wall clock time by +20 milliseconds; thus adding an illusionary added 20 units of latency to the measurement.

The lesson from this is if you need to measure elapsed time in Java, ALWAYS use System.nanoTime()

long start = System.nanoTime();
doSomething();
long time = System.nanoTime() - start;

Wednesday, 18 April 2012

Sampling moments in Q

I was checking over a colleague's Q code that computes moments of a sample distribution taken from a statistical population. I realized that population moments were being computed against the sample - very naughty! The lesson of this blog is to be careful using out-of-the-box Q functions like var against a sample distribution.

Let's take a look at an example

q)sample:3 4 6 7 10
q)var sample
6f


This is the population variance being computed against the sample - this is an inaccurate moment! Instead create your own high-order function to compute sample moments.

q)svar:{(sum d*d:x-avg x)%-1+count x}
q)svar sample
7.5


Simples!