Friday, March 23, 2012

d3 for analytics dashboard

This week, I worked on developing a proof-of-concept for the analytics dashboard for Casebook. We are using a star schema in PostgreSQL for the data warehouse. For the front-end rendering, where we need histograms, box-and-whisker plots, line charts (may be more...we are an advocate of agile, after all), I am using the d3 javascript library. I found it pretty cool for a number of reasons. First, all the rendering are by SVG, so if you want to draw a histogram, you draw a set of rectangles; if you want to draw a box plot, you draw all the lines by yourself, and so on. You learn a set of basic tools, and you use them repeatedly to get your job done. I have used Google charts before for implementing analytics dashboards, but since SVG lets you draw things with geometric primitives, it gives you better control over things. Second, the concept of "joining" your data with the SVG components of the page. A histogram can be drawn with the following piece of nifty code:

chart.selectAll("rect")
.data(histogram)
.enter().append("rect")
.attr("width", x.rangeBand())
.attr("x", function(d) { return x(d.x); })
.attr("y", function(d) { return height - y(d.y); })
.attr("height", function(d) { return y(d.y); });

where histogram is an array of Javascript objects, each object mentions the start-point, width and frequency of a bucket. There are no SVG "rect" elements to begin with, so we can think of it as an "outer join" between the SVG rectangle elements and the histogram buckets, and set the rectangles attributes on the elements of the resultant set. The enter() operator comes in handy here (and in most situations) when nodes that we are trying to select do not exist yet in the DOM tree, it takes the name of the node to append to the document, which, in this case, is the SVG rect element.

One other thing I found interesting here is the object named "histogram". It's obtained by the following method call:

var histogram = d3.layout.histogram()(data);

it's a layout object returned by d3.layout.histogram(), and it is both an object and a function. Since it is a function, it can be invoked with the parameter "data"; and when it is used as a parameter to the method data() above, it serves as a "key function", which means if the data changes (which can happen if you change the filtering criteria of your query), the new data can be rebound to the nodes of the document.

I attended a talk on d3 by Mike Dewar of bit.ly at a meetup recently, where I first got introduced to it. For more details, check the paper by the original authors.

No comments:

Post a Comment