outliers – is it the data or the theory?

July 19, 2007

During my brief(?) stint at the Treasury, my coworkers and I frequently excluded outliers from data because their position on the scatterplot didn’t mesh with our understanding of community finance.  We simply assumed the data was invalid in some way.  It certainly made for a *slightly* more coherent understanding of an exceptionally chaotic set of data.

what you really shouldn’t do—especially when the cases are in other respects quite similar, such as all being functioning, rich capitalist democracies—is label entire countries as “outliers” in order to remove them from your analysis, and then pretend that this has made them disappear from the face of the earth, too.

The problem, to me, occurs where the statistical trend between a limited number of variables is completely insufficient to examine a data set whose causal picture is, in fact, incredibly complex.  For the case recently discussed, whether more taxes (counter intuitively) produce less revenue… is much like predicting weather changes based on the activity of the butterflies in my backyard.  No doubt there is an effect, good luck creating the regression.

we’re supposed to be political science

October 5, 2006

so much for the discipline.

poincare conjecture

August 22, 2006

the math community has been abuzz, since they are getting closer to understanding grigori perelman’s (potential) solution to the poincare conjecture.  for those just tuning in, the poincare conjecture is generally understood as:

Every simply connected closed (i.e. compact and without boundary) 3-manifold is homeomorphic to a 3-sphere.
that isn’t quite fair, though.  perelman appears to have proved the more fundamental conjecture (thurston) that every 3 manifold can be reduced to a simple geometry (eight possible).  from this, the poincare conjecture is a direct consequence.

i’m not going to re-hash all that has been written, since this is well outside basically everything i know about math.  but the MSM’s coverage feels like a bit of a tease.  since it seems that getting to the deeper stuff is something of a mess, here are the relevant links:

keep in mind, kleiner & lott’s “notes on perelman’s papers” (arxiv) is 192 pages long.


June 29, 2006

via crooked timber and jim gibbon, i stumbled into gapminder, a neat data-visualization package available online (alternate link). so so cool, and not just for us geeks.

