outliers – is it the data or the theory?

During my brief(?) stint at the Treasury, my coworkers and I frequently excluded outliers from data because their position on the scatterplot didn’t mesh with our understanding of community finance.  We simply assumed the data was invalid in some way.  It certainly made for a *slightly* more coherent understanding of an exceptionally chaotic set of data.

what you really shouldn’t do—especially when the cases are in other respects quite similar, such as all being functioning, rich capitalist democracies—is label entire countries as “outliers” in order to remove them from your analysis, and then pretend that this has made them disappear from the face of the earth, too.

[Outliers, at crooked timber]

The problem, to me, occurs where the statistical trend between a limited number of variables is completely insufficient to examine a data set whose causal picture is, in fact, incredibly complex.  For the case recently discussed, whether more taxes (counter intuitively) produce less revenue… is much like predicting weather changes based on the activity of the butterflies in my backyard.  No doubt there is an effect, good luck creating the regression.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: