textMap, so so cool – but how does it work?

July 22, 2006

i am the absolute worst when it comes to methodologies and titles (titles, you probably guessed from these posts).  but it is becoming increasingly apparent that these are at the core of great statistics / information display / research.  take textmap, an engine to analyize the geographic and temporal distribution of news.  it is really quite cool, and something i’ve wanted to do for a while, but it always seemed like there were too many problems to be overcome, before the idea became workable. so i was psyched fo find the site.

-but a problem-

playing with the ‘function of location’ charts has me worried.  montana has a relatively few news sources, and therefore never shows a strong reading.  the east coast, however – particularly the metropolitan corridor – is a sea of red (more news sources in the area).  so, there is variation in both areas, but it isn’t entirely clear what the map is measuring, because comparing across regions is no longer intuitive.  i couldn’t find the methodology on the site (boo!) – and so it isn’t clear what intensity of red indicates.

this isn’t to say the site isn’t worthwhile – the mexico map shows an intuitive trend

Mexico TextMap

but i have to wonder if this is an artifact of paper coverage – why the band between n. florida and s. georgia? – and wonder about coverage in relation to associated thoughts.  (what is the unit of analysis, btw – census tract?)

there is also the old baseline problem: what is the ‘noise’ associated with a given concept (background usage not associated with events)? – and what is the median frequency of ‘related’ terms – its cool if mexico usage went up, but if that was a function of world cup news or a function of immigration news makes a big difference.

hm… actually, with this data set, you could probably look at news conglomeration::variety of media sources, if the answers to above were clear… ooh, shiny.

data makes me do the happy dance

July 21, 2006
datamining has an interactive map of the blogosphere. the map layout is a “variant of the force layout approach to graph layout. There certainly is meaning to the location of nodes in the image: proximity indicates a tendancy for mutual citation.” meaning: the map is more than just a pretty face. the place of nodes has actual social meaning.

but this is even more sexy, as a suggestion:

Time stability is an interesting problem. One way to do this is to fix nodes in location (or certain nodes). Alternatively, you could allow nodes to become more lethargic in movement according to how long they have been there. This seems like a good idea. Are you going for some form of animated representation?

dangit, where is my programming computer when i need it!

[update 1]: ok, i heart datamining. this visualization method is pretty darn inspiring, and pretty straightforward to understand (compared to other methods i’ve read)

we start by giving some amount of money to some user (initiator) in LJ network telling him to evenly distribute it among his friends, then his friends are performing the same action among their friends and so on. Obviously, if these guys are the members of some clique it will not take too long until all of them have an equal amount of money (thanks to small-world property), meanwhile only some small part of the initial amount will leave this community. So the amount of money of a particular user defines his thermodynamic distance from the initiator. If we have two initiators – we can plot the figure like the one shown here.

expect more updates as i read through the whole archives this weekend.

[update 2]: don’t run too far through the links. i accidentally made it to ‘linked’, a book that makes me angry. hulk angry