Re: Clustering [message #9839 is a reply to message #9820] |
Sun, 29 May 2016 17:53 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
I think the general rule-of-thumb here is to cluster at whatever is larger: the level of your aggregate variable (city) or the level you need to account for the sampling design (PSU). It sounds like your "city" variable can include multiple PSUs, at which point I would suggest clustering at the city level. You do not need to cluster up to the level of the strata.
I don't know what you mean by "account for the impact of two clusters". Obviously, you need many clusters in order to "cluster" your standard error estimates (many like more than 40 or something, at least more than 15 or 20). Do you mean what if city includes more than one "cluster"? In that case, the answer is above: you want to cluster at the city level.
This way of thinking about it might help: if everyone in a city has the same value for your variable of interest, then are you getting N=# of observations amount of information, or C=# of cities worth of information? Probably we want to think "something in between" but it is probably closer to C than N... you are getting much less information from additional observations all with the same value of the right hand side variable than you would by getting a whole new city. Additional observations within a city (that already has many observations) gives you a little bit more information, but not a full new observation's worth. At the far extreme, you could imagine collapsing everything down to the city level and running your regression on those C observations... each new piece of information within a city would just make each of those C observations slightly less variable (a slightly better estimate of the outcome for that city).
|
|
|